ahmedhisham.dev — status: operational uptime: 2y+ in production · last deploy: now
// site reliability engineer

Ahmed Hisham

SRE & Production Engineering · Cairo, Egypt

I keep critical financial systems running at 99.9%+ uptime — on-call rotations, incident response, and CI/CD automation for Java and legacy Oracle/WebLogic stacks. Background in distributed systems and competitive programming.

40%
fewer recurring incidents
50%
MTTD reduction
80%
less manual release effort
Software Engineer — SRE & Production Engineering, Allianz Jul 2024 – Present · Cairo
  • Maintained 99.9%+ uptime for critical financial systems via 24/7 on-call, P1/P2 triage, real-time debugging, and rollback execution.
  • Ran structured RCAs on database contention, CPU spikes, job failures, and integration breakdowns — cut recurring incidents by 40%.
  • Built multi-environment Jenkins CI/CD pipelines for Java apps and Oracle Forms/ADF, automating WebLogic deploys with zero downtime and 80% less manual effort.
  • Built log aggregation and monitoring dashboards across 100+ scheduled jobs — cut MTTD by 50%, MTTR by 35%.
  • Automated repetitive ops work with Python and Bash, removing 70% of manual toil.
Software Engineer — Java Backend, Hits Consulting Jun 2023 – Jun 2024 · Cairo
  • Built and maintained Spring Boot REST APIs for HR systems (onboarding, leave, payroll) serving 200+ users at 99.5% uptime.
  • Added application-level monitoring via Spring Actuator and custom metrics for request rates, errors, and JVM health.
  • Implemented a Jenkins CI pipeline with Maven, cutting integration errors by 60%.
  • Optimized SQL and JPA/Hibernate mappings, improving key endpoint performance by up to 70%.

Three-Tier Kubernetes App

PostgreSQL + Node.js/Express + React/nginx on Minikube with reverse proxy and persistent volume claims. Debugged layered routing and container networking.

kubernetes · docker · postgres

Cloud-Based Log Monitoring (AWS)

EC2 + CloudWatch + SNS observability stack — sub-2-minute failure detection, automated alerts, recovery validated via simulated crashes.

aws · cloudwatch · observability

Prometheus & Grafana Stack

End-to-end monitoring for HTTP endpoints and MySQL with custom dashboards for latency, throughput, error rate, and threshold alerting.

prometheus · grafana · mysql

Multi-Tier K8s Deployment

Full-stack Java app (Tomcat, MySQL, Nginx) on a self-managed 4-node cluster. Deployments, Services, ConfigMaps, rolling updates, and horizontal scaling.

kubernetes · java · self-managed

CI/CD Pipeline with Jenkins

End-to-end pipeline for a Java app — Git, Maven, and Docker integrated into Jenkins with staged promotion across environments.

jenkins · docker · maven

Infrastructure Automation (Ansible)

Playbooks provisioning multi-tier environments (Tomcat, MySQL, Nginx, RabbitMQ, Memcached) with idempotent, repeatable config across nodes.

ansible · automation

Scalable Payment Processing API

RESTful API on EC2 with RDS/PostgreSQL backend. CloudWatch metrics, health checks, and automated restart scripts for resilience under load.

aws · rds · resilience

Flight Booking Backend

Skyscanner-style REST API in Spring Boot with layered architecture and caching to cut repeated query load and improve response time.

spring boot · caching
Cloud & IaC
AWS (EC2, RDS, CloudWatch, S3, IAM), Terraform, AWS CLI
Orchestration
Kubernetes, Docker, Docker Compose, OpenShift
Observability
Prometheus, Grafana, CloudWatch, log aggregation, MTTD/MTTR tracking
CI/CD & Automation
Jenkins, Ansible, multi-env pipelines
Languages
Java, Python, Bash, PL/SQL
Databases
PostgreSQL, Oracle, MySQL, replication, backup validation
SRE Practices
SLO/SLI/error budgets, on-call, RCA, incident management, toil elimination
Competitive Programming
7th/150 — Africa & Middle East ICPC (2021) · 4th/300+ — Egyptian Programming Competition