Recruitment Room Team

Platform Engineer (AWS, GitHub Actions, Heroku CI) (JHB)

Johannesburg – Gauteng – South Africa
2 days ago
Application ends: October 1, 2025
Deadline date:
October 1, 2025

Job Description


ENVIRONMENT:
A provider of cutting-edge Financial Tools in Joburg seeks the technical expertise of a Platform Engineer to manage Heroku pipelines, CI/CD, review apps, and production environments. You will also operate Celery workers and queues, monitor health, and handle missed task check-ins, manage Cloudflare for DNS, edge security, and performance optimisation & collaborate with Developers to streamline workflows and educate on secure coding practices. The ideal candidate must have 3+ years’ operating production apps on Heroku, AWS, DigitalOcean, or similar, CI/CD pipelines: Hands-on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals and Monitoring & incident response: Experience with Sentry, Papertrail (or similar), logs, and uptime/performance dashboards.
 
DUTIES:
Reliability & Operations –
  • Own uptime, performance, and monitoring for all production applications.
  • Manage Heroku pipelines, CI/CD, review apps, and production environments.
  • Operate Celery workers and queues, monitor health, and handle missed task check-ins.
  • Define and track service level objectives (SLOs) (availability, latency, task success rate).
  • Maintain runbooks, a centralised wiki for incident response, and lead post-mortems.
  • Run periodic disaster recovery drills and coordinate Penetration Tests.
 
Platform Engineering –
  • Keep environments current (Heroku stacks, Postgres/Redis versions, DO/AWS base images).
  • Manage daily backups, ensure restore tests and disaster recovery runbooks are in place.
  • Standardise infrastructure (Terraform or scripts for DO/AWS; app.json for Heroku).
  • Manage Cloudflare for DNS, edge security, and performance optimisation.
  • Tune performance (DB indices, query optimisation, cache usage, Celery queue design).
  • Optimise infrastructure costs across Heroku, DigitalOcean, and AWS.
 
Developer Experience & CI/CD –
  • Maintain CI pipelines with type checking, linting, and security scanning.
  • Enforce test coverage and automate deploy checks (smoke tests, migration health, error budgets).
  • Support Developers with tooling for local/staging environments and build self-service dashboards (e.g., Celery queue status).
  • Collaborate with Developers to streamline workflows and educate on secure coding practices.
 
Security & Compliance –
  • Own vulnerability management and dependency patching cadence.
  • Manage access reviews, secrets, MFA/SSO, and enforce least-privilege IAM policies.
  • Implement encryption for data at rest and in transit (e.g., S3 server-side encryption).
  • Contribute evidence and responses for security questionnaires and SOC 2 audits.
  • Maintain a “security pack” with architecture, sub-processors, and DR/backup processes.
 
Monitoring & Alerting –
  • Configure Sentry ownership rules, Cron Monitors, and release health.
  • Centralise metrics/logs (Heroku metrics, Papertrail, Sentry, APM, Prometheus/New Relic).
  • Set up alerts on golden signals (latency, errors, traffic, saturation) and avoid alert fatigue.
  • Conduct capacity planning and track resource usage trends.
 
Vendor & External Services –
  • Evaluate and manage vendor relationships (e.g., Mailgun, Twilio) to ensure service level agreements (SLAs) and performance.
  • Assess new tools/services to enhance platform capabilities (e.g., observability, security).
  • Track costs, security posture, and integration quality for all third-party services.
 
REQUIREMENTS:
Must-Haves –
  • Cloud Infrastructure Management: 3+ years’ operating production apps on Heroku, AWS, DigitalOcean, or similar.
  • CI/CD pipelines: Hands-on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals.
  • Monitoring & incident response: Experience with Sentry, Papertrail (or similar), logs, and uptime/performance dashboards.
  • Security Fundamentals: Understanding of IAM, encryption in transit/at rest, MFA/SSO, and secure configuration practices.
  • Disaster recovery & backups: Experience implementing and operating automated backups, restore testing, and writing/maintaining incident runbooks.
  • Communication & collaboration: Ability to document processes clearly and work closely with Developers in a small team.
 
Strong Plus –
  • Infrastructure as Code & automation: Experience with Terraform, Docker, or equivalent tooling.
  • Asynchronous workloads: Familiarity with Celery, Redis, or other task queues and message brokers.
  • Scaling & cost optimisation: Capacity planning, performance tuning, and managing infra spend.
  • Compliance frameworks: Exposure to SOC 2, GDPR, or supporting client security questionnaires.
  • Incident management: Participation in on-call rotations, leading post-mortems, or serving as incident commander.
 
Nice-to-Haves –
  • Certifications (AWS Certified DevOps Engineer, CKS, or equivalent).
  • Proficiency in Python; familiarity with Django/Flask.
  • Experience with DNS/CDN/edge security (e.g., Cloudflare).
  • Observability platforms (Prometheus, Grafana, New Relic).
  • Static analysis and code quality tools (mypy, Bandit, SonarQube).
  • Prior exposure to multi-tenant SaaS environments.