โšก New

Observability Engineer

Tekgence Inc

TorontoFull-timeMid LevelOn-site

Job Description

We are seeking an experienced Observability Engineer to join our Enterprise Kubernetes Platform Team at a leading financial services organization. This role is responsible for the design, implementation, and operation of the enterprise observability platform supporting over 50 production Kubernetes clusters and the mission-critical applications running on them. As the owner of the observability ecosystem, you will deliver robust monitoring, logging, tracing, and alerting capabilities that enable engineering teams to maintain high levels of reliability, performance, and operational excellence.

You will work closely with platform engineers, application teams, SREs, and infrastructure teams to build scalable observability solutions across on-premises and cloud environments. The ideal candidate brings deep expertise in cloud-native observability technologies, Kubernetes platforms, and automation practices, along with a passion for leveraging AI/ML capabilities to enhance operational intelligence through predictive monitoring, anomaly detection, and self-healing infrastructure. Required Qualifications Bachelor's degree in Computer Science, Information Technology, Engineering, or equivalent practical experience. 5+ years of experience in platform engineering, site reliability engineering, DevOps, or observability engineering roles.

Strong hands-on experience with Kubernetes and cloud-native technologies. Extensive experience with observability tools including Prometheus, Grafana, Thanos, Loki, OpenTelemetry, and distributed tracing solutions. Experience managing observability platforms at enterprise scale.

Proficiency in Infrastructure as Code tools such as Terraform, Helm, and GitOps frameworks. Strong scripting and automation skills using Python, Go, Bash, or similar languages. Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.

Strong understanding of Linux systems, networking, and distributed systems architecture. Preferred Qualifications Experience in financial services, banking, or highly regulated environments. Knowledge of AI/ML applications in observability and operational intelligence.

Experience with service mesh technologies and cloud-native security practices. Familiarity with SRE principles, SLOs, error budgets, and incident management processes. Relevant certifications in Kubernetes, cloud platforms, or observability technologies. #J-18808-Ljbffr

Posted Yesterday

Related Jobs

Related Searches

Apply Now