Hacking GPU Observability: eBPF & Ephemeral Containers in Action on Kubernetes - Brandon Kang, Akamai Technologies

less than 1 minute read

Abstract

Struggling to observe or secure GPU workloads on Kubernetes? You’re not alone. As AI/ML pipelines scale, ensuring visibility and trust across GPU-accelerated environments becomes increasingly critical. This session dives into how ephemeral containers and eBPF can be combined to troubleshoot, monitor, and protect GPU-based applications—without disrupting production. You’ll learn how to replicate live GPU environments for debugging with ephemeral containers and how eBPF enables real-time kernel-level telemetry, anomaly detection, and zero-trust policy enforcement. We’ll walk through real examples: tracing GPU performance bottlenecks, monitoring unauthorized access to compute resources, and securing container provenance with cryptographic techniques. Whether you’re scaling LLM training or deploying HPC workloads, this session will arm you with modern, production-ready techniques for securing and observing your GPU pipelines.

Sched URL

Video