Docker & Kubernetes
Deploy GPUFlight in containerized workloads using environment
variables to configure live upload, profiling engine, and
remote configs — no code changes after your application's
initial gpufl::init().
gpufl::init()The environment variables on this page configure an application
that already includes gpufl-client and calls gpufl::init() at
startup. They override InitOptions fields without a rebuild. If
you're integrating from scratch, see the
Installation and
Sending data guides first.
Environment Variables
In containers, gpufl::init() reads its config from GPUFL_* env
vars. To enable upload to the backend, two are required:
| Variable | Purpose |
|---|---|
GPUFL_BACKEND_URL | Backend host (e.g. https://api.gpuflight.com). Host-only. |
GPUFL_API_KEY | Bearer token. |
Upload itself is a separate post-shutdown step — see Sending data to the dashboard for the full guide. In containerized workloads, the recommended pattern is either:
- The app calls
gpufl.upload_logs(...)(Python) orgpufl::uploadLogs(opts)(C++) before exiting, OR - A sidecar runs the
gpufl-agentJVM service against the same log directory and uploads continuously, OR - An init container / lifecycle hook runs
gpufl upload <log_path>after the main workload finishes.
Common optional vars: GPUFL_API_PATH (reverse-proxy mounts),
GPUFL_PROFILING_ENGINE (override engine). Full list and precedence
rules: Environment variable overrides.
Docker
A working example Dockerfile based on the NVIDIA CUDA devel image
(builds gpufl from source so NVML is linked correctly, runs
JupyterLab) lives in the client repo at
example/python/docker/Dockerfile.
It pins to a tagged client release and passes the CMake flags
(-DNVML_LIBRARY=…, -DCUDAToolkit_ROOT=/usr/local/cuda) needed
for reliable NVML detection inside pip's isolated build env.
Basic usage
docker run --gpus all \
-e GPUFL_BACKEND_URL=https://api.gpuflight.com \
-e GPUFL_API_KEY=gpfl_xxxxx \
my-training-image:latest
Your app reads the env vars at gpufl.init(...) time and stores them
on InitOptions; whatever upload step it runs (typically inside
gpufl.session() or an explicit gpufl.upload_logs(...) call before
exit) reads them back to talk to the backend.
Docker Compose
services:
training:
image: my-training-image:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- GPUFL_BACKEND_URL=https://api.gpuflight.com
- GPUFL_API_KEY=gpfl_xxxxx
Toggling upload on / off
Unset GPUFL_API_KEY (or remove the env var entirely) to keep the
session fully offline — file logs are still written, and you can ship
them later with gpufl upload <log_path> once you've got credentials.
Kubernetes
Single Pod (app uploads itself)
The simplest pattern: each instrumented Pod runs its workload, then
calls gpufl.upload_logs(...) / gpufl::uploadLogs() before exiting.
No sidecar required. Good for development clusters and small
deployments.
apiVersion: v1
kind: Pod
metadata:
name: gpu-training
spec:
containers:
- name: training
image: my-training-image:latest
env:
- name: GPUFL_BACKEND_URL
value: "https://api.gpuflight.com"
- name: GPUFL_API_KEY
valueFrom:
secretKeyRef:
name: gpuflight-secret
key: api-key
resources:
limits:
nvidia.com/gpu: 1
Store API Key as a Secret
kubectl create secret generic gpuflight-secret \
--from-literal=api-key=gpfl_xxxxx
DaemonSet (gpufl-agent — sidecar-based upload)
Run gpufl-agent
once per GPU node. It's a JVM (Java 25) sidecar that tails
NDJSON files written by every instrumented Pod on the node and
publishes them via HTTP or Kafka. When using the agent, leave
out any in-process gpufl.upload_logs() calls in the
application Pods — let the agent handle delivery so you don't
double-upload the same files.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: gpufl-agent
namespace: monitoring
spec:
selector:
matchLabels:
app: gpufl-agent
template:
metadata:
labels:
app: gpufl-agent
spec:
nodeSelector:
nvidia.com/gpu.present: "true"
containers:
- name: agent
image: ghcr.io/gpu-flight/gpufl-agent:latest
env:
- name: GPUFL_SOURCE_FOLDERS
value: "/var/log/gpuflight"
- name: GPUFL_PUBLISHER_TYPE
value: "http"
- name: GPUFL_HTTP_HOST
value: "https://api.gpuflight.com"
- name: GPUFL_HTTP_API_VERSION # defaults to v1; bump for future versions
value: "v1"
- name: GPUFL_HTTP_TOKEN
valueFrom:
secretKeyRef:
name: gpuflight-secret
key: api-key
- name: GPUFL_CURSOR_FILE
value: "/var/lib/gpufl-agent/cursor.json"
volumeMounts:
- name: gpufl-logs
mountPath: /var/log/gpuflight
- name: gpufl-cursor
mountPath: /var/lib/gpufl-agent
resources:
limits:
nvidia.com/gpu: 0 # Agent doesn't need a GPU device.
volumes:
- name: gpufl-logs
hostPath:
# Application Pods on this node mount the same hostPath as
# their NDJSON log destination. The agent picks up new
# content from any Pod that writes here.
path: /var/log/gpuflight
type: DirectoryOrCreate
- name: gpufl-cursor
hostPath:
# Persisted across agent restarts so we resume tailing at
# the right byte offset and never re-upload events.
path: /var/lib/gpufl-agent
type: DirectoryOrCreate
Then on every application Pod, mount the same hostPath and
point gpufl::InitOptions::log_path into it (e.g.
/var/log/gpuflight/${HOSTNAME}). Each run writes into its own
<log_path>/<session_id>/ subdirectory, which the agent
auto-discovers. The Pods don't need
GPUFL_API_KEY — file writes are all the agent needs; the agent
authenticates and uploads on their behalf.
Which pattern to pick
- Single Pod / direct HTTP: simpler. Each Pod authenticates itself, no sidecar. Best for small clusters, dev environments.
- DaemonSet /
gpufl-agent: durable delivery via persisted cursor file (no duplicate or lost events on restart), single egress point per node, one secret to rotate per cluster instead of per Pod, optional Kafka pipeline. Best for production at scale.
See Sending data to the dashboard and the gpufl-agent guide for the full mental model.
Helm Chart
A Helm chart for one-line deployment is on the roadmap. Follow the GitHub repository for updates. Until then, the YAML above is canonical.
Framework-Agnostic
GPUFlight works at the CUDA driver level, so it's compatible with any GPU framework without framework-specific plugins:
- PyTorch
- TensorFlow
- JAX
- RAPIDS
- Custom CUDA/HIP kernels
- Any application that uses NVIDIA CUDA or AMD ROCm
No import gpuflight in your Python code. No framework integrations to configure. Just set the environment variable and GPUFlight observes all GPU activity automatically.
Overhead
GPUFlight's PcSampling engine is designed for always-on deployment. Deep is the opposite, intended for one-off kernel investigation during development and never enabled fleet-wide.
| Engine | Typical overhead |
|---|---|
Monitor (default) | Minimal — no CUPTI |
Trace | Low — kernel timing, no sampling |
PcSampling (production-safe) | Low; safe to run 24/7 |
RangeProfiler | Moderate, per scope |
Deep (development only) | Significant kernel slowdown while the scope is active |
The Deep slowdown is intrinsic to SASS-level instrumentation. The same physics applies to any tool that collects per-instruction execution counts, including NVIDIA Nsight Compute (which addresses it with kernel replay, paying the cost as additional passes instead of slower passes). Use Deep for the specific kernel you are investigating, not for production fleet observability.
Actual numbers vary by hardware generation, driver version, kernel characteristics, and sampling configuration. Benchmark your own workload before committing to a deployment mode.
What's Next
- Scope Profiling Guide - Add optional code annotations for deeper insight
- CUDA Integration Guide - NVIDIA-specific profiling engines and SASS disassembly
- AMD Integration Guide - AMD ROCm and HIP support