Headlamp sandbox

Extensible open source multi-cluster Kubernetes user interface

Headlamp Team at KubeCon Europe 2026

Headlamp is heading to Amsterdam! KubeCon + CloudNativeCon Europe 2026 runs March 24-26 (with co-located events starting March 22), and we have plenty going on: conference talks, a Maintainer Summit session, a hands-on ContribFest, and our kiosk at the Project Pavilion. Here's the full rundown.

Talks

This year we have Headlamp-related talks from both core members of the project, and from the wider community.

Title: Leveling up with Radius: Custom Resources and Headlamp Integration for Real-World Workloads
Speakers: Nuno Guedes (Millennium BCP) and Will Tsai (Microsoft)
Date and time: Tuesday, March 24, 3:15 PM - 3:45 PM CET
Room: Forum
Description: Learn how Millennium bcp extended the Radius framework for production workloads with complex dependencies (Datadog monitors, AI models, internal APIs) and built a Headlamp plugin that visualizes the app graph and maps dependencies across cloud platforms.

Title: How To (Not) Fork Headlamp
Speaker: Joaquim Rocha (Amutable)
Date and time: Thursday, March 26, 11:45 AM - 12:15 PM CET
Room: E106-108
Description: Should you write a plugin or fork the whole project? This talk walks through Headlamp's architecture and plugin system, covers the trade-offs, and shares practical advice for keeping your customizations maintainable either way.

Title: Ping SRE? I Am the SRE! Awesome Fun I Had Drawing a Zine for Troubleshooting Kubernetes Deployments
Speaker: Rene Dudfield (Microsoft)
Date and time: Wednesday, March 25, 4:45 PM - 5:15 PM CET
Room: Hall 7, Room A
Description: Patterns from troubleshooting Kubernetes issues in the Headlamp community turned into a hand-drawn mini zine for diagnosing deployment problems. Come see how a notebook full of doodles became a 16-page troubleshooting guide, and maybe get inspired to draw your own.

Title: Headlamp: Build Kubernetes Experiences Your Way! (ContribFest)
Speakers: Joaquim Rocha (Amutable) and Santhosh Nagaraj (Microsoft)
Date and time: Wednesday, March 25, 11:00 AM - 12:15 PM CET
Room: G107
Description: A hands-on workshop where you'll build a Headlamp plugin with guidance from the maintainers. Whether you're just getting started or already have contributions in flight, this is a great chance to dig in. Bring your laptop!

Co-located: Maintainer Summit

Title: Does Your Project Want a UI in kubernetes-sigs/headlamp?
Speakers: Rene Dudfield and Santhosh Nagaraj (Microsoft)
Date and time: Sunday, March 22, 11:35 AM - 12:10 PM CET
Room: Forum
Description: Headlamp already has UI plugins for projects like cert-manager, Gateway API, Karpenter, and KEDA. In this Maintainer Summit session, the team invites CNCF project maintainers to collaborate on bringing UI support to even more projects.

Project Pavilion

Come say hi at our kiosk (P-6B) in the Project Pavilion (Solutions Showcase, Halls 1-5). We'll be there with live demos and happy to chat about anything Headlamp, on Wednesday, March 25: 10:00 AM - 1:30 PM CET.

Also at KubeCon

Headlamp is also expected to make an appearance in a couple of other sessions:

See You There

Whether it's at a talk, the ContribFest, or the Project Pavilion, we'd love to connect. See you in Amsterdam!

OpenFeature incubating

Standardizing Feature Flagging for Everyone

OpenFeature at KubeCon + CloudNativeCon Europe 2026

KubeCon + CloudNativeCon Europe 2026 kicks off next week in Amsterdam, March 23-26, and the OpenFeature community will be there in force. Whether you're looking to catch a talk, chat with maintainers at our booth, or join us for evening drinks, here's everything you need to know.


Sessions at KubeCon

OpenFeature maintainers and contributors are presenting across several sessions throughout the week:


Building Secure Package Pipelines

SpeakersAndre Silva (LexisNexis Risk Solutions)
DateSunday, March 22
Time16:05 - 16:40 CET
LocationMaintainer Summit -- Forum

Andre will walk through creating a secure package pipeline that any open-source maintainer can achieve -- covering OIDC authentication, automated SBOM generation, cryptographic attestations, and automated releases with Release Please.


OpenFeature Update From the Maintainers

SpeakersLukas Reining (codecentric AG), Andre Silva (LexisNexis Risk Solutions), Thomas Poignant (Gens de Confiance), Alexandra Oberaigner (Dynatrace)
DateWednesday, March 25
Time11:45 - 12:15 CET
LocationE103-105

Get the latest updates from the OpenFeature maintainers on what's new in the project and where things are headed. Bring your questions.


18 Bluetooth Controllers Walk into a Bar: Observability & Runtime Configuration with CNCF Tools

SpeakersSimon Schrottner (Dynatrace), Manuel Timelthaler (Tractive)
DateThursday, March 26
Time15:15 - 15:45 CET
LocationAuditorium -- KubeCon Main Schedule

What happens when your "distributed system" is 18 PlayStation Move controllers on Bluetooth? Simon and Manuel explore observability challenges you'd never expect, using OpenFeature and OpenTelemetry to manage context-aware flags, intelligent sampling, and real-time telemetry in a physical party game.


Visit Our Booth

Stop by and chat with OpenFeature maintainers at our kiosk in the Project Pavilion (Halls 1-5).

Kiosk Number: P-10B

Schedule:

  • Tuesday, March 24: 15:10 - 19:00 CET
  • Wednesday, March 25: 14:00 - 17:00 CET
  • Thursday, March 26: 12:30 - 14:00 CET

We'll be there to answer questions, demo the latest features, and hear how you're using OpenFeature in your organization.


Meet the Community

A number of OpenFeature maintainers and contributors will be in Amsterdam throughout the week, and we're organizing informal evening meetups -- including drinks. If you'd like to join or help coordinate, hop into the #openfeature-kubecon channel on the CNCF Slack. It's a great opportunity to connect face-to-face with the people building the project.


What's New in OpenFeature

A lot has happened since KubeCon NA 2025. Here are some of the highlights:

  • New Governance Committee members -- The 2026 elections have concluded and we're excited to welcome Andre Silva (LexisNexis Risk Solutions), Jonathan Norris (Dynatrace), Maks Osowski (Google), and Thomas Poignant (Gens de Confiance) to the Governance Board for the 2026-2028 term.

  • CNCF Training (LFS140) -- The official Feature Flagging with OpenFeature course is available on the Linux Foundation Training Platform.

  • C++ SDK -- A new C++ SDK is being bootstrapped with core API surfaces, provider management, and a Bazel build system.

  • OpenFeature CLI -- The CLI continues to mature with a new push command for syncing local flag definitions to remote providers, an improved compare command, and updated code generators.

  • Multi-Provider support across SDKs -- Multi-Provider has shipped or is actively being developed in JS, Java, Go, .NET, Python, Swift, and Kotlin, enabling powerful new use cases for combining multiple flag sources.

  • flagd -- flagd has shipped several releases with major features including multi-project selectors, OAuth support for HTTP sync, array flag configurations, and a new memdb-based flag store.

  • OpenFeature MCP Server -- A new MCP server for integrating feature flags with AI/LLM tooling, now stabilizing with protocol schema updates and reliability improvements.

  • Platform modernization -- .NET SDK upgraded to .NET 10, Angular SDK reached v1.0+, Go SDK moved to Go 1.25, and Python dropped 3.9 support.

  • React <FeatureFlag> component -- The React SDK added a new declarative <FeatureFlag> component for cleaner flag evaluation in React applications.


In-Person Discussions

With so many maintainers and contributors in Amsterdam at the same time, we're organizing informal in-person discussion sessions around key topics for the project. If any of these topics interest you, come find us in the #openfeature-kubecon channel on the CNCF Slack.

Topics on the agenda include:

  • Experimentation support -- advancing experimentation capabilities in OpenFeature
  • flagd changes -- rollout strategies, fractional granularity, and more
  • flagd common evaluation engine -- exploring a shared evaluation engine across flagd languages
  • OFREP -- SSE support and local caching changes
  • AI workflow integrations -- integrating feature flags into AI/ML workflows
  • Expanding the TC and growing maintainers -- building out the Technical Committee and onboarding new maintainers
  • OTel observability -- deeper integration with OpenTelemetry
  • C++ SDK -- roadmap and direction for the new C++ SDK

These are open discussions -- all are welcome regardless of experience level.


We're looking forward to seeing you in Amsterdam. If you haven't registered yet, there's still time:

Register for KubeCon + CloudNativeCon Europe 2026

Follow us on Bluesky, LinkedIn, and join the CNCF Slack to connect with the community. See you in Amsterdam!

Kubernetes graduated

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications

Securing Production Debugging in Kubernetes

During production debugging, the fastest route is often broad access such as cluster-admin (a ClusterRole that grants administrator-level access), shared bastions/jump boxes, or long-lived SSH keys. It works in the moment, but it comes with two common problems: auditing becomes difficult, and temporary exceptions have a way of becoming routine.

This post offers my recommendations for good practices applicable to existing Kubernetes environments with minimal tooling changes

Rook graduated

Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for Ceph storage to integrate with cloud-native environments natively. Ceph is a distributed storage system that provides block, file, and object storage and is deployed in large-scale production clusters....

Running Rook at Petabyte Scale Across Multiple Regions

This post describes how SAP’s cloud infrastructure team uses Rook to manage a multi-region Ceph fleet — from bare metal provisioning to rolling upgrades — as part of building a digitally sovereign storage backbone for Europe.

The 120 Petabyte Challenge

When you are responsible for a target of 120 Petabytes of storage across 30 Regions, manual operations don’t scale.

For years, SAP Cloud Infrastructure relied on a mix of proprietary appliances and legacy OpenStack Swift. But as we architected our next-generation cloud stack (internally part of the Apeiro project), we faced a non-negotiable constraint: Digital Sovereignty. Our stack had to be completely free of hyperscaler lock-in, running on our own hardware in our own data centers.

This created a concrete engineering challenge: build a storage layer that is API-first, fully open-source, and capable of self-management at a global scale. We chose Ceph for the storage engine — and Rook for the automation layer that makes it manageable.

Why Rook

Managing Ceph at this scale without an operator would mean building and maintaining custom tooling for OSD lifecycle, daemon placement, upgrade orchestration, and failure recovery across every region. Rook gives us all of this as a declarative Kubernetes-native interface, which means our existing GitOps and CI/CD workflows extend naturally to storage. Instead of writing region-specific runbooks, we write Helm values.

Architecture: The Separation of Metal and Software

Our platform, CobaltCore, is built on top of Gardener and Metal-API both part of the ApeiroRA reference architecture. In this stack, storage isn’t a static resource — it’s a programmable Kubernetes object. We run storage on dedicated nodes, separate from application workloads. At our density (16 NVMe drives per node), co-locating workloads would create unacceptable I/O interference, so storage nodes do one thing: serve data.

The Metal Layer

Metal-API and Gardener manage the physical lifecycle of bare-metal servers: inventory, provisioning, firmware, and OS deployment. This allows Rook to focus purely on the software layer without worrying about the underlying physical state.

The Declarative Storage Layer (Rook)

Once nodes are handed over, Rook takes control. We use a strict GitOps workflow to ensure consistency across the fleet:

  • Base Blueprint: A central Helm chart defines global best practices and standard Ceph configurations.
  • Region Overlay: Region-specific resources (CephBlockPools, RGW placement rules) are injected via localized values.yaml files.
  • Automation: Rook handles the rest: bootstrapping daemons, configuring CRUSH failure domains, and provisioning RGW endpoints.

Standard Storage Node Spec:

  • Server: Dell PowerEdge R7615
  • CPU: AMD EPYC 9554P (64 cores)
  • RAM: 384 GB
  • Storage: 16x 14 TB NVMe
  • Network: 100 GbE (redundant)

Validation: Establishing the Performance Envelope

Before committing to production capacity planning, we needed to establish the performance envelope of our RGW tier. We ran a breakpoint test on a typical Ceph Squid cluster (28 nodes, 362 OSDs) to find the stable operating range, saturation threshold, and hard ceiling.

Test Setup

  • Workload: 2M objects (4 KB each), 20 k6 clients, single “premium” NVMe bucket.
  • Method: Ramping load over 30 minutes until p90 latency exceeded 500 ms.

Results

Request rate ramp: successful requests (pink) peak at 171K ops/sec before the test exits. Failed requests (blue) spike briefly near saturation.
  • Saturation Point: The cluster entered saturation around 90K GET/s — latency percentiles begin diverging and request queues start building.
  • Breaking Point: Peak of 171K GET/s (measured on RGW) before the runners hit the latency exit condition.

Note: isolated 503s appeared as early as ~33K GET/s on a single RGW instance, likely caused by uneven load distribution rather than cluster-wide saturation.

Reading the charts

Client-side latency (k6): flat near zero through moderate load, stepping up as the cluster approaches saturation, and reaching 1.5s+ at the breaking point.

The client-side latency chart tells the story most directly. Average request duration stays flat near zero well into the ramp — then steps up sharply as the cluster enters saturation and eventually hits its ceiling.

RGW-side GET latency: all percentiles stay flat and sub-ms through moderate load. Around 90K GET/s, p99 begins climbing while median remains low — a classic saturation signal.

Comparing the two charts reveals where the system saturates. At peak load, RGW reports p99 latency of ~210ms — but clients observe 1.5 seconds. The gap is connection queueing: requests waiting to be picked up by RGW Beast frontend threads. RGW’s internal metrics only measure processing time after a request is accepted, not time spent in the queue.

The RGW latency chart also shows that RADOS operation latency climbs under load, which means RGW threads stay occupied longer, contributing to the queue buildup. At the breaking point, request queues filled and RGWs began returning 503s across all instances.

This is a read-focused baseline — our primary workload is read-heavy. The saturation point of 90K GET/s gives us a conservative operating ceiling for per-region capacity planning.

Operational Reality: Making Day 2 Uneventful

The true test of any storage system is what happens when things break or need upgrading. At our scale, the goal is to make operations boring.

Zero-Downtime Upgrades

Rook has reduced storage maintenance from a coordinated event to a background task. Since the first cluster went live in May 2024, we have maintained a continuous upgrade cadence with zero customer-facing downtime and zero data loss:

  • GardenLinux: Monthly rolling updates across all regions.
  • Kubernetes: Quarterly version upgrades.
  • Rook: Quarterly upgrades (v1.14 through v1.18), with additional upgrades when a needed feature ships in a new release.
  • Ceph: Major version migration from Reef v18 to Squid v19. A rolling upgrade of the largest cluster (~816 OSDs) completes in approximately 2 days.
Upgrade cadence since May 2024: GardenLinux monthly, Kubernetes quarterly, Rook quarterly (with extra upgrades for needed features), and one Ceph major version migration.

Drive Failures

With ~2,800 OSDs in the fleet, drive failures are a routine event. When a drive fails, Ceph (RADOS) automatically handles data recovery and rebalancing across the remaining OSDs — no operator action is needed to protect data. On the Kubernetes side, Rook detects the failed OSD pod and manages its lifecycle. The full drive replacement cycle (removing the failed OSD, clearing the device, provisioning a new OSD on the replacement drive) still involves operational steps on our side, but Ceph’s self-healing ensures data durability is never at risk while the replacement is carried out.

Current Status and What’s Next

As of early 2026, the fleet spans 10 live regions (with an 11th newly provisioned):

  • Storage Nodes: 251
  • Total OSDs: ~2,800
  • Raw Capacity: ~37 PiB

Region sizes range from 13-node / 96-OSD deployments to 59-node / 816-OSD clusters — the same Rook-based GitOps workflow handles both.

The next phase is bringing high-performance Block Storage (RBD) into this declarative model to fully retire our remaining proprietary SANs.

  • Target: 30 Regions.
  • Target Capacity: 120 PB.

We are active contributors to the Rook project and continue collaborating with the maintainers as we scale toward these targets. The Rook Slack community has been a valuable resource throughout this journey.

This work is part of ApeiroRA — an open initiative developing a reference blueprint for sovereign cloud-edge infrastructure. All components use enterprise-friendly open-source licenses under neutral community governance. ApeiroRA welcomes participants — whether you want to adopt the blueprints, contribute components, or shape the architecture. Get started at the documentation portal.

Authors: SAP Engineering Team, CLYSO Engineering Team.


Running Rook at Petabyte Scale Across Multiple Regions was originally published in Rook Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Kubernetes graduated

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications

The Invisible Rewrite: Modernizing the Kubernetes Image Promoter

Every container image you pull from registry.k8s.io got there through kpromo, the Kubernetes image promoter. It copies images from staging registries to production, signs them with cosign, replicates signatures across more than 20 regional mirrors, and generates SLSA provenance attestations. If this tool breaks, no Kubernetes release ships. O

Score sandbox

Score is an open-source workload specification designed to simplify development for cloud-native developers.

Score is now onboarded into the Docker-Sponsored Open Source Program

As CNCF Maintainers of the Score project (CNCF Sandbox), we recently embarked on a journey to strengthen our security posture by participating in the Docker Sponsored Open Source Program. This post shares our experience, learnings, and the tangible security improvements we achieved. Our goal is to inspire others to take advantages of these security best practices by default for their own open source projects, under the CNCF umbrella and not only.

Flux graduated

Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories and OCI artifacts), and automating updates to configuration when there is new code to deploy. Flux is built from the ground up to use Kubernetes' API extension system, and to integrate with Prometheus and other core components of the Kubernetes ecosystem....

Blog: Stairway to GitOps: Scaling Flux at Morgan Stanley

One of the things we love most about this community is hearing how you take Flux and run with it - truly solving problems for teams at scale. At our inaugural FluxCon NA, Tiffany Wang and Simon Bourassa from Morgan Stanley gave us a glimpse of their Flux environm

KServe incubating

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Announcing KServe v0.17 - Production-Ready LLM Serving with LLMInferenceService

Published on March 13, 2026

We are excited to announce the release of KServe v0.17, a landmark release that brings LLMInferenceService to production readiness with a GenAI-first architecture built on the llm-d framework. This release introduces KV-cache aware intelligent routing, disaggregated prefill-decode, distributed inference with tensor/data/expert parallelism, Envoy AI Gateway integration with token-based rate limiting, and a completely restructured modular Helm chart architecture.

🤖 LLMInferenceService: GenAI-First Architecture

KServe v0.17 elevates LLMInferenceService from an experimental feature to a production-ready CRD purpose-built for generative AI workloads. Built on the llm-d framework, LLMInferenceService provides a GenAI-first architecture that goes beyond traditional InferenceService to address the unique challenges of serving large language models at scale.

Unlike InferenceService which is designed for predictive AI workloads, LLMInferenceService natively supports:

  • Distributed inference across multiple nodes and GPUs
  • KV-cache aware scheduling for intelligent request routing
  • Disaggregated prefill-decode for optimal resource utilization
  • Gateway Inference Extension (GIE) integration for advanced traffic management
  • Token-based rate limiting via Envoy AI Gateway
FeatureInferenceServiceLLMInferenceService
Primary Use CasePredictive AIGenerative AI
RoutingStandard GatewayKV-cache aware with EPP
ParallelismWorker SpecTP, DP, EP native support
Prefill-DecodeN/ADisaggregated separation
ScalingHPA/KPAWVA + KEDA
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-serving
spec:
model:
uri: hf://meta-llama/Llama-3.1-8B-Instruct
name: meta-llama--Llama-3.1-8B-Instruct
replicas: 3
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
pool: {}

This creates a full serving stack including the Deployment, Service, Gateway, HTTPRoute, InferencePool, InferenceModel, and EPP (Endpoint Picker Pod) — all managed by the LLMInferenceService controller.

🚀 Key LLMInferenceService Features in v0.17

🧠 KV-Cache Aware Scheduling with Gateway Inference Extension

LLMInferenceService integrates with Gateway Inference Extension (GIE) v1.3.0, a Kubernetes SIG project that extends the Gateway API with AI-specific routing capabilities. At the heart of this integration is the Endpoint Picker Pod (EPP) from the llm-d inference scheduler, an intelligent scheduler that routes requests based on real-time KV-cache state rather than simple round-robin or random load balancing.

Traditional load balancing treats all LLM inference requests equally, but in practice, requests with similar prompts benefit enormously from being routed to the same pod — because that pod already has the relevant KV cache blocks loaded. The EPP solves this by tracking real-time KV cache states across all vLLM instances via ZMQ events (BlockStored, BlockRemoved) and building an index mapping {ModelName, BlockHash}{PodID, DeviceTier}.

The scheduling behavior is configured through EndpointPickerConfig, which defines a plugin pipeline with weighted scorers:

apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: EndpointPickerConfig
plugins:
- type: single-profile-handler
- type: prefix-cache-scorer
- type: load-aware-scorer
parameters:
threshold: 100
- type: max-score-picker
schedulingProfiles:
- name: default
plugins:
- pluginRef: prefix-cache-scorer
weight: 2.0
- pluginRef: load-aware-scorer
weight: 1.0
- pluginRef: max-score-picker

The pipeline uses three types of plugins (see llm-d scheduler architecture for details):

  • prefix-cache-scorer (weight: 2.0): Tracks the actual KV cache contents across all vLLM instances and scores pods based on how many cached prefix blocks match the incoming request's prompt. This reduces Time To First Token (TTFT) by avoiding redundant prefill computation for repeated or similar prompts — particularly beneficial for multi-turn conversations and RAG workloads.
  • load-aware-scorer (weight: 1.0): Scores candidate pods based on their current queue depth. Pods with empty queues score 0.5, while pods with growing queues score progressively lower toward 0. The threshold parameter controls the sensitivity — when queue depth exceeds the threshold, the pod scores near zero.
  • max-score-picker: After all scorers run, selects the pod with the highest weighted aggregate score.

The EndpointPickerConfig can be provided inline in the LLMInferenceService spec or referenced from a ConfigMap, giving platform teams the flexibility to standardize scheduling behavior across deployments:

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-with-scheduler
spec:
model:
uri: hf://meta-llama/Llama-3.1-8B-Instruct
name: meta-llama--Llama-3.1-8B-Instruct
replicas: 4
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
config:
ref:
name: custom-endpoint-picker-config
key: endpoint-picker-config.yaml
pool: {}

The GIE CRDs (InferencePool and InferenceModel) are now bundled as part of the KServe installation, simplifying setup.

🔀 Disaggregated Prefill-Decode

LLMInferenceService natively supports disaggregated prefill-decode, which separates the compute-intensive prefill phase from the memory-intensive decode phase into independent workloads. This allows each phase to be scaled and optimized independently.

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-prefill-decode
spec:
model:
uri: hf://meta-llama/Llama-3.1-8B-Instruct
name: meta-llama--Llama-3.1-8B-Instruct
replicas: 2
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
prefill:
replicas: 2
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
pool: {}

KV cache data is transferred between prefill and decode pods using NixlConnector with RDMA-based RoCE for high-throughput, low-latency block transfers.

📐 Distributed Inference: Tensor, Data, and Expert Parallelism

LLMInferenceService introduces a comprehensive parallelism specification for distributed inference across multiple nodes and GPUs using LeaderWorkerSet:

  • Tensor Parallelism (TP): Splits model layers across GPUs within a node
  • Data Parallelism (DP): Runs multiple model replicas for higher throughput
  • Data-Local Parallelism: Controls GPUs per node for optimal NUMA affinity
  • Expert Parallelism (EP): Distributes Mixture-of-Experts (MoE) model experts across GPUs
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-multi-node
spec:
model:
uri: hf://meta-llama/Llama-3.1-70B-Instruct
name: meta-llama--Llama-3.1-70B-Instruct
replicas: 8
parallelism:
tensor: 4
data: 8
dataLocal: 4
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "4"
worker:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "4"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
pool: {}

🌐 Envoy AI Gateway Integration with Token-Based Rate Limiting

LLMInferenceService integrates with Envoy AI Gateway for AI-native traffic management. This enables token-based rate limiting — a capability critical for LLM serving where request cost varies dramatically based on input and output token counts rather than simple request counts.

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: llm-route
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: llama3-serving
llmRequestCosts:
- metadataKey: llm_input_token
type: InputToken
- metadataKey: llm_output_token
type: OutputToken
- metadataKey: llm_total_token
type: TotalToken
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: llm-rate-limit
spec:
targetRefs:
- group: aigateway.envoyproxy.io
kind: AIGatewayRoute
name: llm-route
rateLimit:
type: Global
global:
rules:
- clientSelectors:
- headers:
- name: x-user-id
type: Distinct
limit:
requests: 1000
unit: Hour
cost:
request:
from: Number
number: 0
response:
from: Metadata
key: llm_total_token

⚡ Autoscaling API with WVA Support

A new autoscaling API has been added to LLMInferenceService with support for the Workload Variant Autoscaler (WVA), a Kubernetes-based global autoscaler designed specifically for LLM inference workloads. Traditional CPU/memory-based autoscaling is inadequate for LLMs because inference cost is driven by token throughput, KV cache utilization, and queue depth rather than CPU or memory usage.

WVA continuously monitors inference server metrics via Prometheus — specifically KV cache utilization and queue depth — to determine when servers are approaching saturation. It then computes a wva_desired_replicas metric and emits it to Prometheus, where an actuator backend (HPA or KEDA) reads it to drive the actual scaling:

  • WVA + KEDA: Queries Prometheus directly for the wva_desired_replicas metric. Does not require Prometheus Adapter. Supports idle scale-to-zero via idleReplicaCount.
  • WVA + HPA: Reads the wva_desired_replicas metric via Kubernetes Metrics API. Requires Prometheus Adapter. Supports standard HPA scaling behaviors.

A key concept in WVA is the variant — a specific deployment configuration (hardware, runtime, parallelism strategy) for serving a model. The same base model might be served by multiple variants: for example, Llama-3 on A100 GPUs with TP=4 is one variant, while Llama-3 on H100 GPUs with TP=2 is another. The variantCost field specifies the relative cost per replica for each variant, enabling WVA to make cost-aware scaling decisions across variants — scaling up the cheaper variant first when demand increases, and scaling down the most expensive variant first when demand decreases.

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-wva-autoscaling
spec:
model:
uri: hf://meta-llama/Llama-3.1-8B-Instruct
name: meta-llama--Llama-3.1-8B-Instruct
scaling:
minReplicas: 1
maxReplicas: 10
wva:
variantCost: "15.0"
keda:
pollingInterval: 30
cooldownPeriod: 300
initialCooldownPeriod: 120
idleReplicaCount: 0
fallback:
failureThreshold: 3
replicas: 2
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
pool: {}

In the example above, variantCost: "15.0" indicates the relative cost of running each replica of this variant. If another variant of the same model has variantCost: "5.0", WVA would prefer to add capacity on that cheaper variant before scaling up this one. The default value is "10.0" if not specified. When using the KEDA backend, the fallback field ensures the deployment maintains a minimum replica count (here, 2 replicas) even if the metrics pipeline fails — a critical safety net for production LLM deployments.

🔧 Scheduler High Availability

The LLMInferenceService scheduler (EPP) now supports scaling and high availability, allowing multiple EPP replicas for production deployments that require fault tolerance and higher routing throughput.

🛡️ CRD Webhook Validation

LLMInferenceService now includes CRD webhook validation with comprehensive E2E tests, providing early feedback on invalid configurations before they reach the controller. This catches errors in parallelism settings, workload specifications, and router configurations at admission time.

📋 Configuration Composition with LLMInferenceServiceConfig

LLMInferenceService supports a configuration composition model through LLMInferenceServiceConfig, enabling reusable templates that can be shared across multiple LLMInferenceService resources. The merge order follows:

  1. Well-Known Configs → 2. Explicit BaseRefs → 3. LLMInferenceService Spec

This allows platform teams to define standardized vLLM worker templates, router/scheduler configurations, and resource defaults while giving application teams the ability to override specific settings.

📦 Additional LLMInferenceService Improvements

  • Label and annotation propagation to downstream workload resources (#5009)
  • Prometheus annotation propagation to workloads for metrics collection (#5086)
  • Certificate management with DNS/IP SAN and automatic renewal for self-signed certs (#5099)
  • Improved CA bundle management for secure communication (#4803)
  • Optional storageInitializer — skip model download when using pre-loaded models (#4970)
  • InferencePool auto-migration for seamless upgrades (#5007)
  • Route-only completions through InferencePool for chat/completion endpoints (#5087)
  • Startup probes for vLLM containers for more reliable health monitoring (#5063)
  • vLLM arguments migrated to command field for cleaner configuration (#5049)
  • Versioned well-known config resolution for stable config management (#5096)
  • Scheduler config via ConfigMap or inline for flexible configuration (#4856)
  • Pod init container failure monitoring for better observability (#5034)
  • Preserve externally managed replicas during reconciliation (#4996)
  • Allow stopping LLMInferenceService gracefully (#4839)
  • Enhanced Gateway API URL discovery with listener hostname fallback (#5104, #5079)

🏗️ Modular Component Architecture

KServe v0.17 introduces a fundamental architectural shift toward modular, component-based deployment. KServe now consists of three independent components:

  • kserve (core): Manages InferenceService, ServingRuntime, ClusterServingRuntime, InferenceGraph, and TrainedModel CRDs.
  • llmisvc: The LLMInferenceService controller for generative AI workloads, managing LLMInferenceService and LLMInferenceServiceConfig CRDs.
  • localmodel (optional): The LocalModel controller for efficient model caching with LocalModelCache, LocalModelNode, and LocalModelNodeGroup CRDs.
CombinationUse CaseComponents
KServe OnlyPredictive AIkserve
KServe + LLMIsvcPredictive AI + Generative AIkserve + llmisvc
Full StackPredictive AI + Generative AI + Model Cachingkserve + llmisvc + localmodel

Helm Chart Restructuring

To support the new component architecture, the Helm charts have been completely restructured from a single chart into 10 independent Helm charts:

CRD Charts (6 charts with full and minimal variants):

  • kserve-crd / kserve-crd-minimal
  • kserve-llmisvc-crd / kserve-llmisvc-crd-minimal
  • kserve-localmodel-crd / kserve-localmodel-crd-minimal

Resource Charts (4 charts):

  • kserve-resources (renamed from kserve)
  • kserve-llmisvc-resources (new)
  • kserve-localmodel-resources (new)
  • kserve-runtime-configs (new — manages ClusterServingRuntimes and LLMIsvcConfigs)
warning

This is a breaking change. Users upgrading from v0.16 cannot use a simple helm upgrade command. Please follow the detailed upgrade guide for step-by-step migration instructions. We strongly recommend testing the upgrade in a non-production environment first.

For fresh installations, the new Kustomize component-based architecture also provides composable deployment options via standalone overlays, addon overlays, and all-in-one overlays. See the installation concepts for details.

🔧 InferenceService and Platform Improvements

Storage Performance

  • Parallelized blob downloads from Azure and S3 for faster model loading (#4709, #4714)
  • Faster parallel S3 downloads with configurable file selection (#5102, #5119)
  • Git repository support for downloading models directly from Git repos via HTTPS (#4966)

New Serving Runtimes

  • OpenVINO Model Server — Intel's optimized inference runtime for high-performance serving on Intel hardware (#4592)
  • PredictiveServer runtime with full build/publish infrastructure and E2E testing (#4954)

Gateway & Routing

  • Gateway API upgraded to v1.4.0 (#5038)
  • PathTemplate configuration for flexible inference service routing (#4817)

vLLM Backend

  • Upgraded to vLLM v0.15.1 with performance improvements (#5098)
  • Removed Python 3.9 support (#4851)

Additional Enhancements

  • CSV and Parquet marshallers for expanded data format support (#5115)
  • Event loop configuration with new --event_loop flag supporting auto, asyncio, and uvloop (#4971)
  • Annotation-based runtime defaults for MLServer (#5064)
  • INFERENCE_SERVICE_NAME environment variable exposed to serving containers (#5013)
  • Failure condition surfacing in InferenceService status (#5114)
  • Inference log batching with external marshalling support (#5061)

Infrastructure Updates

  • Kubernetes packages bumped to v0.34.0
  • Knative Serving updated to v1.21.1
  • Go updated to 1.25
  • Kubebuilder updated to 1.9.0
  • KEDA bumped from 2.16.1 to 2.17.3
  • MinIO replaced with SeaweedFS for testing infrastructure

🔒 Security Fixes

Multiple security vulnerabilities have been addressed:

  • CVE-2025-62727 (Starlette)
  • CVE-2025-22872, CVE-2025-47914, CVE-2025-58181
  • CVE-2024-43598 (LightGBM updated to 4.6.0)
  • CVE-2025-43859 (h11 HTTP parsing)
  • CVE-2025-66418 (decompression chain)
  • CVE-2025-68156 (expr-lang/expr)
  • CVE-2026-26007 (cryptography subgroup attack)
  • CVE-2026-24486 (python-multipart arbitrary file write)
  • Path traversal vulnerabilities in https.go and tar extraction

🔍 Release Notes

For the complete list of all 167 merged pull requests, bug fixes, and known issues, visit the GitHub release pages:

🙏 Acknowledgments

We extend our gratitude to all 38+ contributors who made this release possible, including 21 first-time contributors. Your efforts continue to drive the advancement of KServe as a leading platform for serving machine learning models.

  • Core Contributors: The KServe maintainers and regular contributors
  • Community: Everyone who reported issues, provided feedback, and tested features
  • New Contributors: Welcome to all first-time contributors who helped shape this release

🤝 Join the Community

We invite you to explore the new features in KServe v0.17 and contribute to the ongoing development of the project:

Happy serving!


The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!

Kubewarden sandbox

Kubewarden is a Policy Engine powered by WebAssembly policies. Its policies can be written in CEL, Rego (OPA & Gatekeeper flavours), Rust, Go, YAML, and others....

SBOMscanner 0.10 Release

The Kubewarden ecosystem continues to expand its supply chain security capabilities! Hot on the heels of the Admission Controller 1.33 release, we are excited to announce SBOMscanner v0.10.0. This release introduces powerful new features and critical stability fixes. Let’s dive in!
Workload Scan Until now, SBOMscanner required explicit Registry configurations to scan images. However, what usually matters most are the images actively running in your cluster.
The new Workload Scan feature automati

Kubernetes graduated

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications

Announcing the AI Gateway Working Group

The community around Kubernetes includes a number of Special Interest Groups (SIGs) and Working Groups (WGs) facilitating discussions on important topics between interested contributors. Today, we're excited to announce the formation of the AI Gateway Working Group, a new initiative focused on developing standards and best practices for networking infrastructure that supports AI workloads in Kubernetes environments.

Kubewarden sandbox

Kubewarden is a Policy Engine powered by WebAssembly policies. Its policies can be written in CEL, Rego (OPA & Gatekeeper flavours), Rust, Go, YAML, and others....

Admission Controller 1.33 Release

The garden is thriving and Kubewarden 1.33 is ready to bloom! Following last release’s big repotting, this one is serious about pruning, including a security issue. It’s not all housekeeping though, fresh flowers are blooming and come with nice features: BYO-PKI landing in the policy-server, field mask filtering for context-aware calls, proxy support, and a few more treats. Let’s dig in!
Security fix: Cross-namespace data access, removal of deprecated API calls In our previous

CoreDNS graduated

CoreDNS-1.14.2 Release

This release adds the new proxyproto plugin to support Proxy Protocol and preserve client IPs behind load balancers. It also includes enhancements such as improved DNS logging metadata and stronger randomness for loop detection (CVE-2026-26018), along with several bug fixes including TLS+IPv6 forwarding, improved CNAME handling and rewriting, allowing jitter disabling, prevention of an ACL bypass (CVE-2026-26017), and a Kubernetes plugin crash fix. In addition, the release updates the build to G

KServe incubating

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Best of Both Worlds: Cloud-Native AI Inference at Scale using KServe and llm-d

Enterprises today seek to integrate generative AI (GenAI) capabilities into their applications. However, scaling large AI models introduces complexity: managing high-volume traffic from large language models (LLMs), optimizing inference performance, maintaining predictable latency, and controlling infrastructure costs.

Platform engineering leaders require more than just model deployment capabilities. They need a robust, Kubernetes-native infrastructure that supports:

  • Efficient GPU utilization
  • Intelligent request routing
  • Distributed inference patterns
  • Cost-aware autoscaling
  • Production-grade governance

This article demonstrates how two open-source solutions, KServe and llm-d, can be combined to address these challenges.

We explore the role of each solution, illustrate their integration architecture, and provide practical guidance for AI platform teams, with deeper focus on KServe's LLMInferenceService, available since KServe v0.16.

KServe: Simplified Deployment of AI Models on Kubernetes

KServe is a Kubernetes-based model serving platform that simplifies deploying and managing ML models, including LLMs, at scale.

For platform engineers, KServe acts as the model serving control plane: the layer responsible for lifecycle, scaling, and operational governance.

KServe Generative Inference Architecture

Inference as a Service

InferenceService serves as KServe's core abstraction for model deployment, encapsulating the full serving lifecycle, including:

  • Automatic deployment creation and reconciliation
  • Request-based autoscaling with scale-to-zero and autoscaling based on custom metrics
  • Revision management and canary rollouts
  • Endpoint exposure and traffic routing
  • Runtime abstraction across serving backends for both predictive and generative AI
  • Optional pre-processing/post-processing, inference pipelines, and ensembles

ML engineers provide trained models. Platform engineers retain operational control without writing custom deployment code.

LLMInferenceService in KServe

KServe v0.16 introduces stronger generative AI capabilities, including LLMInferenceService, designed specifically for large language model workloads.

Unlike traditional stateless predictors, LLM workloads require:

  • Long-running streaming responses
  • GPU-heavy memory footprints
  • Prefix KV-cache management
  • High-concurrency token streaming
  • OpenAI-compatible APIs

LLMInferenceService shares common foundations with InferenceService but introduces additional capabilities tailored for large language models, including:

Unlocking Generative AI Serving with LLMInferenceService: From Pod-Level Speed to Cluster-Wide Intelligence

Imagine you want to bring the power of generative AI directly into your applications, but without rewriting your entire stack. It offers OpenAI-compatible endpoints like /v1/chat/completions, complete with streaming token responses and multi-turn support. With prompt templating built in, developers can integrate seamlessly with existing tools—whether it's the OpenAI SDKs, LangChain, LlamaIndex, Llama Stack, RAG frameworks, or even enterprise GenAI gateways.

Under the hood, KServe connects to LLM-optimized runtimes such as vLLM, Hugging Face TGI, or other GPU-native backends. These engines bring advanced capabilities like continuous batching, memory-efficient paged attention, and KV-cache reuse, delivering high throughput per GPU.

Yet, while these runtime-level optimizations make each pod lightning fast, true cluster-wide efficiency needs more. That's exactly the role of llm-d: adding an extra layer of intelligence that orchestrates resources and maximizes performance across the entire deployment.

Distributed & Multi-Node Model Support

LLMInferenceService supports advanced parallelism strategies implemented by runtimes, including tensor parallelism, pipeline parallelism, and multi-GPU sharding.

This enables hosting 70B+ parameter models, partitioning models across nodes, and serving models larger than single-GPU memory.

KServe orchestrates the deployment topology, while the runtime manages execution parallelism.

Advanced Autoscaling & Networking (Including Scale-to-Zero)

KServe integrates deeply with Kubernetes to support request- and concurrency-based autoscaling via Knative, GPU-backed scaling, and scale-to-zero for cost control.

It also integrates with the Kubernetes Gateway API for TLS termination, traffic splitting, and advanced routing.

This makes it suitable for development environments, internal copilots, and large-scale production workloads.

Kubernetes Gateway API Integration

KServe integrates with Kubernetes Gateway API for:

  • Enterprise-grade routing
  • TLS termination
  • Traffic splitting
  • Multi-model routing

This enables integration with modern Kubernetes networking stacks.

Where KServe Alone Is Not Enough

Even with LLMInferenceService and optimized runtimes, KServe does not inherently:

  • Route requests based on KV-cache locality across replicas
  • Separate prefill and decode cluster-wide
  • Perform SLA-aware routing decisions
  • Optimize GPU utilization across multiple pods

To address these, we introduce llm-d.

llm-d: Distributed Intelligence for LLM Inference

llm-d is a Kubernetes-native distributed inference framework designed to enhance performance and efficiency of LLM workloads.

If KServe is the control plane for models, llm-d is the distributed intelligence scheduling layer.

llm-d Architecture

KV-Cache Aware Scheduling and Disaggregated Inference with llm-d

As LLM deployments mature, scaling is no longer just about adding GPUs. It's about using them intelligently. Modern runtimes such as vLLM introduced prefix (KV) caching to reduce redundant computation, but without smart scheduling, much of that benefit is lost.

This is where llm-d changes the game.

Disaggregated Inference (Prefill / Decode Separation)

LLM inference consists of two distinct phases: prefill and decode. The prefill phase is compute-heavy, processing the full prompt and building the model's attention context. The decode phase is latency-sensitive, generating tokens step by step where responsiveness directly impacts user experience.

llm-d separates these phases across different GPU groups, assigning compute-optimized resources to prefill and latency-optimized resources to decode. With intelligent scheduling between them, workloads are aligned to the right hardware profile.

This phase-aware architecture increases GPU utilization, reduces tail latency, and lowers cost per token by eliminating resource contention between fundamentally different workloads.

Intelligent Inference Scheduler

llm-d's inference scheduler evaluates the following metrics:

  • GPU utilization
  • Queue depth
  • Cache residency
  • SLA constraints
  • Load distribution

It enhances load balancing with an intelligent scheduler to decrease serving latency and increase throughput with prefix-cache aware routing, utilization-based load balancing, fairness and prioritization for multi-tenant serving, and predicted latency balancing.

KServe LLMInferenceService and llm-d

Responsibility Separation

This layered design ensures composability and specialization, providing a complete, production-ready solution for generative AI. KServe acts as the control plane and LLMInferenceService delivers the generative API abstraction, while llm-d provides the cluster-wide optimization.

LayerResponsibility
KServeModel lifecycle, scaling, governance
LLMInferenceServiceGenerative API abstraction
vLLMEfficient execution inside runtime
llm-dCross-runtime routing & cache awareness
KubernetesResource orchestration

Together, KServe and llm-d enable a production-ready, Kubernetes-native inference platform that balances scalability, performance, and cost efficiency, providing the best of both worlds for cloud-native AI inference at scale.

Cost Efficiency Comparison: Naive vs Optimized

Serving LLMs at scale is no longer just a model problem. It is a distributed systems problem where naive load balancing leads to significant inefficiencies and wasted resources — lost cache locality, GPU imbalance, redundant prefill processing, high tail latency, and overprovisioned GPUs.

Naive Problems:

  • Cache locality loss
  • GPU imbalance
  • Redundant prefill processing
  • High tail latency
  • Overprovisioned GPUs

Optimized Architecture with KServe + llm-d

The combined KServe and llm-d solution introduces distributed intelligence to solve the problems of naive architectures, delivering superior performance, scalability, and cost control. This optimized architecture is pluggable and extensible to work well with many AI and cloud-native technologies.

KServe Layered Architecture

Benefits:

  • Cache reuse preserved
  • Balanced GPU utilization
  • Reduced recomputation
  • Lower cost per token
  • Controlled autoscaling via LLMInferenceService

Benchmark Results: Why Cluster-Level Intelligence Matters

By integrating llm-d's cache-aware routing, prefill and decode disaggregation, and SLA-based scheduling with KServe's enterprise-grade generative serving and autoscaling, the system achieves cluster-wide GPU optimization.

Note: The following results are based on benchmarks published by the llm-d project

Optimization AreaNaive Architecture (Round Robin LB)Optimized (KServe + llm-d)Source
Cache LocalityRequests routed randomly → KV cache frequently missedCache-aware routing preserves prefix localityllm-d blog
Time to First Token (P90)Baseline latency under cache-blind schedulingUp to ~57× faster P90 TTFT in benchmarkllm-d blog
Token Throughput~4,400 tokens/sec (baseline test cluster)~8,730 tokens/sec (~2× improvement)llm-d blog
Throughput at ScaleDegrades under multi-tenant loadSustained 4.5k–11k tokens/secllm-d blog
Tail Latency (P95/P99)Higher tail latency due to stragglers & imbalance~50% tail latency reduction (reported tests)Red Hat Developers
GPU UtilizationUneven utilization, idle GPUs possibleImproved effective utilization via routing intelligencellm-d docs
Autoscaling ControlScale reacts to load onlyWorks with KServe autoscaling + routing intelligenceKServe docs

Modern GenAI platforms require cache locality awareness, phase-aware scheduling, distributed intelligence, and composable Kubernetes-native design. This combination ensures a production-ready system that meets the demands of large-scale production workloads.

Next Steps

Explore detailed project documentation:

Engage with community resources and Slack channels to stay updated and contribute to ongoing developments:

Score sandbox

Score is an open-source workload specification designed to simplify development for cloud-native developers.

Score at KubeCon EU 2026 in Amsterdam

After three KubeCons: in Salt Lake City in 2024, in London in 2025 and most recently in Atlanta in 2025, Score will be very well represented in Amsterdam for its fourth KubeCon as CNCF Sandbox project.
This year’s updates and community achievements mark another exciting milestone, and we’re eager to connect with the cloud-native community to showcase how Score is evolving.
Here are three opportunities to hear more about Score and meet with its maintainers at this year’s KubeCon

Kubernetes graduated

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications

Before You Migrate: Five Surprising Ingress-NGINX Behaviors You Need to Know

As announced November 2025, Kubernetes will retire Ingress-NGINX in March 2026. Despite its widespread usage, Ingress-NGINX is full of surprising defaults and side effects that are probably present in your cluster today. This blog highlights these behaviors so that you can migrate away safely and make a conscious decision about which behaviors to keep. This post also compares Ingress-NGINX with Gateway API and shows

Dapr graduated

The Distributed Application Runtime (Dapr) provides APIs that simplify microservice architecture development and increases developer productivity. Whether your communication pattern is service-to-service invocation or pub/sub messaging, Dapr helps you write resilient and secured microservices....

Dapr v1.17 is now available

We’re excited to announce the release of Dapr 1.17! This release introduces workflow versioning, giving you the tools to safely evolve long-running workflow code without breaking in-flight instances. Combined with new state retention policies, up to 41% higher workflow throughput, and end-to-end tracing, Dapr Workflows are now ready for the most demanding production workloads. This release also stabilizes the Bulk PubSub API, improves Placement service resilience, and adds new CLI commands

Kairos sandbox

Transform any Linux system into a secure, customizable, and easily managed platform for edge computing with or without Kubernetes.

Kairos release v4.0.0

Kairos v4.0.0 is the result of a clear architecture path we have been building over time: first standardizing image creation with kairos-init, then making that flow easy to run anywhere with Kairos Factory and kairos-factory-action, and finally introducing Hadron.

In March 2025, we introduced kairos-init, a foundational shift that removed Dockerfile-heavy distro logic and standardized how we transform OCI bases into Kairos images.

From there, we focused on operationalizing that model so anyone could run it. With Kairos Factory, users can build and maintain their own Kairos pipelines using the same tooling we run in production.

In December 2025, we introduced Hadron, our upstream-first Linux base for immutable systems. In v4, Hadron artifacts are what the project publishes in its release flow.

At the same time, distro flexibility remains core to Kairos. This is visible in active community work such as Oracle Linux support in kairos-io/kairos#3987.

For additional migration and build context, read Hadron-Only Artifacts with Ongoing Distro Support.

What v4 delivers in practice

The key technical reassurance in v4 is compatibility in the build pipeline. Between v3.7.2 and v4.0.0, the kairos-init version did not change.

This matters because kairos-init is where the core Kairos transformation components are pinned. When that set of components is unchanged, build behavior remains compatible in practice across releases. In practical terms, if your non-Hadron flavor pipeline built successfully with v3.7.2, it should continue to build with v4.0.0 under the same assumptions.

For the complete technical details of v4.0.0, see the release notes: kairos v4.0.0.

How to switch to v4

There are two official paths to adopt Kairos v4, and the right one depends on your environment and goals.

One path is to use Hadron-based images, which are now published as part of the project release flow.

The other path is to keep your current distro flavor and build your own v4 release pipeline. With Kairos Factory, you can build and publish your flavor using the same production tooling model we use ourselves.

For migration guidance and build details, read: Hadron-Only Artifacts with Ongoing Distro Support.

Kairos sandbox

Transform any Linux system into a secure, customizable, and easily managed platform for edge computing with or without Kubernetes.

Hadron-Only Artifacts with Ongoing Distro Support

Starting with Kairos v4, the Kairos project will publish Hadron-based artifacts only.

This decision allows us to reduce maintenance costs, focus engineering effort where it has the most impact, and ship improvements faster.

More context in the original issue: #3806.

What is changing

Official artifacts published by the Kairos project for v4 and later will be based on Hadron.

What is not changing

Support for different Linux distributions remains one of the core features of Kairos.

The kairos-init component, which is responsible for converting those distributions into Kairos variants, continues to validate a wide distro matrix: 8 different distros and multiple releases.

You can see the exact test matrix here: kairos-init distro test matrix.

Build your own distro pipeline

If you are using GitHub, building your own release pipeline for a specific distribution is straightforward with kairos-factory-action.

It is mostly a matter of setting a few parameters, and it is exactly how we still build one-distro pipelines today: Kairos release workflow example.

If you are not using GitHub, no problem: you can build directly with AuroraBoot: AuroraBoot reference docs.

Why we believe this is the right move

We understand this may introduce some inconvenience for users who currently rely on project-published artifacts for multiple distros.

At the same time, this change helps us keep Kairos sustainable and gives us more time to focus on features, quality, and platform evolution.

From our side, the overall release matrix grew beyond what we could reliably sustain. Managing multiple distributions across provider variants and Kubernetes combinations (k0s and k3s, each with multiple versions) pushed us to more than 500 artifacts per release cycle.

At that scale, keeping pipelines, CI, and verification checks consistently healthy became increasingly difficult, and the risk of missing artifact signatures or specific version outputs was no longer acceptable.

It also aligns with the Kairos ethos: giving end users control over their own release cadence and distribution flexibility. We do not want users to be blocked by our release timing when their priorities are different. If a critical CVE appears in a component that matters to your environment, you should be able to rebuild and release on your own schedule instead of waiting for our next release window.

Just as importantly, different teams need different upgrade policies. Some users may want kernel or base OS patch updates without moving to a newer k3s version. Others may want to avoid agent bumps and only roll selected OS fixes. This shift is about making that level of control practical, so each team can decide what to update, when to update it, and how fast to promote it through their own pipeline.

Feedback and support

If you have questions, issues, or feedback, please reach out:

Open an issue on GitHub: https://github.com/kairos-io/kairos/issues/new/choose.

Join the CNCF Slack (#kairos): https://slack.cncf.io/.

If you or your team plan to publicly build and distribute Kairos for a specific distro, please let us know. We would love to coordinate and help identify the best way to promote you.

Backstage incubating

Backstage is an open platform for building developer portals, which unify all your infrastructure tooling, services, and documentation with a single, consistent UI.

Get a jump on ContribFest with the new web app

Get a jump on ContribFest with the new web app

Become a Contrib Champ and join us at ContribFest, where commits become legendary!

We are once again hosting ContribFest at KubeCon + CloudNativeCon. This time around, it's taking place in Amsterdam on March 26, 2026, at 13:45 CET — make sure to add it to your schedule. Learn more about what to expect below and get started now by exploring the new ContribFest web app.

Introducing the ContribFest web app

We're excited to announce the new ContribFest web app: https://contribfest.backstage.io/. The app simplifies local setup and helps you quickly find good issues to work on from the curated list pre-selected by your ContribFest co-hosts.

You'll see that the app is broken down into five sections:

  • Welcome: This is where you'll find links to all the things, including the session's slide deck, assignment sheet, the Backstage and Community Plugins repositories, and their respective contribution guides.
  • Getting Started: Whether you are new to Backstage or an old hat, use this handy checklist to help you get your local environment set up for contributing, including all the commands. (Make sure you check all the boxes, you never know what might happen! 😉)
  • Curated Issues: This is what you come to the session for: finding an issue that speaks to you and contributing towards it. This section has a list of issues that we've curated — and filters, so you can slice and dice the list to find the perfect issue to work on.
  • Contrib Champs: We've hosted three other ContribFests in the past — this is where you'll find merged PRs from those sessions, a place to celebrate contributions. Make sure to tag your PRs with “ContribFest”, and maybe your name will show up here one day, too! 🏆
  • Hall of Hosts: ContribFest would not take place without the various community members who have stepped up to help co-host the sessions. This is where you'll see an honor roll of past co-hosts. 🙏

About those Contrib Champs

The goals of the Backstage ContribFest sessions are many — foster community, work with experts, etc. — but it's pretty obvious that contributions are the most important. It's in the name after all. Here are a few past contributions that we wanted to share to give you an idea of what that looks like:

  • #27694 by hyb175 — Add Pagination to Tech Docs Table: for those with lots of entities with TechDocs, this is a massive performance improvement.
  • #29470 by ioboi — Openshift Auth provider: this allows those using OpenShift to use it to sign into their Backstage instance.
  • #31770 by theZMC — Render HTML in GitHub-flavored Markdown: with this change in place, HTML will now render correctly in the MarkdownContent component when you are using the GitHub-flavored Markdown mode.

Check out the Contrib Champs page to see the full list!

Using Dev Containers

Along with the new ContribFest web app, we are also looking to use Dev Containers this time around to help streamline the session for those who'd like to use that option to get started. On the Getting Started page, pick the Dev Containers radio button and then follow the checklist. To give you a quick preview, you'll need to have the following installed:

  • Git, you'll need this to be able to pull down the code
  • Docker Desktop (or Docker Engine on Linux)
  • VS Code with the Dev Containers extension or IntelliJ IDEA Ultimate

Check out our Dev Containers tutorial for a deeper dive into the subject.

Amsterdam, here we come!

On behalf of the Backstage ContribFest co-host team, thank you for following along. We're looking forward to meeting you in Amsterdam and working together on your contributions. Please be sure to introduce yourself!

Flux graduated

Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories and OCI artifacts), and automating updates to configuration when there is new code to deploy. Flux is built from the ground up to use Kubernetes' API extension system, and to integrate with Prometheus and other core components of the Kubernetes ecosystem....

Blog: Announcing Flux 2.8 GA

We are thrilled to announce the release of Flux v2.8.0! In this post, we highlight some of the new features and improvements included in this release.

Highlights

Flux v2.8 comes

Istio graduated

Simplify observability, traffic management, security, and policy with the Istio service mesh.

Ambient multi-network multicluster support is now Beta

Our team of contributors has been busy throughout the transition to 2026. A lot of work was done to get the multi-network multicluster for ambient to production ready state. Improvements were made in areas from our internal tests, up to the most popular multi-network multicluster asks in ambient, with a big focus on telemetry.

Gaps in Telemetry

The benefits of a multicluster distributed system are not without their tradeoffs. Some complexity is inevitabl

Meshery sandbox

As a self-service engineering platform, Meshery enables collaborative design and operation of cloud and cloud native infrastructure.

Meshery Workspaces

In modern cloud-native environments, platform engineering teams face a persistent challenge: how do you enable multiple teams to collaborate on infrastructure designs, share best practices, and maintain consistency across diverse deployment environments—all while preserving team autonomy and security boundaries?

Enter Meshery Workspaces: a powerful organizational construct designed to facilitate exactly this kind of cross-team collaboration. Workspaces provide isolated, multi-tenant environments where teams can design, test, and share cloud-native infrastructure patterns without stepping on each other’s toes.

What Are Meshery Workspaces?

At their core, Workspaces are logical boundaries within Meshery that allow you to organize your infrastructure designs, deployments, and environments. Think of them as collaborative project spaces where:

  • Teams maintain separate environments for development, staging, and production
  • Designs and configurations are scoped to specific organizational units or projects
  • Access controls ensure that only authorized team members can view or modify resources
  • Shared catalog items can be imported and customized without affecting the source

Each Workspace operates as an independent namespace, complete with its own set of:

  • Designs: Visual topology configurations for your infrastructure
  • Environments: Kubernetes clusters and cloud provider connections
  • Credentials: Securely stored access tokens and keys
  • Team members: Role-based access control for collaborators

Why Platform Engineers Love Workspaces

Platform engineering is fundamentally about creating self-service capabilities that empower development teams while maintaining organizational standards. Workspaces excel at this by providing:

1. Isolation with Collaboration

Workspaces give each team their own sandbox to experiment and iterate, while still allowing platform engineers to share golden paths and reference architectures across the organization.

Organization
├── Platform Team Workspace
│   ├── Reference architectures
│   ├── Approved patterns
│   └── Baseline configurations
├── Backend Team Workspace
│   ├── Microservices infrastructure
│   └── API gateway configurations
└── Data Team Workspace
    ├── Data pipeline designs
    └── Analytics infrastructure

2. Design Reusability

Rather than reinventing the wheel, teams can clone proven designs from Meshery’s public Catalog or from other internal Workspaces. This accelerates development while ensuring consistency with organizational standards.

3. Environment Management

Platform engineers can configure multiple Kubernetes clusters and cloud environments within a Workspace, making it trivial to promote designs from dev to staging to production with confidence.

4. Audit and Compliance

Every action within a Workspace is tracked, providing clear visibility into who made what changes and when. This audit trail is invaluable for compliance and troubleshooting.

Real-World Example: Multi-Team Microservices Platform

Let’s walk through a practical scenario. Imagine you’re a platform engineer at a company building a microservices platform. You have:

  • A Platform Team maintaining infrastructure standards
  • Multiple application teams deploying microservices
  • A security team enforcing policies

Here’s how you’d leverage Workspaces:

Step 1: Create the Golden Path Workspace

The Platform Team creates a Workspace containing reference architectures:

  • Service mesh baseline (Istio with mTLS enabled)
  • Observability stack (Prometheus, Grafana, Jaeger)
  • Ingress patterns with rate limiting
  • Database operator configurations

These designs are marked as “approved” and published to your organization’s internal catalog section.

Step 2: Application Teams Clone and Customize

The Backend Team needs to deploy a new microservice. Instead of starting from scratch:

  1. They navigate to the Meshery Catalog
  2. Find the “Service Mesh Baseline” design from the Platform Team’s published patterns
  3. Click “Clone to Workspace”
  4. The design appears in their Workspace, ready for customization

Now they can:

  • Adjust resource limits for their specific workload
  • Add application-specific sidecars
  • Configure custom routing rules
  • Deploy to their dev environment for testing

Step 3: Iterate with Confidence

As the Backend Team refines their design:

  • Changes are isolated to their Workspace
  • They can validate configurations against Kubernetes and OPA policies
  • The visual designer shows real-time topology updates
  • Once tested, they promote to staging and production environments

Step 4: Share Back to the Organization

After proving their pattern works, the Backend Team can publish their customized design back to the internal catalog with a description like “High-Throughput API Service Pattern.” Now other teams benefit from their learnings.

Workspace Features That Drive Adoption

Visual Design Collaboration

Meshery’s visual designer is Workspace-aware, meaning multiple team members can collaborate on the same infrastructure design in real-time, similar to Figma for infrastructure-as-code.

Environment Promotion

Workspaces support multi-environment workflows, allowing you to:

  • Test designs in a sandbox cluster
  • Validate in staging with production-like data
  • Deploy to production with a single click (with appropriate RBAC checks)

Team Management

Fine-grained role-based access control ensures:

  • Owners can manage Workspace settings and membership
  • Editors can modify designs and environments
  • Viewers can browse designs without making changes

Integration with GitOps

Workspaces integrate seamlessly with GitOps workflows:

  • Export designs as Kubernetes manifests or Helm charts
  • Commit to your Git repository
  • Let your CI/CD pipeline apply changes
  • Meshery tracks drift between desired and actual state

Try It Yourself: Clone a Design from the Catalog

Ready to experience the power of Workspaces? Here’s a hands-on challenge:

  1. Sign up for Meshery Cloud (free tier available) or install Meshery locally
  2. Navigate to the Catalog at meshery.io/catalog
  3. Find a design that interests you, such as:
  4. Click “Clone to Workspace” - this imports the design into your personal Workspace
  5. Open the Visual Designer and explore the topology:
    • See how components are connected
    • Adjust configurations to match your environment
    • Add or remove resources as needed
  6. Connect your Kubernetes cluster (if you haven’t already)
  7. Deploy the design to your environment with a single click
  8. Monitor the deployment through Meshery’s built-in observability features

Within minutes, you’ll have a production-ready infrastructure pattern running in your cluster, customized to your needs. That’s the power of Workspaces and the Catalog working together.

Best Practices for Workspace Organization

Based on patterns we’ve seen from successful platform engineering teams:

Structure by Team and Environment

├── platform-team-workspace
│   ├── Environment: prod-cluster
│   ├── Environment: staging-cluster
│   └── Designs: Reference architectures
├── backend-team-dev-workspace
│   ├── Environment: dev-cluster-1
│   └── Designs: Experimental features
└── backend-team-prod-workspace
    ├── Environment: prod-cluster
    └── Designs: Production deployments

Establish Naming Conventions

  • Workspaces: <team>-<environment>-workspace
  • Designs: <service>-<version>-<purpose>
  • Environments: <cluster-name>-<region>

Implement a Promotion Workflow

  1. Develop and test in team dev Workspace
  2. Promote stable designs to team staging Workspace
  3. After validation, promote to team prod Workspace
  4. Share successful patterns to organization-wide Workspace

Leverage Teams and RBAC

  • Create Teams that span multiple Workspaces
  • Grant minimum necessary permissions
  • Use Viewer role for cross-team visibility
  • Reserve Owner role for platform engineers

The Future of Collaborative Infrastructure

Meshery Workspaces represent a fundamental shift in how platform engineering teams approach infrastructure management. By combining:

  • Visual design tools that make complex topologies understandable
  • Collaborative features that break down silos
  • Reusable patterns through the Catalog
  • Strong isolation with flexible sharing

Workspaces empower organizations to move faster without sacrificing control. They enable the kind of self-service infrastructure that development teams crave, while giving platform engineers the governance capabilities they need.

Get Started Today

Whether you’re managing infrastructure for a small startup or orchestrating deployments across a large enterprise, Meshery Workspaces can transform how your teams collaborate.

Start your journey:

Join the community:

Clone a design, customize it in your Workspace, and deploy it to your cluster today. Experience firsthand how Workspaces can accelerate your platform engineering efforts while maintaining the control and visibility your organization demands.


Have questions about Workspaces or want to share your use case? Join us in the Meshery community - we’d love to hear from you!

Kubewarden sandbox

Kubewarden is a Policy Engine powered by WebAssembly policies. Its policies can be written in CEL, Rego (OPA & Gatekeeper flavours), Rust, Go, YAML, and others....

Not affected by cross-ns privilege escalation via policy api call

Why Kubewarden is not affected by CVE-2026-22039 The recent vulnerability CVE-2026-22039 is doing the rounds in the Kubernetes security community, with dramatic titles such as “How an admission controller vulnerability turned Kubernetes namespaces into a security illusion”. You can read about people doubting admission controllers, claiming they have too much power, or they represent too high a value target.
In this blogpost, we reassure Kubewarden users that they aren’t affecte

Prometheus graduated

metrics-based monitoring and alerting

Modernizing Prometheus: Native Storage for Composite Types

Over the last year, the Prometheus community has been working hard on several interesting and ambitious changes that previously would have been seen as controversial or not feasible. While there might be little visibility about those from the outside (e.g., it's not an OpenClaw Prometheus plugin, sorry 🙃), Prometheus developers are, organically, steering Prometheus into a certain, coherent future. Piece by piece, we unexpectedly get closer to goals we never dreamed we would achieve as an open-source project!

This post starts (hopefully!) as a series of blog posts that share a few ambitious shifts that might be exciting to new and existing Prometheus users and developers. In this post, I'd love to focus on the idea of native storage for the composite types which is tidying up a lot of challenges that piled up over time. Make sure to check the provided inlined links on how you can adopt some of those changes early or contribute!

CAUTION: Disclaimer: This post is intended as a fun overview, from my own personal point of view as a Prometheus maintainer. Some of the mentioned changes haven't been (yet) officially approved by the Prometheus Team; some of them were not proved in production.

NOTE: This post was written by humans; AI was used only for cosmetic and grammar fixes.

Classic Representation: Primitive Samples

As you might know, the Prometheus data model (so server, PromQL, protocols) supports gauges, counters, histograms and summaries. OpenMetrics 1.0 extended this with gaugehistogram, info and stateset types.

Impressively, for a long time Prometheus' TSDB storage implementation had an explicitly clean and simple data model. The TSDB allowed the storage and retrieval of string-labelled primitive samples containing only float64 values and int64 timestamps. It was completely metric-type-agnostic.

The metric types were implied on top of the TSDB, for humans and best effort tooling for PromQL. For simplicity, let's call this way of storing types a classic model or representation. In this model:

We have primitive types:

  • gauge is a "default" type with no special rules, just a float sample with labels.

  • counter that should have a _total suffix in the name for humans to understand its semantics.

    foo_total 17.0
    
  • info that needs an _info suffix in the metric name and always has a value of 1.

We have composite types. This is where the fun begins. In the classic representation, composite metrics are represented as a set of primitive float samples:

  • histogram is a group of counters with certain mandatory suffixes and le labels:

    foo_bucket{le="0.0"} 0
    foo_bucket{le="1e-05"} 0
    foo_bucket{le="0.0001"} 5
    foo_bucket{le="0.1"} 8
    foo_bucket{le="1.0"} 10
    foo_bucket{le="10.0"} 11
    foo_bucket{le="100000.0"} 11
    foo_bucket{le="1e+06"} 15
    foo_bucket{le="1e+23"} 16
    foo_bucket{le="1.1e+23"} 17
    foo_bucket{le="+Inf"} 17
    foo_count 17
    foo_sum 324789.3
    
  • gaugehistogram, summary, and stateset types follow the same logic – a group of special gauges or counters that compose a single metric.

The classic model served the Prometheus project well. It significantly simplified the storage implementation, enabling Prometheus to be one of the most optimized, open-source time-series databases, with distributed versions based on the same data model available in projects like Cortex, Thanos, and Mimir, etc.

Unfortunately, there are always tradeoffs. This classic model has a few limitations:

  • Efficiency: It tends to yield overhead for composite types because every new piece of data (e.g., new bucket) takes precious index space (it's a new unique series), whereas samples are significantly more compressible (rarely change, time-oriented).
  • Functionality: It poses limitations to the shape and flexibility of the data you store (unless we'd go into some JSON-encoded labels, which have massive downsides).
  • Transactionality: Primitive pieces of composite types (separate counters) are processed independently. While we did a lot of work to ensure write isolation and transactionality for scrapes, transactionality completely breaks apart when data is received or sent via remote write, OTLP protocols. For example, a classic histogram foo might have been partially sent, its foo_bucket{le="1.1e+23"} 17 counter series be delayed or dropped accidentally, which risks triggering false positive alerts.
  • Reliability: Consumers of the TSDB data have to essentially guess the type semantics. There's nothing stopping users from writing a foo_bucket gauge or foo_total histogram.

A Glimpse of Native Storage for Composite Types

The classic model was challenged by the introduction of native histograms. The TSDB was extended to store composite histogram samples other than float. We tend to call this a native histogram, because TSDB can now "natively" store a full (with sparse and exponential buckets) histogram as an atomic, composite sample.

At that point, the common wisdom was to stop there. The special advanced histogram that's generally meant to replace the "classic" histograms uses a composite sample, while the rest of the metrics use the classic model. Making other composite types consistent with the new native model felt extremely disruptive to users, with too much work and risks. A common counter-argument was that users will eventually migrate their classic histograms naturally, and summaries are also less useful, given the more powerful bucketing and lower cost of native histograms.

Unfortunately, the migration to native histograms was known to take time, given the slight PromQL change required to use them, and the new bucketing and client changes needed (applications have to define new or edit existing metrics to new histograms). There will also be old software used for a long time that never migrates. Eventually, it leaves Prometheus with no chance of deprecating classic histograms, with all the software solutions required to support the classic model, likely for decades.

However, native histograms did push TSDB and the ecosystem into that new composite sample pattern. Some of those changes could be easily adapted to all composite types. Native histograms also gave us a glimpse of the many benefits of that native support. It was tempting to ask ourselves: would it be possible to add native counterparts of the existing composite metrics to replace them, ideally transparently?

Organically, in 2024, for transactionality and efficiency, we introduced a native histogram custom buckets(NHCB) concept that essentially allows storing classic histograms with explicit buckets natively, reusing native histogram composite sample data structures.

NHCB has proven to be at least 30% more efficient than the classic representation, while offering functional parity with classic histograms. However, two practical challenges emerged that slowed down the adoption:

  1. Expanding, that is converting from NHCB to classic histogram, is relatively trivial, but combining, which is turning a classic histogram into NHCB, is often not feasible. We don't want to wait for client ecosystem adoption, and also being mindful of legacy, hard to change software, we envisioned NHCB being converted (so combined) on scrape from the classic representation. That has proven to be somewhat expensive on scrape. Additionally, combination logic is practically impossible when receiving "pushes" (e.g., remote write with classic histograms), as you could end up having different parts of the same histogram sample (e.g., buckets and count) sent via different remote write shards or sequential messages. This combination challenge is also why OpenTelemetry collector users see an extra overhead on prometheusreceiver as the OpenTelemetry model strictly follows the composite sample model.

  2. Consumption is slightly different, especially in the PromQL query syntax. Our initial decision was to surface NHCB histograms using a native-histogram-like PromQL syntax. For example the following classic histogram:

    foo_bucket{le="0.0"} 0
    # ...
    foo_bucket{le="1.1e+23"} 17
    foo_bucket{le="+Inf"} 17
    foo_count 17
    foo_sum 324789.3
    

    When we convert this to NHCB, you can no longer use foo_bucket as your metric name selector. Since NHCB is now stored as a foo metric, you need to use:

    histogram_quantile(0.9, sum(foo{job="a"}))
    
    

    Old syntax: histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))

    This has also another effect. It violates our "what you see is what you query" rule for the text formats, at least until OpenMetrics 2.

    On top of that, similar problems occur on other Prometheus outputs (federation, remote read, and remote write).

NOTE: Fun fact: Prometheus client data model (SDKs) and PrometheusProto scrape protocol use the composite sample model already!

Transparent Native Representation

Let's get straight to the point. Organically, the Prometheus community seems to align with the following two ideas:

  • We want to eventually move to a fully composite sample model on the storage layer, given all the benefits.
  • Users needs to be able to switch (e.g., on scrape) from classic to native form in storage without breaking consumption layer. Essentially to help with non-trivial migration pains (finding who use what, double-writing, synchronizing), avoiding tricky, dual mode, protocol changes and to deprecate the classic model ASAP for the sustainability of the Prometheus codebase, we need to ensure eventual consumption migration e.g., PromQL queries -- independently to the storage layer.

Let's go through evidence of this direction, which also represents efforts you can contribute to or adopt early!

  1. We are discussing the "native" summary and stateset to fully eliminate classic model for all composite types. Feel free to join and help on that work!

  2. We are working on the OpenMetrics 2.0 to consolidate and improve the pull protocol scene and apply the new learnings. One of the core changes will be the move to composite values in text, which makes the text format trivial to parse for storages that support composite types natively. This solves the combining challenge. Note that, by default, for now, all composite types will be still "expanded" to classic format on scrape, so there's no breaking change for users. Feel free to join our WG to help or give feedback.

  3. Prometheus receive and export protocol has been updated. Remote Write 2.0 allows transporting histograms in the "native" form instead of a classic representation (classic one is still supported). In the future versions (e.g. 2.1), we could easily follow a similar pattern and add native summaries and stateset. Contributions are welcome to make Remote Write 2.0 stable!

  4. We are experimenting with the consumption compatibility modes that translate the composite types store as composite samples to classic representation. This is not trivial; there are edge cases, but it might be more feasible (and needed!) than we might have initially anticipated. See:

    In PromQL it might work as follows, for an NHCB that used to be a classic histogram:

    # New syntax gives our "foo" NHCB:    
    histogram_quantile(0.9, sum(foo{job="a"}))
    # Old syntax still works, expanding "foo" NHCB to classic representation:
    histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))
    

    Alternatives, like a special label or annotations, are also discussed.

When implemented, it should be possible to fully switch different parts of your metric collection pipeline to native form transparently.

Summary

Moving Prometheus to a native composite type world is not easy and will take time, especially around coding, testing and optimizing. Notably it switches performance characteristics of the metric load from uniform, predictable sample sizes to a sample size that depends on a type. Another challenge is code architecture - maintaining different sample types has already proven to be very verbose (we need unions, Go!).

However, recent work revealed a very clean and possible path that yields clear benefits around functionality, transactionality, reliability, and efficiency in the relatively near future, which is pretty exciting!

If you have any questions around these changes, feel free to:

  • DM me on Slack.
  • Visit the #prometheus-dev Slack channel and share your questions.
  • Comment on related issues, create PRs, also review PRs (the most impactful work!)

The Prometheus community is also at KubeConEU 2026 in Amsterdam! Make sure to:

I'm hoping we can share stories of other important, orthogonal shifts we see in the community in future posts. No promises (and help welcome!), but there's a lot to cover, such as (random order, not a full list):

  1. Our native start timestamp feature journey that cleanly unblocks native delta temporality without "hacks" like reusing gauges, separate layer of metric types or label annotations e,g., __temporality__.
  2. Optional schematization of Prometheus metrics that attempt to solve a ton of stability problems with metric naming and shape; building on top of OpenTelemetry semconv.
  3. Our metadata storage journey that attempts to improve the OpenTelemetry Entities and resource attributes storage and consumption experience.
  4. Our journey to organize and extend Prometheus scrape pull protocols with the recent ownership move of OpenMetrics.
  5. An incredible TSDB Parquet effort, coming from the three LTS project groups (Cortex, Thanos, Mimir) working together, attempting to improve high-cardinality cases.
  6. Fun experiments with PromQL extensions, like PromQL with pipes and variables and some new SQL transpilation ideas.
  7. Governance changes.

See you in open-source!

Kubernetes graduated

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications

Spotlight on SIG Architecture: API Governance

This is the fifth interview of a SIG Architecture Spotlight series that covers the different subprojects, and we will be covering SIG Architecture: API Governance.

In this SIG Architecture spotlight we talked with Jordan Liggitt, lead of the API Governance sub-project.

Introduction

Confidential Containers sandbox

Confidential Containers is an open source community working to enable cloud native confidential computing by leveraging Trusted Execution Environments to protect containers and data.

Deploy Trustee in Kubernetes

Introduction

In this blog, we’ll be going through the deployment of Trustee, the Key Broker Service that provides keys/secrets to clients that want to execute workloads confidentially. Trustee provides a built-in attestation service that complies to the

Kubewarden sandbox

Kubewarden is a Policy Engine powered by WebAssembly policies. Its policies can be written in CEL, Rego (OPA & Gatekeeper flavours), Rust, Go, YAML, and others....

Kubewarden 1.32 Release

Another year rolls around, and Kubewarden is still growing like a well-watered houseplant! Kubewarden got a New Year’s resolution to tidy up and repot, and has gone full on with digital gardening. This release is a maintenance one, with big moves to monorepos and a refresh in release artifacts.
New Admission Controller monorepo With the addition of SBOMscanner to the Kubewarden harvest, we saw a great opportunity for cleanup on the Admission Controller side.

Kubeflow incubating

Kubeflow is the foundation of tools for AI Platforms on Kubernetes.

Introducing the Metaflow-Kubeflow Integration

A tale of two flows: Metaflow and Kubeflow

Metaflow is a Python framework for building and operating ML and AI projects, originally developed and open-sourced by Netflix in 2019. In many ways, Kubeflow and Metaflow are cousins: closely related in spirit, but designed with distinct goals and priorities.

Metaflow emerged from Netflix’s need to empower data scientists and ML/AI developers with developer-friendly, Python-native tooling, so that they could easily iterate quickly on ideas, compare modeling approaches, and ship the best solutions to production without heavy engineering or DevOps involvement. On the infrastructure side, Metaflow started with AWS-native services like AWS Batch and Step Functions, later expanding to provide first-class support for the Kubernetes ecosystem and other hyperscaler clouds.

In contrast, Kubeflow began as a set of Kubernetes operators for distributed TensorFlow and Jupyter Notebook management. Over time, it has evolved into a comprehensive Cloud Native AI ecosystem, offering a broad set of tools out of the box. These include Trainer, Katib, Spark Operator for orchestrating distributed AI workloads, Workspaces for interactive development environments, Hub for AI catalog and artifacts management, KServe for model serving, and Pipelines to deploy end-to-end ML workflows and stitching Kubeflow components together.

Over the years, Metaflow has delighted end users with its intuitive APIs, while Kubeflow has delivered tons of value to infrastructure teams through its robust platform components. This complementary nature of the tools motivated us to build a bridge between the two: you can now author projects in Metaflow and deploy them as Kubeflow Pipelines, side by side with your existing Kubeflow workloads.

Why Metaflow → Kubeflow

In the most recent CNCF Technology Radar survey from October 2025, Metaflow got the highest positive scores in the “likelihood to recommend” and “usefulness” categories, reflecting its success in providing a set of stable, productivity-boosting APIs for ML/AI developers.

Metaflow spans the entire development lifecycle—from early experimentation to production deployment and ongoing operations. To give you an idea, the core features below illustrate the breadth of its API surface, grouped by project stage:

Development

Scaling

Deployment

  • Maintain a clear separation between experimentation, production, and individual developers through namespaces.

  • Adopt CI/CD and GitOps best practices through branching.

  • Compose large, reactive systems through isolated sub-flows with event triggering.

These features provide a unified, user-facing API for the capabilities required by real-world ML and AI systems. Behind the scenes, Metaflow is built on integrations with production-quality infrastructure, effectively acting as a user-interface layer over platforms like Kubernetes - and now, Kubeflow. The diagram below illustrates the division of responsibilities: kubeflow-metaflow-arch

The key benefit of the Metaflow–Kubeflow integration is that it allows organizations to keep their existing Kubernetes and Kubeflow infrastructure intact, while upgrading the developer experience with higher-level abstractions and additional functionality, provided by Metaflow.

Currently, the integration supports deploying Metaflow flows as Kubeflow Pipelines. Once you have Metaflow tasks running on Kubernetes, you can access other components such as Katib and Trainer from Metaflow tasks through their Python clients as usual.

Metaflow → Kubeflow in practice

As the integration requires no changes in your existing Kubeflow infrastructure, it is straightforward to get started. You can deploy Metaflow in an existing cloud account (GCP, Azure, or AWS) or you can install the dev stack on your laptop with a single command.

Once you have Metaflow and Kubeflow running independently, you can install the extension providing the integration (you can follow instructions in the documentation):

pip install metaflow-kubeflow

The only configuration needed is to point Metaflow at your Kubeflow Pipelines service, either by adding the following line in the Metaflow config or by setting it as an environment variable:

METAFLOW_KUBEFLOW_PIPELINES_URL = "http://my-kubeflow"

After this, you can author a Metaflow flow as usual and test it locally:

python flow.py run

which runs the flow quickly as local processes. If everything looks good, you can deploy the flow as a Kubeflow pipeline:

python flow.py kubeflow-pipelines create

This will package all the source code and dependencies of the flow automatically, compile the Metaflow flow into a Kubeflow Pipelines YAML and deploy it to Kubeflow, which you can see alongside your existing pipelines in the Kubeflow UI. The following screencast shows the process in action:

The integration doesn’t have 100% feature coverage yet: Some Metaflow features such as conditional and recursive steps are not yet supported. In future versions, we may also provide additional convenience APIs for other Kubeflow components, such as KServe - or you can easily implement them by yourself as custom decorators with the Kubeflow SDK!

If you want to learn more about the integration, you can watch an announcement webinar on Youtube.

Feedback welcome!

Like Kubeflow, Metaflow is an open-source project actively developed by multiple organizations — including Netflix, which maintains a dedicated team working on Metaflow, and Outerbounds, which provides a managed Metaflow platform deployed in customers’ own cloud environments.

The Metaflow community convenes at the Metaflow Slack. We welcome you to join, ask questions, and give feedback about the Kubeflow integration, and share your wishlist items for the roadmap. We are looking forward to a fruitful collaboration between the two communities!