Vitess graduated

MySQL-compatible, horizontally scalable, cloud-native database solution.

Announcing Vitess 24

Announcing Vitess 24 # The Vitess maintainers are happy to announce the release of version 24.0.0, along with version 2.17.0 of the Vitess Kubernetes Operator.
Version 24.0.0 expands query serving capabilities for sharded keyspaces, modernizes Vitess's observability stack, and introduces faster replica provisioning through native MySQL CLONE support. The companion v2.17.0 operator release brings significant improvements to scheduled backups, with new cluster- and keyspace-level schedules that ma

Flux graduated

Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories and OCI artifacts), and automating updates to configuration when there is new code to deploy. Flux is built from the ground up to use Kubernetes' API extension system, and to integrate with Prometheus and other core components of the Kubernetes ecosystem....

Blog: Bootstrapping Flux with Terraform, the right way

This post introduces a new Terraform module (fully compatible with OpenTofu) that bootstraps Flux Oper

Confidential Containers sandbox

Confidential Containers is an open source community working to enable cloud native confidential computing by leveraging Trusted Execution Environments to protect containers and data.

Extending Trustee Key Broker Service with Remote Plugins

Confidential Computing provides hardware-backed isolation and remote attestation, ensuring workloads execute inside trusted execution environments (TEEs) with verifiable integrity.

Building on this, Confidential Containers extends these guarantees to containerised workloads, integrating TEE-based isolation with cloud-native orchestration so that both the workload and its data remain protected from the underlying infrastructure.

One of the core components of the Confidential Cont

Kubewarden sandbox

Kubewarden is a Policy Engine powered by WebAssembly policies. Its policies can be written in CEL, Rego (OPA & Gatekeeper flavours), Rust, Go, YAML, and others....

Admission Controller 1.35 Release

This Admission Controller 1.35 release is one that builds the nest properly: load-bearing branches first, then careful weaving. A moderate security vulnerability has been fixed, and rather than a quick twig stuffed in a gap, the team reinforced the whole structure. This release brings also a new policy, an expansion on our threat model, and a JavaScrypt/TypeScrypt SDK relocation.
Security fix: RBAC reconnaissance and host capability calls Kubewarden makes the following security promise:

KServe incubating

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Production-Grade LLM Inference at Scale with KServe, llm-d, and vLLM

Everyone is racing to run Large Language Models (LLMs), in the cloud, on-prem, and even on edge devices. The real challenge, however, isn't the first deployment; it's scaling, managing, and maintaining hundreds of LLMs efficiently. We initially approached this challenge with a straightforward vLLM deployment wrapped in a Kubernetes StatefulSet.

The Problem with "Simple" LLM Deployments

The approach quickly introduced severe operational bottlenecks:

  • Storage Drag: Models like Llama 3 can easily reach hundreds of gigabytes in size. Relying on sluggish network storage (NFS) for these massive safetensors was a non-starter.
  • Infrastructure Lock-in: Switching to local LVM persistent volumes solved the speed problem but created a rigid node-to-pod affinity. A single hardware failure meant a manual intervention to delete the Persistent Volume Claim (PVC) and reschedule the pod, which is an unacceptable burden for day-2 operations.
  • Naive Load Balancing: Beyond the looming retirement of NGINX Ingress Controller, a simple round-robin load-balancing strategy is fundamentally inefficient for LLMs. It fails to utilize the critical KV-cache on the GPU, a core feature of vLLM that significantly boosts throughput. In a world where GPU costs are paramount, squeezing efficiency out of every core is non-negotiable.

What We Needed from an Operator

Running LLMs at scale demanded a purpose-built Kubernetes Operator designed for the intricacies of AI/ML. After evaluating the landscape, we identified a clear set of requirements:

  • Full spec-level customization: We needed the ability to override the runtime specification beyond what typical Custom Resources expose — tailoring vLLM flags for specialized hardware and rapid iteration.
  • Flexible deployment patterns: Rather than being locked into a single prefill/decode architecture, we needed an operator that could adapt to our evolving serving topologies.
  • Standard Kubernetes API integration: The solution had to work with the Kubernetes API surface we already knew, not introduce an entirely new abstraction layer.

The Winning Combination: KServe + llm-d + vLLM

kserve-architecture

Our journey led us back to the most flexible and powerful solution: llm-d, powered by KServe and its cutting-edge Inference Gateway Extension.

This combination solved every scaling and operational challenge we faced by delivering:

  • Deep Customization: The LLMInferenceService and LLMInferenceConfig objects expose the standard Kubernetes API, allowing us to override the spec precisely where needed. This level of granular control is crucial for tailoring vLLM to specialized hardware or quickly implementing flag changes.
  • Intelligent Routing and Efficiency: By leveraging Envoy, Envoy AI Gateway, and Gateway API Inference Extension, we moved far beyond round-robin. This technology enables prefix-cache aware routing, ensuring requests are intelligently routed to the correct vLLM instance to maximize KV-cache utilization and drive up GPU efficiency.

On one deployment, we observed a 3x improvement in output tokens/s and a 2x reduction in time to first token (TTFT) after enabling prefix-cache aware routing. These numbers were measured when serving Llama 3.1 70B model on 4 MI300X AMD GPUs with the configuration: tensor-parallel-size=4, gpu-memory-utilization=0.90, and --max-model-len=65536. Below is the chart that shows the performance improvement after we released the routing change (at around 12:30PM).

performance-improvements

Community Contributions and Collaboration

Running this stack in production surfaced real issues that we fixed upstream in KServe, benefiting the broader community:

  • New feature requests filed: #4901, #4900, #4898, #4899
  • storageInitializer made optional (kserve#4970) — enabling RunAI Model Streamer as an alternative to the default storage initializer
  • Added support for latest Gateway API Inference Extension (kserve#4886)

These contributions came directly from hitting production edge cases. Validating KServe and llm-d at this scale helped harden the platform for everyone running LLM workloads on Kubernetes.

Acknowledgement

We'd like to thank everyone from the community who has contributed to the successful adoption of KServe, llm-d, and vLLM in Tesla's production environment. In particular, below is the list of people from Red Hat and Tesla teams who have helped through the process (in alphabetical order).

  • Red Hat team: Sergey Bekkerman, Nati Fridman, Killian Golds, Andres Llausas, Bartosz Majsak, Greg Pereira, Pierangelo Di Pilato, Ran Pollak, Vivek Karunai Kiri Ragavan, Robert Shaw, and Yuan Tang
  • Tesla team: Scott Cabrinha and Sai Krishna

Get Involved with llm-d

The work described here is just one example of what becomes possible when a community of engineers tackles hard problems together in the open. If you're running LLMs at scale and wrestling with the same challenges — storage, routing, efficiency, day-2 operations — we'd love to have you involved.

  • Explore the code → Browse our GitHub organization and dig into the projects powering this stack
  • Join our SlackGet your invite and connect directly with maintainers and contributors from Red Hat, Tesla, and beyond
  • Attend community calls → All meetings are open! Add our public calendar (Wednesdays 12:30pm ET) and join the conversation
  • Follow project updates → Stay current on Twitter/X, Bluesky, and LinkedIn
  • Watch demos and recordings → Subscribe to the llm-d YouTube channel for community call recordings and feature walkthroughs
  • Read the docs → Visit our community page to find SIGs, contribution guides, and upcoming events
Backstage incubating

Backstage is an open platform for building developer portals, which unify all your infrastructure tooling, services, and documentation with a single, consistent UI.

Backstage in Amsterdam: Highlights from BackstageCon and KubeCon + CloudNativeCon Europe 2026

Backstage in Amsterdam: Highlights from BackstageCon and KubeCon + CloudNativeCon Europe 2026

Amsterdam delivered. From the moment BackstageCon opened its doors on March 23, the Backstage community was in full force — long-time contributors comparing notes with teams who had only just started their IDP journey. Across four days of BackstageCon and KubeCon + CloudNativeCon Europe 2026, there was a lot to take in: an energetic day of community talks, a documentary premiere, a standing room–only maintainers session, a live demo on the Keynote mainstage, and a ContribFest that turned issues into pull requests in real time. Read on for the highlights — and catch up on all the BackstageCon talks in the full recordings playlist.

BackstageCon: What the community is building

BackstageCon Europe 2026 in Amsterdam 📸 CNCF

BackstageCon kicked off the week with a full day of talks organized by the community, for the community — emceed by Balaji Sivasubramanian (Red Hat) and André Wanlin (Spotify). The full schedule had something for every stage of the Backstage journey, but a few themes stood out across the day.

On the engineering side, Booking.com's Symbat Nurbay and Xicu Piñera shared how they're working toward a unified developer experience across a large, complex organization — a relatable challenge for many in the room. Krzysztof Janota and Dusan Askovic from ING Bank N.V. went deep on how to keep a Backstage deployment healthy and collaborative in a big institution, covering the governance patterns that make it scale without fragmenting. On the catalog and tooling side, Sebastian Poxhofer from N26 showed how adding a platform CLI on top of the Backstage catalog can open up new workflows and make the catalog more actionable for platform teams.

One of the day's surprises came from the lightning talk slot: Mathilde Ançay from HEIG-VD took an unexpected angle, tracing an unlikely path from philosophy to a Backstage plugin — it's the kind of talk that's hard to summarize, so just watch it.

And the buzziest moment of the day? The panel — Building a Healthy Backstage Plugins Ecosystem — with Paul Schultz and Hope Hadfield (Red Hat), Heikki Hellgren (OP Financial Group), Peter Macdonald (VodafoneZiggo), and Aramis Sennyey (DoorDash). The conversation was wide-ranging and the Q&A spilled past the scheduled time, which felt like a good sign.

📺 Those are just a few picks — there's plenty more to explore in the full BackstageCon playlist.

Now playing: The Backstage story, on film

The Backstage Documentary screening at KubeCon Amsterdam 2026 📸 CNCF

One of the week's most memorable moments had nothing to do with a slide deck. The Backstage Documentary made its world premiere at KubeCon Amsterdam, and the room filled up with community members eager to watch the story of how Backstage evolved from an internal tool at Spotify into one of the most widely adopted and active open source projects in the cloud native ecosystem. The film surfaces voices from across the project's history — including some perspectives that even long-time contributors hadn't heard before. If you haven't watched it yet, grab a snack and set aside 30 minutes to see the past, present, and future of Backstage.

Standing room only: The State of Backstage in 2026

Core maintainers present at KubeCon + CloudNativeCon Europe 2026

The State of Backstage talk has become one of the community's most anticipated events at every KubeCon. In Amsterdam, that anticipation was on full display: over 600 people were seated, close to 1,000 had registered, and more were turned away at the door. Core maintainers Ben Lambert and Patrik Oldsberg covered the full breadth of what's been happening across the project — contributions, ecosystem growth, the New Frontend System now that it's adoption-ready, and the work underway on MCP support and an AI-native Backstage direction.

A demo on the big stage

The Backstage Keynote Demo at KubeCon + CloudNativeCon Europe 2026 📸 CNCF

On Thursday morning, the core maintainers stepped up for something a little different: a live demo on the KubeCon Keynote mainstage. In front of over 1,500 attendees, they showcased some of Backstage's newest capabilities — including the MCP and AI-related features covered in the maintainers talk the day before. It's one thing to hear about new features in a talk; seeing them demonstrated live in a keynote setting, to an audience that large, is a different kind of moment for our open source project.

ContribFest: Open source, live

Backstage ContribFest at KubeCon + CloudNativeCon Europe 2026

Rounding out the week was the fourth-ever Backstage ContribFest, co-hosted by André Wanlin and Emma Indal (Spotify), Heikki Hellgren (OP Financial Group), and Elaine Bezerra (DB Systel GmbH). Around 50 attendees showed up ready to contribute — some experienced, some brand new to the project — and spent the session working through real issues in the Backstage and Community Plugins repositories alongside core maintainers and community contributors.

Not every contribution makes it into a merged PR on the day, but ContribFest is often where the work starts. Keep an eye on the release notes — some of what was kicked off in Amsterdam may already be on its way to a future release.

Want to see what came out of Amsterdam and past ContribFests? Head over to the ContribFest web app to browse the full history of contributions from every session.

Tot ziens, Amsterdam! 🌷

Goodbye, Amsterdam! 📸 CNCF

What a week. BackstageCon, a documentary debut, a packed maintainers room, a keynote demo, and a ContribFest — Amsterdam showed that the open source Backstage community has a lot of momentum and a lot to say. Catch up on everything you missed in the BackstageCon playlist, and watch the Backstage Documentary and Keynote Demo if you haven't already.

See you in Salt Lake City 🏔️ at BackstageCon and KubeCon + CloudNativeCon North America, November 9-12, 2026!

KubeArmor sandbox

Runtime protection for Kubernetes & other cloud Workloads. Kubearmor provides a observability and policy enforcement system to restrict any unwanted, malicious behaviour of cloud-native workloads at runtime.

Network Segmentation of Linux VMs using KubeArmor

alt text

KubeArmor now enforces layer 3/4 network rules on Linux VMs via a new CRD: KubeArmorNetworkPolicy. Policies support CIDR ranges, port ranges, interface scoping, and both ingress and egress control. Enforcement runs at the kernel via nftables and is stateful by default.

The Gap That Hurt VM Security

KubeArmor already enforced process execution, file access, and protocol-level network syscalls on VMs. But layer 3 and layer 4 controls were missing. You could block a protocol, optionally filtered by process, but that was the ceiling.

You could not write: accept TCP on port 5432 only from this CIDR, on this interface. No ingress/egress port control. No IP block rules. No port ranges. KubeArmorNetworkPolicy closes that gap.

Network segmentation on Linux VMs has historically required either cloud security groups or a separately managed firewall tool. KubeArmor collapses both into a single policy plane that lives alongside your workload definitions.

In a segmented network, each VM zone (database tier, app tier, bastion) should only communicate with adjacent tiers on approved ports. Without kernel-level enforcement, those boundaries exist only on paper.

What Is the Network Policy Enforcer?

A new feature that adds layer 3/4 network enforcement to KubeArmor for VMs. It introduces a dedicated CRD, KubeArmorNetworkPolicy, separate from the three existing policy types.

CRDTargetEnforcement LayerUse When
KubeArmorPolicyContainers / podsProcess, file, syscall, protocol+processContainer workload hardening
KubeArmorHostPolicyVM / hostProcess, file, syscall, protocol+processHost-level process and file control
KubeArmorNetworkPolicyVM / host nodes, K8s NodesLayer 3/4: CIDR, port, port range, interfaceIP-based and port-based VM network control

KubeArmor Policy CRD hierarchy and enforcement layers

Figure 1: KubeArmor policy CRD hierarchy and enforcement layers

Under the Hood: nftables

KubeArmor translates KubeArmorNetworkPolicy rules into nftables rules on the host. It creates its own table named KubeArmor, with chains that hook into Linux's input and output hooks. Every applied policy becomes a rule inside those chains.

Two default rules are always present in every chain:

  • Loopback traffic is always allowed. Services communicating locally over lo are not disrupted.
  • Established and related connections are accepted, making the policy stateful. An ingress allow does not require a matching egress rule for reply packets.

On startup, KubeArmor checks that nftables is present and running as root. If nftables is unavailable, the enforcer skips initialization. No silent failures.

Default Posture and Enabling the Feature

SettingBehavior
Default posture: BlockTraffic not matched by any policy is dropped.
Default posture: AuditUnmatched traffic is logged only. No blocking.
enableNetworkPolicyEnforcer: true (default)Feature active. KubeArmorNetworkPolicy resources are enforced.
enableNetworkPolicyEnforcer: falseFeature disabled. Existing CRDs unaffected.

Policy Structure

The spec mirrors Kubernetes NetworkPolicy with two additions: an action field and an iface field. Anyone familiar with K8s network policies will read these without friction.

apiVersion: security.kubearmor.com/v1
kind: KubeArmorNetworkPolicy
metadata:
name: [policy name]
spec:
severity: [1-10] # optional (appears in alerts)
tags: ["tag", ...] # optional (e.g., MITRE, STIG)
message: [message] # optional (injected into alert logs)
nodeSelector:
matchLabels:
[key]: [value] # target nodes by label
ingress:
- from:
- ipBlock:
cidr: [IP range] # IPv4 or IPv6
iface: [if1, ...] # optional: scope to specific interfaces
ports:
- protocol: [TCP|UDP|SCTP]
port: [number or name in string]
endPort: [optional: defines a port range]
egress: # mirrors ingress; uses 'to' instead of 'from'
action: [Allow|Audit|Block]
FieldRequiredNotes
nodeSelectorYesStandard K8s label selector. System labels (kubernetes.io/hostname) also valid.
ingress.from.ipBlock.cidrNoSource IP range for inbound rules. IPv4 or IPv6.
egress.to.ipBlock.cidrNoDestination IP range for outbound rules.
ifaceNoRestricts rule to named interfaces. Ports/protocol apply only to listed interfaces.
portNoSingle port by number or name (ssh, dns, http, https).
endPortNoIf set, defines a range from port to endPort.
actionYesAllow, Block, or Audit.

KubeArmor Network Policy structure

Example Policies

Policy 1: Lock a VM to Its Private Network

A database VM in subnet 10.0.1.0/24 should only accept TCP connections from that subnet on port 5432. All other inbound TCP is blocked.

apiVersion: security.kubearmor.com/v1
kind: KubeArmorNetworkPolicy
metadata:
name: allow-private-subnet-ingress
spec:
nodeSelector:
matchLabels:
role: database
ingress:
- from:
- ipBlock:
cidr: "10.0.1.0/24"
ports:
- protocol: TCP
port: "5432"
severity: 7
action: Block

Attack vector blocked: Lateral movement to database nodes. A compromised host outside the subnet cannot reach port 5432 even if cloud security groups are misconfigured.

This is the canonical micro-segmentation pattern: the database tier accepts connections only from the app tier CIDR, and nothing else. No firewall rule outside the VM can guarantee this if the cloud NSG drifts.

Policy 2: Block Outbound DNS to External Resolvers

Block UDP port 53 traffic to 8.8.8.8. Forces name resolution through internal DNS and closes the DNS tunneling C2 channel.

apiVersion: security.kubearmor.com/v1
kind: KubeArmorNetworkPolicy
metadata:
name: block-external-dns
spec:
nodeSelector:
matchLabels:
kubernetes.io/hostname: "prod-worker-01"
egress:
- to:
- ipBlock:
cidr: "8.8.8.8/32"
ports:
- protocol: UDP
port: "dns"
severity: 5
action: Block

Attack vector blocked: DNS exfiltration and C2 callbacks via public resolvers. Leaves internal DNS resolution untouched.

Policy 3: SSH Restricted to Jump Host (Progressive Enforcement)

SSH access to production VMs should only originate from the corporate jump host subnet (192.168.10.0/28). Use a two-policy pattern: audit broadly first to baseline, then block.

Policy A — Audit all SSH globally:

apiVersion: security.kubearmor.com/v1
kind: KubeArmorNetworkPolicy
metadata:
name: audit-ssh-all
spec:
nodeSelector:
matchLabels:
env: production
ingress:
- from:
- ipBlock:
cidr: "0.0.0.0/0"
ports:
- protocol: TCP
port: "ssh"
message: "SSH from outside jump host subnet detected"
severity: 8
action: Audit

SSH access control is a segmentation boundary, not just an access policy. Restricting it by source CIDR enforces the separation between the management plane and the data plane at the host level.

Policy B — Allow only the jump host subnet:

apiVersion: security.kubearmor.com/v1
kind: KubeArmorNetworkPolicy
metadata:
name: allow-ssh-jumphost
spec:
nodeSelector:
matchLabels:
env: production
ingress:
- from:
- ipBlock:
cidr: "192.168.10.0/28"
ports:
- protocol: TCP
port: "ssh"
severity: 2
action: Allow

The message field in Policy A surfaces directly in alert logs for SIEM ingestion. Run Audit for 48 hours, validate there are no false positives, then flip to Block.

Policy 4: Interface-Scoped Port Range for Backend Service Mesh

A VM with two NICs: eth0 for external traffic, eth1 for internal service mesh. Restrict inbound traffic on eth1 to ports 8000–9000 from the internal CIDR only. eth0 is unaffected.

apiVersion: security.kubearmor.com/v1
kind: KubeArmorNetworkPolicy
metadata:
name: restrict-backend-mesh
spec:
nodeSelector:
matchLabels:
tier: backend
ingress:
- from:
- ipBlock:
cidr: "172.16.0.0/12"
iface: ["eth1"]
ports:
- protocol: TCP
port: "8000"
endPort: 9000
severity: 6
action: Block

Use iface to prevent over-broad rules from affecting unrelated traffic paths. Common in hybrid cloud and service mesh deployments where VMs have separate NICs per traffic class.

How It Fits Into a Zero Trust VM Architecture

KubeArmor VM Network Segmentation - Zero Trust Architecture

Traditional network policies (cloud security groups, NSGs) operate at the cloud perimeter. They do not enforce on the VM itself. KubeArmorNetworkPolicy enforces at the kernel via nftables, which means:

  • Policy follows the workload, not the infrastructure boundary.
  • Rules hold even if cloud-level security groups are misconfigured or bypassed.
  • Works in air-gapped, hybrid, or bare-metal deployments with no cloud networking controls.

Combined with KubeArmor's process and file policies, you get defense-in-depth on every VM. A compromised process cannot make unauthorized outbound calls. An attacker who lands on the VM cannot reach internal services or external C2 if those paths are explicitly blocked.

Micro-segmentation means each VM enforces its own perimeter. KubeArmor achieves this without a separate network appliance or agent: the nftables rules live on the host itself and survive cloud perimeter changes.

Getting Started

The Network Policy Enforcer is enabled by default in recent KubeArmor releases.

Verify the enforcer is active on your nodes:

kubectl get kubearmornodestatus \
-o jsonpath='{.items[*].status.networkPolicyEnforcer}'

Apply a policy:

kubectl apply -f nsp-allow-private-subnet-ingress.yaml

Inspect generated nftables rules on the host:

sudo nft list table ip kubearmor

Policy violations appear in KubeArmor alerts. Pipe the message field into your SIEM for contextualized incident tickets.

References

Frequently Asked Questions

Does KubeArmorNetworkPolicy replace KubeArmorHostPolicy?

No. They are complementary. KubeArmorHostPolicy handles process execution, file access, and protocol-level syscall controls on the VM host. KubeArmorNetworkPolicy handles CIDR, port, and interface rules at the network layer. Use both together for full depth.

Does this work on distributions that still use iptables?

KubeArmor uses nftables, which ships by default on major Linux distributions (Ubuntu 20.04+, RHEL 8+, Debian 10+). On older distributions it is not pre-installed but can be manually installed on any system running Linux kernel 3.x+. On startup, KubeArmor checks for nftables availability. If nftables is present and KubeArmor is running as root, it initializes. iptables is not used by the Network Policy Enforcer.

Can I define both ingress and egress rules in a single policy?

Yes. A single KubeArmorNetworkPolicy spec can contain both ingress and egress blocks. Define them independently with their own CIDR, interface, and port values.

What happens to traffic that does not match any policy?

The behavior depends on your default network posture configuration. When running in allowlist mode (at least one allow-based policy is active), unmatched traffic follows the default posture — dropped if set to Block, or logged and allowed through if set to Audit. Outside of allowlist mode, unmatched traffic is recorded in host logs. Set your default posture explicitly before deploying Block-action policies to production.

How is this different from a standard Linux firewall like ufw or firewalld?

KubeArmor network policies implement micro-segmentation across a fleet of VMs declaratively, with the same GitOps workflow you already use for application policies. ufw and firewalld are standalone firewall managers with no policy-as-code workflow, no Kubernetes API surface, and no integration with workload identity or labels. KubeArmorNetworkPolicy uses the same nftables kernel layer but is declared as a Kubernetes CRD, version-controlled, auditable, and scoped by node labels. It is also stateful by default and composable with process and file policies from the same agent.

PipeCD sandbox

GitOps style continuous delivery platform that provides consistent deployment and operations experience for any applications

Blog: Building the Kubernetes Multi-Cluster Plugin for PipeCD — LFX Mentorship

If you had told me last year that I would be working with Kubernetes and all things clusters, deployments and service meshes, I would have brushed it off. I am truly grateful for the journey thus far.

Earlier last month, I got accepted as an LFX Mentee for Term 1 of this calendar year. For me it is such a big deal, given my background, and how much effort has been put in behind the scenes to get to this stage.

I’m currently a mentee in the LFX Mentorship program working on

Kairos sandbox

Transform any Linux system into a secure, customizable, and easily managed platform for edge computing with or without Kubernetes.

Help Kairos Move to CNCF Incubation: Become an Adopter

Kairos community,

We are preparing the next major step in our CNCF journey: applying to move from Sandbox to Incubation.

To do that, we need to demonstrate healthy and diverse real-world adoption. As part of the official CNCF TOC project lifecycle and process, projects are expected to provide 5-7 adopters willing to be interviewed during due diligence.

Today, we are inviting organizations using Kairos to participate as adopters and help us through this milestone.

We also understand adoption takes time. If you are currently in a proof-of-concept stage, that is absolutely okay. We would still love to talk, learn from your experience, and help you move toward pre-production and production in any way we can.

Why Incubation?

Incubation is where CNCF projects start demonstrating stronger maturity signals: more stability, broader production usage, and clearer evidence that the project is useful in real environments.

For Kairos, this helps us:

  • Strengthen long-term credibility in the cloud-native ecosystem
  • Validate production readiness through independent adopter feedback
  • Accelerate project growth while keeping development open and vendor-neutral

A signal that Kairos is growing in the open

This effort is part of a broader direction for Kairos.

We recently welcomed our first maintainer from a different company than the original Kairos creators, a concrete sign of growing shared ownership and open governance:

What it means to be a Kairos adopter

Being an adopter means your organization uses Kairos in a meaningful way (proof of concept, pilot, pre-production, or production) and can share practical feedback, including:

  • Why you chose Kairos
  • Which use cases you support with it
  • What value it brings in operations, reliability, security, or scale
  • What challenges you faced and how we can improve

You do not need to be a large enterprise, and you do not need to be a direct code contributor.

What CNCF may ask from your organization

If selected as one of the adopter references, your organization may be contacted by CNCF TOC sponsors for an adopter interview as part of the incubation due diligence process.

These interviews typically cover:

  • Adoption context and timeline
  • Current usage stage and scale
  • Perception of project maturity and governance
  • Strengths, gaps, and future needs

Confidentiality and publication boundaries

We understand many teams have legal or commercial constraints.

The CNCF process can accommodate adopter anonymity and publication boundaries when needed, and adopter interview summaries are reviewed with adopters before publication.

What your organization gets

We see this as a practical collaboration where both sides benefit.

By participating, your organization can:

  • Support a CNCF Sandbox project advancing toward Incubation
  • Be recognized in CNCF-related due diligence materials when disclosure is allowed
  • Stay close to the Kairos roadmap and maintainer discussions
  • Share engineering feedback that directly influences project priorities

And if your organization wants public recognition, we can also:

  • Feature your organization as a Kairos adopter on the Kairos website
  • Include your logo on our adopters page (fully opt-in, based on your approval and brand guidelines)
  • Collaborate on public technical stories, including potential CNCF blog posts

Interested in joining as an adopter?

If your organization is using Kairos and is open to this conversation, you can:

When you contact us, it helps if you include:

  • Organization name
  • Main use case
  • Current stage (pilot, pre-production, or production)
  • Preferred visibility level (public, limited public, or private first)

Thanks for helping us take Kairos to the next level.

This milestone is about more than project status. It is about proving, together, that Kairos is delivering real value in real systems.

Kubewarden sandbox

Kubewarden is a Policy Engine powered by WebAssembly policies. Its policies can be written in CEL, Rego (OPA & Gatekeeper flavours), Rust, Go, YAML, and others....

Kubewarden 1.34 Release

After the big blooms of 1.33, this release turns its attention to the garden fence: making sure our CI pipelines are sturdy, our supply chain is trustworthy, and a nagging bug in kwctl gets pulled out by the roots. Nothing flashy, but the kind of care that keeps the garden healthy for the long haul. Let’s take a look at what’s new!
Fix for kwctl scaffold command When using kwctl command scaffold manifest with a policy URI that omits an explicit tag (e.

Istio graduated

Simplify observability, traffic management, security, and policy with the Istio service mesh.

Simplifying Egress Routing to Wildcard Destinations

Overview

Controlling egress traffic is a common requirement in service mesh deployments. Many organizations configure their mesh to allow only explicitly registered external services by setting:

meshConfig.outboundTrafficPolicy.mode = REGISTRY_ONLY

With this configuration, any external destination must be registered in the mesh using resources such as ServiceEntry

PipeCD sandbox

GitOps style continuous delivery platform that provides consistent deployment and operations experience for any applications

Blog: My First 30 days as an LFX Mentee with PipeCD

A month ago, I started my journey as an LFX Mentee with PipeCD.

Coming from a non-technical background, the cloud native ecosystem is relatively new to me; I’ve been outside looking in. Right now, I’m working to establish a social media presence for PipeCD, create content covering v1 features, plugin development, and walkthrough videos that make the project easier to adopt.

To effectively do that, my technical knowledge needs to be sharpened. So I’m learning Linux basics and Kub

Prometheus graduated

metrics-based monitoring and alerting

Introducing the UX Research Working Group

Prometheus has always prioritized solving complex technical challenges to deliver a reliable, performant open-source monitoring system. Over time, however, users have expressed a variety of experience-related pain points. Those pain points range from onboarding and configuration to documentation, mental models, and interoperability across the ecosystem.

At PromCon 2025, a user research study was presented that highlighted several of these issues. Although the central area of investigation involved Prometheus and OpenTelemetry workflows, the broader takeaway was clear: Prometheus would benefit from a dedicated, ongoing effort to understand user needs and improve the overall user experience.

Recognizing this, the Prometheus team established a Working Group focused on improving user experience through design and user research. This group is meant to support all areas of Prometheus by bringing structured research, user insights, and usability perspectives into the community's development and decision-making processes.

How we can help Prometheus maintainers

Building something where the user needs are unclear? Maybe you're looking at two competing solutions and you'd like to understand the user tradeoffs alongside the technical ones.

That's where we can be of help.

The UX Working Group will partner with you to conduct user research or provide feedback on your plans for user outreach. That could include:

  • User research reports and summaries
  • User journeys, personas, wireframes, prototypes, and other UX artifacts
  • Recommendations for improving usability, onboarding, interoperability, and documentation
  • Prioritized lists of user pain points
  • Suggestions for community discussions or decision-making topics

To get started, tell us what you're trying to do, and we'll work with you to determine what type and scope of research is most appropriate.

How we can help Prometheus end users

We want to hear from you! Let us know if you're interested in participating in a research study and we'll contact you when we're working on one that's a good fit. Having an issue with the Prometheus user experience? We can help you open an issue and direct it to the appropriate community members.

Interested in helping?

New contributors to the working group are always welcome! Get in touch and let us know what you'd like to work on.

Where to find us

Drop us a message in Slack, join a meeting, or raise an issue in GitHub.

PipeCD sandbox

GitOps style continuous delivery platform that provides consistent deployment and operations experience for any applications

Blog: Your First GitOps Project with PipeCD

Introduction

Cover Page Every infrastructure setup tends to follow the same pattern. You open the AWS console, configure a few options, and create a resource. It works as expected. But when the same setup needs to be recreated later, there is no clear record of what was done. The process becomes manual again, often inconsistent, and difficult to repeat reliably. This is the gap that Git-based workflows aim

Kubeflow incubating

Kubeflow is the foundation of tools for AI Platforms on Kubernetes.

Modernizing Kubeflow Pipelines UI

The Kubeflow Pipelines web interface has been upgraded from React 16 to React 19 — a modernization effort that touches every layer of the frontend stack. Whether you use the UI to manage pipelines day-to-day or contribute to the codebase, here is what this means for you.

What’s changing for users

You do not need to do anything differently. Your bookmarks, workflows, and browser all work exactly as before. But under the hood, the UI is now built on a modern foundation that delivers tangible improvements:

A faster, more responsive interface

React 18 introduced automatic batching, which reduces unnecessary re-renders across the UI. In practice, this means pages like Run Details, Experiment Details, and the pipeline creation flow respond faster to your interactions. Forms validate without flicker, and multi-step workflows feel snappier. The production bundle size stayed exactly the same — 0% increase — so page load times are unchanged.

Smoother pipeline graph navigation

The pipeline DAG visualization (the graph you see when inspecting a pipeline’s structure) has been migrated from the deprecated react-flow-renderer to @xyflow/react. This brings improved pan, zoom, and drag performance, especially on larger or more complex pipeline graphs. If you’ve ever experienced sluggishness when navigating a deeply nested pipeline, this upgrade directly addresses that.

Improved charts and metrics display

Run metrics and comparison charts now use Recharts instead of the deprecated react-vis library. The new charting library renders more efficiently, handles edge cases better, and provides cleaner visual output when comparing run results side by side.

Better accessibility

The component library migration from Material-UI v3 to MUI v5 brings improved keyboard navigation, better ARIA attribute coverage, and more consistent focus management across dialogs, tables, and form elements. These improvements make the UI more usable with screen readers and keyboard-only workflows.

No breaking changes

Every user-facing feature works the same way it did before. The API contracts are unchanged. If you use the KFP Python SDK or REST API to interact with the platform, nothing changes on your end. This upgrade was purely a frontend modernization — zero impact on backend behavior, pipeline execution, or artifact storage.

Why we made this change

The KFP frontend had been running on React 16 (released in 2017) with Material-UI v3, create-react-app, and Jest/Enzyme for testing. This created compounding issues:

  • Security exposure. React 16 and 17 no longer receive security patches, and dozens of transitive dependencies were locked to outdated versions because of React peer constraints.
  • Stalled ecosystem. Modern libraries — including improved data-fetching, visualization, and accessibility tools — dropped support for React 16/17. Staying behind meant the UI could not benefit from upstream improvements.
  • Contributor friction. The legacy CRA + Jest + Enzyme toolchain was slow to build, brittle to test, and increasingly difficult for new contributors to set up. Modernizing the stack lowers the barrier to contribution.

How we got here

Rather than attempting a single risky version jump, we followed a deps-first, bump-last strategy: upgrade every dependency to be forward-compatible before touching React itself. A custom React peer compatibility gate in CI prevented regressions at every step. The work was executed across 20+ pull requests in strict dependency order.

React 16 → 17: Rebuilding the foundation

Before React could move forward, the entire build and test toolchain had to be replaced. create-react-app was swapped for Vite, Jest + Enzyme gave way to Vitest + Testing Library, and Material-UI was upgraded from v3 to v4 to unblock the React 17 peer range. The deprecated react-vis charting library was replaced with Recharts. With those blockers cleared, the React 17 bump itself was a small, low-risk change.

React 17 → 18: The biggest leap

This phase required the most dependency work. Storybook jumped from v6 straight to v10 on the Vite builder. Material-UI v4 was migrated to MUI v5 with Emotion. react-query moved to @tanstack/react-query v4. react-flow-renderer was replaced with @xyflow/react. After all ecosystem deps cleared the peer gate, the React 18 core bump landed — followed by careful stabilization of automatic batching behavior in class components that were reading stale state.

React 18 → 19: The final stretch

A deprecation audit at React 18.3 found zero React-specific warnings. A final dependency sweep cleared the last peer blockers (react-ace, transitive react-redux). The React 19 bump resolved the final allowlist entry and handled a small set of API changes like the removal of forwardRef in test mocks.

The full stack transformation

Over the course of this effort, virtually every layer of the frontend stack was modernized:

Layer Before After
React 16 19
Build system Create React App + Craco Vite
Test framework Jest + Enzyme Vitest + Testing Library
UI component library Material-UI v3 MUI v5 + Emotion
Data fetching react-query v3 @tanstack/react-query v4
Pipeline graph react-flow-renderer v9 @xyflow/react
Charts react-vis Recharts
Storybook 6 (Webpack) 10 (Vite)

By the numbers

  • 20+ PRs merged across the entire React 16-to-19 effort
  • 15 tracked milestones executed in strict dependency order
  • 0% bundle size increase — page load times unchanged
  • 0 React deprecation warnings at the 18.3 checkpoint audit
  • 0 breaking changes to user-facing features or APIs

Want to contribute?

The full execution plan with every PR, issue, and dependency graph is tracked in the react-18-19-upgrade-checklist.md. Look for miscellaneous bugs, report bugs, help with reviews and help improve our documentation.

Huge thanks to @jeffspahr, @kanishka-commits, @PR3MM, @jsonmp-k8, @dpanshug, and @rishi-jat for contributing to this effort and reviewing all the contributions leading up to this milestone!

Confidential Containers sandbox

Confidential Containers is an open source community working to enable cloud native confidential computing by leveraging Trusted Execution Environments to protect containers and data.

Integrate Trustee with the External Secrets Operator

Introduction

The Trustee operator simplifies configuring secrets and serving them to confidential container pods that execute inside trusted execution environments (TEEs). You can set up the required secrets as Kubernetes Secret objects

Headlamp sandbox

Extensible open source multi-cluster Kubernetes user interface

From Signals to Answers: Conversational Kubernetes Troubleshooting with HolmesGPT in Headlamp

Kubernetes does not fail quietly. When something goes wrong, signals show up everywhere. Logs, events, metrics, and status fields each tell part of the story. None of them tell the whole thing. Teams spend hours stitching these signals together. They open dashboards, run commands, and scroll through logs looking for clues. Often, the data they need is already there. The hard part is turning those signals into clear answers.

This is the gap Headlamp fills with HolmesGPT by centralizing data and context to get answers in a familiar environment.

The Real Problem Is Not Missing Data

Most Kubernetes teams are not short on data. Modern clusters generate a steady stream of signals about what is happening across the system. On paper, everything needed to diagnose an issue already exists.

The real problem is making sense of it.

Understanding why a rollout is stuck or a pod keeps restarting takes context and time. Humans do this by testing one idea at a time. We gather signals, form a hypothesis, then move to the next. Holmes can do this work in parallel. It looks across related resources and controller behavior at once, finding answers faster than a single person can.

Kubernetes problems end up feeling harder than they should be with multiple layers of friction.

What HolmesGPT Brings to Kubernetes

HolmesGPT is designed to reason about Kubernetes behavior, not just report state. It looks at real cluster signals together, including logs, events, and resources. It understands how failures propagate and how controller logic affects outcomes.

Instead of listing symptoms, HolmesGPT focuses on causes. Instead of showing raw output, it explains what is happening and how to fix it. This shifts troubleshooting from guesswork to understanding.

Why Headlamp Is the Right Place for HolmesGPT

Headlamp is where Kubernetes work already happens. It is where teams explore clusters, inspect workloads, and notice when something looks wrong. This is exactly the place where signals need to be turned into answers. Many teams try to close this gap by adding more tools. Another dashboard. Another alerting system. Another surface to check during an incident. Each one adds value, but each one also adds friction.

The HolmesGPT integration takes a different approach. Instead of adding another tool, it brings reasoning into an existing workflow. Headlamp does not become something new to learn. It becomes easier to use because it builds on an environment teams already know. When insight lives in the same place as management, it gets used. Context stays intact. Teams move more quickly from questions to action.

Watch the demo:

From Signals to Understanding

Troubleshooting in Headlamp feels different because the explanation lives in the same place as the investigation. HolmesGPT works in context, alongside the workloads, namespaces, and controllers you are already viewing. It explains how the resources on screen relate to each other, without sending you to another tool.

Traditional observability shows what is happening, but it rarely explains why. A pod restart loop might come from a bad configuration, a missing secret, or a failure elsewhere in the system. Logs alone cannot tell you which one matters. Events add clues, but they still only show part of the story.

That context is what changes the troubleshooting experience. Patterns that are hard to spot in raw output become clear when they are tied directly to Kubernetes objects. Instead of stitching clues together across tools, teams see explanations next to the problem they are trying to understand.

Insight That Fits How Teams Work

Kubernetes is rarely owned by one role. Developers focus on application behavior. Operators focus on stability. Platform teams look for patterns and consistency across clusters. HolmesGPT helps create a shared understanding across those roles. The same explanation can help a developer understand why a rollout failed and help an operator confirm a broader issue. The language is clear, the context is shared, and the insight is grounded in real cluster state.

Just as important, this insight fits into existing workflows. Teams do not need to change how they work or learn a new system. When understanding appears at the right moment and in the same place as investigation, it gets used. This reduces handoffs, avoids miscommunication, and helps teams move from discussion to action faster.

Clear Answers When They Matter Most

HolmesGPT in Headlamp turns scattered signals into clear explanations, right where you already investigate. You keep context, move faster, and make decisions with more confidence.

If you want to try it, open Headlamp, enable the AI Assistant, and connect HolmesGPT. To add the Holmes agent to your cluster, follow the setup instructions. The next time an alert fires, you can go from signals to answers without leaving the UI.

Headlamp sandbox

Extensible open source multi-cluster Kubernetes user interface

Bringing Unified MCP Expertise in Your UI for Smarter Kubernetes Intelligence

The Problem

Kubernetes teams lose time switching between tools to understand what is happening in their clusters. Many tools offer deep insight, but that expertise often lives outside the UI where teams actually work. This breaks context and slows decisions. MCPs bring powerful expertise into Kubernetes workflows. They can explain how systems behave at runtime, how workloads interact, and where issues begin. But today, that expertise is usually accessed through separate tools or commands.

The challenge is not the MCPs themselves. It is where they live.

Teams move between dashboards, terminals, and scripts to get answers. They learn something in one place, then act somewhere else. Context gets lost along the way. Kubernetes is where applications run and where decisions are made. MCP expertise should live there too. In context. Next to the workloads and applications it describes.

The Solution

Headlamp brings MCP expertise directly into the Kubernetes UI.

With MCP support, Headlamp makes specialized insight available where Kubernetes work already happens. Instead of accessing MCPs through separate tools, teams use them in context alongside their workloads. This reduces context switching, but it also does more than that. MCPs bring focused expertise into the UI. MCPs interpret the domains, behaviors, and systems they surface. Helping users to understand in a way that matches the Kubernetes resources on screen.

The workflow stays simple. You look at an application. You use an MCP to understand what is happening. You act without leaving Headlamp. By unifying MCP expertise inside the Kubernetes UI, Headlamp turns insight into something teams can use right away. Not something teams have to translate or chase down.

What MCP Support Means in Headlamp

MCP support in Headlamp reduces context switching by making MCPs part of the Kubernetes workflow. MCPs are built in, not bolted on.

MCPs are available inside the same UI teams use to explore clusters, inspect workloads, and troubleshoot issues. Instead of switching tools, you interact with MCPs alongside Kubernetes resources. Their output appears next to the workloads and applications it describes, where it is easiest to understand. This does more than simplify access. MCPs bring focused expertise into the UI. They surface insight that shows how systems behave, components interact, and where problems start. That understanding is tied directly to Kubernetes context, not shown in isolation.

By treating MCPs as first‑class integrations, Headlamp keeps workflows simple. Kubernetes resources, application views, and MCP expertise live in one place. Teams spend less time stitching tools together and more time understanding what is happening.

Why This Matters for Kubernetes Teams

Kubernetes is rarely managed by one role. Developers, operators, and platform engineers all work in the same clusters. However, they look at those clusters for different signals and ask different questions.

MCPs in Headlamp bring that expertise closer to the work.

Developers gain clearer insight into how their applications behave at runtime. MCPs help explain what is happening inside workloads, not just that something failed. This makes issues easier to understand and faster to fix.

Platform engineers benefit from consistency and control. MCP expertise shows up inside a familiar UI and follows existing Kubernetes permissions. Teams gain deeper operational understanding without adding another system to manage.

Operators see focused insight where investigations already happen. MCP output appears in the chat box alongside logs, events, and resource state, making it easier to connect signals and identify root causes.

By unifying MCP expertise inside the Kubernetes UI, Headlamp creates a shared understanding of the cluster. Teams spend less time translating between tools and more time solving problems together.

MCPs in an Application‑Centric World

Headlamp helps teams think in terms of applications, not just individual Kubernetes resources. Projects group related workloads, services, and configurations into a single, scoped view.

MCP support enhances projects by bringing specialized insight into the same application context.

Instead of running MCPs across an entire cluster and sorting through results later, MCPs can be used where the application lives. Their expertise is applied to the namespaces, workloads, and resources that matter, not buried in cluster‑wide noise.

This makes MCP insight easier to understand and easier to trust. Teams see expert signals in the context of the application they are working on. The result is less distraction and more focus on what actually affects the app.

Setting Up MCP Support

The Model Context Protocol (MCP) is an open standard that lets the Headlamp AI Assistant talk to external tools through a unified interface—think of it as a plugin system for your AI. Connect an MCP server (like the Flux operator) and its capabilities appear alongside Headlamp's built-in Kubernetes tooling, ready to use in chat.

How it works

The Headlamp desktop app spawns MCP servers as local processes, discovers the tools they expose, and hands them to the LLM. When you ask a question that needs an external tool, the assistant picks the right one, runs it, and formats the results into tables, metrics, or plain text.

Setting it up

  1. Open the AI Assistant plugin settings and navigate to the MCP Servers section.
  2. Turn Enable MCP servers on.

Enable MCP servers in the AI Assistant settings

  1. Click Add Server, then enter a name, the server command, and any args or environment variables it needs.

Example:

  • Name — A unique identifier for the server.
  • Command — The executable to run (e.g., flux-operator-mcp).
  • Args — Command-line arguments (e.g., serve --kube-context HEADLAMP_CURRENT_CLUSTER).
  • Environment Variables — Optional env vars required by the server (e.g., KUBECONFIG).
  1. Save. Headlamp will spawn the process and discover its tools. Review and toggle individual tools under the MCP Tool Settings tab.

MCP server settings configuration in Headlamp

After that, just ask a question. For example, "List all Flux HelmReleases in the default namespace"—and the assistant takes care of the rest.

A few things worth noting: MCP is desktop-only (Electron), you need at least one AI provider configured, and you can run multiple servers side by side. Tool approval settings let you gate write operations before they execute.

Use Cases That Fit Well

MCP support in Headlamp works best when teams need expert insight in context.

Some MCPs focus on how applications behave at runtime. For example, the Inspektor Gadget MCP can surface low‑level signals from running workloads. When used in Headlamp, those signals are tied directly to pods and namespaces. Teams see how an application behaves while it is running, not just that something is wrong.

Other MCPs focus on how applications are delivered and kept in sync. The Flux MCP brings insight into deployment state, drift, and reconciliation. Instead of checking a separate system, teams can understand why a workload looks the way it does right from the Kubernetes UI.

In both cases, the value is not just access to data. It is access to expertise. MCPs explain what is happening and why, in the context of the application teams are already working on. By keeping that expertise inside Headlamp, teams spend less time chasing answers across tools and more time acting on what they learn.

Conclusion

MCPs bring more than raw signals. They bring expertise. They understand how systems behave, how applications are delivered, and where problems begin.

Headlamp brings that expertise into the Kubernetes UI. MCP insights show up where teams already work, tied to real workloads and applications. Context stays intact. Decisions get easier. Action follows faster.

This is about more than reducing context switching. It is about giving Kubernetes teams smarter operational intelligence. Intelligence grounded in the domain knowledge MCPs provide and delivered in a way that fits how teams actually work.

This is also part of a longer journey. We are building Headlamp toward a Unified Kubernetes Workspace. One place where ease of use, context, insight, and action come together, so Kubernetes feels connected instead of fragmented.

Meshery sandbox

As a self-service engineering platform, Meshery enables collaborative design and operation of cloud and cloud native infrastructure.

Meshery v1.0 is Generally Available

Today, at KubeCon + CloudNativeCon Europe 2026 in Amsterdam, the Meshery maintainers are proud to announce the general availability of Meshery v1.0 - the extensible cloud native management platform that has become the governance layer the cloud native stack has long been missing.

This release has been six years in the making. Meshery has been open source from day one, built in public, by a global community, for a global community. v1.0 is not just a version number. It is a statement of production readiness, architectural maturity, and a commitment to the teams who depend on Meshery to manage their most critical infrastructure.

The Problem v1.0 Solves

Cloud native teams have assembled powerful toolchains over the past decade - Kubernetes, Helm, Terraform, Docker, observability stacks - yet lacked a unified layer to govern how those components relate, change, and interact across organizational boundaries. YAML files proliferated. PR diffs became inscrutable. Tribal knowledge calcified in individuals rather than in systems.

AI has accelerated the urgency. LLM-generated configurations can produce syntactically valid but semantically dangerous infrastructure changes at machine speed - outpacing any engineering team’s ability to perform meaningful review. The industry is not lacking automation. It is lacking oversight.

Meshery v1.0 is that oversight layer. Its visual, collaborative design surfaces make infrastructure changes - whether authored by humans or generated by AI - legible, reviewable, and governable before they reach production.

Infrastructure as Design

At the core of v1.0 is the Infrastructure as Design model: a shift from managing infrastructure as disconnected text files to operating it as a shared, visual, living artifact. Teams can see the blast radius of a change, review AI-suggested configurations the way they review code, and collaborate in real time across organizational boundaries.

This model is operationalized through Meshery extensions, with two complementary surfaces:

Kanvas Designer (GA) is a declarative, drag-and-drop visual design interface - “diagram as code” - where infrastructure is designed, versioned, and diff’ed as a visual artifact rather than a wall of YAML. Import your existing Helm charts, compose multi-cluster topologies, and share designs as first-class GitOps artifacts.

Kanvas Operator (Beta) is a real-time operations surface providing live resource views and cluster management, giving SRE and platform teams continuous situational awareness across multi-cluster, multi-cloud deployments.

Together, they deliver what no configuration management tool or cluster visualizer alone provides: a single workspace where infrastructure is designed, understood, and operated as a team sport - with human oversight explicitly built into every step of the AI-assisted workflow.

“Kubernetes gave us the runtime. GitOps gave us the pipeline. Meshery v1.0 gives teams the governance layer - the place where you actually see, understand, and control what’s running across your infrastructure before and after AI touches it.”

- Lee Calcote, Meshery Co-Creator and Maintainer

Community Velocity That Speaks for Itself

The community behind this release is itself a proof point. Meshery has been formally recognized as the sixth highest-velocity project in the CNCF - an extraordinary distinction among 237 CNCF projects, especially for a project at the Sandbox maturity level. Over the past year, the project recorded a 350% increase in code commits, driven by a global community of more than 3,000 contributors, 10,000 GitHub stars, and 10,000 community members.

Meshery is also the #1 most applied-to internship in the Linux Foundation’s LFX Mentorship program, with over 10,000 applicants to date. It continues as a flagship participant in Google Summer of Code. The developer interest in Meshery reflects a broader industry recognition: governance tooling for cloud native infrastructure is not a niche concern. It is the next foundational layer.

Governance by Architecture: The Dual-Org Model

To govern its own explosive growth sustainably, Meshery has restructured its GitHub footprint into two distinct organizations:

  • meshery - The core platform: Meshery Operator, MeshSync, and foundational architecture - governed by core maintainers to ensure absolute v1.0 stability.
  • meshery-extensions - A community-centric space for the project’s 300+ integrations, adapters, and ecosystem tooling - enabling independent teams to innovate rapidly without introducing instability into the core.

This model is itself a statement about governance: a project serious about production-grade reliability must impose the same rigor on its own development process that it asks engineering teams to apply to their infrastructure.

The Certified Meshery Contributor Program

Alongside v1.0, we are launching the Certified Meshery Contributor (CMC) program - the first contributor certification in the CNCF - designed to validate the proficiency of developers actively shaping the Meshery ecosystem.

The free certification comprises five exams spanning Meshery’s major architectural domains: Server, CLI, UI, Models, and Extensibility. Tailored for practitioners skilled in Go, React, and OpenAPI schemas, the CMC credential formally recognizes the human expertise that keeps AI-assisted infrastructure management safe, auditable, and correct.

The AI era does not diminish the value of human expertise. It amplifies it.

“Meshery v1.0 is the culmination of years of collaborative design, relentless engineering, and a profoundly dedicated global community. Pairing our 1.0 release with a new multi-organization extension model and the CMC program perfectly encapsulates our dual mission: delivering a world-class, extensible cloud native manager, while cultivating the most inclusive, high-velocity open-source community in the CNCF.”

- Sangram Rath, Meshery Maintainer

What the Community Is Saying

“Kanvas has me rethinking how I approach interactions with team members. The ability to visually design, import existing Helm charts, and collaborate on changes in a single GitOps workflow might just eliminate my cross-team friction. Workspaces in Meshery are my new Google Drive for infrastructure work.”

- Venil Noronha, Tech Lead at Stripe

“Meshery is simply incredible: just give it a try and it might fill the void you never knew you had. It provides the standardization and visibility and an easy path to scalable infrastructure.”

- Mars Toktonaliev, CNCF Ambassador and Senior Systems Integrator at KGPCo.

“I started looking for a suitable open-source project to contribute to in late 2025. While exploring projects under the CNCF, Meshery immediately caught my attention. Around the same time, I came across a CNCF blog post announcing the Certified Meshery Contributor certification. That’s when I decided to join the community just before the Christmas holidays. In hindsight, it turned out to be one of the best decisions I’ve made.”

- Kavitha Karunakaran, Meshery Community Manager

Get Started with Meshery v1.0

Meshery v1.0 is available today. If you are at KubeCon EU 2026, visit the Meshery booth to experience the release in action, explore the Infrastructure as Design model firsthand, and learn how to earn your Certified Meshery Contributor credential. Project maintainers are available for briefings and technical deep-dives throughout the conference.

Six years ago, a small team started building in public with the conviction that cloud native infrastructure deserved better tooling - tooling that was visual, collaborative, and open. Today, with v1.0, that conviction has a name, a community of thousands, and a production-ready platform behind it.

Thank you to every contributor, reviewer, mentee, mentor, and community member who made this release possible. This is your milestone as much as ours.

- The Meshery Maintainers

Istio graduated

Simplify observability, traffic management, security, and policy with the Istio service mesh.

Istio is Migrating Container Registries

Due to changes in Istio’s funding model, Istio images will no longer be available at gcr.io/istio-release starting January 1st, 2027. That is, clusters that reference images hosted on gcr.io/istio-release might fail to create new pods in 2027.

In fact, we are fully migrating all Istio artifacts out of Google Cloud, including Helm charts. Future communications will cover the migration of Helm charts and other artifacts. This post will focus on what you ca

Istio graduated

Simplify observability, traffic management, security, and policy with the Istio service mesh.

Security Considerations on Istio's CRDs with Namespace-based Multi-Tenancy

The Istio project wants to address a possible Man-in-the-Middle (MITM) attack scenario in which a VirtualService can redirect or intercept traffic within the service mesh. This affects namespace-based multi-tenancy clusters where tenants have the permissions to deploy Istio resources (networking.istio.io/v1).

This blog post highlights the risks of using Istio in multi-tenant clusters and explains how users can mitigate these risks and safely operate Istio in t

Score sandbox

Score is an open-source workload specification designed to simplify development for cloud-native developers.

How Engine Built a Self-Service Kubernetes Platform with Score

Recently, the Infrastructure team at Engine started a project that would change how every engineering team ships software. We were migrating dozens of services off a legacy container orchestration setup onto Kubernetes, and we had a choice to make: build another pile of bespoke tooling, or find an abstraction that could scale with us.
The real problem wasn’t Kubernetes itself. It was everything around it. Deploying a new service took days, sometimes weeks, and almost none of that time was

Kubeflow incubating

Kubeflow is the foundation of tools for AI Platforms on Kubernetes.

Kubeflow Trainer v2.2: JAX & XGBoost Runtimes, Flux for HPC Support, and TrainJob progress and metrics observability

Just a little over one week ahead of KubeCon + CloudNativeCon EU 2026, the Kubeflow team is excited to ship Trainer v2.2. The v2.2 release reinforces our commitment to expanding the Kubeflow Trainer ecosystem – meeting developers where they are by adding native support for JAX, XGBoost, and Flux, while also delivering deeper observability into training jobs.

Key highlights of the v2.2 release include:

  • First-class support for Training Runtimes for JAX and XGBoost, enabling native distributed training on Kubernetes. This marks a major milestone for the Trainer project, achieving full compatibility with Training Operator v1 CRDs: PyTorchJob, MPIJob, JAXJob, and XGBoostJob – now unified under a single TrainJob abstraction.
  • Enhanced training observability, allowing progress and metrics to be propagated directly from training scripts to the TrainJob status. Hugging Face Transformers already integrate with the KubeflowTrainerCallback to automate this capability.
  • Flux runtime support, bringing HPC workloads to Kubernetes and improving MPI bootstrapping within TrainJob.
  • TrainJob activeDeadlineSeconds API, enabling explicit timeout policies for training jobs.
  • RuntimePatches API, introducing a more flexible and scalable way to customize runtime configurations from the TrainJobs.

You can now install the Kubeflow Trainer control plane and its training runtimes with a single command:

helm install kubeflow-trainer oci://ghcr.io/kubeflow/charts/kubeflow-trainer \
    --namespace kubeflow-system \
    --create-namespace \
    --version 2.2.0 \
    --set runtimes.defaultEnabled=true

Bringing JAX to Kubernetes with Trainer

Kubeflow Trainer supports running JAX workloads on Kubernetes through the jax-distributed runtime. It is designed for distributed and parallel JAX computation using jax.distributed and SPMD primitives like pmap, pjit, and shard_map. The runtime maps one Kubernetes Pod to one JAX process and injects the required distributed environment variables so training or fine-tuning can run consistently across multiple nodes and devices.

  • Multi-process CPU training
  • Multi-GPU training using CUDA enabled JAX
  • Data-parallel and model-parallel JAX workloads
  • Massive scale TPU distributed training with ComputeClases

Start by following the Getting Started guide for Kubeflow Trainer basics and making sure you have Kubeflow SDK installed on your machine:

pip install kubeflow 

Use the jax-distributed runtime and initialize JAX distributed explicitly in your training script before any JAX computation:


from kubeflow.trainer import TrainerClient, CustomTrainer

def get_jax_dist():
import os
import jax
import jax.distributed as dist

<span class="n">dist</span><span class="p">.</span><span class="n">initialize</span><span class="p">(</span>
    <span class="n">coordinator_address</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">&#34;JAX_COORDINATOR_ADDRESS&#34;</span><span class="p">],</span>
    <span class="n">num_processes</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">&#34;JAX_NUM_PROCESSES&#34;</span><span class="p">]),</span>
    <span class="n">process_id</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">&#34;JAX_PROCESS_ID&#34;</span><span class="p">]),</span>
<span class="p">)</span>

<span class="k">print</span><span class="p">(</span><span class="s">&#34;JAX Distributed Environment&#34;</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">&#34;Local devices: </span><span class="si">{</span><span class="n">jax</span><span class="p">.</span><span class="n">local_devices</span><span class="p">()</span><span class="si">}</span><span class="s">&#34;</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">&#34;Global device count: </span><span class="si">{</span><span class="n">jax</span><span class="p">.</span><span class="n">device_count</span><span class="p">()</span><span class="si">}</span><span class="s">&#34;</span><span class="p">)</span>

<span class="kn">import</span> <span class="nn">jax.numpy</span> <span class="k">as</span> <span class="n">jnp</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">jnp</span><span class="p">.</span><span class="n">ones</span><span class="p">((</span><span class="mi">4</span><span class="p">,))</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">jax</span><span class="p">.</span><span class="n">pmap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">v</span><span class="p">:</span> <span class="n">v</span> <span class="o">*</span> <span class="n">jax</span><span class="p">.</span><span class="n">process_index</span><span class="p">())(</span><span class="n">x</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">&#34;PMAP result:&#34;</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>

client = TrainerClient()
job_id = client.train(
runtime="jax-distributed",
trainer=CustomTrainer(func=get_jax_dist),
)
client.wait_for_job_status(job_id)
print("\n".join(client.get_job_logs(name=job_id)))

The jax-distributed runtime injects JAX_NUM_PROCESSES, JAX_PROCESS_ID, and JAX_COORDINATOR_ADDRESS into the environment, and all processes must call jax.distributed.initialize() exactly once before any JAX computation.

For more details, refer to the Kubeflow Trainer JAX guide for jax.distributed and SPMD primitives.

Bringing XGBoost to Kubernetes with Trainer

Running distributed XGBoost workloads on Kubernetes has traditionally required manual setup of communication layers, environment variables, and cluster coordination. With this release, Kubeflow Trainer introduces built-in support for XGBoost, enabling seamless distributed training with minimal configuration.

The new xgboost-distributed runtime abstracts away the complexity of setting up XGBoost’s collective communication (Rabit). Trainer automatically provisions worker pods using JobSet and injects the required DMLC environment variables, allowing workers to coordinate and synchronize during training. The rank 0 pod is automatically configured to act as the tracker, simplifying cluster setup even further.

This integration supports both CPU and GPU workloads out of the box. For CPU training, each node runs a single worker leveraging OpenMP for intra-node parallelism. For GPU workloads, each GPU is mapped to an individual worker, enabling efficient scaling across nodes.

For more information, please see this Notebook example and documentation guide.

Track TrainJob Progress and Expose Metrics

In this release, Kubeflow Trainer introduces a powerful new capability to automatically update TrainJob status with real-time training progress and metrics generated directly from your ML code. This enables key insights: such as percentage completion, estimated time remaining (ETA), and training metrics–to be surfaced through the TrainJob API, eliminating the need to manually inspect training logs.

How it works

When this feature is enabled (feature flag TrainJobStatus is required), Kubeflow Trainer starts an HTTP server that exposes endpoints for reporting training progress and metrics. Client applications can send updates to these endpoints, and the TrainJob controller will automatically reflect this information in the job status. Users can then easily access these insights through the Kubeflow SDK without needing to inspect logs.

To simplify adoption, we are collaborating with popular ML frameworks to integrate Kubeflow Trainer callbacks that automate this process. With these integrations, users don’t need to change anything to make it work!

For example, this functionality is already available in Hugging Face Transformers, where metrics are automatically reported when using the Trainer:

from transformers import Trainer, TrainingArguments

trainer = Trainer(model=model, args=TrainingArguments(...), train_dataset=ds)
trainer.train() # Progress automatically reported when running in Kubeflow

Future Plans

We have an exciting roadmap for this feature, including support for periodic, transparent checkpointing based on ETA, as well as integration with OptimizationJob for hyperparameter tuning jobs.

To learn more about this feature please see this proposal.

Bringing Flux Framework for HPC and MPI Bootstrapping

Setting up distributed ML training jobs using MPI can be very time consuming: from stitching together launcher-worker topologies to configuring SSH-based bootstrapping, there’s a lot of moving parts that require code on top of your training code. In v2.2, Kubeflow Trainer brings the Flux Framework – a workload manager that combines hierarchical job management with graph-based scheduling – to handle your HPC-style scheduling needs without the overhead that typically comes with it.

Flux uses ZeroMQ to bootstrap MPI, an improvement over traditional SSH, and also brings PMIx and support for more MPI variants. When a training job is submitted, an init container automatically handles Flux’s installation, meaning that you do not need to install Flux to your application container. The plugin also handles cluster discovery, broker configuration, and CURVE certificate generation to provide cryptographic security for the overlay network.

For teams whose workloads sit at the intersection of ML and HPC, Flux serves as a portability layer that enables running simulation alongside AI/ML workloads. Scheduling to Flux bypasses any potential etcd bottlenecks, and the limitations of the Kubernetes scheduler that require tricks to batch schedule to an underlying single-pod queue. Flux enables fine-grained control over where pods land, and is ideal when you are running simulation pipelines that feed into model Training. This integration also enables the use of Process Management Interface Exascale (PMIx) to manage and coordinate large-scale MPI workloads on Kubernetes using TrainJobs, something that was previously not possible.

Apply the Flux runtime and a TrainJob manifest. For example:

kubectl apply --server-side -f https://raw.githubusercontent.com/kubeflow/trainer/refs/heads/master/examples/flux/flux-runtime.yaml
kubectl apply -f https://raw.githubusercontent.com/kubeflow/trainer/refs/heads/master/examples/flux/lammps-train-job.yaml

After that, monitor the pods with kubectl get pods --watch, and inspect the lead broker logs with kubectl logs <pod-name> -c node -f . This also shows how to run the Flux cluster in interactive mode with flux-interactive.yaml, and then use kubectl exec and flux proxy to connect to the lead broker Flux instance and manually run LAMMPS inside the cluster.

The Flux runtime depends on the mlPolicy: flux trigger in flux-runtime.yaml, and you can customize the setup through environment variables such as FLUX_VIEW_IMAGE and FLUX_NETWORK_DEVICE. Binaries are installed under /mnt/flux, software is copied to /opt/software, and configurations are stored in /etc/flux-config. Related documentation includes the Kubeflow Trainer Getting Started guide, the Flux example manifests, and the Flux Framework HPSF project resources. A simple implementation has been done for this first go, and users are encouraged to submit feedback to request exposure of additional features. A demo video will be showcased at the KubeCon + CloudNativeCon 2026 EU booth for those that can attend.

You can learn more about this in our Flux Guide.

Resource Timeout for TrainJobs

Previously, TrainJob resources persisted in the cluster indefinitely after completion unless manually removed, which led to Etcd bloat, resource contention and no automatic garbage collection. A job could also get stuck or run indefinitely, wasting CPU/GPU capacity and reducing cluster efficiency. In v2.2, Kubeflow Trainer adds support for ActiveDeadlineSeconds API in TrainJob. This field lets users set a hard timeout (in seconds) for a TrainJob’s active execution timeline. When the deadline is exceeded, Trainer marks the TrainJob as Failed (reason: DeadlineExceeded), terminates the running workload, and deletes the underlying JobSet.

There’s a couple ways to specify the timeout limit of a job, the first one is by modifying the TrainJob manifest directly:

apiVersion: trainer.kubeflow.org/v1alpha1
kind: TrainJob
metadata:
  name: quick-experiment
spec:
  activeDeadlineSeconds: 28800 #Max runtime 8 hours
runtimeRef:
  name: torch-distributed-gpu
trainer:
  image: my-training:latest
  numNodes: 2

More information about how to configure lifecycle policies for TrainJobs can be found in our TrainJob Lifecycle Guide

RuntimePatches API to override TrainJob defaults

In many distributed learning environments, multiple controllers can interact with the same TrainJob manifest, making ownership boundaries really important to preserve. The new RuntimePatches API replaces PodTemplateOverrides with a manager-keyed structure that makes it explicit on who applied what and when.

Each patch is scoped to a named manager and can target specific jobs or pods within the runtime, with both job-level and pod-level overrides supported. This means Kueue can inject node selectors and tolerations into the trainer pod without conflicting with another controller managing job-level metadata, and the full history of what was applied is preserved directly in the spec.

In the new TrainJob manifest, every manager owns its own entry, pod and job overrides are separate fields under that manager. Note that your manager field will be immutable after creation:

apiVersion: trainer.kubeflow.org/v2alpha1
kind: TrainJob
metadata:
  name: pytorch-distributed
spec:
  runtimeRef:
    name: pytorch-distributed-gpu
  trainer:
    image: docker.io/custom-training
  runtimePatches:
    - manager: trainer.kubeflow.org/kubeflow-sdk # who owns this entry (immutable)
      trainingRuntimeSpec:
        template:
          spec:
            replicatedJobs:
              - name: node
                template:
                  spec:
                    template:
                      spec:
                        nodeSelector:
                          accelerator: nvidia-tesla-v100

Note that the RuntimePatches API cannot be used to set environment variables for the node, dataset-initializer, or model-initializer containers, nor to override command, args, image, or resources on the trainer container.

For a complete description of the API’s structure, restrictions and use cases, check out the RuntimePatches Operator Guide.

⚠️ This API introduces Breaking Changes!!

PodTemplateOverrides has been removed in v2.2. If you’re currently using it in your TrainJob manifests, you’ll need to migrate to the RuntimePatches API.

Breaking Changes

This release introduces a set of architectural improvements and breaking changes that lay the foundations for a more scalable and modularized Trainer. Please review the following when upgrading to Trainer v2.2:

Replace PodTemplateOverrides with RuntimePatches API

As mentioned above, PodTemplateOverrides has been replaced with RuntimePatches API to support manager-scoped customization and prevent conflicts when multiple controllers are patching the same TrainJob.

If you are using PodTemplateOverrides in your TrainJob manifests or SDK code, you will need to migrate to the manager-keyed RuntimePatches structure. See the RuntimePatches Operator Guide, and Options Reference for more information.

Remove numProcPerNode from the Torch MLPolicy API

The numProcPerNode field has been removed from the Torch MLPolicy. Process-per-node configuration is now handled directly through the container resources, so any TrainJob manifests or SDK calls that set numProcPerNode explicitly will need to be updated before upgrading to v2.2.

Remove ElasticPolicy API

The ElasticPolicy API has been removed from MLPolicy in Trainer v2.2. Elastic training is not yet available in this release, we are actively working on a redesigned implementation for future release. If your TrainJobs rely on elastic training configuration, please hold off on upgrading until that work lands.

Some TrainJob API fields are now immutable

Several TrainJob spec fields are now properly enforced as immutable after job creation. This rejects modifications to fields such as .spec.trainer.image on a running TrainJob upfront instead of having it silently fail at the JobSet controller level. If your workflows rely on updating these fields on a running TrainJob, those updates will now be rejected by the admission webhood. Please review your TrainJob update logic to ensure compatibility with our immutability policies in v2.2.

Release Notes

For the complete list of all pull requests, visit the GitHub release page: https://github.com/kubeflow/trainer/releases/tag/v2.2.0

Roadmap Moving Forward

We are excited to continue pushing Kubeflow as a state of the art platform for distributed ML training by making TrainJob manifests more observable and more performant across a wide range of hardware.

One area we’re particularly excited about is bringing Multi-Node NVLink (MNNVL) support for TrainJobs, enabling them to treat GPUs across multiple machines as a single unified memory domain. For large-scale training, this means significantly faster node-to-node communication compared to standard network-based primitives and brings forth a new era of configurations that simply weren’t practical before on Kubernetes. We are working closely with Kubernetes community to introduce first class support for Dynamic Resource Allocation (DRA) in TrainJobs.

We look forward to introducing Automatic configuration of GPU requests for TrainJobs that will take the guesswork out of choosing the right resources. With intelligent methods guiding the process, Trainer will choose appropriate resources automatically based on the TrainJob configuration. This gives teams the power to plan experiments with confidence and trust that jobs use just the right amount of compute.

Workload-Aware Scheduling (WAS) is also actively being integrated with the native Kubernetes Workload API for TrainJob to bring robust gang-scheduling support for distributed training without third party plugins. The integration will be available after Kubernetes v1.36, and we plan to extend it further to support Topology-Aware Scheduling and Dynamic Resource Allocation (DRA) as those APIs mature.

A full list of our 2026 roadmap can be found here.

Join the Community

The Kubeflow Trainer is built by and for the community. We welcome contributions, feedback, and participation from everyone! We want to thank the community for their contributions to this release. We invite you to:

Contribute:

Connect with the Community:

Learn More:

Headed to KubeCon + CloudNativeCon 2026 EU? Stop by the Kubeflow booth to see these features in action 😸🧊!!

Kubeflow incubating

Kubeflow is the foundation of tools for AI Platforms on Kubernetes.

Kubeflow SDK v0.4.0: Model Registry, SparkConnect, and Enhanced Developer Experience

Explore the full documentation at sdk.kubeflow.org

With KubeCon just around the corner, we are pleased to announce the release of Kubeflow SDK v0.4.0. This release continues the work toward providing a unified, Pythonic interface for all AI workloads on Kubernetes.

The v0.4.0 release focuses on bridging the gap between data engineering, model management, and production-ready ML pipelines. The Kubeflow SDK now covers most of the MLOps lifecycle – from data processing and hyperparameter optimization to model training and registration:

Kubeflow SDK Diagram

Highlights in Kubeflow SDK v0.4.0 include:

Unified Model Management: The Model Registry Client

Managing model artifacts, versions, and metadata across experiments has historically required stitching together multiple tools outside of your training code. In v0.4.0, the SDK introduces ModelRegistryClient – a Pythonic interface to the Kubeflow Model Registry, available under the new kubeflow.hub submodule.

The client exposes a minimal, curated API: register models, retrieve them by name and version, update their metadata, and iterate over what’s in your registry – all without leaving the SDK. It integrates directly with the Model Registry server and supports token auth and custom CA configuration for production clusters. To install the Model Registry server, see the installation guide.

Install the hub extra to get started:

pip install 'kubeflow[hub]'

Usage Example

from kubeflow.hub import ModelRegistryClient

client = ModelRegistryClient(
"https://model-registry.kubeflow.svc.cluster.local&#34;,
author="Your Name",
)

# Register a model
model = client.register_model(
name="my-model",
uri="s3://bucket/path/to/model",
version="1.0.0",
model_format_name="pytorch",
)

# List all models
for model in client.list_models():
print(f"Model: {model.name}")

# Get a specific version and artifact
version = client.get_model_version("my-model", "1.0.0")
artifact = client.get_model_artifact("my-model", "1.0.0")
print(f"Model URI: {artifact.uri}")

Note: list_models() and list_model_versions() return lazy iterators backed by pagination, so only the data you consume results in API calls – making it efficient to work with large registries.

Distributed AI Data at Scale: SparkClient & SparkConnect

Data is a fundamental piece to every AI workload, and Apache Spark has become a cornerstone technology for large-scale data processing. However, deploying and managing Spark workloads on Kubernetes has traditionally required users to work directly with Kubernetes manifests and YAML configurations – a process that can be operationally complex. In v0.4.0, the SDK introduces SparkClient – a high-level, Pythonic API that eliminates this complexity, allowing data engineers and ML practitioners to manage interactive and batch Spark workloads on Kubernetes without writing a single line of YAML. Backed by the Kubeflow Spark Operator (KEP-107), the initial version of SparkClient introduces support for interactive sessions through the SparkConnect custom resource. In future releases of the Kubeflow SDK, we will expand this support to include batch workloads as well.

SparkClient supports two operational modes. In create mode, the SDK provisions a new SparkConnect interactive session on Kubernetes for you – handling CRD creation, pod scheduling, networking, and cleanup automatically. In connect mode, you point it at an existing Spark Connect server, useful for shared clusters or cross-namespace access. Either way, you get back a standard SparkSession and can write the same PySpark code you already know.

Install Kubeflow Spark support:

pip install 'kubeflow[spark]'

To install the Spark Operator, see the installation guide.

Usage Example

from kubeflow.spark import SparkClient, Name
from kubeflow.common.types import KubernetesBackendConfig

client = SparkClient(
backend_config=KubernetesBackendConfig(namespace="spark-test")
)

# Level 1: Minimal - use all defaults
spark = client.connect(options=[Name("my-session")])
df = spark.range(5)
df.show()
client.delete_session("my-session")

# Level 2: Simple -- configure executors and resources
spark = client.connect(
num_executors=5,
resources_per_executor={"cpu": "5", "memory": "1Gi"},
spark_conf={"spark.sql.adaptive.enabled": "true"},
options=[Name("my-session-2")],
)
df = spark.range(5)
df.show()
client.delete_session("my-session-2")

# Connect mode -- attach to an existing Spark Connect server
spark = client.connect(base_url="sc://spark-server:15002")
df = spark.sql("SELECT * FROM my_table")
df.show()

Default specifications: Spark 4.0.1, 1 executor, 512Mi memory and 1 CPU per pod, 300 second session timeout.

Note: v0.4.0 focuses on SparkConnect session management. Batch job support via SparkApplication CR (submit_job, get_job, list_jobs) is planned for a future release.

A New Home for Documentation

To support the Kubeflow SDK users and contributors, we’ve introduced a dedicated Kubeflow SDK Website. This site includes:

  • Quickstart: Train your first model with Kubeflow SDK
  • API Reference: Automatically updated documentation for all SDK modules.
  • Examples: Step-by-step guides from local prototyping to remote training.

Infrastructure & Breaking Changes

This release includes several architectural updates to ensure the SDK remains secure, scalable, and easy to use. Please note the following requirements when upgrading to v0.4.0.

Better Isolation with Namespaced TrainingRuntimes

Security and multi-tenancy are core to Kubeflow. In v0.4.0, we’ve introduced support for Namespaced TrainingRuntimes. This allows platform teams to provide curated training environments at the namespace level, ensuring that one team’s custom training configuration doesn’t interfere with another’s.

Upgrade Note: The SDK now prioritizes namespaced runtimes over cluster-wide ones. If you have runtimes with duplicate names in different scopes, verify your TrainerClient calls are targeting the intended resources.

Furthering Parity Between Local and Remote Execution

One of the biggest hurdles in MLOps is the “it worked on my machine” syndrome. With the addition of Dataset and Model Initializers for the ContainerBackend, the SDK now emulates how Kubernetes handles data dependencies.

Whether you are running locally on Docker or at scale on a cluster, the SDK now automatically manages the “plumbing” of mounting and initializing your data. This ensures your local development environment mirrors the data-loading behavior of your production training jobs.

Required: Upgrading to Python 3.10+

To maintain a secure and performant codebase, Kubeflow SDK v0.4.0 is officially moving its minimum requirement to Python 3.10.

This change ensures that all SDK users benefit from better security patches, improved type-hinting, and more efficient asynchronous networking for our API clients.

To Upgrade: Ensure your local environment, Notebook images, and CI/CD pipelines are running Python 3.10 or higher before running pip install --upgrade kubeflow

What’s Next for Kubeflow SDK

Looking ahead, the Kubeflow SDK 2026 Roadmap outlines several exciting initiatives:

  • Kubeflow MCP Server to enable AI-assisted interactions with Kubeflow resources
  • OpenTelemetry integration for improved observability across SDK operations
  • MLflow support for experiment tracking and metrics
  • First class support for Kubeflow Pipelines to bring KFP into the unified SDK
  • TrainJob checkpointing and dynamic LLM Trainers for more flexible and resilient training workflows
  • End-to-end AI pipelines orchestrating data processing, training, and optimization using SparkClient, TrainerClient, and OptimizerClient
  • Multi-cluster job submission leveraging Kueue and Multi-Kueue capabilities for Spark and training workloads
  • Batch Spark job support via SparkApplication CR for submit, get, and list operations

We encourage the community to review and contribute to the roadmap.

Get Involved!

The Kubeflow SDK is built by and for the community. We welcome contributions, feedback, and participation from everyone! We want to thank the community for their contributions to this release. We invite you to:

Connect with the Community:

Learn More

Headed to KubeCon + CloudNativeCon 2026 EU? Stop by the Kubeflow booth to see these features in action!

Headlamp sandbox

Extensible open source multi-cluster Kubernetes user interface

Headlamp Team at KubeCon Europe 2026

Headlamp is heading to Amsterdam! KubeCon + CloudNativeCon Europe 2026 runs March 24-26 (with co-located events starting March 22), and we have plenty going on: conference talks, a Maintainer Summit session, a hands-on ContribFest, and our kiosk at the Project Pavilion. Here's the full rundown.

Talks

This year we have Headlamp-related talks from both core members of the project, and from the wider community.

Title: Leveling up with Radius: Custom Resources and Headlamp Integration for Real-World Workloads
Speakers: Nuno Guedes (Millennium BCP) and Will Tsai (Microsoft)
Date and time: Tuesday, March 24, 3:15 PM - 3:45 PM CET
Room: Forum
Description: Learn how Millennium bcp extended the Radius framework for production workloads with complex dependencies (Datadog monitors, AI models, internal APIs) and built a Headlamp plugin that visualizes the app graph and maps dependencies across cloud platforms.

Title: How To (Not) Fork Headlamp
Speaker: Joaquim Rocha (Amutable)
Date and time: Thursday, March 26, 11:45 AM - 12:15 PM CET
Room: E106-108
Description: Should you write a plugin or fork the whole project? This talk walks through Headlamp's architecture and plugin system, covers the trade-offs, and shares practical advice for keeping your customizations maintainable either way.

Title: Ping SRE? I Am the SRE! Awesome Fun I Had Drawing a Zine for Troubleshooting Kubernetes Deployments
Speaker: Rene Dudfield (Microsoft)
Date and time: Wednesday, March 25, 4:45 PM - 5:15 PM CET
Room: Hall 7, Room A
Description: Patterns from troubleshooting Kubernetes issues in the Headlamp community turned into a hand-drawn mini zine for diagnosing deployment problems. Come see how a notebook full of doodles became a 16-page troubleshooting guide, and maybe get inspired to draw your own.

Title: Headlamp: Build Kubernetes Experiences Your Way! (ContribFest)
Speakers: Joaquim Rocha (Amutable) and Santhosh Nagaraj (Microsoft)
Date and time: Wednesday, March 25, 11:00 AM - 12:15 PM CET
Room: G107
Description: A hands-on workshop where you'll build a Headlamp plugin with guidance from the maintainers. Whether you're just getting started or already have contributions in flight, this is a great chance to dig in. Bring your laptop!

Co-located: Maintainer Summit

Title: Does Your Project Want a UI in kubernetes-sigs/headlamp?
Speakers: Rene Dudfield and Santhosh Nagaraj (Microsoft)
Date and time: Sunday, March 22, 11:35 AM - 12:10 PM CET
Room: Forum
Description: Headlamp already has UI plugins for projects like cert-manager, Gateway API, Karpenter, and KEDA. In this Maintainer Summit session, the team invites CNCF project maintainers to collaborate on bringing UI support to even more projects.

Project Pavilion

Come say hi at our kiosk (P-6B) in the Project Pavilion (Solutions Showcase, Halls 1-5). We'll be there with live demos and happy to chat about anything Headlamp, on Wednesday, March 25: 10:00 AM - 1:30 PM CET.

Also at KubeCon

Headlamp is also expected to make an appearance in a couple of other sessions:

See You There

Whether it's at a talk, the ContribFest, or the Project Pavilion, we'd love to connect. See you in Amsterdam!

Score sandbox

Score is an open-source workload specification designed to simplify development for cloud-native developers.

Score is now onboarded into the Docker-Sponsored Open Source Program

As CNCF Maintainers of the Score project (CNCF Sandbox), we recently embarked on a journey to strengthen our security posture by participating in the Docker Sponsored Open Source Program. This post shares our experience, learnings, and the tangible security improvements we achieved. Our goal is to inspire others to take advantages of these security best practices by default for their own open source projects, under the CNCF umbrella and not only.

Flux graduated

Flux is a tool for keeping Kubernetes clusters in sync with sources of configuration (like Git repositories and OCI artifacts), and automating updates to configuration when there is new code to deploy. Flux is built from the ground up to use Kubernetes' API extension system, and to integrate with Prometheus and other core components of the Kubernetes ecosystem....

Blog: Stairway to GitOps: Scaling Flux at Morgan Stanley

One of the things we love most about this community is hearing how you take Flux and run with it - truly solving problems for teams at scale. At our inaugural FluxCon NA, Tiffany Wang and Simon Bourassa from Morgan Stanley gave us a glimpse of their Flux environm

KServe incubating

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Announcing KServe v0.17 - Production-Ready LLM Serving with LLMInferenceService

Published on March 13, 2026

We are excited to announce the release of KServe v0.17, a landmark release that brings LLMInferenceService to production readiness with a GenAI-first architecture built on the llm-d framework. This release introduces KV-cache aware intelligent routing, disaggregated prefill-decode, distributed inference with tensor/data/expert parallelism, Envoy AI Gateway integration with token-based rate limiting, and a completely restructured modular Helm chart architecture.

🤖 LLMInferenceService: GenAI-First Architecture

KServe v0.17 elevates LLMInferenceService from an experimental feature to a production-ready CRD purpose-built for generative AI workloads. Built on the llm-d framework, LLMInferenceService provides a GenAI-first architecture that goes beyond traditional InferenceService to address the unique challenges of serving large language models at scale.

Unlike InferenceService which is designed for predictive AI workloads, LLMInferenceService natively supports:

  • Distributed inference across multiple nodes and GPUs
  • KV-cache aware scheduling for intelligent request routing
  • Disaggregated prefill-decode for optimal resource utilization
  • Gateway Inference Extension (GIE) integration for advanced traffic management
  • Token-based rate limiting via Envoy AI Gateway
FeatureInferenceServiceLLMInferenceService
Primary Use CasePredictive AIGenerative AI
RoutingStandard GatewayKV-cache aware with EPP
ParallelismWorker SpecTP, DP, EP native support
Prefill-DecodeN/ADisaggregated separation
ScalingHPA/KPAWVA + KEDA
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-serving
spec:
model:
uri: hf://meta-llama/Llama-3.1-8B-Instruct
name: meta-llama--Llama-3.1-8B-Instruct
replicas: 3
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
pool: {}

This creates a full serving stack including the Deployment, Service, Gateway, HTTPRoute, InferencePool, InferenceModel, and EPP (Endpoint Picker Pod) — all managed by the LLMInferenceService controller.

🚀 Key LLMInferenceService Features in v0.17

🧠 KV-Cache Aware Scheduling with Gateway Inference Extension

LLMInferenceService integrates with Gateway Inference Extension (GIE) v1.3.0, a Kubernetes SIG project that extends the Gateway API with AI-specific routing capabilities. At the heart of this integration is the Endpoint Picker Pod (EPP) from the llm-d inference scheduler, an intelligent scheduler that routes requests based on real-time KV-cache state rather than simple round-robin or random load balancing.

Traditional load balancing treats all LLM inference requests equally, but in practice, requests with similar prompts benefit enormously from being routed to the same pod — because that pod already has the relevant KV cache blocks loaded. The EPP solves this by tracking real-time KV cache states across all vLLM instances via ZMQ events (BlockStored, BlockRemoved) and building an index mapping {ModelName, BlockHash}{PodID, DeviceTier}.

The scheduling behavior is configured through EndpointPickerConfig, which defines a plugin pipeline with weighted scorers:

apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: EndpointPickerConfig
plugins:
- type: single-profile-handler
- type: prefix-cache-scorer
- type: load-aware-scorer
parameters:
threshold: 100
- type: max-score-picker
schedulingProfiles:
- name: default
plugins:
- pluginRef: prefix-cache-scorer
weight: 2.0
- pluginRef: load-aware-scorer
weight: 1.0
- pluginRef: max-score-picker

The pipeline uses three types of plugins (see llm-d scheduler architecture for details):

  • prefix-cache-scorer (weight: 2.0): Tracks the actual KV cache contents across all vLLM instances and scores pods based on how many cached prefix blocks match the incoming request's prompt. This reduces Time To First Token (TTFT) by avoiding redundant prefill computation for repeated or similar prompts — particularly beneficial for multi-turn conversations and RAG workloads.
  • load-aware-scorer (weight: 1.0): Scores candidate pods based on their current queue depth. Pods with empty queues score 0.5, while pods with growing queues score progressively lower toward 0. The threshold parameter controls the sensitivity — when queue depth exceeds the threshold, the pod scores near zero.
  • max-score-picker: After all scorers run, selects the pod with the highest weighted aggregate score.

The EndpointPickerConfig can be provided inline in the LLMInferenceService spec or referenced from a ConfigMap, giving platform teams the flexibility to standardize scheduling behavior across deployments:

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-with-scheduler
spec:
model:
uri: hf://meta-llama/Llama-3.1-8B-Instruct
name: meta-llama--Llama-3.1-8B-Instruct
replicas: 4
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
config:
ref:
name: custom-endpoint-picker-config
key: endpoint-picker-config.yaml
pool: {}

The GIE CRDs (InferencePool and InferenceModel) are now bundled as part of the KServe installation, simplifying setup.

🔀 Disaggregated Prefill-Decode

LLMInferenceService natively supports disaggregated prefill-decode, which separates the compute-intensive prefill phase from the memory-intensive decode phase into independent workloads. This allows each phase to be scaled and optimized independently.

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-prefill-decode
spec:
model:
uri: hf://meta-llama/Llama-3.1-8B-Instruct
name: meta-llama--Llama-3.1-8B-Instruct
replicas: 2
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
prefill:
replicas: 2
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
pool: {}

KV cache data is transferred between prefill and decode pods using NixlConnector with RDMA-based RoCE for high-throughput, low-latency block transfers.

📐 Distributed Inference: Tensor, Data, and Expert Parallelism

LLMInferenceService introduces a comprehensive parallelism specification for distributed inference across multiple nodes and GPUs using LeaderWorkerSet:

  • Tensor Parallelism (TP): Splits model layers across GPUs within a node
  • Data Parallelism (DP): Runs multiple model replicas for higher throughput
  • Data-Local Parallelism: Controls GPUs per node for optimal NUMA affinity
  • Expert Parallelism (EP): Distributes Mixture-of-Experts (MoE) model experts across GPUs
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-multi-node
spec:
model:
uri: hf://meta-llama/Llama-3.1-70B-Instruct
name: meta-llama--Llama-3.1-70B-Instruct
replicas: 8
parallelism:
tensor: 4
data: 8
dataLocal: 4
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "4"
worker:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "4"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
pool: {}

🌐 Envoy AI Gateway Integration with Token-Based Rate Limiting

LLMInferenceService integrates with Envoy AI Gateway for AI-native traffic management. This enables token-based rate limiting — a capability critical for LLM serving where request cost varies dramatically based on input and output token counts rather than simple request counts.

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: llm-route
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: llama3-serving
llmRequestCosts:
- metadataKey: llm_input_token
type: InputToken
- metadataKey: llm_output_token
type: OutputToken
- metadataKey: llm_total_token
type: TotalToken
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: llm-rate-limit
spec:
targetRefs:
- group: aigateway.envoyproxy.io
kind: AIGatewayRoute
name: llm-route
rateLimit:
type: Global
global:
rules:
- clientSelectors:
- headers:
- name: x-user-id
type: Distinct
limit:
requests: 1000
unit: Hour
cost:
request:
from: Number
number: 0
response:
from: Metadata
key: llm_total_token

⚡ Autoscaling API with WVA Support

A new autoscaling API has been added to LLMInferenceService with support for the Workload Variant Autoscaler (WVA), a Kubernetes-based global autoscaler designed specifically for LLM inference workloads. Traditional CPU/memory-based autoscaling is inadequate for LLMs because inference cost is driven by token throughput, KV cache utilization, and queue depth rather than CPU or memory usage.

WVA continuously monitors inference server metrics via Prometheus — specifically KV cache utilization and queue depth — to determine when servers are approaching saturation. It then computes a wva_desired_replicas metric and emits it to Prometheus, where an actuator backend (HPA or KEDA) reads it to drive the actual scaling:

  • WVA + KEDA: Queries Prometheus directly for the wva_desired_replicas metric. Does not require Prometheus Adapter. Supports idle scale-to-zero via idleReplicaCount.
  • WVA + HPA: Reads the wva_desired_replicas metric via Kubernetes Metrics API. Requires Prometheus Adapter. Supports standard HPA scaling behaviors.

A key concept in WVA is the variant — a specific deployment configuration (hardware, runtime, parallelism strategy) for serving a model. The same base model might be served by multiple variants: for example, Llama-3 on A100 GPUs with TP=4 is one variant, while Llama-3 on H100 GPUs with TP=2 is another. The variantCost field specifies the relative cost per replica for each variant, enabling WVA to make cost-aware scaling decisions across variants — scaling up the cheaper variant first when demand increases, and scaling down the most expensive variant first when demand decreases.

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceService
metadata:
name: llama3-wva-autoscaling
spec:
model:
uri: hf://meta-llama/Llama-3.1-8B-Instruct
name: meta-llama--Llama-3.1-8B-Instruct
scaling:
minReplicas: 1
maxReplicas: 10
wva:
variantCost: "15.0"
keda:
pollingInterval: 30
cooldownPeriod: 300
initialCooldownPeriod: 120
idleReplicaCount: 0
fallback:
failureThreshold: 3
replicas: 2
template:
spec:
containers:
- name: vllm
resources:
limits:
nvidia.com/gpu: "1"
router:
gateway:
managed: {}
route:
httpRoute: {}
scheduler:
pool: {}

In the example above, variantCost: "15.0" indicates the relative cost of running each replica of this variant. If another variant of the same model has variantCost: "5.0", WVA would prefer to add capacity on that cheaper variant before scaling up this one. The default value is "10.0" if not specified. When using the KEDA backend, the fallback field ensures the deployment maintains a minimum replica count (here, 2 replicas) even if the metrics pipeline fails — a critical safety net for production LLM deployments.

🔧 Scheduler High Availability

The LLMInferenceService scheduler (EPP) now supports scaling and high availability, allowing multiple EPP replicas for production deployments that require fault tolerance and higher routing throughput.

🛡️ CRD Webhook Validation

LLMInferenceService now includes CRD webhook validation with comprehensive E2E tests, providing early feedback on invalid configurations before they reach the controller. This catches errors in parallelism settings, workload specifications, and router configurations at admission time.

📋 Configuration Composition with LLMInferenceServiceConfig

LLMInferenceService supports a configuration composition model through LLMInferenceServiceConfig, enabling reusable templates that can be shared across multiple LLMInferenceService resources. The merge order follows:

  1. Well-Known Configs → 2. Explicit BaseRefs → 3. LLMInferenceService Spec

This allows platform teams to define standardized vLLM worker templates, router/scheduler configurations, and resource defaults while giving application teams the ability to override specific settings.

📦 Additional LLMInferenceService Improvements

  • Label and annotation propagation to downstream workload resources (#5009)
  • Prometheus annotation propagation to workloads for metrics collection (#5086)
  • Certificate management with DNS/IP SAN and automatic renewal for self-signed certs (#5099)
  • Improved CA bundle management for secure communication (#4803)
  • Optional storageInitializer — skip model download when using pre-loaded models (#4970)
  • InferencePool auto-migration for seamless upgrades (#5007)
  • Route-only completions through InferencePool for chat/completion endpoints (#5087)
  • Startup probes for vLLM containers for more reliable health monitoring (#5063)
  • vLLM arguments migrated to command field for cleaner configuration (#5049)
  • Versioned well-known config resolution for stable config management (#5096)
  • Scheduler config via ConfigMap or inline for flexible configuration (#4856)
  • Pod init container failure monitoring for better observability (#5034)
  • Preserve externally managed replicas during reconciliation (#4996)
  • Allow stopping LLMInferenceService gracefully (#4839)
  • Enhanced Gateway API URL discovery with listener hostname fallback (#5104, #5079)

🏗️ Modular Component Architecture

KServe v0.17 introduces a fundamental architectural shift toward modular, component-based deployment. KServe now consists of three independent components:

  • kserve (core): Manages InferenceService, ServingRuntime, ClusterServingRuntime, InferenceGraph, and TrainedModel CRDs.
  • llmisvc: The LLMInferenceService controller for generative AI workloads, managing LLMInferenceService and LLMInferenceServiceConfig CRDs.
  • localmodel (optional): The LocalModel controller for efficient model caching with LocalModelCache, LocalModelNode, and LocalModelNodeGroup CRDs.
CombinationUse CaseComponents
KServe OnlyPredictive AIkserve
KServe + LLMIsvcPredictive AI + Generative AIkserve + llmisvc
Full StackPredictive AI + Generative AI + Model Cachingkserve + llmisvc + localmodel

Helm Chart Restructuring

To support the new component architecture, the Helm charts have been completely restructured from a single chart into 10 independent Helm charts:

CRD Charts (6 charts with full and minimal variants):

  • kserve-crd / kserve-crd-minimal
  • kserve-llmisvc-crd / kserve-llmisvc-crd-minimal
  • kserve-localmodel-crd / kserve-localmodel-crd-minimal

Resource Charts (4 charts):

  • kserve-resources (renamed from kserve)
  • kserve-llmisvc-resources (new)
  • kserve-localmodel-resources (new)
  • kserve-runtime-configs (new — manages ClusterServingRuntimes and LLMIsvcConfigs)
warning

This is a breaking change. Users upgrading from v0.16 cannot use a simple helm upgrade command. Please follow the detailed upgrade guide for step-by-step migration instructions. We strongly recommend testing the upgrade in a non-production environment first.

For fresh installations, the new Kustomize component-based architecture also provides composable deployment options via standalone overlays, addon overlays, and all-in-one overlays. See the installation concepts for details.

🔧 InferenceService and Platform Improvements

Storage Performance

  • Parallelized blob downloads from Azure and S3 for faster model loading (#4709, #4714)
  • Faster parallel S3 downloads with configurable file selection (#5102, #5119)
  • Git repository support for downloading models directly from Git repos via HTTPS (#4966)

New Serving Runtimes

  • OpenVINO Model Server — Intel's optimized inference runtime for high-performance serving on Intel hardware (#4592)
  • PredictiveServer runtime with full build/publish infrastructure and E2E testing (#4954)

Gateway & Routing

  • Gateway API upgraded to v1.4.0 (#5038)
  • PathTemplate configuration for flexible inference service routing (#4817)

vLLM Backend

  • Upgraded to vLLM v0.15.1 with performance improvements (#5098)
  • Removed Python 3.9 support (#4851)

Additional Enhancements

  • CSV and Parquet marshallers for expanded data format support (#5115)
  • Event loop configuration with new --event_loop flag supporting auto, asyncio, and uvloop (#4971)
  • Annotation-based runtime defaults for MLServer (#5064)
  • INFERENCE_SERVICE_NAME environment variable exposed to serving containers (#5013)
  • Failure condition surfacing in InferenceService status (#5114)
  • Inference log batching with external marshalling support (#5061)

Infrastructure Updates

  • Kubernetes packages bumped to v0.34.0
  • Knative Serving updated to v1.21.1
  • Go updated to 1.25
  • Kubebuilder updated to 1.9.0
  • KEDA bumped from 2.16.1 to 2.17.3
  • MinIO replaced with SeaweedFS for testing infrastructure

🔒 Security Fixes

Multiple security vulnerabilities have been addressed:

  • CVE-2025-62727 (Starlette)
  • CVE-2025-22872, CVE-2025-47914, CVE-2025-58181
  • CVE-2024-43598 (LightGBM updated to 4.6.0)
  • CVE-2025-43859 (h11 HTTP parsing)
  • CVE-2025-66418 (decompression chain)
  • CVE-2025-68156 (expr-lang/expr)
  • CVE-2026-26007 (cryptography subgroup attack)
  • CVE-2026-24486 (python-multipart arbitrary file write)
  • Path traversal vulnerabilities in https.go and tar extraction

🔍 Release Notes

For the complete list of all 167 merged pull requests, bug fixes, and known issues, visit the GitHub release pages:

🙏 Acknowledgments

We extend our gratitude to all 38+ contributors who made this release possible, including 21 first-time contributors. Your efforts continue to drive the advancement of KServe as a leading platform for serving machine learning models.

  • Core Contributors: The KServe maintainers and regular contributors
  • Community: Everyone who reported issues, provided feedback, and tested features
  • New Contributors: Welcome to all first-time contributors who helped shape this release

🤝 Join the Community

We invite you to explore the new features in KServe v0.17 and contribute to the ongoing development of the project:

Happy serving!


The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!

Meshery sandbox

As a self-service engineering platform, Meshery enables collaborative design and operation of cloud and cloud native infrastructure.

Mesheryctl Relationship Commands Promoted From Experimental

If you are managing cloud-native infrastructure with Meshery, understanding how your components interact is critical. This post walks you through the mesheryctl relationship commands and celebrates an important milestone: their officially graduated from experimental mode.

From mesheryctl exp relationship to mesheryctl relationship

After a period of stabilization, community feedback, and real-world usage, the relationship commands have been promoted to stable and moved to the top-level namespace:

Before (experimental) After (stable)
mesheryctl exp relationship generate mesheryctl relationship generate
mesheryctl exp relationship list mesheryctl relationship list
mesheryctl exp relationship search mesheryctl relationship search
mesheryctl exp relationship view mesheryctl relationship view

What is a Meshery Relationship?
In the Meshery ecosystem, a relationship defines how two or more components are interconnected. Relationships capture the dependencies, policies, and interactions between components within a model. They are organized by kind (e.g., hierarchical, edge), type, and subtype (e.g., parent, binding) and are evaluated by Meshery’s policy engine to enforce design constraints and visualize architectural intent.
Learn more about Meshery Relationships

The mesheryctl relationship command gives you a convenient CLI interface to interact with the relationships registered in your Meshery Server. It exposes four subcommands — list, search, view, and generate — each targeting a specific use case.


Base command: mesheryctl relationship

Description: The root command for managing relationships. On its own, it prints usage information. Combined with the --count flag, it returns the total number of relationships registered in Meshery Server.

Flags:

Flag Short Default Description
--count -c false Get the total number of relationships
--help -h   Display help for the command

Example — display the total count of registered relationships:

~$ mesheryctl relationship --count
Total number of relationships: 597

mesheryctl relationship list

Description: Lists all relationships registered in Meshery Server, displaying their ID, kind, API version, model name, subtype, and type in a tabular format. Supports paginated output so you can navigate through large sets of results interactively.

Flags:

Flag Short Default Description
--page -p 1 List next set of relationships at the specified page number
--pagesize   10 Number of results per page
--count -c false Display the total count of relationships only
--help -h   Display help for the command

Example — list all relationships (page 1, 10 results):

~$ mesheryctl relationship list
Total number of relationships: 597
Page: 1
ID                                    KIND          API VERSION  MODEL NAME                             SUB TYPE   TYPE
0f9ba842-d709-4d2b-a60e-f4c2b46d02ad  edge          v1.0.0       aws-apigatewayv2-controller            network    non-binding
c360e677-c0e2-4f21-a50f-94c5318a4e21  edge          v1.0.0       aws-apigatewayv2-controller            reference  non-binding
023becab-18f5-4eae-bdd2-1ef03eecffd6  edge          v1.0.0       aws-apigatewayv2-controller            reference  non-binding
7a77e701-bf34-4a07-9aff-41e61b1d87dd  edge          v1.0.0       aws-apigatewayv2-controller            reference  non-binding
13f0e4f2-81f1-4714-b850-88a8fe0d8acd  edge          v1.0.0       aws-apigatewayv2-controller            reference  non-binding
644b97c4-7f9e-41d8-9676-deb34b873cea  hierarchical  v1.0.0       aws-apigatewayv2-controller            inventory  parent
896cb3d1-1b37-47cc-91af-5f9003ef5182  edge          v1.0.0       aws-apigatewayv2-controller            reference  non-binding
343d7ee3-bf0c-41fa-95ad-deb2d6562ba8  edge          v1.0.0       aws-applicationautoscaling-controller  reference  non-binding
ec82ff50-d8dc-4c55-bb0b-a5633546b0ca  edge          v1.0.0       aws-applicationautoscaling-controller  reference  non-binding
b98483ea-b70d-40fd-915f-7e624290cf42  edge          v1.0.0       aws-applicationautoscaling-controller  reference  non-binding

Additional usage examples:

# List relationships on a specific page
mesheryctl relationship list --page 2

# List relationships with a custom page size
mesheryctl relationship list --pagesize 25

# Display only the total count of relationships
mesheryctl relationship list --count


mesheryctl relationship search

Description: Searches registered relationships used by different models. You can narrow down results by kind, type, subtype, and/or model name. At least one filter flag is required.

Flags:

Flag Short Default Description
--kind -k   Search relationships of a particular kind (e.g., hierarchical, edge)
--type -t   Search relationships of a particular type
--subtype -s   Search relationships of a particular subtype (e.g., parent, binding)
--model -m   Search relationships belonging to a particular model
--page -p 1 Page number of results to fetch
--help -h   Display help for the command

Example — search for hierarchical relationships:

~$ mesheryctl relationship search --kind hierarchical
Total number of relationships: 194
Page: 1
ID                                    KIND          API VERSION  MODEL NAME                    SUB TYPE   TYPE
644b97c4-7f9e-41d8-9676-deb34b873cea  hierarchical  v1.0.0       aws-apigatewayv2-controller   inventory  parent
b236f6ba-60a8-4e5e-a36d-f0c8b2fd87f4  hierarchical  v1.0.0       aws-documentdb-controller     inventory  parent
2efa3365-5e2d-4cd2-a313-408363419d4f  hierarchical  v1.0.0       aws-dynamodb-controller       inventory  parent
4b5fa9d9-80e7-44fa-a563-ad03ef590e83  hierarchical  v1.0.0       aws-ec2-controller            inventory  parent
f5303970-cbde-49f9-9878-4f13f31ec9ff  hierarchical  v1.0.0       aws-ec2-controller            inventory  parent
853279e4-c4b7-4b95-819a-4b3ec14319a4  hierarchical  v1.0.0       aws-ecs-controller            inventory  parent
eb0da592-9e2b-44de-8e4c-457f3743f2a5  hierarchical  v1.0.0       aws-efs-controller            inventory  parent
1a848bd2-14a9-4a4c-a6ff-2dda7116d1d6  hierarchical  v1.0.0       aws-eks-controller            inventory  parent
a933d19d-e447-4e13-a252-eba9451a3a6c  hierarchical  v1.0.0       aws-emrcontainers-controller  inventory  parent
4d6d9799-496a-46b6-84fa-c033f2e85b26  hierarchical  v1.0.0       aws-eventbridge-controller    inventory  parent

Additional usage examples:

# Search by subtype
mesheryctl relationship search --subtype parent

# Search by model and kind
mesheryctl relationship search --model kubernetes --kind edge

# Search with pagination
mesheryctl relationship search --type binding --page 2


mesheryctl relationship view

Description: Views the full definition of a specific relationship belonging to a given model. The command fetches the relationships registered for the model you specify, then presents an interactive selection prompt so you can pick the exact relationship you want to inspect. The output is rendered in YAML format by default, or in JSON if requested. You can also save the output to a file.

Flags:

Flag Short Default Description
--output-format -o yaml Format to display in: json or yaml
--save -s false Save the output as a JSON or YAML file
--help -h   Display help for the command

Example — view relationships of the kubernetes model:

~$ mesheryctl relationship view kubernetes
Use ↑/↓/←/→ to navigate, Ctrl+C to cancel
? Select item:
    kind: edge, EvaluationPolicy: , SubType: reference
    kind: edge, EvaluationPolicy: , SubType: firewall
  ▸ kind: edge, EvaluationPolicy: , SubType: firewall
    kind: edge, EvaluationPolicy: , SubType: mount
    kind: edge, EvaluationPolicy: , SubType: mount
↓   kind: edge, EvaluationPolicy: , SubType: mount
kubernetes example
  
id: a12b458d-221a-4559-95c9-b6e6e3f8bf6e
capabilities: null
evaluationQuery: ""
kind: edge
metadata:
    description: ""
    styles:
        primaryColor: ""
        svgColor: ""
        svgWhite: ""
    isAnnotation: false
    additionalproperties: {}
model:
    version: v1.0.0
    name: kubernetes
    displayName: Kubernetes
    id: 00000000-0000-0000-0000-000000000000
    registrant:
        kind: github
    model:
        version: v1.35.0-rc.1
modelid: 00000000-0000-0000-0000-000000000000
schemaVersion: relationships.meshery.io/v1alpha3
selectors:
    - allow:
        from:
            - id: null
              kind: StorageClass
              match:
                from:
                    - id: null
                      kind: self
                      mutatedRef:
                        - - component
                          - kind
                        - - displayName
                to:
                    - id: null
                      kind: PersistentVolumeClaim
                      mutatorRef:
                        - - component
                          - kind
                        - - configuration
                          - spec
                          - storageClassName
              match_strategy_matrix: []
              model:
                version: ""
                name: kubernetes
                displayName: ""
                id: 00000000-0000-0000-0000-000000000000
                registrant:
                    kind: github
                model:
                    version: ""
              patch: null
        to:
            - id: null
              kind: PersistentVolume
              match:
                from:
                    - id: null
                      kind: PersistentVolumeClaim
                      mutatedRef:
                        - - configuration
                          - spec
                          - storageClassName
                        - - configuration
                          - spec
                          - volumeName
                to:
                    - id: null
                      kind: self
                      mutatorRef:
                        - - displayName
                        - - configuration
                          - spec
                          - storageClassName
              match_strategy_matrix: []
              model:
                version: ""
                name: kubernetes
                displayName: ""
                id: 00000000-0000-0000-0000-000000000000
                registrant:
                    kind: github
                model:
                    version: ""
              patch: null
      deny:
        from: []
        to: []
subType: mount
status: enabled
type: binding
version: v1.0.0

  
  

Additional usage examples:

# View relationships in JSON format
mesheryctl relationship view kubernetes --output-format json

# View relationships and save the output to a file
mesheryctl relationship view kubernetes --output-format json --save


mesheryctl relationship generate

Description: Generates a relationships documentation file (JSON format) by reading data from a Google Spreadsheet. This is primarily a maintainer-facing command used to keep the Meshery documentation up to date with the latest relationship definitions. It requires a valid spreadsheet ID and base64-encoded Google API credentials.

Flags:

Flag Short Default Description
--spreadsheet-id     (Required) Google Spreadsheet ID containing relationship data
--spreadsheet-cred     (Required) Base64-encoded Google API credentials
--help -h   Display help for the command

Example — generate relationship documentation from a spreadsheet:

```shell ~$ mesheryctl relationship generate
–spreadsheet-id spreadsheet-id
–spreadsheet-cred $CRED Relationships data generated in docs/_data/RelationshipsData.json —

Conclusion

The mesheryctl relationship commands give you direct CLI access to the relationship layer of the Meshery model ecosystem. Whether you want a quick count of registered relationships, need to search for a specific kind, want to inspect a full relationship definition, or are maintaining the documentation data, there is a subcommand for every need.

As a next step, try combining search and view together: use search to find a relationship relevant to your model, then use view to inspect its full definition and save it locally for reference.

For more details on how relationships work under the hood, visit the official documentation: