The Distributed Application Runtime (Dapr) provides APIs that simplify microservice architecture development and increases developer productivity. Whether your communication pattern is service-to-service invocation or pub/sub messaging, Dapr helps you write resilient and secured microservices....
Dapr 1.17.2
This update includes security fixes, a breaking change, a new component, and bug fixes:
Go standard library vulnerabilities fixed by upgrading to Go 1.25.8
Problem
Three vulnerabilities were identified in the Go standard library used by Dapr 1.17.1 (Go 1.24.13):
- GO-2026-4603: URLs in meta content attribute actions are not escaped in
html/template, allowing potential cross-site scripting via crafted URLs.
- GO-2026-4602:
FileInfo can escape from a Root in os, potentially allowing access to files outside an intended directory boundary.
- GO-2026-4601: Incorrect parsing of IPv6 host literals in
net/url, which could lead to unexpected URL routing or SSRF in applications that parse user-supplied URLs.
Impact
Applications using html/template, os.Root-scoped file operations, or net/url URL parsing are potentially affected by these vulnerabilities. All three are fixed in Go 1.25.8.
Root Cause
The vulnerabilities are in the Go standard library and are not specific to Dapr code. They affect any Go program compiled with Go versions prior to 1.25.8.
Solution
Upgraded the Go toolchain from 1.24.13 to 1.25.8 across all modules and Docker images in the repository.
Register RavenDB state store component
Problem
The RavenDB state store component from components-contrib was not registered in the Dapr runtime, so it could not be used as a state store in Dapr applications.
Impact
Users could not use RavenDB as a state store backend with Dapr, despite the component implementation being available in components-contrib.
Root Cause
The component registration file for the RavenDB state store was missing from the Dapr runtime's component loader (cmd/daprd/components/).
Solution
Added the state_ravendb.go registration file to register the RavenDB state store component with the default state store registry. The component is available when building with the allcomponents build tag. The ravendb-go-client dependency was added to go.mod.
Workflow state retention policy CRD fields use incorrect type (Breaking Change)
Problem
The Configuration CRD defined the stateRetentionPolicy fields (anyTerminal, completed, failed, terminated) as type: integer, format: int64, but the Go API types use metav1.Duration which serializes as strings (e.g. "1s", "168h").
This mismatch caused Kubernetes to reject valid duration string values for these fields, and prevented the workflow state retention policy from being configured correctly via the Kubernetes Configuration CRD.
Impact
Users running Dapr in Kubernetes mode could not configure the workflow state retention policy using the Configuration CRD with human-readable duration strings like "1s" or "168h". Kubernetes validation rejected these values because the CRD schema expected integers.
Additionally, even if integer nanosecond values were used to bypass the CRD schema validation, the internal configuration deserializer could not correctly unmarshal the metav1.Duration string format sent by the operator, causing daprd to fail with:
Fatal error from runtime: error loading configuration: json: cannot unmarshal string into Go struct field WorkflowStateRetentionPolicy.spec.workflow.stateRetentionPolicy.anyTerminal of type time.Duration
Root Cause
The Configuration CRD YAML (charts/dapr/crds/configuration.yaml) was not regenerated after the Go API type WorkflowStateRetentionPolicy was updated to use *metav1.Duration fields.
Solution
Updated the CRD schema to use type: string for all stateRetentionPolicy fields, matching the metav1.Duration serialization format.
Added a custom UnmarshalJSON method on the internal config.WorkflowStateRetentionPolicy struct that deserializes via the configapi.WorkflowStateRetentionPolicy type (which uses *metav1.Duration), correctly handling both the Kubernetes CRD string format and the standalone YAML format.
Upgrading
This is a change that requires a CRD update. Kubernetes does not automatically update CRDs when upgrading Dapr via Helm.
You must manually update the CRDs before upgrading.
See the Kubernetes upgrade guide for detailed instructions on how to force update CRDs.
To update CRDs manually:
kubectl apply -f https://raw.githubusercontent.com/dapr/dapr/v1.17.2/charts/dapr/crds/configuration.yaml
Pub/sub messages incorrectly routed to dead-letter queue during graceful shutdown
Problem
During graceful shutdown (or hot-reload of a pub/sub component), messages arriving after the subscription began closing were immediately NACKed by Dapr.
Brokers that support dead-letter queues interpreted these NACKs as permanent delivery failures and routed the messages to the dead-letter queue, where they were never retried.
Impact
Applications using pub/sub with dead-letter queues configured could lose messages during rolling deployments, restarts, or any event that triggers graceful shutdown.
Rather than being redelivered to another healthy consumer, these messages were silently diverted to the dead-letter queue.
This affected all subscription types: declarative, programmatic (HTTP and gRPC), and streaming subscriptions.
Root Cause
When a subscription was closing, Dapr rejected new incoming messages with a "subscription is closed" error.
The pluggable pub/sub component layer translated this error into a NACK sent back to the broker.
The broker then treated the message as a permanent failure and routed it to the configured dead-letter topic.
Solution
Dapr now holds messages that arrive during subscription shutdown instead of rejecting them.
The message handler blocks until the broker connection is torn down, at which point the broker treats the message as unacknowledged and redelivers it to another available consumer.
In-flight messages that were already being processed continue to complete normally before the subscription fully closes.
Scheduler jobs with Drop failure policy may fire more than once during host reconnection
Problem
When the scheduler cluster membership changed (including during initial startup), one-shot jobs or jobs with a Drop failure policy could be triggered more than once.
Impact
Jobs configured with DueTime (one-shot) or a Drop failure policy could be delivered to the application multiple times instead of at most once.
This was more likely to occur during scheduler startup or when the scheduler cluster membership changed, as etcd can emit multiple membership events in quick succession.
Root Cause
A race condition existed between two asynchronous event loops in daprd's scheduler connection management.
The hosts loop manages gRPC client connections to the scheduler, and the connector loop manages the stream-based cluster that runs on those connections.
When the hosts loop received a second set of scheduler host addresses (e.g. from an etcd membership event during startup), it immediately closed the first set of gRPC client connections before the connector loop had a chance to gracefully stop the cluster running on those connections.
This caused active streams to break mid-flight, in-flight job triggers to be marked as undeliverable and re-staged, and jobs to fire again when new streams connected.
Solution
Moved gRPC connection lifecycle management from the hosts loop to the connector loop.
The hosts loop now passes connection close functions to the connector via the Connect event, and the connector closes old connections only after it has gracefully stopped the previous cluster.
This ensures connections are never closed while streams are still active.
Scheduler fails to start due to trailing dot in cluster domain DNS lookup
Problem
The Dapr Scheduler service fails to start in Kubernetes with a fatal error:
Fatal error running scheduler: failed to create etcd config: peer certificate does not contain the expected DNS name dapr-scheduler-server-1.dapr-scheduler-server.dapr-system.svc.cluster.local. got [dapr-scheduler-server-0.dapr-scheduler-server.dapr-system.svc.cluster.local dapr-scheduler-server-1.dapr-scheduler-server.dapr-system.svc.cluster.local dapr-scheduler-server-2.dapr-scheduler-server.dapr-system.svc.cluster.local]
Impact
The Scheduler service cannot start in any Kubernetes cluster where the DNS CNAME lookup for the cluster domain returns a fully-qualified domain name with a trailing dot (standard DNS behavior). This prevents all scheduler-based functionality including job scheduling.
Root Cause
The scheduler resolves the Kubernetes cluster domain via a DNS CNAME lookup. Per DNS convention, CNAME responses include a trailing dot (e.g. cluster.local.).
The code only stripped leading dots from the result, leaving the trailing dot intact.
This caused the etcd peer TLS server name to end with an extra dot, which did not match the certificate SANs and failed validation.
Solution
Changed strings.TrimLeft to strings.Trim to strip dots from both ends of the parsed cluster domain, ensuring the trailing dot from DNS CNAME responses is removed.
Service invocation buffers entire streaming request body in memory
Problem
When sending a request with a streaming body (chunked transfer encoding) through Dapr HTTP service invocation, the sidecar buffered the entire request body in memory before forwarding it.
For large payloadsโsuch as file uploads or long-running data streamsโthis caused excessive memory usage and potential out-of-memory crashes.
Impact
Any HTTP service invocation request without a known Content-Length (e.g. chunked uploads, streamed data, piped bodies) had its entire body buffered in memory by the sending sidecar.
This made Dapr unsuitable for streaming large payloads between services and could cause sidecar OOM kills in production.
Root Cause
The sidecar's retry mechanism unconditionally buffered the request body into memory so it could replay the body on retry.
For streaming requests, the body cannot be replayed because it is consumed as it is read, making the buffering both unnecessary and harmful.
Solution
The sidecar now detects streaming requests (those with no known content length) and skips request body buffering entirely.
Both the built-in retry logic and any user-configured resiliency retry policies are automatically bypassed for streaming requests, since retrying would require re-reading a body that has already been consumed.
Non-streaming requests with a known Content-Length continue to support retries as before.
Service invocation buffers entire streaming response body in memory
Problem
When proxying HTTP responses through service invocation, the sidecar buffered the entire response body in memory before forwarding it to the caller.
For large or unbounded streaming responses, this caused excessive memory usage and potential out-of-memory crashes.
Impact
Any service invocation response with a large or streaming body could cause sidecar OOM kills, regardless of HTTP status code.
This made Dapr unsuitable for proxying streaming responses such as server-sent events, file downloads, or long-running data streams between services.
Root Cause
The sidecar's resiliency mechanism read the full response body into memory so it could evaluate whether to retry the request.
When the request itself is a stream that has already been consumed, retries are impossible regardless of the response, making the buffering unnecessary.
Solution
For streaming requests, the sidecar now forwards response bodies directly to the caller without buffering them in memory.
Resiliency features like circuit breakers continue to track failures normally.
Non-streaming requests continue to support retries and buffered error handling as before.
Oracle Database state store BulkGet returns HTTP 500 instead of per-key errors
Problem
When using the Oracle Database state store component, a BulkGet request that encountered an error for one or more keys returned an HTTP 500 error for the entire request instead of returning per-key errors alongside successful results.
Impact
Applications using BulkGet with the Oracle Database state store could not retrieve any results if even a single key encountered an error. Instead of receiving successful results for valid keys with per-key errors for failed keys, the entire operation failed with an HTTP 500 response.
Root Cause
The BulkGet implementation in the Oracle Database state store component returned a top-level error when any individual key retrieval failed, rather than collecting the error and associating it with the specific key in the response.
Solution
Updated the BulkGet implementation to return per-key errors in the BulkGetResponse items instead of returning a top-level error. Successful key retrievals are now returned alongside any per-key errors, matching the expected state store BulkGet contract.
Pulsar pub/sub publishes invalid JSON messages when Avro schema is configured
Problem
When the Pulsar pub/sub component was configured with an Avro schema, JSON messages were published without being validated against the schema. Invalid messages that did not conform to the Avro schema were accepted and published to the topic.
Impact
Applications relying on Avro schema enforcement at the Pulsar pub/sub layer could publish malformed messages that did not conform to the expected schema. Downstream consumers expecting schema-compliant messages could encounter deserialization failures or data integrity issues.
Root Cause
The Pulsar pub/sub component did not validate JSON message payloads against the configured Avro schema before publishing. The schema was used only for consumer-side deserialization, not for producer-side validation.
Solution
Added JSON-to-Avro schema validation in the publish path. Before publishing, the component now validates JSON message payloads against the configured Avro schema and returns an error if the message does not conform, preventing invalid messages from being published to the topic.
Actor placement dissemination failures with many replicas
Problem
After upgrading to Dapr 1.17.x, deployments with many replicas (e.g. 50+) experience frequent "dissemination timeout after 8s" errors, and /placement/state showing only a fraction of expected hosts.
Impact
Actor invocations fail intermittently because most sidecars never receive a complete placement table. Rolling restarts and scaling events amplify the problem, making large actor deployments unstable.
Root Cause
Three issues combined to cause a cascading failure during dissemination:
- Stale UNLOCK version accepted: The sidecar disseminator assigned the incoming version before comparing it against the current version, so the guard
currentVersion > version always evaluated to false. Stale UNLOCK messages were incorrectly applied.
- Errors killed the disseminator permanently: When the sidecar detected a version mismatch on UPDATE or received an unknown operation, it returned a fatal error that terminated the disseminator loop entirely. The sidecar never reconnected to the placement service and remained stuck.
- Sequential dissemination rounds for concurrent connections: When many replicas connected to the placement service simultaneously while a dissemination round was in progress, each waiting connection triggered its own sequential dissemination round on completion. With N waiting replicas, this created N rounds instead of 1, causing timeouts that disconnected other sidecars and produced the cascading failure.
Solution
- Fixed the UNLOCK version guard to compare before assignment, so stale versions are correctly rejected.
- Changed version mismatch and unknown operation handling to cancel the stream and trigger a clean reconnection instead of killing the disseminator.
- Batched all connections that arrive during an active dissemination round into a single round, reducing N sequential rounds to 1.
Nil pointer dereference in conversation LangChain Go Kit LLM logger
Problem
The conversation component using the LangChain Go Kit could panic with a nil pointer dereference when the LLM logger was invoked.
Impact
Applications using the conversation API with the LangChain Go Kit-based component could experience unexpected crashes due to a nil pointer dereference, causing the Dapr sidecar to restart.
Root Cause
The LLM logger callback in the LangChain Go Kit conversation component was called with a nil pointer, and the logger did not perform a nil check before accessing the pointer.
Solution
Added a nil pointer check in the LangChain Go Kit LLM logger to prevent the dereference, ensuring the conversation component handles the case gracefully without panicking.
Workflow activities with large results fail with gRPC ResourceExhausted error
Problem
Workflow activities that return results larger than ~2MB fail with a ResourceExhausted gRPC error when scheduling the activity result reminder via the scheduler:
Error scheduling reminder job activity-result-XXXX due to: rpc error: code = ResourceExhausted desc = trying to send message larger than max (37950104 vs. 2097152)
Impact
Any workflow activity returning a result larger than the default gRPC send message size limit (~2MB) fails to deliver its result back to the parent orchestration. The orchestration hangs indefinitely waiting for the activity result, eventually timing out or stalling.
Root Cause
The scheduler gRPC client configured MaxCallRecvMsgSize to allow receiving large messages, but did not configure MaxCallSendMsgSize. This left the send-side limit at the gRPC default (~2MB). When an activity completes, its result is serialized into a reminder job request sent to the scheduler. If the activity result exceeds the default limit, the gRPC client rejects the outgoing message before it reaches the server.
Solution
Added MaxCallSendMsgSize to the scheduler gRPC client dial options, matching the existing MaxCallRecvMsgSize configuration.
Bulk publish does not apply namespace prefix to topic
Problem
When using the Bulk Publish API with a pub/sub component that has NamespaceScoped enabled, messages were published to the un-namespaced topic instead of the namespace-prefixed topic.
Impact
Applications using namespace-scoped pub/sub components with the Bulk Publish API experienced silent message loss. Bulk-published messages were routed to the wrong topic (e.g. the un-namespaced exchange), while subscribers were listening on the namespace-prefixed topic. The regular Publish API was not affected, so only bulk publish users encountered this issue.
Root Cause
The Publish method in publisher.go prepends the namespace to req.Topic when NamespaceScoped is true, but the BulkPublish method did not include this same namespace-prefixing step. This caused bulk-published messages to bypass the namespace scoping entirely.
Solution
Added the namespace prefix guard to BulkPublish in publisher.go, immediately after scope validation and before either the native BulkPublisher or defaultBulkPublisher fallback path is invoked. This ensures bulk-published messages are routed to the same namespace-prefixed topic as regular published messages.
Workflow timer reminders not deleted when external event is received before timeout
Problem
When a workflow used WaitForSingleEvent with a timeout, a timer reminder was created in the scheduler. If the external event was raised before the timer fired, the timer reminder was never deleted and remained as an orphan in the scheduler until it eventually fired unnecessarily.
Additionally, when a workflow completed while timers were still pending (e.g. a CreateTimer that had not yet fired), those timer reminders were also left behind.
Impact
Workflows using WaitForSingleEvent with timeouts accumulated orphan timer reminders in the scheduler. These timers would eventually fire and trigger unnecessary workflow actor invocations that were silently ignored, wasting scheduler and actor resources.
For long-running workflows with many WaitForSingleEvent calls or long timeouts, the number of orphan reminders could grow significantly.
Root Cause
The durable task SDK completes the event task when an external event is received, but does not signal the Dapr runtime to delete the associated timer reminder. The runtime had no mechanism to detect that a timer was no longer needed because its associated event had already been received.
Similarly, when a workflow completed, there was no cleanup of pending timer reminders that had not yet fired.
Solution
Added two timer cleanup mechanisms to the workflow orchestrator:
-
Mid-execution cleanup (deleteCancelledEventTimers): After each workflow execution step, the runtime scans the history for TimerCreated events associated with WaitForSingleEvent calls (identified by the Name field on TimerCreated). When a matching EventRaised event is found in the new events, the corresponding timer reminder is deleted from the scheduler. Event name matching is case-insensitive, and already-deleted timers (e.g. from a crash recovery) are handled gracefully by ignoring NotFound errors.
-
Completion cleanup (deleteAllReminders): When a workflow completes and has unfired timers (detected by comparing TimerCreated vs TimerFired event counts), all reminders for the workflow and its activities are bulk-deleted via DeleteByActorID. This handles timers without a Name field (e.g. CreateTimer) that cannot be matched to specific events.
Ollama conversation component missing endpoint metadata field in spec
Problem
The Ollama conversation component's metadata spec was missing the endpoint metadata field, which is required to configure the Ollama server URL.
Impact
Users configuring the Ollama conversation component could not discover the endpoint metadata field through the component spec. The field was functional in code but not declared in the component metadata spec, making it invisible to tooling and documentation that relies on the spec.
Root Cause
The endpoint metadata field was omitted from the Ollama conversation component's metadata.yaml spec file.
Solution
Added the endpoint metadata field to the Ollama conversation component spec (conversation/ollama/metadata.yaml).
Dapr CLI cannot list workflow instances when using MongoDB as workflow actor state store
Problem
The Dapr CLI dapr workflow list command failed when MongoDB was configured as the workflow actor state store.
Impact
Users using MongoDB as their workflow actor state store could not list workflow instances via the Dapr CLI. The list operation requires prefix-based key queries to enumerate workflow instances, which MongoDB did not support.
Root Cause
The MongoDB state store component did not implement the KeysLiker interface, which provides prefix-based key listing functionality. The Dapr CLI's workflow list operation depends on this interface to query workflow instance keys by prefix.
Solution
Implemented the KeysLiker interface on the MongoDB state store component, enabling the prefix-based key listing queries required by the Dapr CLI workflow list command.
LangChain Go Kit conversation component does not return error when required tool calls are not invoked
Problem
When the LangChain Go Kit conversation component received a response from the LLM that included required tool calls, but those tool calls were not actually invoked, no error was returned to the caller.
Impact
Applications using the conversation API with the LangChain Go Kit component could silently receive incomplete responses when the LLM requested tool calls that were not executed. The caller had no indication that the response was missing expected tool call results.
Root Cause
The LangChain Go Kit conversation component did not check whether tool calls flagged as required by the LLM were actually invoked during the conversation turn.
Solution
Added error handling to return an error when the LLM response includes required tool calls that were not invoked, ensuring the caller is informed of the incomplete response.
Sentry fails to sign certificates when issuer key type does not match CSR signature algorithm
Problem
Sentry fails to sign workload certificates with the error:
x509: requested SignatureAlgorithm does not match private key type
This occurs when the CSR signature algorithm does not match the issuer key type. For example, when a sidecar generates an Ed25519 CSR but the Sentry issuer key is ECDSA, or vice versa. This breaks version skew scenarios where the sidecar and control plane use different key types.
Impact
Sidecars cannot obtain workload certificates from Sentry during version skew upgrades where the sidecar and Sentry use different cryptographic key types. All mTLS-secured communication fails, preventing the sidecar from starting.
Root Cause
Sentry copied the SignatureAlgorithm from the incoming CSR onto the workload certificate template. When x509.CreateCertificate was called, Go's x509 library rejected the mismatch between the template's signature algorithm (from the CSR) and the issuer's private key type.
Solution
Removed the hardcoded SignatureAlgorithm from certificate templates and the SignRequest struct. Go's x509.CreateCertificate now infers the correct signature algorithm from the issuer's signing key, allowing Sentry to sign certificates regardless of the CSR's key type.