Streaming Quality Improvements — Cleaner Output, OpenAI Spec Compliance & Proxy Compatibility
This patch release focuses on streaming quality improvements across the platform. If you've experienced empty chunks, unexpected metadata in streaming responses, or buffering issues behind proxies like Nginx or Cloudflare, this update addresses all of those. All changes are fully backward compatible with no action required on your end.
Streaming Quality — Empty Chunk Elimination
Cleaner SSE Output: Streaming responses no longer include empty or garbage chunks that could cause issues in client-side parsing.
- Empty content chunks removed: Previously, some streaming responses would include chunks with empty content or empty reasoning deltas — these are now filtered out before reaching your application
- Metadata-only events suppressed: Internal metadata events that carried no user-visible content are no longer forwarded in the stream
- Consistent behavior: All providers now apply the same filtering, ensuring a uniform streaming experience regardless of which model you use
OpenAI Specification Compliance
Stricter Spec Adherence: Streaming responses now more closely follow the OpenAI SSE format specification.
-
Role on first chunk only: The
role: "assistant"field now appears only on the first chunk that carries actual content, reasoning, or tool calls — matching how OpenAI's API behaves. Previously, it was incorrectly included on every chunk. -
Per-chunk finish_reason: Each SSE chunk now carries its own
finish_reason(ornull), rather than accumulating a value from earlier chunks. This ensures correct behavior for clients that rely onfinish_reasonto detect stream completion.
These changes improve compatibility with OpenAI SDKs, client libraries, and any application that expects strict OpenAI-format streaming responses.
Proxy & CDN Compatibility
SSE Header Improvements: Proper headers have been added to streaming responses to prevent buffering issues when the API is accessed through proxies, CDNs, or reverse proxies.
Cache-Control: no-cache, no-store, must-revalidate— Prevents proxy and browser caching of streaming responsesX-Accel-Buffering: no— Disables Nginx buffering so chunks are delivered in real-timeConnection: keep-alive— Maintains persistent connections for uninterrupted streaming
If you were experiencing delayed or batched streaming responses behind Nginx, Cloudflare, or similar infrastructure, this fix should resolve those issues.
Platform Impact
- Improved Client Compatibility: Stricter OpenAI spec compliance means fewer parsing issues with OpenAI SDKs and third-party client libraries
- Real-Time Streaming: Proper SSE headers ensure chunks are delivered immediately without proxy buffering delays
- Cleaner Integrations: No more empty chunks or unexpected metadata events cluttering your streaming handlers
Important Notes
- No Breaking Changes: All improvements are internal pipeline enhancements. The client-facing SSE format remains fully OpenAI-compatible.
- No Action Required: These fixes apply automatically to all streaming requests. No code changes needed on your end.
- All Providers Affected: Streaming improvements have been applied across multiple providers for a consistent experience platform-wide.