v7.0.0

Multimodal Expansion: Audio & Video Generation Launch

Sree
SreeAuthor

This landmark update transforms A4F into a comprehensive multimodal platform with advanced audio, video, and image capabilities, providing developers with a unified API for all creative AI tasks.

New Features

5
  • Audio Generation API: Launched /v1/audio/speech endpoint for high-quality text-to-speech conversion with multiple voice options and output formats.
  • Audio Transcription API: Introduced /v1/audio/transcriptions endpoint for robust speech-to-text conversion across multiple providers.
  • Video Generation API: Added /v1/video/generations endpoint for creating short video clips from text prompts.
  • Image Editing API: Implemented /v1/images/edits endpoint for modifying existing images based on text instructions.
  • Unified Media Storage: Refactored backend media storage to handle images and videos through unified serving endpoints.

Audio Capabilities

3
  • Text-to-Speech Models: tts-1, tts-1-hd with multiple voice options and output formats.
  • Speech-to-Text Models: whisper-1, distil-whisper-large-v3-en for accurate transcription.
  • Audio Duration Tracking: Implemented precise audio duration calculation for fair billing and usage tracking.

Video Generation

2
  • Supported Models: video-generations for creating short video content from text prompts.
  • Provider Integration: Video generation support integrated with Provider 5 for immediate availability.

Image Editing

2
  • Supported Models: image-edits for modifying existing images with text instructions.
  • Provider Integration: Image editing capabilities available through Provider 3 with multipart form data support.

Platform Enhancements

3
  • Multimodal Usage Tracking: Enhanced usage tracking system to separately track text tokens, audio tokens, and their respective costs.
  • Nested Pricing Structure: Implemented nested pricing objects for models supporting multiple modalities.
  • Enhanced Validation: Added comprehensive input validation for all new multimodal endpoints.

API Improvements

2
  • Base64 Image Support: Enhanced image URL handling to support base64 data URIs in addition to HTTP/HTTPS links.
  • Streaming Restrictions: Explicitly blocked streaming for audio-capable models to prevent unsupported API requests.