v7.0.0•
Multimodal Expansion: Audio & Video Generation Launch
SreeAuthor
This landmark update transforms A4F into a comprehensive multimodal platform with advanced audio, video, and image capabilities, providing developers with a unified API for all creative AI tasks.
New Features
5
- Audio Generation API: Launched
/v1/audio/speechendpoint for high-quality text-to-speech conversion with multiple voice options and output formats. - Audio Transcription API: Introduced
/v1/audio/transcriptionsendpoint for robust speech-to-text conversion across multiple providers. - Video Generation API: Added
/v1/video/generationsendpoint for creating short video clips from text prompts. - Image Editing API: Implemented
/v1/images/editsendpoint for modifying existing images based on text instructions. - Unified Media Storage: Refactored backend media storage to handle images and videos through unified serving endpoints.
Audio Capabilities
3
- Text-to-Speech Models:
tts-1,tts-1-hdwith multiple voice options and output formats. - Speech-to-Text Models:
whisper-1,distil-whisper-large-v3-enfor accurate transcription. - Audio Duration Tracking: Implemented precise audio duration calculation for fair billing and usage tracking.
Video Generation
2
- Supported Models:
video-generationsfor creating short video content from text prompts. - Provider Integration: Video generation support integrated with Provider 5 for immediate availability.
Image Editing
2
- Supported Models:
image-editsfor modifying existing images with text instructions. - Provider Integration: Image editing capabilities available through Provider 3 with multipart form data support.
Platform Enhancements
3
- Multimodal Usage Tracking: Enhanced usage tracking system to separately track text tokens, audio tokens, and their respective costs.
- Nested Pricing Structure: Implemented nested pricing objects for models supporting multiple modalities.
- Enhanced Validation: Added comprehensive input validation for all new multimodal endpoints.
API Improvements
2
- Base64 Image Support: Enhanced image URL handling to support base64 data URIs in addition to HTTP/HTTPS links.
- Streaming Restrictions: Explicitly blocked streaming for audio-capable models to prevent unsupported API requests.