Overview
The Weights API provides powerful voice synthesis capabilities through RVC technology. You can train custom voice models on your own audio samples and create voice covers that sound like specific singers or speakers.Prerequisites
- A Weights API account with an API key
- Audio files for training (1-50 files, 10+ seconds each)
- Audio files for covers (uploaded to web-accessible URLs)
Step 1: Set Up Your Environment
First, install the Weights SDK and set up your authentication:Step 2: Prepare Your Training Audio
Before training an RVC model, you need to prepare and upload your training audio files.Audio Requirements
- Quantity: 1-50 audio files
- Duration: Each file should be 10+ seconds long
- Quality: Clear, high-quality audio with minimal background noise
- Format: Common audio formats (MP3, WAV, FLAC)
- Content: Consistent voice/speaker across all files
- Accessibility: Must be accessible via HTTP/HTTPS URLs
Example Audio Preparation
Step 3: Create Your RVC Model
Start the training process by creating an RVC model:Request Parameters
- name (required): Name for your RVC model
- audioFiles (required): Array of audio file objects with URLs, names, and lengths
- description (optional): Description of the model
- runKaraoke (optional): Extract vocals from music (default: false)
- runDeEchoDeReverb (optional): Remove echo and reverb (default: false)
Step 4: Monitor Training Progress
RVC training is asynchronous and can take several hours. Poll the status to track progress:Training Statuses
- QUEUED: Model is waiting in the training queue
- PENDING_WORKER: Model is assigned to a training worker
- PROCESSING: Model is being trained
- SUCCEEDED: Training completed successfully
- ERRORED: Training failed
- CANCELED: Training was canceled
Step 5: Retrieve Your Trained Model
Once training is complete, you can retrieve your model details:Step 6: Create Voice Covers
Once you have a trained model (or use a public one), you can create voice covers:Cover Parameters
- rvcModelId (required): ID of the RVC model to use
- inputUrl (required): URL of the audio file to convert
- inputFileName (optional): Name for the input file
- pitch (optional): Pitch shift in semitones (default: 0)
- preStemmed (optional): Whether input is already vocal-only (default: false)
- stemOnly (optional): Return only vocals (default: false)
- deEcho (optional): Remove echo from output (default: false)
- isolateMainVocals (optional): Focus on main vocals (default: false)
Step 7: Monitor Cover Generation
Cover generation is asynchronous. Poll the status to track progress:Step 8: List and Manage Your Models
View all your RVC models and training jobs:Advanced Features
Search Public Models
You can search and use public RVC models created by other users:Upload Existing Models
You can also upload existing RVC models:Download Trained Models
For advanced users, you can download your trained RVC model files:Best Practices
Audio Selection for Training
- Quality: Use clear, high-quality audio recordings
- Consistency: All files should feature the same voice/speaker
- Variety: Include different pitches, emotions, and speaking styles
- Length: Each file should be 10+ seconds for best results
- Clean Audio: Minimize background noise and music
Cover Creation Tips
- Input Quality: Use high-quality source audio for covers
- Pitch Adjustment: Experiment with pitch shifts for different effects
- Vocal Isolation: Use
isolateMainVocals
for better results with music - Echo Removal: Enable
deEcho
for cleaner output
Model Management
Complete Example
Here’s a complete example that trains an RVC model and creates a cover:Use Cases
Voice Covers
- Music Covers: Create covers of popular songs in different voices
- Character Voices: Apply character voices to existing content
- Language Learning: Practice pronunciation with native speaker voices
Voice Synthesis
- Content Creation: Generate voiceovers for videos and podcasts
- Accessibility: Create audio versions of text content
- Personalization: Customize voice assistants and applications
Research and Development
- Voice Cloning: Research applications for voice synthesis
- Audio Processing: Experiment with different audio processing techniques
- Model Training: Develop and test custom voice models
Performance Considerations
- Training Time: RVC training typically takes 2-8 hours
- Cover Generation: Cover creation takes 5-15 minutes
- Queue Position: Jobs are processed in order
- Audio Quality: Higher quality input produces better results
Next Steps
- Explore Song Generation to create original music
- Learn about Image Generation for visual content
- Check out Video Generation for multimedia projects