SAM Audio: AI Audio Separation
Instantly separate any sound from any audio or video with simple text, visual clicks, or time-span prompts.
SAM AUDIO CAPABILITIES
SAM Audio separates target and residual sounds from any audio or audiovisual source—across general sound, music, and speech.
Text prompts
SAM Audio enables you to use text-based prompts to describe the specific target audio they want to separate.
Visual prompts
SAM Audio lets you pick out and separate sounds by clicking on the part of the video where you hear them.
Span prompts
SAM Audio is the first model to introduce span prompting, selecting the desired point in the timespan that contains the target audio.
What is SAM Audio?
SAM Audio is a next-generation AI Engine that instantly segments audio using text, visual, or time-span prompts.
SAM Audio is Meta's multimodal foundation model (2025) for segmenting any describable sound. It takes text, visual clicks, or time-span anchors and outputs both the target sound and the residual background.
Who is SAM Audio for?
Empowering Creators Across Industries
Whether you are a musician, editor, or content creator, this AI tool streamlines your workflow.
For Musicians
Isolate vocals for remixes, extract drum stems for practice, or separate instruments to study arrangements.
For Video Editors
Clean up dialogue by removing background noise, or extract specific sound effects impacting your narrative.
For Content Creators
Create karaoke tracks, remix popular memes, or repurpose existing audio content effortlessly.
Capabilities
What Makes These Features Revolutionary
First unified model for open-domain audio separation with multimodal prompts
Universal Audio Separation
Unlike traditional tools limited to specific categories (vocals, drums, bass), the audio separator can separate ANY describable sound - from dog barks to saxophone solos to crowd chatter.
Audio-Visual Grounding (PE-AV)
Powered by PE-AV encoder trained on 100M+ videos, it understands the connection between what you see and what you hear - solving the cocktail party problem with visual cues.
Target + Residual Output
Outputs both the isolated target sound AND the remaining background audio. The two combine to perfectly reconstruct the original - no audio information is lost.
State-of-the-Art Performance
Achieves best results across SAM Audio-Bench covering general sounds, speech, music, and instruments. Outperforms specialized models in their own domains.
How it works
How to Separate Audio Online with AI
Follow these simple steps to isolate vocals, remove noise, or extract instruments using our next-generation AI.
Step 1: Upload Your File
Simply upload your audio or video file in any of the supported formats like MP3, WAV, FLAC, or MP4. As a browser-based tool, there is no installation required and no need for complex pre-processing.
Step 2: Choose Your Prompt
Select your method: type a Text Prompt like "Drums" to isolate percussion, or use Visual Prompting to click objects in a video. The model intelligently identifies and separates the target sound.
Step 3: AI Processing
Experience lightning-fast cloud rendering. The AI separates layers in seconds without consuming your local computer resources, ensuring a smooth experience on any device.
Step 4: Download Stems
Get a high-quality export of your isolated stems. Download the vocal, instrument, or residual tracks in WAV format, ready to direct import into your DAW for professional production.
FAQ
Frequently Asked Questions about SAM Audio
Experience the Future of Audio AI
SAM Audio represents a paradigm shift from signal-based filtering to semantic-based generation. Try Meta's revolutionary foundation model that separates any sound using natural prompts.
