Audio Mode — Turn-Based Translation with Voice Playback
March 2026
Live Translate Live now has a second way to translate conversations: audio mode. Instead of the continuous scrolling marquee, audio mode is a turn-based translation system — you speak, see your words transcribed on screen, tap a button, and hear the translation spoken aloud by an AI voice. Then hand the phone to the other person for their turn.
How It Works
Audio mode follows a simple cycle:
- Speak — press and hold the push-to-talk button, or use always-listening mode, and talk naturally in your language
- Review — your speech appears as text on screen with dynamic font sizing that adjusts to fit the display
- Edit if needed — tap the transcript to fix any words the speech recognition got wrong
- Translate — tap the "Translate into [Language]" button to send the text for translation
- Listen — the translated text is read aloud by an AI voice so the other person hears the translation spoken in their language
- Replay or clear — replay the audio if needed, or clear the screen and start a new turn
Each turn is self-contained. You control when the microphone listens, when the translation happens, and when to move on. There is no background processing between turns.
Hold Your Phone and Talk
The scrolling marquee mode works best when a device is laid flat on a table between two speakers — both people read their side of the screen simultaneously. That is great for sit-down conversations, but it does not work as well when you are standing, walking, or moving around.
Audio mode is designed for handheld use. Hold your phone normally, speak into it, and tap translate. The other person hears the translation spoken aloud — no need to read a screen. Hand the phone over for their turn. This makes audio mode practical in situations where laying a device on a table is not an option: standing in a market, walking through a hospital, or talking at a service counter.
Save Credits in Noisy Environments
In the live marquee mode, the speech recognition engine runs continuously while your session is active. Background noise in a busy restaurant, street, or airport is processed the same as real speech — and credits are consumed the entire time, whether anyone is speaking or not.
Audio mode works differently. Speech recognition only runs while you are actively speaking, and translation only happens when you tap the button. In a noisy restaurant where the marquee mode might burn through credits for an entire hour-long dinner, audio mode only uses credits for the sentences you actually translate. If you exchange 30 short phrases over dinner instead of running continuous recognition for 60 minutes, the difference in cost can be significant.
Audio mode uses per-character billing for translation and text-to-speech rather than time-based billing. You pay for the text you translate and the audio that gets generated — nothing more.
Audio Playback — No Screen Reading Required
The translated text is read aloud by an AI-generated voice. The other person does not need to look at the screen at all — they just listen. This makes audio mode useful in situations where reading a screen is impractical:
- Visually impaired users — translations are spoken, not just displayed
- While driving — a passenger can use audio mode without the driver taking their eyes off the road
- Hands-busy situations — cooking, carrying luggage, working with tools, or holding a child
- Low-light conditions — no need to squint at a screen in a dark restaurant or theater
After the audio plays, you can tap replay to hear it again. The translated text also appears on screen as a fallback if the other person prefers to read.
Inline Editing
Speech recognition is accurate but not perfect. Proper nouns, technical terms, and accented speech can occasionally produce errors. In the scrolling marquee mode, those errors get translated immediately because the process is continuous — there is no chance to correct them before translation.
Audio mode gives you a review step. After you speak, your transcript appears on screen and you can tap to edit it. Fix a misspelled name, correct a number, or rephrase a sentence before it gets translated. This means the translation is based on exactly what you intended to say, not on what the speech recognition guessed.
Push-to-Talk and Always-Listening Modes
Audio mode supports two input methods:
- Push-to-talk — hold the microphone button while speaking, release when done. Best for noisy environments where you want precise control over when the microphone is active.
- Always-listening — the microphone stays on and captures speech continuously until you stop it. More convenient in quiet settings where you do not want to hold a button.
Both modes feed into the same review-edit-translate cycle. The transcript builds on screen as you speak, and you translate when ready.
When to Use Audio Mode vs. the Scrolling Marquee
Both modes are available in the same app. Here is a quick comparison:
- Scrolling marquee — best for sit-down conversations where both people can see a shared screen, continuous flow with no pauses, and time-based billing
- Audio mode — best for handheld use, noisy environments, situations where audio playback is more practical than reading, and per-use billing
You can switch between modes at any time without losing your session or conversation history.
Supported Languages for Voice Playback
Audio mode's AI voice playback supports 32 languages: English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Vietnamese, Norwegian, and Hungarian.
Languages outside this list are still available in the scrolling marquee mode, which displays translations as text without voice playback.
Try Audio Mode
Sign in to Live Translate Live, select your languages, and switch to audio mode. Speak, review, translate, and listen — all from your phone in your hand. No flat table required.
Start translating · View pricing · See all features