Voice Messages, Session Transfer & OpenACP's Hidden Power Features
Most people discover OpenACP for the basics: send a text message in Telegram, get code back from Claude. But beneath the surface lies a set of power features that fundamentally change how you interact with AI coding agents. Voice input lets you describe problems naturally while walking. Session handoff lets you start a task on your phone and pick it up in your terminal. Context resume preserves your agent's memory across restarts. This guide explores the features that make OpenACP more than a simple chat bridge.
Voice Input: Talk to Your AI Agent
Sometimes typing is not the best way to communicate with an AI coding agent. You might be walking to lunch and realize you need to fix that bug before the afternoon standup. You might be reviewing code on a tablet and find it easier to explain what you want verbally. Or you might simply think more clearly when speaking aloud than when typing.
OpenACP supports voice input through Groq's speech-to-text (STT) API, which provides fast, high-accuracy transcription of voice messages. When you send a voice message in Telegram (or any supported platform), OpenACP transcribes it into text and forwards the transcription to your AI agent, just as if you had typed it out.
Setting Up Voice Input
To enable voice input, you need a Groq API key. Groq offers a generous free tier that handles casual voice usage easily:
- Sign up at console.groq.com
- Create an API key in the dashboard
- Add it to your OpenACP configuration
// ~/.openacp/config.json
{
"voice": {
"stt": {
"provider": "groq",
"apiKey": "gsk_your_groq_api_key_here"
},
"mode": "next"
}
}
Or via environment variable:
export OPENACP_GROQ_API_KEY="gsk_your_groq_api_key_here"
Once configured, simply send a voice message in your chat. OpenACP handles the rest: it receives the audio file from the platform, sends it to Groq for transcription, and forwards the resulting text to your AI agent. The transcription typically completes in under a second, even for messages up to several minutes long.
How Voice Transcription Works Under the Hood
When OpenACP receives a voice message from a platform adapter, the following pipeline executes:
// Simplified voice processing pipeline
async function processVoiceMessage(audioBuffer: Buffer, format: string) {
// 1. Convert audio to format accepted by STT provider
const normalizedAudio = await normalizeAudio(audioBuffer, format);
// 2. Send to Groq Whisper API for transcription
const transcription = await groqClient.audio.transcriptions.create({
file: normalizedAudio,
model: "whisper-large-v3",
response_format: "text",
});
// 3. Return transcribed text to be handled as a normal message
return transcription.text;
}
The transcription uses Groq's Whisper Large V3 model, which supports over 50 languages and handles accents, technical jargon, and code-related terminology surprisingly well. You can say things like "add a useEffect hook that fetches from the slash API slash users endpoint" and it will accurately transcribe the technical terms.
Text-to-Speech: Hear Your Agent's Response
Voice is not just for input. OpenACP also supports text-to-speech (TTS) through Microsoft's Edge TTS service, which converts the agent's text responses into natural-sounding audio messages sent back to your chat.
This creates a fully conversational coding experience. You send a voice message asking about a bug, and the agent responds with a voice message explaining the issue and what it did to fix it. It feels remarkably natural -- like pair programming with a colleague over a voice call, except the colleague has perfect recall of your entire codebase.
// TTS configuration
{
"voice": {
"stt": {
"provider": "groq",
"apiKey": "gsk_..."
},
"tts": {
"provider": "edge",
"voice": "en-US-AriaNeural"
},
"mode": "always"
}
}
Edge TTS is free to use and requires no API key. It provides high-quality neural voices in multiple languages. The default voice is en-US-AriaNeural, but you can configure any of the dozens of available voices for different languages and styles.
Available Voice Options
Some popular Edge TTS voices:
en-US-AriaNeural-- Clear, professional female voice (default)en-US-GuyNeural-- Natural male voiceen-GB-SoniaNeural-- British English femaleen-AU-NatashaNeural-- Australian English femalede-DE-KatjaNeural-- German femaleja-JP-NanamiNeural-- Japanese femalezh-CN-XiaoxiaoNeural-- Chinese Mandarin female
Voice Modes: Control When Voice Is Active
OpenACP provides three voice modes to control when TTS responses are sent as audio versus text:
Mode: off
Voice input is still processed (your voice messages are transcribed), but the agent always responds with text. This is the default mode if TTS is not configured.
{ "voice": { "mode": "off" } }
Mode: next
The agent responds with audio only for the next message after you send a voice message. If you send a text message, the response comes back as text. This mode is contextually aware -- it matches the modality of your input.
{ "voice": { "mode": "next" } }
This is the recommended mode for most users. It gives you voice responses when you want them (by sending a voice message) without flooding your chat with audio when you are typing normally.
Mode: always
Every response from the agent is sent as both text and audio. This mode is useful for accessibility or when you are listening to responses while doing something else (like driving, cooking, or exercising).
{ "voice": { "mode": "always" } }
Session Handoff: Move Between Terminal and Chat
One of OpenACP's most innovative features is session handoff, which lets you transfer a live agent session between your terminal and your messaging app. This is enabled through the /handoff command.
Here is the scenario: you are sitting at your desk, working with Claude Code in your terminal. You have been building a feature for the last hour, and the agent has deep context about your project, the files you have been working on, and the approach you are taking. Now you need to head to a meeting, but you want to continue directing the agent from your phone.
Instead of starting a new session in Telegram (which would have no context about what you have been working on), you type /handoff in your terminal. OpenACP generates a handoff token and displays it. You paste that token in your Telegram chat, and the session -- with all its context, conversation history, and working state -- transfers to Telegram.
# In your terminal session with Claude Code:
/handoff
# OpenACP outputs:
# Session handoff token generated.
# Send this in your Telegram/Discord/Slack chat:
# /resume abc123def456
# In Telegram, send:
/resume abc123def456
# The session transfers to Telegram with full context
How Handoff Works Technically
Session handoff works by serializing the current session state -- including the agent subprocess's context window, the conversation history, and the working directory -- into a transferable token. When the token is presented to a different adapter (Telegram, Discord, Slack), a new session is created with that serialized state, and the original session is terminated.
// Handoff flow
class Session {
async generateHandoffToken(): Promise<string> {
const state = {
conversationHistory: this.history,
workingDirectory: this.cwd,
agentType: this.agentType,
contextWindow: await this.agent.getContextSnapshot(),
createdAt: Date.now(),
expiresAt: Date.now() + 5 * 60 * 1000 // 5 minute expiry
};
const token = await encrypt(JSON.stringify(state), this.handoffSecret);
this.handoffTokens.set(token, state);
return token;
}
static async resumeFromToken(token: string, adapter: ChannelAdapter): Promise<Session> {
const state = await decrypt(token);
if (Date.now() > state.expiresAt) {
throw new Error('Handoff token expired');
}
const session = new Session(adapter, state.agentType);
await session.restoreState(state);
return session;
}
}
Handoff tokens expire after 5 minutes for security. This prevents old tokens from being used to hijack sessions. The token is also single-use -- once consumed, it cannot be used again.
Practical Handoff Scenarios
- Desk to mobile: Start a complex refactoring at your desk, then continue directing the agent from your phone during lunch
- Terminal to team chat: Work on a feature locally, then hand off to a Discord thread so your team can see the progress
- Chat to terminal: Start a quick task from Telegram, realize it needs more intensive work, and hand off to your terminal for full IDE integration
- Between team members: One developer starts a session, then hands it off to another developer who can continue with full context
Context Resume with Entire.io
Session handoff moves sessions between interfaces in real-time, but what happens when a session ends? Normally, the agent's context -- everything it knows about your conversation, the files it has read, the decisions it has made -- is lost when the session terminates.
Context resume, powered by Entire.io, solves this by persisting the agent's context across sessions. When you start a new session, the agent can pick up where the previous one left off, with full awareness of what was discussed and decided.
// Context resume configuration
{
"contextResume": {
"provider": "entire",
"apiKey": "your-entire-io-api-key"
}
}
Here is how context resume changes the workflow:
Without context resume:
- You work with the agent for an hour on a complex feature
- The session times out or you close it
- You start a new session
- The agent has no memory of the previous session
- You spend 10 minutes re-explaining the context
With context resume:
- You work with the agent for an hour on a complex feature
- The session ends
- You start a new session
- The agent loads the context from the previous session via Entire.io
- You say "continue where we left off" and the agent picks up seamlessly
What Gets Persisted
Context resume stores a structured summary of the session, including:
- Key decisions made during the session
- Files that were read, modified, or created
- The overall goal and current progress
- Technical approaches being used
- Issues encountered and how they were resolved
- Open tasks and next steps
This is not a raw dump of the conversation -- it is a structured knowledge graph that allows the agent to efficiently reconstruct its understanding of your project without needing to re-read the entire conversation history.
Session Persistence and Recovery
Beyond context resume, OpenACP has built-in session persistence that handles common failure scenarios:
Graceful Shutdown
When OpenACP receives a shutdown signal (SIGTERM, SIGINT), it gracefully terminates all active sessions. Each session's agent subprocess receives a termination signal, has a grace period to save state, and then is cleaned up. Active users receive a notification that their session was ended due to a server shutdown.
// Graceful shutdown handler
process.on('SIGTERM', async () => {
logger.info('Received SIGTERM, shutting down gracefully...');
for (const session of activeSessions.values()) {
await session.notifyUser('Server is shutting down. Your session will be saved.');
await session.saveContext(); // Save to Entire.io if configured
await session.terminate();
}
process.exit(0);
});
Crash Recovery
If OpenACP crashes unexpectedly, sessions cannot be recovered (the agent subprocesses are lost). However, if context resume is enabled, the context is periodically checkpointed during the session. When OpenACP restarts and a user sends a new message, the agent can load the most recent checkpoint and continue with minimal context loss.
Network Interruptions
If the connection to the messaging platform is temporarily lost (network issues, platform downtime), OpenACP maintains the agent session in the background. When connectivity is restored, buffered responses are delivered and the session continues normally. The session timeout is paused during connectivity loss to prevent sessions from being terminated due to network issues rather than user inactivity.
Combining Power Features: Real-World Workflows
These features become truly powerful when combined. Here are some real-world workflows that leverage multiple power features:
The Mobile-First Developer
You are on the train, reviewing a pull request on your phone. You notice a bug and send a voice message in Telegram: "Hey, there is a null pointer exception in the user service when the email field is missing. Can you add a null check and write a test for it?" The agent transcribes your voice, fixes the bug, writes the test, and responds with a voice message summarizing what it did. You approve the file writes from Telegram and the fix is committed before you arrive at the office.
The Pair Programming Handoff
Developer A starts working on a feature in their terminal with Claude Code. After an hour, they realize the CSS work would be better handled by Developer B, who is working remotely. Developer A uses /handoff to generate a token, shares it in the team's Discord channel, and Developer B resumes the session with full context about the feature, the decisions made, and the files already modified.
The Long-Running Refactoring
You are refactoring a large module over several days. Context resume keeps the agent aware of the overall refactoring plan, which files have been migrated, which ones remain, and the patterns being used. Each morning, you start a new session and the agent immediately knows where you left off yesterday. No re-explaining needed.
The Accessibility-First Workflow
A developer with a repetitive strain injury configures OpenACP with voice mode set to "always." They can dictate their coding instructions and hear back the agent's responses, dramatically reducing the amount of typing needed. The permission gate still shows visual buttons for approve/deny, providing a simple tap interface for the most critical interactions.
Configuration Reference: All Power Features
Here is a complete configuration example with all power features enabled:
{
"telegram": {
"token": "your-bot-token",
"allowedUserIds": [123456789]
},
"voice": {
"stt": {
"provider": "groq",
"apiKey": "gsk_your_groq_key"
},
"tts": {
"provider": "edge",
"voice": "en-US-AriaNeural"
},
"mode": "next"
},
"contextResume": {
"provider": "entire",
"apiKey": "your-entire-io-key"
},
"sessionTimeout": 60,
"maxConcurrentSessions": 10
}
Troubleshooting Power Features
Voice Not Working
- Verify your Groq API key is correct:
openacp doctorwill check connectivity - Ensure you are sending actual voice messages, not audio files (Telegram distinguishes between these)
- Check that the audio format is supported (OGG, MP3, WAV, M4A)
Handoff Token Not Accepted
- Tokens expire after 5 minutes -- generate a new one
- Tokens are single-use -- you cannot resume the same token twice
- Ensure the destination platform adapter is running on the same OpenACP instance
Context Resume Not Loading
- Verify your Entire.io API key and connectivity
- Check that the previous session had time to checkpoint before ending
- Start your new session with a message like "continue our previous conversation" to trigger context loading
What is Next
OpenACP's power features continue to evolve. The voice pipeline is being expanded with support for additional STT providers and real-time voice streaming. Session handoff is being enhanced with cross-machine support for team environments. Context resume is getting more granular, with the ability to selectively load specific aspects of previous sessions.
These features represent OpenACP's vision of AI-assisted development: an experience that is not limited to sitting at a desk with a keyboard, but extends wherever you are and however you prefer to communicate. Whether you are typing at your desk, talking on your phone, or handing off to a teammate, the agent meets you where you are.
Try Voice and Handoff Today
Install OpenACP and experience the power features that make AI coding truly flexible.
npm install -g @openacp/cli && openacp