← Back to Blog
Voice Messages, Session Transfer and Power Features
11 min read voice session-transfer features

Voice Messages, Session Transfer & OpenACP's Hidden Power Features

Most people discover OpenACP for the basics: send a text message in Telegram, get code back from Claude. But beneath the surface lies a set of power features that fundamentally change how you interact with AI coding agents. Voice input lets you describe problems naturally while walking. Session handoff lets you start a task on your phone and pick it up in your terminal. Context resume preserves your agent's memory across restarts. This guide explores the features that make OpenACP more than a simple chat bridge.

Voice Input: Talk to Your AI Agent

Sometimes typing is not the best way to communicate with an AI coding agent. You might be walking to lunch and realize you need to fix that bug before the afternoon standup. You might be reviewing code on a tablet and find it easier to explain what you want verbally. Or you might simply think more clearly when speaking aloud than when typing.

OpenACP supports voice input through Groq's speech-to-text (STT) API, which provides fast, high-accuracy transcription of voice messages. When you send a voice message in Telegram (or any supported platform), OpenACP transcribes it into text and forwards the transcription to your AI agent, just as if you had typed it out.

Setting Up Voice Input

To enable voice input, you need a Groq API key. Groq offers a generous free tier that handles casual voice usage easily:

  1. Sign up at console.groq.com
  2. Create an API key in the dashboard
  3. Add it to your OpenACP configuration
// ~/.openacp/config.json
{
  "voice": {
    "stt": {
      "provider": "groq",
      "apiKey": "gsk_your_groq_api_key_here"
    },
    "mode": "next"
  }
}

Or via environment variable:

export OPENACP_GROQ_API_KEY="gsk_your_groq_api_key_here"

Once configured, simply send a voice message in your chat. OpenACP handles the rest: it receives the audio file from the platform, sends it to Groq for transcription, and forwards the resulting text to your AI agent. The transcription typically completes in under a second, even for messages up to several minutes long.

How Voice Transcription Works Under the Hood

When OpenACP receives a voice message from a platform adapter, the following pipeline executes:

// Simplified voice processing pipeline
async function processVoiceMessage(audioBuffer: Buffer, format: string) {
  // 1. Convert audio to format accepted by STT provider
  const normalizedAudio = await normalizeAudio(audioBuffer, format);

  // 2. Send to Groq Whisper API for transcription
  const transcription = await groqClient.audio.transcriptions.create({
    file: normalizedAudio,
    model: "whisper-large-v3",
    response_format: "text",
  });

  // 3. Return transcribed text to be handled as a normal message
  return transcription.text;
}

The transcription uses Groq's Whisper Large V3 model, which supports over 50 languages and handles accents, technical jargon, and code-related terminology surprisingly well. You can say things like "add a useEffect hook that fetches from the slash API slash users endpoint" and it will accurately transcribe the technical terms.

Text-to-Speech: Hear Your Agent's Response

Voice is not just for input. OpenACP also supports text-to-speech (TTS) through Microsoft's Edge TTS service, which converts the agent's text responses into natural-sounding audio messages sent back to your chat.

This creates a fully conversational coding experience. You send a voice message asking about a bug, and the agent responds with a voice message explaining the issue and what it did to fix it. It feels remarkably natural -- like pair programming with a colleague over a voice call, except the colleague has perfect recall of your entire codebase.

// TTS configuration
{
  "voice": {
    "stt": {
      "provider": "groq",
      "apiKey": "gsk_..."
    },
    "tts": {
      "provider": "edge",
      "voice": "en-US-AriaNeural"
    },
    "mode": "always"
  }
}

Edge TTS is free to use and requires no API key. It provides high-quality neural voices in multiple languages. The default voice is en-US-AriaNeural, but you can configure any of the dozens of available voices for different languages and styles.

Available Voice Options

Some popular Edge TTS voices:

Voice Modes: Control When Voice Is Active

OpenACP provides three voice modes to control when TTS responses are sent as audio versus text:

Mode: off

Voice input is still processed (your voice messages are transcribed), but the agent always responds with text. This is the default mode if TTS is not configured.

{ "voice": { "mode": "off" } }

Mode: next

The agent responds with audio only for the next message after you send a voice message. If you send a text message, the response comes back as text. This mode is contextually aware -- it matches the modality of your input.

{ "voice": { "mode": "next" } }

This is the recommended mode for most users. It gives you voice responses when you want them (by sending a voice message) without flooding your chat with audio when you are typing normally.

Mode: always

Every response from the agent is sent as both text and audio. This mode is useful for accessibility or when you are listening to responses while doing something else (like driving, cooking, or exercising).

{ "voice": { "mode": "always" } }
Tip: In "always" mode, the text version of the response is still sent alongside the audio. This ensures you can always scroll back and read code snippets that might be hard to follow by audio alone.

Session Handoff: Move Between Terminal and Chat

One of OpenACP's most innovative features is session handoff, which lets you transfer a live agent session between your terminal and your messaging app. This is enabled through the /handoff command.

Here is the scenario: you are sitting at your desk, working with Claude Code in your terminal. You have been building a feature for the last hour, and the agent has deep context about your project, the files you have been working on, and the approach you are taking. Now you need to head to a meeting, but you want to continue directing the agent from your phone.

Instead of starting a new session in Telegram (which would have no context about what you have been working on), you type /handoff in your terminal. OpenACP generates a handoff token and displays it. You paste that token in your Telegram chat, and the session -- with all its context, conversation history, and working state -- transfers to Telegram.

# In your terminal session with Claude Code:
/handoff

# OpenACP outputs:
# Session handoff token generated.
# Send this in your Telegram/Discord/Slack chat:
# /resume abc123def456

# In Telegram, send:
/resume abc123def456

# The session transfers to Telegram with full context

How Handoff Works Technically

Session handoff works by serializing the current session state -- including the agent subprocess's context window, the conversation history, and the working directory -- into a transferable token. When the token is presented to a different adapter (Telegram, Discord, Slack), a new session is created with that serialized state, and the original session is terminated.

// Handoff flow
class Session {
  async generateHandoffToken(): Promise<string> {
    const state = {
      conversationHistory: this.history,
      workingDirectory: this.cwd,
      agentType: this.agentType,
      contextWindow: await this.agent.getContextSnapshot(),
      createdAt: Date.now(),
      expiresAt: Date.now() + 5 * 60 * 1000  // 5 minute expiry
    };

    const token = await encrypt(JSON.stringify(state), this.handoffSecret);
    this.handoffTokens.set(token, state);
    return token;
  }

  static async resumeFromToken(token: string, adapter: ChannelAdapter): Promise<Session> {
    const state = await decrypt(token);
    if (Date.now() > state.expiresAt) {
      throw new Error('Handoff token expired');
    }

    const session = new Session(adapter, state.agentType);
    await session.restoreState(state);
    return session;
  }
}

Handoff tokens expire after 5 minutes for security. This prevents old tokens from being used to hijack sessions. The token is also single-use -- once consumed, it cannot be used again.

Practical Handoff Scenarios

Context Resume with Entire.io

Session handoff moves sessions between interfaces in real-time, but what happens when a session ends? Normally, the agent's context -- everything it knows about your conversation, the files it has read, the decisions it has made -- is lost when the session terminates.

Context resume, powered by Entire.io, solves this by persisting the agent's context across sessions. When you start a new session, the agent can pick up where the previous one left off, with full awareness of what was discussed and decided.

// Context resume configuration
{
  "contextResume": {
    "provider": "entire",
    "apiKey": "your-entire-io-api-key"
  }
}

Here is how context resume changes the workflow:

Without context resume:

  1. You work with the agent for an hour on a complex feature
  2. The session times out or you close it
  3. You start a new session
  4. The agent has no memory of the previous session
  5. You spend 10 minutes re-explaining the context

With context resume:

  1. You work with the agent for an hour on a complex feature
  2. The session ends
  3. You start a new session
  4. The agent loads the context from the previous session via Entire.io
  5. You say "continue where we left off" and the agent picks up seamlessly

What Gets Persisted

Context resume stores a structured summary of the session, including:

This is not a raw dump of the conversation -- it is a structured knowledge graph that allows the agent to efficiently reconstruct its understanding of your project without needing to re-read the entire conversation history.

Session Persistence and Recovery

Beyond context resume, OpenACP has built-in session persistence that handles common failure scenarios:

Graceful Shutdown

When OpenACP receives a shutdown signal (SIGTERM, SIGINT), it gracefully terminates all active sessions. Each session's agent subprocess receives a termination signal, has a grace period to save state, and then is cleaned up. Active users receive a notification that their session was ended due to a server shutdown.

// Graceful shutdown handler
process.on('SIGTERM', async () => {
  logger.info('Received SIGTERM, shutting down gracefully...');

  for (const session of activeSessions.values()) {
    await session.notifyUser('Server is shutting down. Your session will be saved.');
    await session.saveContext();  // Save to Entire.io if configured
    await session.terminate();
  }

  process.exit(0);
});

Crash Recovery

If OpenACP crashes unexpectedly, sessions cannot be recovered (the agent subprocesses are lost). However, if context resume is enabled, the context is periodically checkpointed during the session. When OpenACP restarts and a user sends a new message, the agent can load the most recent checkpoint and continue with minimal context loss.

Network Interruptions

If the connection to the messaging platform is temporarily lost (network issues, platform downtime), OpenACP maintains the agent session in the background. When connectivity is restored, buffered responses are delivered and the session continues normally. The session timeout is paused during connectivity loss to prevent sessions from being terminated due to network issues rather than user inactivity.

Combining Power Features: Real-World Workflows

These features become truly powerful when combined. Here are some real-world workflows that leverage multiple power features:

The Mobile-First Developer

You are on the train, reviewing a pull request on your phone. You notice a bug and send a voice message in Telegram: "Hey, there is a null pointer exception in the user service when the email field is missing. Can you add a null check and write a test for it?" The agent transcribes your voice, fixes the bug, writes the test, and responds with a voice message summarizing what it did. You approve the file writes from Telegram and the fix is committed before you arrive at the office.

The Pair Programming Handoff

Developer A starts working on a feature in their terminal with Claude Code. After an hour, they realize the CSS work would be better handled by Developer B, who is working remotely. Developer A uses /handoff to generate a token, shares it in the team's Discord channel, and Developer B resumes the session with full context about the feature, the decisions made, and the files already modified.

The Long-Running Refactoring

You are refactoring a large module over several days. Context resume keeps the agent aware of the overall refactoring plan, which files have been migrated, which ones remain, and the patterns being used. Each morning, you start a new session and the agent immediately knows where you left off yesterday. No re-explaining needed.

The Accessibility-First Workflow

A developer with a repetitive strain injury configures OpenACP with voice mode set to "always." They can dictate their coding instructions and hear back the agent's responses, dramatically reducing the amount of typing needed. The permission gate still shows visual buttons for approve/deny, providing a simple tap interface for the most critical interactions.

Configuration Reference: All Power Features

Here is a complete configuration example with all power features enabled:

{
  "telegram": {
    "token": "your-bot-token",
    "allowedUserIds": [123456789]
  },
  "voice": {
    "stt": {
      "provider": "groq",
      "apiKey": "gsk_your_groq_key"
    },
    "tts": {
      "provider": "edge",
      "voice": "en-US-AriaNeural"
    },
    "mode": "next"
  },
  "contextResume": {
    "provider": "entire",
    "apiKey": "your-entire-io-key"
  },
  "sessionTimeout": 60,
  "maxConcurrentSessions": 10
}

Troubleshooting Power Features

Voice Not Working

Handoff Token Not Accepted

Context Resume Not Loading

What is Next

OpenACP's power features continue to evolve. The voice pipeline is being expanded with support for additional STT providers and real-time voice streaming. Session handoff is being enhanced with cross-machine support for team environments. Context resume is getting more granular, with the ability to selectively load specific aspects of previous sessions.

These features represent OpenACP's vision of AI-assisted development: an experience that is not limited to sitting at a desk with a keyboard, but extends wherever you are and however you prefer to communicate. Whether you are typing at your desk, talking on your phone, or handing off to a teammate, the agent meets you where you are.

Try Voice and Handoff Today

Install OpenACP and experience the power features that make AI coding truly flexible.

npm install -g @openacp/cli && openacp