Skip to main content

Vertex AI Gemini Live - Realtime API

Use Vertex AI's Gemini Live API (BidiGenerateContent) through LiteLLM's unified /realtime endpoint, which speaks the OpenAI Realtime protocol.

FeatureSupported
Proxy (/realtime)✅
Voice in / Voice out✅
Text in / Text out✅
Server VAD✅
Output transcription✅

Setup​

1. Auth​

LiteLLM uses your Google Cloud credentials (OAuth2 Bearer token), not an API key.

gcloud auth application-default login

Or set a service-account key file:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json

2. Proxy config​

model_list:
- model_name: vertex-gemini-live
litellm_params:
model: vertex_ai/gemini-2.0-flash-live-001
vertex_project: your-gcp-project-id
vertex_location: us-east4 # or any supported region, or "global"

general_settings:
master_key: sk-your-key

3. Start the proxy​

litellm --config config.yaml --port 4000

Usage​

Python (websockets)​

import asyncio
import json
import websockets

PROXY_URL = "ws://localhost:4000/realtime?model=vertex-gemini-live"
API_KEY = "sk-your-key"

async def main():
async with websockets.connect(
PROXY_URL,
additional_headers={"api-key": API_KEY},
) as ws:
# Wait for session.created
event = json.loads(await ws.recv())
print(f"session.created: {event['session']['id']}")

# Send a text message
await ws.send(json.dumps({
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Say hello in one sentence."}],
},
}))

# Collect the response
async for raw in ws:
ev = json.loads(raw)
t = ev.get("type", "")
if t == "response.text.delta":
print(ev.get("delta", ""), end="", flush=True)
elif t == "response.done":
print("\n[done]")
break

asyncio.run(main())

Node.js​

const WebSocket = require("ws");

const ws = new WebSocket(
"ws://localhost:4000/realtime?model=vertex-gemini-live",
{ headers: { "api-key": "sk-your-key" } }
);

ws.on("open", () => {
ws.send(JSON.stringify({
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [{ type: "input_text", text: "Say hello." }],
},
}));
});

ws.on("message", (data) => {
const ev = JSON.parse(data);
if (ev.type === "response.text.delta") process.stdout.write(ev.delta);
if (ev.type === "response.done") ws.close();
});

OpenAI SDK (Python)​

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
base_url="http://localhost:4000",
api_key="sk-your-key",
)

async def main():
async with client.beta.realtime.connect(
model="vertex-gemini-live"
) as conn:
await conn.session.update(session={"modalities": ["text"]})

await conn.conversation.item.create(
item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Say hello."}],
}
)

async for event in conn:
if event.type == "response.text.delta":
print(event.delta, end="", flush=True)
elif event.type == "response.done":
print()
break

asyncio.run(main())

Voice in / Voice out​

For a complete voice example see voice_realtime_test.py.

Key settings for audio:

  • Microphone input: 16 kHz PCM16 (audio/pcm;rate=16000)
  • Speaker output: 24 kHz PCM16 (Vertex AI returns audio at 24 kHz)
  • Server VAD is enabled by default with 800 ms silence threshold
# session.update with server VAD — the proxy ignores this for Vertex AI
# because VAD is already configured in the initial setup message.
await ws.send(json.dumps({
"type": "session.update",
"session": {
"modalities": ["audio"],
"turn_detection": {"type": "server_vad", "silence_duration_ms": 800},
},
}))

Supported OpenAI Realtime Events​

Client → Proxy (→ Vertex AI)

OpenAI eventNotes
input_audio_buffer.appendForwarded as realtime_input.audio
conversation.item.createForwarded as realtime_input.text
session.updateSilently ignored — Vertex AI does not support mid-session reconfiguration
response.createSilently ignored — Vertex AI responds automatically after each turn

Vertex AI → Proxy (→ Client)

OpenAI event emittedVertex AI source
session.createdSynthesized after setupComplete
response.text.deltaserverContent.modelTurn.parts[].text
response.audio.deltaserverContent.modelTurn.parts[].inlineData
response.audio_transcript.deltaserverContent.outputTranscription.text
conversation.item.input_audio_transcription.completedserverContent.inputTranscription.text
response.doneserverContent.turnComplete

Limitations​

  • session.update is not forwarded (Vertex AI only accepts one setup message per connection).
  • Tool calling / function calling is not yet supported.
  • Audio transcription requires outputAudioTranscription: {} to be set in the initial setup (done automatically by LiteLLM).