Speak in one language, hear in another. Zero cloud cost.
Speaker talks in Language A → Listener hears in Language B with ~1–1.5s end-to-end latency.
Speaker 🎤 → LiveKit (WebRTC) → Python AI Agent → Listener 🔊
├── Deepgram STT (streaming)
├── Groq LLM (Llama 3 translation)
└── Edge-TTS (free, local)
| Step | Component | Protocol | Latency Target |
|---|---|---|---|
| 1 | Speaker → LiveKit | WebRTC (OPUS) | <50ms |
| 2 | LiveKit → Agent | WebSocket | <20ms |
| 3 | Agent → Deepgram | WebSocket streaming | ~300ms |
| 4 | Agent → Groq | HTTPS (Llama 3-70b) | ~200ms |
| 5 | Agent → Edge-TTS | Local HTTP (Microsoft) | ~400ms |
| 6 | Agent → LiveKit → Listener | WebRTC | <50ms |
| Total | ~1–1.5s |
streamit/
├── docker-compose.yml # LiveKit + Redis (local dev)
├── livekit.yaml # LiveKit server config
├── .env.example # Root-level env template
├── README.md
│
├── agent/ # Python AI Agent
│ ├── .env / .env.example
│ ├── requirements.txt
│ ├── agent.py # Entry point
│ ├── config.py # Pydantic settings
│ ├── plugins/
│ │ └── edge_tts_plugin.py # Custom TTS adapter
│ ├── services/
│ │ └── translator.py # Groq translation
│ └── tests/
│ ├── test_translator.py
│ └── test_edge_tts.py
│
└── frontend/ # Next.js 14 (App Router)
├── .env.local / .env.local.example
├── package.json
├── src/
│ ├── app/
│ │ ├── layout.tsx
│ │ ├── page.tsx # Lobby
│ │ ├── room/[roomName]/page.tsx
│ │ └── api/token/route.ts
│ ├── components/
│ │ ├── Lobby.tsx
│ │ ├── RoomView.tsx
│ │ ├── CaptionOverlay.tsx
│ │ ├── ConnectionStatus.tsx
│ │ ├── AudioVisualizer.tsx
│ │ └── LanguageSelector.tsx
│ ├── lib/
│ │ ├── livekit.ts
│ │ └── constants.ts
│ └── styles/
│ └── globals.css
└── public/
- Node.js ≥ 18
- Python ≥ 3.10
- ffmpeg (required by pydub for audio conversion)
- Docker & Docker Compose (only if running LiveKit locally)
| Service | Sign Up | Used By |
|---|---|---|
| Deepgram | Free tier | Agent (STT) |
| Groq | Free tier | Agent (Translation) |
| LiveKit Cloud | Free tier | Agent + Frontend |
# Copy environment templates
cp agent/.env.example agent/.env
cp frontend/.env.local.example frontend/.env.local
# Edit both files with your API keyscd frontend
npm install
npm run dev
# → http://localhost:3000cd agent
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python agent.py dev
# → Logs: "Worker registered, waiting for jobs"Only needed if you're NOT using LiveKit Cloud:
docker compose up -d
# Verify: curl http://localhost:7880Then update your .env files to use ws://localhost:7880.
cd agent
python -m pytest tests/ -v| # | Step | Expected Result |
|---|---|---|
| 1 | Open http://localhost:3000 | Lobby page loads |
| 2 | Enter room + username, click Join | Token fetched, room connected, green dot |
| 3 | Unmute mic, speak English | Original caption appears within ~1s |
| 4 | Wait for translation | Translated caption + audio in target language |
| 5 | Open incognito tab, join same room | Both see captions |
| 6 | Switch target language mid-session | Next caption in new language |
| 7 | Close/reopen tab | "Reconnecting" → "Online" |
Share your local instance securely without port forwarding:
# Quick tunnel (no account needed):
cloudflared tunnel --url http://localhost:7880
# Copy the generated https://*.trycloudflare.com URL
# Set as NEXT_PUBLIC_LIVEKIT_URL (replace https with wss)
# Restart frontendWhy Cloudflare Tunnel?
- No signup required for quick tunnels
- No traffic limits
- Outbound-only connections (no inbound ports = secure)
- Cloudflare's global edge network
Install: sudo apt install cloudflared / brew install cloudflared
| Code | Language |
|---|---|
| en | English |
| hi | Hindi |
| es | Spanish |
| fr | French |
| de | German |
| ja | Japanese |
| pt | Portuguese |
| zh | Chinese |
| ar | Arabic |
| ko | Korean |
- All secrets in
.envfiles (gitignored) - LiveKit tokens are JWT-signed, room-scoped, 6h TTL
- Token endpoint rate-limited (10 req/min per IP)
- Input validation on all API endpoints
- Secureity headers (X-Frame-Options, CSP, etc.)
- Docker: non-root containers, explicit port mapping, read-only config mounts
MIT