A newer version of the Gradio SDK is available:
6.1.0
title: VOICE SEMENTLE
emoji: ๐๏ธ
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 6.0.0
app_file: client/app.py
pinned: false
tags:
- mcp-in-action-track-creative
๐๏ธ Voice Sementle
Daily voice puzzle game โ guess the meme, song, or movie quote, but you have to SAY IT RIGHT!
It's not just what you say, it's how you say it. Your pitch, rhythm, energy, and pronunciation all matter.
๐๏ธ New puzzle every day โข ๐ญ 3 genres (memes, songs, movies) โข ๐ง AI hints that get smarter
๐ Submission Info
| Track | MCP in Action โ Creative |
| MCP Used | VoiceKit MCP |
| LLM | Google Gemini 2.5 Flash |
| Voice AI | ElevenLabs (Voice Cloning + TTS) |
| Framework | Gradio 6.0 |
๐ข Social Post: View on LinkedIn ๐ข Social Post: View on X ๐ฌ Demo Video: Watch (1-5 min) ๐ฅ Team: @LisaVLee, @SabaPivot, @daheepk, @tchoi911, @Lucian25
โ Track 2 Requirements
| Requirement | How We Fulfill It |
|---|---|
| Autonomous Agent | Two agents: MCP Advisor (voice analysis) + Chatbot (text + audio hints) |
| MCP as Tools | VoiceKit MCP (voicekit_analyze_voice_similarity) for voice analysis |
| Gradio App | Built with Gradio 6.0 |
| Tool Calling | Chatbot autonomously calls generate_audio_hint โ ElevenLabs TTS |
๐ฎ How It Works
1. ๐ฏ Daily puzzle loads (meme / song / movie quote)
2. ๐ค You record your voice guess
3. ๐ MCP analyzes: pitch, rhythm, energy, pronunciation, transcript
4. ๐ง Gemini agent generates progressive hints (vague โ specific)
5. ๐ Ask for audio hint โ Agent calls ElevenLabs TTS with voice cloning
6. ๐ Score > 85 = WIN!
๐ค Agentic Architecture (Two Agents)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ VOICE SEMENTLE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AGENT 1: MCP Advisor โ โ AGENT 2: Chatbot + Tools โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ โ โ
โ ๐ค User Voice โ โ ๐ฌ User Chat โ
โ โ โ โ โ โ
โ โผ โ โ โผ โ
โ โโโโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโ โ
โ โ VoiceKit MCP โ โ โ โ Gemini 2.5 โ โ
โ โ (SSE Server) โ โ โ โ Flash โ โ
โ โโโโโโโโโฌโโโโโโโโ โ โ โโโโโโโโโฌโโโโโโโโ โ
โ โ โ โ โ โ
โ โผ โ โ โโโโโโโดโโโโโโ โ
โ 6 Voice Scores โ โ โผ โผ โ
โ (pitch, rhythm, โ โ Text Tool Call โ
โ energy, etc.) โ โ Response (autonomous) โ
โ โ โ โ โ โ
โ โผ โ โ โผ โ
โ โโโโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Gemini 2.5 โ โ โ โ generate_audio_hint โ โ
โ โ Flash โ โ โ โโโโโโโโโโโโโฌโโโโโโโโโโโโ โ
โ โโโโโโโโโฌโโโโโโโโ โ โ โ โ
โ โ โ โ โผ โ
โ โผ โ โ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Progressive Advice โ โ โ ElevenLabs โ โ
โ (based on attempt #) โ โ โ IVC + TTS Engine โ โ
โ โ โ โโโโโโโโโโโโโฌโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โ
โ โผ โ
โ ๐ Audio Hint โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Agent 1: MCP Advisor
Analyzes voice via VoiceKit MCP and generates advice with Gemini 2.5 Flash.
- Connects to MCP server (
voicekit_analyze_voice_similarity) - Returns 6 scores: pitch, rhythm, energy, pronunciation, transcript, overall
- Gemini 2.5 Flash generates progressive advice based on scores & attempt count
Progressive Advice Strategy:
- Attempt 1: Extremely vague (no category revealed)
- Attempt 2: Vague hint + category mentioned
- Attempts 3-4: More specific context
- Attempts 5-6: Quite specific (era, usage)
- Attempts 7-10: Very specific (syllables, first letter, rhymes)
- Attempt 11+: Pronunciation coaching mode
Agent 2: Chatbot (with Tool Calling)
Conversational chatbot powered by Gemini 2.5 Flash that provides text hints AND can autonomously call tools.
- Answers user questions about the game
- Provides additional hints on request
- Tool calling: Autonomously decides to call
generate_audio_hintโ ElevenLabs TTS
๐ Audio Hints with ElevenLabs
The agent has access to generate_audio_hint and autonomously decides when to use it:
# User: "Can I hear how it sounds?"
# Agent decides to call tool:
generate_audio_hint(hint_type="syllable")
โ Clone voice from reference audio (ElevenLabs IVC)
โ Generate TTS with eleven_multilingual_v2
โ Return audio to user
ElevenLabs Features Used:
- ๐ญ Instant Voice Cloning (IVC) โ Clone voice from reference audio
- ๐ฃ๏ธ eleven_multilingual_v2 โ High-quality multilingual TTS
- ๐ Voice Library โ Consistent character voices for hints
๐ ๏ธ Tech Stack
| Component | Technology |
|---|---|
| Frontend | Gradio 6.0 |
| Voice Analysis | VoiceKit MCP (SSE) |
| LLM Agent | Google Gemini 2.5 Flash |
| Audio Hints | ElevenLabs IVC + TTS |
| Database | PostgreSQL |
๐ Scoring (6 Metrics)
| Metric | What It Measures |
|---|---|
| ๐ต Pitch | Tone accuracy |
| ๐ฅ Rhythm | Timing & cadence |
| โก Energy | Intensity level |
| ๐ฃ๏ธ Pronunciation | Clarity |
| ๐ Transcript | Correct words (STT) |
| ๐ Overall | Combined (>85 = win) |
๐ฏ Why Voice Sementle?
| Judging Criteria | Our Approach |
|---|---|
| UI/UX | Polished Gradio 6 interface, intuitive game flow |
| Functionality | MCP + Gemini Agentic chatbot + ElevenLabs Tool calling |
| Creativity | First voice-based guessing game with performance scoring |
| Documentation | Clear README, architecture diagrams |
| Real-world Impact | Fun consumer app; language learning potential |
๐ฎ Try It Now!
๐ Click the interface above to start playing!
- Allow microphone access
- Record your voice guess
- Get scored on pitch, rhythm, energy & pronunciation
- Ask for hints or audio examples
- Keep trying until you win!
Built for MCP's 1st Birthday Hackathon ๐
Celebrating one year of Model Context Protocol!