This tutorial showcases a real-time AI agent built with the NVIDIA NeMo Agent Toolkit, powered by NVIDIA Nemotron models, controlling a Reachy Mini Robot. The agent uses an intelligent LLM router to dynamically route between:
- Nemotron nano text for text-based interactions
- Nemotron nano VLM (Vision Language Model) for visual understanding
- REACT agent for tool-based actions
The system consists of three main components running in parallel:
- Reachy Mini Daemon - Controls the robot hardware (or simulation)
- Bot Service - Processes vision and speech, coordinates robot actions
- NeMo Agent Service - Handles AI agent logic with intelligent routing between models
- Python 3.10+
- uv package manager
- NVIDIA API Key (for Nemotron models)
- ElevenLabs API Key (for text-to-speech)
cd /path/to/ces-tutorialCreate a .env file in the main directory with your API keys:
NVIDIA_API_KEY=your_nvidia_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_hereIn a terminal window:
cd bot
uv venv
uv syncIn a separate terminal window:
cd nat
uv venv
uv syncYou'll need three terminal windows running simultaneously.
Navigate to the bot directory and start the robot daemon:
For macOS:
cd bot
uv run mjpython -m reachy_mini.daemon.app.main --sim --no-localhost-onlyFor Linux:
cd bot
uv run -m reachy_mini.daemon.app.main --sim --no-localhost-onlyNote: The --sim flag runs the robot in simulation mode. Remove it if using actual hardware.
In the bot directory:
cd bot
uv run --env-file ../.env python main.pyThis service handles:
- Vision processing through the robot's camera
- Speech recognition and text-to-speech
- Robot movement coordination
- Emotional expression through dance moves
In the nat directory:
cd nat
uv run --env-file ../.env nat serve --config_file src/ces_tutorial/config.yml --port 8001This launches the NeMo Agent Toolkit server with intelligent model routing capabilities.
- Vision & Audio Input: The bot captures visual information and listens for speech
- Agent Processing: The NeMo Agent router intelligently selects the appropriate model:
- Text queries → Nemotron nano text model
- Visual queries → Nemotron nano VLM
- Action requests → REACT agent with tool calling
- Robot Actions: Based on the agent's response, the bot executes movements, expressions, or speaks
Check out ces_tutorial.mp4 to see the system in action!
ces-tutorial/
├── bot/ # Robot control and vision/speech processing
│ ├── main.py # Main bot orchestration
│ ├── nat_vision_llm.py # Vision and LLM integration
│ └── services/ # Robot services (moves, speech, etc.)
├── nat/ # NeMo Agent Toolkit configuration
│ └── src/ces_tutorial/
│ ├── config.yml # Agent configuration
│ └── functions/ # Router and agent implementations
└── .env # API keys (create this file)
- Port conflicts: Ensure port 8001 is available for the NeMo Agent service
- API key errors: Verify your
.envfile is properly formatted and contains valid keys - Robot connection issues: Check that the Reachy daemon started successfully before launching the bot service

