Introduction
In today’s cloud-native landscape, creating a conversational AI is easy, but building a production-ready system is a different challenge. It requires more than just a clever prompt; it requires a robust, scalable architecture that can handle real-world "messy" human conversation.
In this post, I’ll walk you through Clinic-AI Receptionist, an automated system I built to handle patient bookings in Hindi. By combining the power of Google Gemini, Firestore, Twilio(WhatsApp), and Google Cloud Run, I’ve created a bot that doesn't just "chat"—it manages state, checks availability, and automates scheduling without a single human click.
How it Works: From 'नमस्ते' to a Confirmed Booking
The system is designed to turn unstructured natural language into structured calendar events. Here is the high-level flow:
- The Entry Point (Twilio): A patient sends a message on WhatsApp. Twilio acts as our secure gateway, converting that message into an HTTP webhook and forwarding it to our backend.
- The Processing Core (Cloud Run): Our logic lives on Google Cloud Run. Because this is a stateless environment, the first thing the code does is fetch the patient's history from Firestore. This allows the bot to "remember" who it is talking to.
- The Intelligence (Gemini AI): The message and the history are sent to Gemini. The AI handles the heavy lifting: understanding the patient's intent in Hindi, extracting dates/times, and calculating relative dates (like "tomorrow").
- The Validation (Google Calendar): Once the AI extracts a potential appointment time, the system automatically checks the clinic's Google Calendar for conflicts. If the slot is free, it locks it in.
- The Loopback: The bot sends a natural Hindi confirmation back to the patient, completing the booking in seconds.
Why This Architecture Matters
As a Principal Engineer, my goal wasn't just to make it work, but to make it resilient:
- Scalable: Every component is serverless. It costs nearly zero when idle but can scale to thousands of patients instantly.
- Context-Aware: By using Firestore as a "brain," the bot can handle multi-turn conversations (e.g., asking for the name in the first message and the time in the second).
- Multi-Lingual: Gemini allows us to support Hindi natively without complex translation layers.
The Workflow: A Multi-Turn Conversation
The bot follows a precise logic loop for every incoming message:
- Identify User: Extract phone number from the Twilio payload.
- Load Context: Pull the existing JSON session from Firestore.
- AI Inference: Send the prompt to Gemini including "Today's Date" and "Current Session" so it can calculate relative dates like "tomorrow."
- Update State: If the user provides new info (like their name), Gemini updates the JSON object.
- Persistence: Save the updated JSON back to Firestore and send the Hindi reply to the user.
The Problem: The "Stateless" Challenge
Cloud Run is fantastic for scaling, but its stateless nature means memory is volatile. Using a local Map() for user sessions works in development but fails in production when multiple container instances are spinning up and down. To solve this, I migrated the session management to a managed NoSQL layer(fire-store).
Working Example: From WhatsApp to Calendar
The real power of this architecture is its ability to handle "human" input. A user doesn't need to follow a rigid menu; they can just talk.
Step 1: The WhatsApp Interaction
As seen below, the bot maintains context. When the user says "11 am morning," the bot knows it refers to the "Deepak" and "20th April" mentioned in the previous message.
Whatapp number for appointment: +1- 415 523 8886
Step 2: Automated Scheduling
Once the required fields (Name, Date, Time) are filled in the Firestore document, the system triggers the Google Calendar API to secure the slot.





























