We actually started exploring MCP (Model Context Protocol) way back in May, long before our major refactor in November. Looking back, our understanding of it has evolved drastically.
Table of contents
Open Table of contents
Phase 1: The “RPC” Misunderstanding (May)
In the beginning, we treated MCP just like a fancy RPC (Remote Procedure Call). We weren’t even using frameworks like LangGraph yet.
Our workflow was crude:
- Ask LLM to generate parameters based on a schema.
- Manual Code: Catch the output, parse JSON.
- Manual Code: Invoke the MCP tool (using a simple client wrapper).
- Manual Code: Get the result and feed it back to the LLM.
It worked, but it defeated the purpose. We were writing glue code for every single tool. It felt like we were just adding an extra layer of complexity over a standard HTTP API.
# The "Old Way" (May) - Manual Glue
response = llm.invoke(prompt)
if "call_tool" in response:
# We were manually parsing and invoking...
tool_name = parse_tool_name(response)
args = parse_args(response)
# Treating MCP client just like a requests.post() wrapper
result = mcp_client.call_tool(tool_name, args)
# Manually constructing the next prompt
next_prompt = f"Tool result: {result}. Now continue."
Phase 2: Understanding the Transport (Stdio vs. SSE)
As we dug deeper (I highly recommend reading the technical deep-dive This is MCP), we started to understand the underlying mechanics.
Initially, I was confused by the npx and python commands in the MCP configuration.
“Why do I need to run a command locally?”
It turns out, the default mode uses Stdio (Standard Input/Output). The host process (Agent) spawns the tool server as a subprocess and talks to it via stdin/stdout.
The Stdio Limitation
This explains why we couldn’t scale initially. Stdio is great for local development (like running an Agent on your laptop that talks to a local SQLite DB), but it fails in a server environment:
- No Concurrency: One process, one connection.
- Resource Heavy: Spawning a new Python process for every user request is a disaster.
Switching to HTTP SSE
We eventually standardized on HTTP SSE (Server-Sent Events) for all our internal MCP servers. This allows a single long-running server to handle thousands of concurrent Agent connections, stateless and efficient.
Phase 3: The Long-Running Task Challenge (November)
By November, we had migrated to LangGraph. We wanted to fully leverage create_react_agent and let the model drive the interaction natively.
But we hit a UX wall: Progress Reporting.
When an Agent calls a tool like start_collection_job (e.g., crawling 100 posts from a social media platform), it might take several minutes.
- Initial thought: Can MCP push “intermediate logs” back to the LLM via the stream?
- Reality: No. In the standard
Agent Loop, the LLM is “frozen” waiting for the tool to return the final result. Intermediate text pushed over MCP isn’t seen by the LLM until the tool finishes, and it doesn’t trigger the next step of the agent.
The Solution: Background Jobs + Frontend Polling
We realized that for long tasks—especially data collection and scraping—the MCP tool shouldn’t “do” the work synchronously. It should “dispatch” the work to a specialized crawler service.
- Agent: Calls
start_collection_job(platform="xhs", keyword="AI"). - MCP Tool: Immediately returns
{"job_id": "job_99", "status": "processing"}. - Agent: Receives this instantly and tells the user: “I’ve started the collection task. You can see the progress in the panel.”
- Frontend: Sees the
job_idin the tool output and starts polling our background worker API to show a real-time progress bar (e.g., “45/100 items collected”).
This split architecture allows us to keep the Agent loop responsive while handling heavy lifting asynchronously.
Conclusion
The journey with MCP was ultimately a journey of understanding protocol mechanics.
- Phase 1: We mistook it for an RPC library and wrote unnecessary wrapper code.
- Phase 2: We learned that
stdio(the default) creates a 1-to-1 process lock, which kills concurrency. Switching toSSE(Server-Sent Events) was the turning point for building scalable services. - Phase 3: We realized that for long-running tasks, the protocol should only be used for dispatching (Fire-and-Forget), not for waiting.