Building a Personal Agentic AI System: A Practitioner’s Guide
Computational Genomicist, Alan Turing Institute Fellow, and someone who got tired of waiting for the future to arrive
The Problem Nobody Talks About
Everyone is discussing AI agents. Few are building them. Fewer still are building them for themselves.
The frontier of AI right now is not chatbots. It is autonomous systems that can reason, remember, and act on your behalf. Systems that persist across conversations, learn from your history, and execute multi-step tasks without constant hand-holding.
I have spent the last four months building exactly this. Not as a startup. Not with a team of engineers. As a single academic with a Mac Studio, a Telegram account, and an unwillingness to wait for OpenAI or Google to solve my problems.
This essay documents what I built, how I built it, and what I learned. It is intended for practitioners who want to build their own systems — not consume someone else’s product.
What I Actually Built
The system has three layers:
Layer 1: Corpas Core — A 336MB Vector Knowledge Base
14 ChromaDB collections running locally
11,664 Gmail messages embedded
1,045 Apple Notes embedded
209 ChatGPT conversation exports
Meeting transcripts, blog posts, career strategy documents
Voyage AI embeddings (voyage-3, 1024 dimensions)
This is my second brain. Every professional email I have sent or received in the past decade. Every note I have taken. Every conversation with AI systems. All searchable by meaning, not just keywords.
Layer 2: RoboTerri — An Autonomous Telegram Agent
This is where it gets interesting. RoboTerri is not a chatbot. It is an agent that:
Maintains 50-message conversation history
Executes 5-iteration tool loops autonomously
Processes voice memos (ffmpeg + Whisper transcription)
Analyses photos and screenshots (base64 vision encoding)
Extracts text from PDFs
Manages my Google Calendar and Outlook calendar
Creates files on my filesystem
Generates audio from text (edge-tts)
Publishes podcast episodes to RSS feeds
Queries and writes to the vector knowledge base
It runs 24/7. When I send a voice note at midnight, it transcribes, reasons about the content, queries relevant context from my knowledge base, and responds. When I ask it to schedule a meeting, it checks both my calendars, finds available slots, and creates the event.
Layer 3: Automated Pipelines
Six daily jobs run without human intervention:
ArXiv digest: surfaces relevant papers based on my research interests
Podcast digest: summarises episodes from feeds I follow
Email triage: flags priority messages requiring attention
Intelligence briefing: synthesises news relevant to my work
Meeting preparation: pulls context before scheduled calls
Knowledge base maintenance: embeds new content automatically
The system works while I sleep. This is not a metaphor.
The Technical Stack
I am going to be specific because vague architectural diagrams are useless.
Hardware:
Mac Studio M3 Ultra, 256GB RAM
All compute runs locally except API calls
Vector Database:
ChromaDB (persistent, local-first)
No cloud dependency — critical for institutional data
Embeddings:
Voyage AI voyage-3 model
1024-dimensional vectors
Separate embedding types for documents vs queries
Language Model:
Anthropic Claude (Opus for complex reasoning, Sonnet for routine tasks)
200K context window for long document processing
Agent Framework:
Custom Python (no LangChain, no LlamaIndex)
Tool definitions as JSON schemas
5-iteration maximum tool loops with early exit on completion
Interface:
Telegram Bot API (python-telegram-bot)
Async handlers for text, voice, photos, documents
Message queue for rate limiting
Calendar Integration:
Google Calendar API (OAuth2)
macOS Calendar via icalBuddy (read) and AppleScript (write)
Audio Pipeline:
Whisper (via OpenAI API) for transcription
edge-tts (Microsoft) for text-to-speech generation
ffmpeg for audio format conversion
Deployment:
launchd for daemon management on macOS
Systemd-style service definitions
Automatic restart on failure
The Embedding Pipeline: Details That Matter
Most tutorials skip the parts that actually matter. Here is what I learned:
Chunking Strategy:
I chunk documents at approximately 500 tokens with 50-token overlap. Too small and you lose context. Too large and retrieval becomes imprecise. This took experimentation.
def chunk_document(text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
words = text.split()
chunks = []
start = 0
while start < len(words):
end = start + chunk_size
chunk = ' '.join(words[start:end])
chunks.append(chunk)
start = end - overlap
return chunks
Metadata is Everything:
Raw text embeddings are not enough. Every chunk carries metadata:
metadata = {
"source": "gmail",
"date": "2025-01-15",
"from": "collaborator@university.edu",
"to": "mc@manuelcorpas.com",
"subject": "Re: Grant proposal",
"direction": "received",
"thread_id": "abc123"
}
This lets me filter queries: “What did I discuss with [person] about [topic] in [timeframe]?” Without metadata, this is impossible.
Voyage AI vs OpenAI Embeddings:
I tested both. Voyage AI’s voyage-3 model consistently returned more semantically relevant results for my domain (genomics, academic correspondence). The difference was noticeable, not marginal.
Building the Agent: What Actually Works
The agent architecture went through three iterations before I got it right.
Iteration 1: Stateless chatbot
Failed. No memory between conversations. Every interaction started from zero. Useless for anything beyond simple Q&A.
Iteration 2: RAG-only system
Better. Retrieved context from the knowledge base. But could not act on anything. Could not schedule meetings, create files, or execute multi-step tasks.
Iteration 3: Agentic system with tool use
This is what works. The key insight: give the model a set of well-defined tools and let it decide which to use.
Tool Definition Example:
{
"name": "calendar",
"description": "Manage Google Calendar and Outlook events. List, create, update, delete.",
"parameters": {
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["list", "create", "update", "delete"]
},
"summary": {"type": "string"},
"start": {"type": "string", "description": "ISO 8601 datetime"},
"end": {"type": "string", "description": "ISO 8601 datetime"},
"attendees": {"type": "array", "items": {"type": "string"}}
},
"required": ["action"]
}
}
The Tool Loop:
async def agent_loop(user_message: str, history: List[dict], max_iterations: int = 5):
messages = history + [{"role": "user", "content": user_message}]
for i in range(max_iterations):
response = await claude_api_call(messages, tools=TOOL_DEFINITIONS)
if response.stop_reason == "end_turn":
return response.content # Done, no more tools needed
if response.stop_reason == "tool_use":
tool_results = await execute_tools(response.tool_calls)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
continue
return "Maximum iterations reached"
The model decides when it has enough information to respond. This is crucial. Forcing a fixed number of tool calls leads to either wasted computation or incomplete answers.
Memory Architecture: The Hard Problem
Conversation history is trivial. Persistent memory across sessions is hard.
What I Store:
Episodic memory: Recent conversations (last 50 messages per chat)
Semantic memory: Embedded knowledge base (searchable by meaning)
Procedural memory: Learned patterns about how I work (encoded in system prompts)
The System Prompt Problem:
The system prompt defines the agent’s behaviour. Mine is 8,000 tokens. It includes:
Communication style guidelines
Domain expertise boundaries
Security rules (what never to share)
Decision frameworks for common scenarios
Examples of good and bad responses
This is not elegant. It is necessary. Without extensive system prompting, the agent makes basic errors repeatedly.
Context Window Management:
With 200K tokens available, context management seems trivial. It is not. Stuffing everything into context degrades response quality. The model gets confused.
My approach: retrieve the 10 most relevant chunks from the knowledge base, prepend them to the conversation, and let the model synthesise. Quality over quantity.
What I Got Wrong
Mistake 1: Overcomplicating the Architecture
I initially built an elaborate multi-agent system with specialised agents for different tasks. It was slow, buggy, and added no value. A single capable agent with good tools outperforms a committee of weak agents.
Mistake 2: Underestimating Latency
API calls take time. Tool execution takes time. A five-step agent loop can take 30 seconds. Users expect faster responses. I added streaming responses and progress indicators. Still not perfect.
Mistake 3: Ignoring Edge Cases
Voice memos with background noise. PDFs with malformed text. Calendar events in ambiguous timezones. Each edge case required specific handling. The 80/20 rule does not apply here. The last 20% of edge cases took 80% of the debugging time.
Mistake 4: Building Before Understanding
I should have spent more time studying existing agent frameworks before building my own. Not to use them — they are mostly too heavy — but to learn from their design decisions.
The Economics
This matters and nobody discusses it.
Monthly Costs:
Claude API: ~$50-100 (varies with usage)
Voyage AI embeddings: ~$10
OpenAI Whisper: ~$5
Total: ~$65-115/month
Time Investment:
Initial build: ~80 hours over 6 weeks
Ongoing maintenance: ~5 hours/week
ROI Calculation:
The system saves me approximately 2 hours per day on email triage, meeting preparation, and information retrieval. At an academic’s opportunity cost, this is significant.
More importantly: the system enables things I could not do before. Querying my entire professional history by meaning. Having an agent that knows my calendar, my projects, my communication patterns. This is not efficiency. It is capability expansion.
What Comes Next
The system as described is version 1.0. Here is what I am building next:
Entity Extraction and Knowledge Graphs:
Moving beyond flat embeddings to structured entity relationships. Who are my collaborators? What projects connect them? What opportunities have mentioned timelines?
-- Target schema
entities: id, type, name, aliases, first_seen, last_seen, importance_score
mentions: entity_id, source_id, context, timestamp, sentiment
relationships: entity1_id, entity2_id, relationship_type, strength, evidence
Proactive Intelligence:
The system currently responds to queries. It should surface insights without being asked. “You have not contacted [key collaborator] in 60 days.” “This arxiv paper is relevant to your grant application.” “Your calendar next week is overloaded — consider rescheduling.”
Multi-Modal Integration:
Voice is partially working. Vision is partially working. Full multi-modal reasoning — understanding a photo of a whiteboard, a voice memo, and a calendar screenshot in a single query — is the next frontier.
Local Language Models:
I am running all inference through APIs. With an M3 Ultra, I should be running smaller models locally for routine tasks. Llama, Mistral, and similar open models are approaching the capability threshold for many agent tasks.
Why This Matters Beyond My Desk
I am an academic. I built this for my own productivity. But the implications extend further.
The Big Labs Will Not Solve Your Problem:
OpenAI, Anthropic, and Google are building for millions of users. They cannot build a system that knows your email history, your calendar, your research interests, your communication patterns. That system must be built by you, for you.
Sovereignty Over Your Own AI:
Every query I make stays on my machine or goes to APIs I control. My knowledge base is mine. My agent’s behaviour is defined by my prompts. In a world of corporate AI products, this matters.
The Frontier Is Accessible:
You do not need billions in compute to build agentic AI. You need clear thinking about what you want the system to do, willingness to work through implementation details, and enough technical skill to write Python and manage APIs.
The barrier is not resources. It is deciding to build.
Practical Starting Points
If you want to build something similar, here is where to start:
Pick one data source and embed it. Your email archive is a good choice. Use ChromaDB locally. Use Voyage AI or OpenAI for embeddings.
Build a simple retrieval interface. Query your embedded data with natural language. Get this working before adding complexity.
Add one tool. Calendar management is a good choice. Well-defined actions, clear success criteria.
Iterate on the system prompt. This is where most of the tuning happens. Be specific about what you want.
Run it for a month before adding features. You will discover what matters through use.
Final Thoughts
Elon Musk says we are inside the singularity. Perhaps. What I know is this: the tools to build genuinely useful AI systems are available now, to individuals, without permission from anyone.
The question is not whether to engage with this technology. It is whether to be a consumer of products built for average use cases, or a builder of systems designed for your specific needs.
I chose to build. The system is not perfect. But it is mine, it is improving, and it is already changing how I work.
The frontier is open. The only question is whether you walk through.
Manuel Corpas is a computational genomicist, Senior Lecturer at the University of Westminster, and Fellow at the Alan Turing Institute. He writes about AI, genomics, and the intersection of both at manuelcorpas.com.


