Persona-Based AI Agents: An Architecture for Extensible Autonomous Systems

ai-agentsarchitecturesystems-design

Most AI agent architectures are monolithic. The system prompt encodes domain knowledge, tool definitions are hardcoded, and adding a new capability means modifying the core agent. This works fine for single-purpose agents, but it falls apart the moment you need the same infrastructure to serve multiple domains like trading analysis, job searching, and prediction markets, each with different workflows, authentication contexts, and operational cadences.

This post describes the architecture behind x-lens, a personal AI agent I built that solves this through a persona-based skill system. The key design constraint: adding a new domain should require zero code changes. Just a directory and some markdown files.

The Problem

The initial idea was simple: build a trading agent. It would analyze open interest, screen for breakout patterns, monitor market regimes, and send notifications. The system prompt was around 2000 tokens of trading-specific instructions. The browser stayed logged into my brokerage. Sessions persisted so the agent remembered yesterday's analysis. A single-purpose tool that did one thing well.

Then I thought: why not use the same infrastructure to search LinkedIn for hiring posts? That's when things broke. The trading agent's browser profile had brokerage cookies, but LinkedIn needed its own cookies, and sharing a profile meant auth state collisions. The trading instructions in the system prompt were irrelevant for job searching, and a combined prompt would confuse the model while wasting context window. The agent's conversation history was full of market analysis, so resuming that session to search LinkedIn made no sense. And the scheduling was completely different: trading jobs ran on market hours while LinkedIn scans needed their own cadence.

The naive solution of adding if/else branches in the system prompt, maintaining separate browser launch configs, and managing sessions manually doesn't scale. Every new domain adds complexity to the core.

The Design

The solution separates the agent into three layers with clear boundaries.

Loading diagram...

Tools are the agent's fixed capabilities, the same for every persona: navigate a browser, run shell commands, make HTTP requests, read and write persistent memory, manage scheduled jobs.

Skills are domain-specific workflow instructions that tell the agent how to use its tools for a particular task. They specify which URLs to visit, which DOM selectors to use, how to score results, and what to do when something fails. Skills live in two tiers. Global skills sit in skills/global/ and are loaded for every persona. Persona-specific skills sit in skills/<persona>/ and are only loaded when that persona is active. At startup, the skill loader walks both directories and registers everything it finds:

export function loadSkills(projectRoot: string, persona?: string): Skill[] {
  const skillMap = new Map<string, Skill>();
 
  const globalDir = join(projectRoot, "skills", "global");
  for (const skill of loadSkillsFromDir(globalDir)) {
    skillMap.set(skill.name, skill);
  }
 
  if (persona) {
    const personaDir = join(projectRoot, "skills", persona);
    for (const skill of loadSkillsFromDir(personaDir)) {
      skillMap.set(skill.name, skill);
    }
  }
 
  return Array.from(skillMap.values());
}

Persona-specific skills override globals on name collision, which means you can customize shared behavior per domain without touching the global skill.

Personas are the top-level isolation boundary. Each persona determines which skills load, which browser profile provides cookies, which session file to resume, and what system prompt shapes the agent's identity.

The critical property is that layers only depend downward. Personas depend on skills. Skills depend on tools. Tools depend on nothing. You can add a persona without touching skills or tools. You can add a skill without touching tools.

Skills as Domain Knowledge

The most important design decision was making skills markdown files that the LLM reads as instructions, not code that gets executed. This is counterintuitive. Why not use structured configs or Python scripts? Three reasons.

LLMs are instruction followers, not API consumers

A skill is a set of steps the agent follows, with decision points and error recovery. This maps directly to what LLMs are good at: reading natural language instructions and executing them using available tools. A structured format like JSON schema or YAML config would need a custom runtime to interpret it. Markdown needs nothing because the LLM is the runtime.

Each skill file has YAML frontmatter for metadata (name, description, trigger keywords) and a markdown body containing the actual instructions. The frontmatter is parsed at startup so the system knows which skill to load for a given user request. The body is only loaded when the skill is actually invoked, keeping the base context compact.

Progressive disclosure manages context

A skill has two layers: the main skill.md with high-level steps, and a references/ directory with detailed supporting documents. The agent reads the skill file first and only pulls in reference docs when it needs them.

For example, the LinkedIn search skill's main file says "Run JavaScript via browser_evaluate to extract posts, see references/extraction-selectors.md for the script." The agent loads that reference file only when it reaches the extraction step. A skill like linkedin-search has an 80-line skill.md at the top level and a references/ directory containing a 160-line extraction selectors file. This mirrors how human engineers work: you read the architecture doc first, then dive into the implementation details when you need them.

Expected outputs close the feedback loop

The biggest reliability improvement came from adding expected output declarations after each step:

  1. Navigate to https://www.linkedin.com/feed/
  2. Take a screenshot Expected: LinkedIn feed with posts, navigation bar showing "Home", "My Network" If login page shown: load the linkedin-login skill

Without these, the agent optimistically assumes every action succeeded. With them, it screenshots the result, compares against the expected state, and follows the recovery path when things don't match. This act-verify-recover pattern turns a fragile script into a resilient workflow.

Persona Isolation

Each persona operates in a fully isolated context across five dimensions.

Loading diagram...

Browser profiles are the most important isolation boundary. When the daemon spins up a persona, it constructs a dedicated Chromium profile path:

const profileDir = join(homedir(), ".x-lens", "browser-data", persona);
const browser = new BrowserController({ headless: true, profileDir });

The trader persona's profile has brokerage cookies. The job finder persona's profile has LinkedIn cookies. They never interfere. You only need to handle interactive authentication (2FA, CAPTCHAs) once per persona because subsequent headless runs reuse the saved session.

Conversation history is stored per persona at ~/.x-lens/sessions/<persona>.jsonl. When the daemon restarts, each persona resumes its own context. The trader agent remembers yesterday's market regime classification. The job finder agent remembers which searches it already ran.

System prompts are per persona. The trader gets instructions about market regimes, risk management, and options analysis. The job finder gets instructions about LinkedIn navigation and post relevance scoring. Each prompt is focused and token efficient because it only contains what that persona needs.

Scheduled jobs auto-tag with the current persona. When the agent calls schedule_create, the system fills in the persona from the runtime context so jobs are always dispatched to the correct agent instance:

const input = {
  ...params,
  persona: params.persona || persona || "trader",
} as CreateJobInput;

This isolation model means personas are truly independent. You can run the trader daemon on market hours and the job finder daemon on a completely different schedule without any coordination between them.

Skill Orchestration

Skills can call other skills, which creates composable workflows without tight coupling.

Loading diagram...

Each skill in the chain is independently useful. The post-extraction skill works on any LinkedIn page. The post-ranking skill works on any set of posts regardless of how they were collected. The linkedin-login skill can be invoked standalone. The orchestrating skill (linkedin-search) just chains them together.

The orchestration mechanism is simple. The skill body literally says:

Load the post-ranking skill via skill_read to score and save results.

The agent reads the next skill file and follows its instructions. No routing logic, no message passing, no framework. The LLM's instruction-following capability is the orchestration engine.

The Extension Model

Adding a new persona follows a predictable pattern:

Step 1: mkdir skills/<persona>/<skill-name>

Step 2: Write skill.md with frontmatter (name, description, triggers),
        instructions with expected outputs, and references for
        implementation details

Step 3: x-lens --persona <persona>

No code changes. No config files. No deployment. The skill loader picks up everything from the filesystem at startup. Since global skills are loaded for every persona, a new persona automatically has access to all of them. The only skills you need to write are the ones unique to its domain.

Design Tradeoffs

This architecture makes explicit tradeoffs worth naming.

Flexibility over type safety. Skills are unstructured markdown. There's no schema validation, no compile-time checks, no IDE autocomplete. A skill with a typo in a CSS selector will fail at runtime. The tradeoff is worth it because skills change frequently (LinkedIn changes their DOM regularly) and the iteration cycle of editing markdown and rerunning is faster than editing code, compiling, and deploying.

Convention over configuration. The system assumes skills/<persona>/<skill-name>/skill.md. No config file maps personas to skill directories. No registry declares available skills. This is simpler but means you can't share a skill between two specific personas without putting it in global/. In practice this hasn't been a limitation because skills tend to be either universal or domain-specific.

LLM as runtime over deterministic execution. The agent interprets skill instructions, which means execution is probabilistic. The same skill might take slightly different paths on different runs. Expected output declarations mitigate this by adding verification checkpoints, but it's fundamentally different from a deterministic script. The upside is adaptability: the agent can handle situations the skill author didn't anticipate. The downside is occasional drift that needs to be debugged through conversation logs.

Isolation over sharing. Personas don't share browser state, sessions, or context. This prevents interference but means you can't easily build cross-persona workflows like "find a job posting on LinkedIn, then email me about it" spanning the job finder and a hypothetical email persona. In practice this is handled by having global skills that any persona can use, or by running sequential commands with different personas.

Operational Patterns

Two patterns emerged from running this system in production.

Self-healing selectors. LinkedIn changes their DOM class names frequently, breaking extraction scripts. The skill includes a troubleshooting section that tells the agent: if extraction returns zero posts but the screenshot shows posts are visible, inspect the DOM, find the new selectors, update the reference file, and save working selectors to memory. The agent maintains its own playbooks.

Agent-managed scheduling. Instead of hardcoding cron jobs, the agent has tools to create, delete, and list its own schedules. This lets it adapt: increase monitoring frequency when it detects interesting activity, or pause a scan that's been returning empty results. The schedule is an output of the agent's reasoning, not an input to it.

Why This Matters

The persona-based architecture is not specific to x-lens. It's a general pattern for building extensible AI agents.

The first principle is separating identity from capability from implementation. Persona defines who. Skills define what. Tools define how. The second is making domain knowledge data, not code: skills are files on disk, not functions in a codebase. The third is isolating execution contexts so that auth state, conversation history, and scheduling are all per-domain. The fourth is letting the LLM be the runtime instead of building a custom interpreter for skill files.

The result is a system where the cost of adding a new domain is writing markdown, not writing code. Every new persona is a directory. Every new workflow is a file. The core agent doesn't know or care about any specific domain. It just follows instructions and uses tools. That's the right separation of concerns for systems where the domain logic changes faster than the infrastructure.

If any of this resonates or you want to dig deeper into the implementation details, reach out. I'm always up for a conversation about agent architectures and what's actually working in production.