I built a Senior Editor agent into my portfolio stack to handle metadata, SEO, and publishing. Here is the architecture, the model benchmarking, and the real cost per post after six months in production.

Six months ago, I got tired of the friction between writing and publishing. I would draft a post in one tool, move it to a CMS, write SEO metadata in another tab, generate a cover image somewhere else, and manually schedule social posts. The overhead of publishing often killed the motivation to write.
So I built a "Senior Editor" agent. It lives inside my portfolio stack, fetches drafts from my database, generates metadata, optimizes for SEO, validates its own work, and publishes. It does not replace me. It removes the parts I hate.
This is how it works.
I run a Laravel 12 API on the backend and a Next.js 15 frontend. The agent is not a separate service. It is a pipeline of queued jobs inside Laravel, triggered by a Filament admin panel or a scheduled cron. AI calls go through OpenRouter, which lets me benchmark and swap models without touching client code.
graph LR
A[Filament Admin / Cron] --> B[Laravel 12 API]
B --> C[OpenRouter AI]
B --> D[Database]
D --> E[Next.js 15 Frontend]
C --> F[Cost Tracker]
B --> G[Queue Workers]The frontend is static where possible. The API handles auth, content CRUD, and the agent pipeline. Everything else is pre-rendered or cached at the edge.
Continue Reading
The pipeline is five steps. Each step is a Laravel job that can fail independently, retry, or escalate to me.
The pipeline starts when a post is marked ready_for_review. The first job pulls the raw markdown, word count, and any manual tags the author added.
class FetchDraftJob implements ShouldQueue
{
public function handle(Post $post): void
{
$post->agentRun()->create([
'status' => 'running',
'step' => 'fetch_draft',
'payload' => [
'word_count' => str_word_count(strip_tags($post->content)),
'has_manual_tags' => $post->tags()->exists(),
],
]);
GenerateMetadataJob::dispatch($post);
}
}This is where the first AI call happens. I send the raw markdown to a fast model with a structured output schema. The model returns a JSON object with a proposed title, excerpt, reading time estimate, and topic tags.
I do not ask the model to write the content. The human already did that. I ask it to understand the content and label it.
SEO gets its own step because it requires a different model and a different prompt. I have learned that cheap models write decent summaries but terrible SEO titles. So I route this task to a stronger model.
The SEO job generates:
BlogPosting)The agent validates its own output before it touches the live database. It checks character limits, ensures the primary keyword appears in the first 100 words, and verifies that the generated tags actually exist in the taxonomy. If validation fails, the job retries with a stricter prompt. After three failures, it stops and notifies me.
If validation passes, the agent writes the metadata back to the post record and transitions the status to published or scheduled, depending on the author's preference. A webhook then invalidates the Next.js cache and rebuilds the static page.
Not every task needs GPT-4o. I benchmarked several models on real posts and tracked accuracy, latency, and cost.
| Task | Best Model | Avg Cost | Notes |
|---|---|---|---|
| Tag generation | Mistral 7B | $0.0003 | Fast, good taxonomy fit |
| Content summary | Claude 3.5 Haiku | $0.0012 | Concise, no fluff |
| SEO title | GPT-4o mini | $0.0021 | Respects char limits |
| Meta description | GPT-4o mini | $0.0018 | Consistent formatting |
| Full SEO pass | Claude 3.5 Sonnet | $0.0085 | Best when context matters |
I default to cheaper models and escalate to expensive ones only when validation fails. Over six months, the average cost per post is $0.014. The most expensive post ever cost $0.047 because I ran multiple SEO variants and A/B tested them manually.
The difference between a useless agent and a useful one is not the model. It is the prompt.
I keep system prompts short and role-based. The model is not a "helpful assistant." It is a "senior technical editor who writes metadata for a software engineering blog."
const SYSTEM_PROMPT = `You are a senior technical editor.
You write metadata for software engineering blog posts.
Rules:
- Never use buzzwords.
- Never use passive voice.
- Respect character limits exactly.
- Output valid JSON only.`;For SEO titles, I include two examples in every prompt: one good, one bad. This cuts formatting errors by about 60 percent.
const FEW_SHOT_EXAMPLES = `
Example 1 (good):
Input: "A post about React server components"
Output: {"meta_title":"React Server Components: A Practical Guide"}
Example 2 (bad):
Input: "A post about React server components"
Output: {"meta_title":"Unlocking the Power of React Server Components"}
Reason: "Unlocking the Power of" is fluff. Do not do this.
`;I use Zod schemas on the TypeScript side and validate AI output before it hits PHP. If the JSON is malformed or a field is missing, the job fails fast.
const SeoMetadataSchema = z.object({
meta_title: z.string().max(60),
meta_description: z.string().max(160),
primary_keyword: z.string().min(1),
secondary_keywords: z.array(z.string()).max(5),
});Every AI call is logged. I store the model name, token count, and cost in a separate table. This lets me query spend per post, per model, and per month.
class AiCall extends Model
{
protected $fillable = [
'agent_run_id',
'model',
'input_tokens',
'output_tokens',
'cost_usd',
'task',
];
}A Filament widget shows the last 30 days of spend. In March, I spent $4.20 on 312 agent runs. That is cheaper than a single hour of my time.
The pipeline has three layers of defense.
Laravel's queue system handles transient failures. Each job has a backoff of [30, 120, 300] seconds. If OpenRouter returns a 502 or rate limit, the job waits and tries again.
class GenerateMetadataJob implements ShouldQueue
{
public $tries = 3;
public $backoff = [30, 120, 300];
}If the primary model fails three times, the job swaps to a fallback. For SEO tasks, the fallback is always Claude 3.5 Sonnet. It is more expensive but more reliable.
$model = match ($attempts) {
1 => 'openai/gpt-4o-mini',
2 => 'anthropic/claude-3.5-haiku',
default => 'anthropic/claude-3.5-sonnet',
};If validation fails after all retries, the agent stops and sends me a Filament notification. I can review the draft, edit the metadata manually, or re-run the pipeline with a different model. The post stays in ready_for_review until I act.
The admin panel is Filament v3. I added a custom page that shows the agent run history for each post: which models ran, what they output, how much it cost, and where it failed.
There is also a bulk action. I can select ten drafts and queue them all for agent processing. This is useful when I batch-write posts on weekends and let the agent handle the metadata during the week.
Agents fail silently if you do not validate. Early on, I trusted the model to respect character limits. It did not. I lost two hours debugging why Facebook was truncating my OpenGraph titles. Now validation is mandatory.
Cheaper models are often better for narrow tasks. GPT-4o is overkill for tagging. Mistral 7B is faster, cheaper, and more predictable when the task is well-defined.
Cost tracking changes behavior. Once I saw the numbers, I stopped running the pipeline on every draft save. I batch runs nightly. The agent is not real-time, and it does not need to be.
The human is still the bottleneck. The agent writes metadata. It does not write the post. The hard part is still thinking clearly enough to draft something worth publishing. Everything else is just plumbing.
The pipeline is open source. You can find the Laravel jobs, the TypeScript prompt builders, and the Filament resources in the repository. It is not a framework. It is a set of patterns that worked for my stack. Adapt them to yours.
If you are building something similar, start with one step. Fetch a draft and generate tags. Add SEO later. Add validation after that. The biggest mistake is trying to automate everything on day one. The second biggest mistake is never automating anything at all.

AI Engineer & Full-Stack Tech Lead
Expertise: 20+ years full-stack development. Specializing in architecting cognitive systems, RAG architectures, and scalable web platforms for the MENA region.
Practical AI + full-stack insights for MENA builders. No spam.



