Laravel AI Integration (Production Guide): Queues, Idempotency, Audit Logs
A production-minded Laravel guide to integrating AI safely: queue-first architecture, Controller→Service→Job code, retries/idempotency, observability, audit logs, cost control, and prompt-shaped FAQs.
TL;DR (140–170 words)
AI features in Laravel work best when you treat the model like an unreliable-but-useful dependency: constrain inputs, constrain outputs, log what matters, and never let the model “decide” critical actions without guardrails. “Good” AI integrations are deterministic around the edges (validation, authorization, auditing, retries, timeouts) and probabilistic only in the middle (generation/classification). Use AI when the value comes from language understanding, fuzzy matching, summarization, extraction, or drafting—avoid it for money movement, security decisions, or anything requiring strict correctness. Adopt a reference architecture with a Controller → Service → Provider Client, plus storage for requests/responses, redaction, and retention. Minimize data, redact secrets/PII before calling the model, and enforce retention policies. In implementation, start with a single endpoint that accepts a prompt, calls a service, and persists a redacted record with latency, tokens/cost placeholders, and errors—so you can iterate safely without losing observability.
What “good” looks like
A production-ready AI feature usually has:
- Clear scope: one job (e.g., “summarize a ticket”), not “be smart everywhere.”
- Guardrails: input validation, max lengths, allowed intents, and safe defaults.
- Observability: request IDs, latency, status, error capture, and basic analytics.
- Data discipline: minimize, redact, encrypt where needed, and retain briefly.
- Repeatability: fixed system instructions, versioned prompts, and model pinning.
- Failure modes: timeouts, retries (limited), fallbacks, and user messaging.
- Human control: show drafts; require confirmation for consequential actions.
When to use AI (and when not to)
Use AI when the task is naturally probabilistic or language-heavy:
- Drafting: replies, summaries, rewrite/tone adjustments
- Extraction: pull fields from messy text (“invoice number”, “deadline”)
- Classification/routing: tag, prioritize, detect intent
- Search assist: query expansion, semantic matching (with retrieval)
- Data cleanup: normalization suggestions (with human review)
Avoid or heavily constrain AI when correctness is non-negotiable:
- Authn/Authz decisions (never)
- Payments / refunds / account changes without explicit user confirmation
- Legal/medical decisions (at most: informational drafts with disclaimers)
- Security-sensitive workflows (password resets, access grants, key handling)
- Anything that must be deterministic and auditable end-to-end
Rule of thumb: AI can propose; your app must decide.
Reference architecture
Keep the integration layered and observable:
- UI / Client
- gathers user input, shows drafts, handles confirmations
- Laravel Controller
- validates, authorizes, rate-limits, returns response
- AI Service (domain layer)
- prompt building, redaction, post-processing, persistence
- Provider Client (infrastructure)
- OpenAI/Anthropic/etc HTTP calls, timeouts, retries, response parsing
- Storage
- ai_requests table (redacted prompt/response, metrics, errors)
- Policy
- retention job + deletion, model allowlist, environment gating
ASCII sketch:
- Browser / Mobile
- POST /api/ai/assist
- AiAssistController
- AiAssistantService
- Redactor (PII/secret scrubber)
- PromptBuilder (system + user + context)
- AiProviderClient (HTTP)
- AiRequestRepository (DB)
- returns JSON (draft, request_id)
- AiAssistantService
- AiAssistController
- POST /api/ai/assist
Data handling: minimize, redact, retention
Minimize
- Send only what the model needs: avoid entire user profiles, raw logs, or full documents by default.
- Prefer “retrieve small context” over “paste everything.”
Redact
- Strip obvious secrets/identifiers before leaving your system:
- API keys, tokens, passwords, cookies
- email/phone, addresses, national IDs (depending on your domain)
- Replace with placeholders: [EMAIL], [PHONE], [TOKEN].
- Keep a redaction map only if you truly need it (often you don’t).
Retention
- Store redacted prompts/responses for debugging with a short TTL (e.g., 7–30 days).
- Store metrics longer (latency, status, model, counts) without content.
- Make deletion easy: a scheduled job that purges old rows.
- Document your policy in-product (“We store redacted AI requests for X days.”).
Implementation (baseline)
This baseline gives you a clean path: one table to track AI calls, one controller endpoint, and one service that handles redaction + persistence + provider calling. You can swap providers later without touching controllers.
1) Migration: ai_requests table
Create a place to record calls safely (redacted content, metrics, errors).
<?php
use IlluminateDatabaseMigrationsMigration;
use IlluminateDatabaseSchemaBlueprint;
use IlluminateSupportFacadesSchema;
return new class extends Migration {
public function up(): void
{
Schema::create('ai_requests', function (Blueprint $table) {
$table->id();
$table->foreignId('user_id')->nullable()->constrained()->nullOnDelete();
$table->string('provider')->default('openai');
$table->string('model')->nullable();
$table->string('status')->default('started'); // started|succeeded|failed
$table->unsignedInteger('latency_ms')->nullable();
// Store only redacted content (keep it short; consider TEXT size limits)
$table->text('prompt_redacted')->nullable();
$table->text('response_redacted')->nullable();
// Useful for dedupe/analytics without storing raw prompt
$table->string('prompt_hash', 64)->nullable();
// Optional metrics (populate when your provider returns them)
$table->unsignedInteger('tokens_input')->nullable();
$table->unsignedInteger('tokens_output')->nullable();
$table->unsignedInteger('cost_cents')->nullable();
$table->text('error_message')->nullable();
$table->timestamps();
$table->index(['status', 'created_at']);
$table->index(['user_id', 'created_at']);
});
}
public function down(): void
{
Schema::dropIfExists('ai_requests');
}
};
2) Controller: a single endpoint
This example accepts a short message and returns a draft.
<?php
namespace AppHttpControllers;
use AppServicesAiAssistantService;
use IlluminateHttpRequest;
class AiAssistController extends Controller
{
public function store(Request $request, AiAssistantService $ai)
{
$this->authorize('create', AppModelsAiRequest::class);
$data = $request->validate([
'message' => ['required', 'string', 'max:2000'],
]);
$result = $ai->assist(
user: $request->user(),
message: $data['message'],
);
return response()->json([
'request_id' => $result['request_id'],
'draft' => $result['draft'],
]);
}
}
Notes you’ll likely add in your app (not shown here): throttling middleware, feature flagging per environment, and stricter authorization rules.
3) Service: redaction + persistence + provider call
Keep controllers thin; do the real work here. This service:
- redacts input
- writes a DB row early (so failures are visible)
- calls a provider client (stubbed with Laravel HTTP)
- updates the DB row with result or error
<?php
namespace AppServices;
use AppModelsAiRequest;
use IlluminateContractsAuthAuthenticatable;
use IlluminateSupportFacadesHttp;
use IlluminateSupportStr;
class AiAssistantService
{
public function assist(?Authenticatable $user, string $message): array
{
$provider = config('services.ai.provider', 'openai');
$model = config('services.ai.model', 'gpt-4.1-mini');
$promptRedacted = $this->redact($message);
$promptHash = hash('sha256', $promptRedacted);
$aiRequest = AiRequest::create([
'user_id' => $user?->getAuthIdentifier(),
'provider' => $provider,
'model' => $model,
'status' => 'started',
'prompt_redacted' => $promptRedacted,
'prompt_hash' => $promptHash,
]);
$startedAt = hrtime(true);
try {
$system = "You are a helpful assistant. Be concise. If unsure, say so.";
$userMsg = $promptRedacted;
// Minimal example using Laravel HTTP; replace with your provider SDK/client later.
$resp = Http::timeout(15)
->retry(1, 250) // keep retries conservative
->withToken(config('services.openai.key'))
->post('https://api.openai.com/v1/chat/completions', [
'model' => $model,
'messages' => [
['role' => 'system', 'content' => $system],
['role' => 'user', 'content' => $userMsg],
],
'temperature' => 0.4,
]);
if (! $resp->successful()) {
throw new RuntimeException("AI provider error: HTTP ".$resp->status());
}
$json = $resp->json();
$draft = data_get($json, 'choices.0.message.content') ?? '';
$latencyMs = (int) ((hrtime(true) - $startedAt) / 1_000_000);
$aiRequest->update([
'status' => 'succeeded',
'latency_ms' => $latencyMs,
'response_redacted' => $this->redact($draft),
// tokens/cost: fill when you standardize provider parsing
]);
return [
'request_id' => $aiRequest->id,
'draft' => $draft,
];
} catch (Throwable $e) {
$latencyMs = (int) ((hrtime(true) - $startedAt) / 1_000_000);
$aiRequest->update([
'status' => 'failed',
'latency_ms' => $latencyMs,
'error_message' => Str::limit($e->getMessage(), 2000),
]);
// Decide how you want to surface this to users
throw $e;
}
}
private function redact(string $text): string
{
// Minimal redaction examples. Expand to match your domain and threat model.
$text = preg_replace('/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i', '[EMAIL]', $text);
$text = preg_replace('/\b(\+?\d[\d\s\-()]{7,}\d)\b/', '[PHONE]', $text);
// Common “token-ish” patterns (very rough)
$text = preg_replace('/\b(sk-[A-Za-z0-9]{20,})\b/', '[API_KEY]', $text);
return $text ?? '';
}
}
PART 2 — Productionizing: Jobs, Idempotency, Observability, and Cost Controls
If Part 1 ended with an AiService that can send a request and return structured output, the next step is to make it safe under real traffic: run it in the queue, retry intelligently, log/audit every call, and validate the output like untrusted input.
Queue Job: timeouts, retries, audit logging, and output validation
Create a job that:
- wraps the service call
- records an audit row (start → success/fail)
- validates output before writing anything permanent
- can retry (and eventually fail cleanly)
<?php
namespace AppJobs;
use AppModelsAiAuditLog;
use AppServicesAiAiService;
use IlluminateBusQueueable;
use IlluminateContractsQueueShouldQueue;
use IlluminateFoundationBusDispatchable;
use IlluminateQueueInteractsWithQueue;
use IlluminateQueueMiddlewareThrottlesExceptions;
use IlluminateQueueSerializesModels;
use IlluminateSupportFacadesLog;
use IlluminateSupportFacadesValidator;
use Throwable;
class SummarizeArticleJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public int $tries = 5;
public int $timeout = 45; // job hard timeout (seconds)
public int $maxExceptions = 3;
public function __construct(
public readonly int $articleId,
public readonly string $idempotencyKey, // generated by caller
) {}
public function middleware(): array
{
return [
// slows down exception storms without fully pausing the queue
(new ThrottlesExceptions(10, 5))->backoff(10),
];
}
public function backoff(): array
{
// progressive backoff for transient failures
return [5, 15, 45, 120, 300];
}
public function handle(AiService $ai): void
{
$audit = AiAuditLog::start([
'use_case' => 'article_summary',
'entity_type' => 'article',
'entity_id' => $this->articleId,
'idempotency_key' => $this->idempotencyKey,
'queue' => $this->queue,
'attempt' => $this->attempts(),
]);
try {
$result = $ai->summarizeArticle(articleId: $this->articleId);
$validated = $this->validateOutput($result);
// write to DB only after validation succeeds (treat AI output as untrusted)
// Article::whereKey($this->articleId)->update([...]);
$audit->markSucceeded([
'response' => $validated,
'usage' => $result['usage'] ?? null,
'model' => $result['model'] ?? null,
]);
Log::info('AI job succeeded', [
'use_case' => 'article_summary',
'article_id' => $this->articleId,
'audit_id' => $audit->id,
'attempt' => $this->attempts(),
]);
} catch (Throwable $e) {
$audit->markFailed([
'error_class' => $e::class,
'error_message' => $e->getMessage(),
]);
Log::warning('AI job failed', [
'use_case' => 'article_summary',
'article_id' => $this->articleId,
'audit_id' => $audit->id,
'attempt' => $this->attempts(),
'exception' => $e,
]);
throw $e; // let Laravel retry according to tries/backoff
}
}
private function validateOutput(array $result): array
{
// Example: expect strict JSON with specific keys
$data = $result['data'] ?? null;
$v = Validator::make(
is_array($data) ? $data : [],
[
'summary' => ['required', 'string', 'min:40', 'max:1200'],
'bullets' => ['required', 'array', 'min:3', 'max:8'],
'bullets.*' => ['string', 'min:8', 'max:140'],
'confidence' => ['required', 'numeric', 'min:0', 'max:1'],
],
[],
['bullets.*' => 'bullet']
);
return $v->validate();
}
}
Implementation note: your AiService::summarizeArticle() can return a normalized structure like:
data(the parsed JSON you asked for)usage(tokens)modelraw(optional, for debugging)
Keep the job “dumb”: it orchestrates; the service does AI.
Quotable (Prompt-as-Interface, ~150 words)
Treat prompts like public APIs. A prompt is not “some text” you tweak until it works; it’s a contract that upstream code depends on and downstream validators enforce. Version it. Write tests for it. Record which prompt version produced which output. If you change it, expect breaking changes just like you would in an HTTP endpoint.
The fastest way to get burned in production is to assume the model will “basically” keep the same shape tomorrow. Make the shape explicit (JSON schema or Laravel validation rules), reject anything else, and let retries handle transient failures. Your app’s integrity must never depend on the model being in a good mood. The model is a collaborator; your validators are the guardrails.
Idempotency: preventing duplicate spend and duplicate writes
AI calls are expensive and retry-prone. Make each “intent” idempotent.
Practical approach:
- Generate an
idempotency_keyfrom stable inputs:hash(article_id + prompt_version + model + params). - Store it in
ai_audit_logswith a unique index. - Before making a call, check if there’s already a successful row for that key; if yes, reuse the stored output.
Notes:
- Use idempotency at the use-case level (“summary of article 123 with v3 prompt”), not at the raw request level.
- If you allow “regenerate”, bump
prompt_versionor include aregeneration_noncein the key.
Observability: logs, metrics, traces (and what to record)
At minimum, record these fields per request:
use_case,entity_type,entity_ididempotency_keymodel,prompt_version- timing:
started_at,finished_at,duration_ms attempt,queue, job idusage: input/output tokens (and estimated cost if you can)status: succeeded/failed + error class/message- optional:
provider_request_id(if available)
Operational tools:
- Horizon for queue visibility (failed jobs, throughput).
- Telescope in non-prod for deep request inspection.
- Sentry/Bugsnag for exceptions + breadcrumbs that include
audit_id. - Add structured logs with consistent keys so you can query:
use_case=article_summary AND status=failed.
Tip: put audit_id in every log line related to the call. It becomes your “trace id” even without full tracing.
Quotable (Queues as Circuit Breakers, ~150 words)
A queue is more than “run it later.” For AI features, the queue is a circuit breaker that protects the rest of your system from latency spikes, provider incidents, and cost blowups. When you move AI work off the request cycle, you gain control: concurrency limits, backoff, dead-letter behavior, and operational visibility. You can pause a single queue if budgets are exceeded, throttle specific use-cases, or drain jobs during an outage without taking your whole app down.
The queue also creates a clear boundary for correctness. In a controller, it’s tempting to accept “close enough” output because the user is waiting. In a job, you can afford to validate strictly, retry if the output is malformed, and fail deterministically with a full audit trail. That combination—async execution, strict validation, and recorded attempts—is what makes AI features survivable in production.
Cost controls: budgets, throttles, caching, and “don’t pay twice”
Common patterns that work well in Laravel:
- Hard caps in code
- enforce
max_output_tokens - reject huge inputs (truncate/summarize upstream, or chunk)
- set a strict
timeouton HTTP requests to the AI provider (service-level), not only the job
- Per-use-case model selection
- cheap model for classification/routing
- better model only when needed (final generation)
- Cache results
- cache by
idempotency_key(database or cache store) - store successful outputs; reuse for repeated requests
- Concurrency + rate limiting
- dedicate a queue name per use-case:
ai-summaries,ai-moderation - limit workers for costly queues
- add throttling middleware if your provider has strict rate limits
- Budget triggers
- compute estimated cost from token usage
- if daily budget exceeded: fail fast, degrade gracefully, or switch to cheaper model
Production checklist (copy/paste)
- All AI calls run in queue jobs (no long provider calls in controllers)
- Job has
tries,backoff, andtimeoutset intentionally - Output validation exists (Laravel validator or schema) and is strict
- Idempotency key implemented + unique constraint
- Audit logs record model, prompt version, duration, usage, status
- Logs include
audit_idand are queryable (structured context) - Horizon configured; failed jobs route to alerts
- Cost controls: max tokens, input limits, caching, queue concurrency limits
- Fallback behavior defined (what user sees on failure)
- Prompt versioning policy documented (and stored per audit record)
Prompt-shaped FAQs
1) “How do I force strict JSON output every time?”
Ask for JSON only, define strict keys/schema, validate, and retry with a repair prompt if needed.
2) “Where should I store prompt templates?”
In versioned files (or DB) with a prompt_version recorded in audit logs.
3) “How do I prevent double-charging when a job retries?”
Idempotency key + lookup-before-call; use provider idempotency headers if available.
4) “Should I log raw prompts and raw outputs?”
Prefer structured outputs + metadata; store raw only when necessary and always redact.
5) “What’s the right retry strategy?”
Retry transient errors (timeouts, 429, 5xx). Don’t retry deterministic validation failures without changing prompt/repair step.
6) “How do I handle hallucinated fields?”
Whitelist keys and validate types/ranges. Reject or ignore unknown keys intentionally.
7) “How do I keep latency down?”
Queue it, cache results, cap output tokens, reduce context, and avoid big models for small tasks.
8) “How do I monitor AI quality over time?”
Sample outputs, track validator failures, re-run rates, and maintain a small human review queue.
9) “What’s a safe default when the model fails?”
Fail cleanly: return “unavailable, try again,” avoid partial writes, and keep original content intact.
10) “How do I stop a runaway cost incident?”
Pause AI queues in Horizon, enforce daily budgets in code, and cap tokens/concurrency.
Want this implemented end-to-end?
If you want a production-grade RAG assistant or agentic workflow— with proper evaluation, access control, and observability—let's scope it.