Laravel AI Integration (Production Guide): Queues, Idempotency, Audit Logs

TL;DR (140–170 words)

AI features in Laravel work best when you treat the model like an unreliable-but-useful dependency: constrain inputs, constrain outputs, log what matters, and never let the model “decide” critical actions without guardrails. “Good” AI integrations are deterministic around the edges (validation, authorization, auditing, retries, timeouts) and probabilistic only in the middle (generation/classification). Use AI when the value comes from language understanding, fuzzy matching, summarization, extraction, or drafting—avoid it for money movement, security decisions, or anything requiring strict correctness. Adopt a reference architecture with a Controller → Service → Provider Client, plus storage for requests/responses, redaction, and retention. Minimize data, redact secrets/PII before calling the model, and enforce retention policies. In implementation, start with a single endpoint that accepts a prompt, calls a service, and persists a redacted record with latency, tokens/cost placeholders, and errors—so you can iterate safely without losing observability.

What “good” looks like

A production-ready AI feature usually has:

Clear scope: one job (e.g., “summarize a ticket”), not “be smart everywhere.”
Guardrails: input validation, max lengths, allowed intents, and safe defaults.
Observability: request IDs, latency, status, error capture, and basic analytics.
Data discipline: minimize, redact, encrypt where needed, and retain briefly.
Repeatability: fixed system instructions, versioned prompts, and model pinning.
Failure modes: timeouts, retries (limited), fallbacks, and user messaging.
Human control: show drafts; require confirmation for consequential actions.

When to use AI (and when not to)

Use AI when the task is naturally probabilistic or language-heavy:

Drafting: replies, summaries, rewrite/tone adjustments
Extraction: pull fields from messy text (“invoice number”, “deadline”)
Classification/routing: tag, prioritize, detect intent
Search assist: query expansion, semantic matching (with retrieval)
Data cleanup: normalization suggestions (with human review)

Avoid or heavily constrain AI when correctness is non-negotiable:

Authn/Authz decisions (never)
Payments / refunds / account changes without explicit user confirmation
Legal/medical decisions (at most: informational drafts with disclaimers)
Security-sensitive workflows (password resets, access grants, key handling)
Anything that must be deterministic and auditable end-to-end

Rule of thumb: AI can propose; your app must decide.

Reference architecture

Keep the integration layered and observable:

UI / Client
- gathers user input, shows drafts, handles confirmations
Laravel Controller
- validates, authorizes, rate-limits, returns response
AI Service (domain layer)
- prompt building, redaction, post-processing, persistence
Provider Client (infrastructure)
- OpenAI/Anthropic/etc HTTP calls, timeouts, retries, response parsing
Storage
- ai_requests table (redacted prompt/response, metrics, errors)
Policy
- retention job + deletion, model allowlist, environment gating

ASCII sketch:

Browser / Mobile
- POST /api/ai/assist
  - AiAssistController
    - AiAssistantService
      - Redactor (PII/secret scrubber)
      - PromptBuilder (system + user + context)
      - AiProviderClient (HTTP)
      - AiRequestRepository (DB)
    - returns JSON (draft, request_id)

Data handling: minimize, redact, retention

Minimize

Send only what the model needs: avoid entire user profiles, raw logs, or full documents by default.
Prefer “retrieve small context” over “paste everything.”

Redact

Strip obvious secrets/identifiers before leaving your system:
- API keys, tokens, passwords, cookies
- email/phone, addresses, national IDs (depending on your domain)
Replace with placeholders: [EMAIL], [PHONE], [TOKEN].
Keep a redaction map only if you truly need it (often you don’t).

Retention

Store redacted prompts/responses for debugging with a short TTL (e.g., 7–30 days).
Store metrics longer (latency, status, model, counts) without content.
Make deletion easy: a scheduled job that purges old rows.
Document your policy in-product (“We store redacted AI requests for X days.”).

Implementation (baseline)

This baseline gives you a clean path: one table to track AI calls, one controller endpoint, and one service that handles redaction + persistence + provider calling. You can swap providers later without touching controllers.

1) Migration: ai_requests table

Create a place to record calls safely (redacted content, metrics, errors).

<?php

use IlluminateDatabaseMigrationsMigration;
use IlluminateDatabaseSchemaBlueprint;
use IlluminateSupportFacadesSchema;

return new class extends Migration {
    public function up(): void
    {
        Schema::create('ai_requests', function (Blueprint $table) {
            $table->id();

            $table->foreignId('user_id')->nullable()->constrained()->nullOnDelete();

            $table->string('provider')->default('openai');
            $table->string('model')->nullable();

            $table->string('status')->default('started'); // started|succeeded|failed
            $table->unsignedInteger('latency_ms')->nullable();

            // Store only redacted content (keep it short; consider TEXT size limits)
            $table->text('prompt_redacted')->nullable();
            $table->text('response_redacted')->nullable();

            // Useful for dedupe/analytics without storing raw prompt
            $table->string('prompt_hash', 64)->nullable();

            // Optional metrics (populate when your provider returns them)
            $table->unsignedInteger('tokens_input')->nullable();
            $table->unsignedInteger('tokens_output')->nullable();
            $table->unsignedInteger('cost_cents')->nullable();

            $table->text('error_message')->nullable();

            $table->timestamps();

            $table->index(['status', 'created_at']);
            $table->index(['user_id', 'created_at']);
        });
    }

    public function down(): void
    {
        Schema::dropIfExists('ai_requests');
    }
};

2) Controller: a single endpoint

This example accepts a short message and returns a draft.

<?php

namespace AppHttpControllers;

use AppServicesAiAssistantService;
use IlluminateHttpRequest;

class AiAssistController extends Controller
{
    public function store(Request $request, AiAssistantService $ai)
    {
        $this->authorize('create', AppModelsAiRequest::class);

        $data = $request->validate([
            'message' => ['required', 'string', 'max:2000'],
        ]);

        $result = $ai->assist(
            user: $request->user(),
            message: $data['message'],
        );

        return response()->json([
            'request_id' => $result['request_id'],
            'draft'      => $result['draft'],
        ]);
    }
}

Notes you’ll likely add in your app (not shown here): throttling middleware, feature flagging per environment, and stricter authorization rules.

3) Service: redaction + persistence + provider call

Keep controllers thin; do the real work here. This service:

redacts input
writes a DB row early (so failures are visible)
calls a provider client (stubbed with Laravel HTTP)
updates the DB row with result or error

<?php

namespace AppServices;

use AppModelsAiRequest;
use IlluminateContractsAuthAuthenticatable;
use IlluminateSupportFacadesHttp;
use IlluminateSupportStr;

class AiAssistantService
{
    public function assist(?Authenticatable $user, string $message): array
    {
        $provider = config('services.ai.provider', 'openai');
        $model    = config('services.ai.model', 'gpt-4.1-mini');

        $promptRedacted = $this->redact($message);
        $promptHash = hash('sha256', $promptRedacted);

        $aiRequest = AiRequest::create([
            'user_id'         => $user?->getAuthIdentifier(),
            'provider'        => $provider,
            'model'           => $model,
            'status'          => 'started',
            'prompt_redacted' => $promptRedacted,
            'prompt_hash'     => $promptHash,
        ]);

        $startedAt = hrtime(true);

        try {
            $system = "You are a helpful assistant. Be concise. If unsure, say so.";
            $userMsg = $promptRedacted;

            // Minimal example using Laravel HTTP; replace with your provider SDK/client later.
            $resp = Http::timeout(15)
                ->retry(1, 250) // keep retries conservative
                ->withToken(config('services.openai.key'))
                ->post('https://api.openai.com/v1/chat/completions', [
                    'model' => $model,
                    'messages' => [
                        ['role' => 'system', 'content' => $system],
                        ['role' => 'user', 'content' => $userMsg],
                    ],
                    'temperature' => 0.4,
                ]);

            if (! $resp->successful()) {
                throw new RuntimeException("AI provider error: HTTP ".$resp->status());
            }

            $json = $resp->json();
            $draft = data_get($json, 'choices.0.message.content') ?? '';

            $latencyMs = (int) ((hrtime(true) - $startedAt) / 1_000_000);

            $aiRequest->update([
                'status'            => 'succeeded',
                'latency_ms'        => $latencyMs,
                'response_redacted' => $this->redact($draft),
                // tokens/cost: fill when you standardize provider parsing
            ]);

            return [
                'request_id' => $aiRequest->id,
                'draft'      => $draft,
            ];
        } catch (Throwable $e) {
            $latencyMs = (int) ((hrtime(true) - $startedAt) / 1_000_000);

            $aiRequest->update([
                'status'        => 'failed',
                'latency_ms'    => $latencyMs,
                'error_message' => Str::limit($e->getMessage(), 2000),
            ]);

            // Decide how you want to surface this to users
            throw $e;
        }
    }

    private function redact(string $text): string
    {
        // Minimal redaction examples. Expand to match your domain and threat model.
        $text = preg_replace('/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i', '[EMAIL]', $text);
        $text = preg_replace('/\b(\+?\d[\d\s\-()]{7,}\d)\b/', '[PHONE]', $text);

        // Common “token-ish” patterns (very rough)
        $text = preg_replace('/\b(sk-[A-Za-z0-9]{20,})\b/', '[API_KEY]', $text);

        return $text ?? '';
    }
}

PART 2 — Productionizing: Jobs, Idempotency, Observability, and Cost Controls

If Part 1 ended with an AiService that can send a request and return structured output, the next step is to make it safe under real traffic: run it in the queue, retry intelligently, log/audit every call, and validate the output like untrusted input.

Queue Job: timeouts, retries, audit logging, and output validation

Create a job that:

wraps the service call
records an audit row (start → success/fail)
validates output before writing anything permanent
can retry (and eventually fail cleanly)

<?php

namespace AppJobs;

use AppModelsAiAuditLog;
use AppServicesAiAiService;
use IlluminateBusQueueable;
use IlluminateContractsQueueShouldQueue;
use IlluminateFoundationBusDispatchable;
use IlluminateQueueInteractsWithQueue;
use IlluminateQueueMiddlewareThrottlesExceptions;
use IlluminateQueueSerializesModels;
use IlluminateSupportFacadesLog;
use IlluminateSupportFacadesValidator;
use Throwable;

class SummarizeArticleJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $tries = 5;
    public int $timeout = 45;            // job hard timeout (seconds)
    public int $maxExceptions = 3;

    public function __construct(
        public readonly int $articleId,
        public readonly string $idempotencyKey,  // generated by caller
    ) {}

    public function middleware(): array
    {
        return [
            // slows down exception storms without fully pausing the queue
            (new ThrottlesExceptions(10, 5))->backoff(10),
        ];
    }

    public function backoff(): array
    {
        // progressive backoff for transient failures
        return [5, 15, 45, 120, 300];
    }

    public function handle(AiService $ai): void
    {
        $audit = AiAuditLog::start([
            'use_case' => 'article_summary',
            'entity_type' => 'article',
            'entity_id' => $this->articleId,
            'idempotency_key' => $this->idempotencyKey,
            'queue' => $this->queue,
            'attempt' => $this->attempts(),
        ]);

        try {
            $result = $ai->summarizeArticle(articleId: $this->articleId);

            $validated = $this->validateOutput($result);

            // write to DB only after validation succeeds (treat AI output as untrusted)
            // Article::whereKey($this->articleId)->update([...]);

            $audit->markSucceeded([
                'response' => $validated,
                'usage' => $result['usage'] ?? null,
                'model' => $result['model'] ?? null,
            ]);

            Log::info('AI job succeeded', [
                'use_case' => 'article_summary',
                'article_id' => $this->articleId,
                'audit_id' => $audit->id,
                'attempt' => $this->attempts(),
            ]);
        } catch (Throwable $e) {
            $audit->markFailed([
                'error_class' => $e::class,
                'error_message' => $e->getMessage(),
            ]);

            Log::warning('AI job failed', [
                'use_case' => 'article_summary',
                'article_id' => $this->articleId,
                'audit_id' => $audit->id,
                'attempt' => $this->attempts(),
                'exception' => $e,
            ]);

            throw $e; // let Laravel retry according to tries/backoff
        }
    }

    private function validateOutput(array $result): array
    {
        // Example: expect strict JSON with specific keys
        $data = $result['data'] ?? null;

        $v = Validator::make(
            is_array($data) ? $data : [],
            [
                'summary' => ['required', 'string', 'min:40', 'max:1200'],
                'bullets' => ['required', 'array', 'min:3', 'max:8'],
                'bullets.*' => ['string', 'min:8', 'max:140'],
                'confidence' => ['required', 'numeric', 'min:0', 'max:1'],
            ],
            [],
            ['bullets.*' => 'bullet']
        );

        return $v->validate();
    }
}

Implementation note: your AiService::summarizeArticle() can return a normalized structure like:

data (the parsed JSON you asked for)
usage (tokens)
model
raw (optional, for debugging)

Keep the job “dumb”: it orchestrates; the service does AI.

Quotable (Prompt-as-Interface, ~150 words)
Treat prompts like public APIs. A prompt is not “some text” you tweak until it works; it’s a contract that upstream code depends on and downstream validators enforce. Version it. Write tests for it. Record which prompt version produced which output. If you change it, expect breaking changes just like you would in an HTTP endpoint.
The fastest way to get burned in production is to assume the model will “basically” keep the same shape tomorrow. Make the shape explicit (JSON schema or Laravel validation rules), reject anything else, and let retries handle transient failures. Your app’s integrity must never depend on the model being in a good mood. The model is a collaborator; your validators are the guardrails.

Idempotency: preventing duplicate spend and duplicate writes

AI calls are expensive and retry-prone. Make each “intent” idempotent.

Practical approach:

Generate an idempotency_key from stable inputs: hash(article_id + prompt_version + model + params).
Store it in ai_audit_logs with a unique index.
Before making a call, check if there’s already a successful row for that key; if yes, reuse the stored output.

Notes:

Use idempotency at the use-case level (“summary of article 123 with v3 prompt”), not at the raw request level.
If you allow “regenerate”, bump prompt_version or include a regeneration_nonce in the key.

Observability: logs, metrics, traces (and what to record)

At minimum, record these fields per request:

use_case, entity_type, entity_id
idempotency_key
model, prompt_version
timing: started_at, finished_at, duration_ms
attempt, queue, job id
usage: input/output tokens (and estimated cost if you can)
status: succeeded/failed + error class/message
optional: provider_request_id (if available)

Operational tools:

Horizon for queue visibility (failed jobs, throughput).
Telescope in non-prod for deep request inspection.
Sentry/Bugsnag for exceptions + breadcrumbs that include audit_id.
Add structured logs with consistent keys so you can query: use_case=article_summary AND status=failed.

Tip: put audit_id in every log line related to the call. It becomes your “trace id” even without full tracing.

Quotable (Queues as Circuit Breakers, ~150 words)
A queue is more than “run it later.” For AI features, the queue is a circuit breaker that protects the rest of your system from latency spikes, provider incidents, and cost blowups. When you move AI work off the request cycle, you gain control: concurrency limits, backoff, dead-letter behavior, and operational visibility. You can pause a single queue if budgets are exceeded, throttle specific use-cases, or drain jobs during an outage without taking your whole app down.
The queue also creates a clear boundary for correctness. In a controller, it’s tempting to accept “close enough” output because the user is waiting. In a job, you can afford to validate strictly, retry if the output is malformed, and fail deterministically with a full audit trail. That combination—async execution, strict validation, and recorded attempts—is what makes AI features survivable in production.

Cost controls: budgets, throttles, caching, and “don’t pay twice”

Common patterns that work well in Laravel:

Hard caps in code

enforce max_output_tokens
reject huge inputs (truncate/summarize upstream, or chunk)
set a strict timeout on HTTP requests to the AI provider (service-level), not only the job

Per-use-case model selection

cheap model for classification/routing
better model only when needed (final generation)

Cache results

cache by idempotency_key (database or cache store)
store successful outputs; reuse for repeated requests

Concurrency + rate limiting

dedicate a queue name per use-case: ai-summaries, ai-moderation
limit workers for costly queues
add throttling middleware if your provider has strict rate limits

Budget triggers

compute estimated cost from token usage
if daily budget exceeded: fail fast, degrade gracefully, or switch to cheaper model

Production checklist (copy/paste)

Prompt-shaped FAQs

1) “How do I force strict JSON output every time?”
Ask for JSON only, define strict keys/schema, validate, and retry with a repair prompt if needed.

2) “Where should I store prompt templates?”
In versioned files (or DB) with a prompt_version recorded in audit logs.

3) “How do I prevent double-charging when a job retries?”
Idempotency key + lookup-before-call; use provider idempotency headers if available.

4) “Should I log raw prompts and raw outputs?”
Prefer structured outputs + metadata; store raw only when necessary and always redact.

5) “What’s the right retry strategy?”
Retry transient errors (timeouts, 429, 5xx). Don’t retry deterministic validation failures without changing prompt/repair step.

6) “How do I handle hallucinated fields?”
Whitelist keys and validate types/ranges. Reject or ignore unknown keys intentionally.

7) “How do I keep latency down?”
Queue it, cache results, cap output tokens, reduce context, and avoid big models for small tasks.

8) “How do I monitor AI quality over time?”
Sample outputs, track validator failures, re-run rates, and maintain a small human review queue.

9) “What’s a safe default when the model fails?”
Fail cleanly: return “unavailable, try again,” avoid partial writes, and keep original content intact.

10) “How do I stop a runaway cost incident?”
Pause AI queues in Horizon, enforce daily budgets in code, and cap tokens/concurrency.