Why doesn't Sentry catch retry loops on AI apps?

Error monitoring tools group errors by fingerprint (a hash of stack trace + error message). A retry loop firing the same error 800 times looks identical to 800 different users hitting the same bug once — both increment the same issue counter. Default Sentry alerts trigger on new issues, not on suddenly-noisy existing issues, so a silent loop firing at 50/sec for 6 hours never trips an alert. Add a custom alert on 'any single issue with > 100 events per hour' to catch this.

How do I detect a runaway AI retry loop tonight?

Query your background job queue (BullMQ, Sidekiq, Celery, or whichever you use) for jobs that have been active or failed for more than 24 hours, or that have retried more than 10 times. Cancel them and add a hard cap so any job exceeding 10 retries is paged or dropped automatically. This single rule prevents most multi-thousand-dollar OpenAI bills.

Why did my OpenAI bill suddenly multiply 5x without traffic changing?

The most common cause is context window bloat: the app appends full conversation history to every prompt without truncation. A six-month-old user's chat can grow from 4K to 60K input tokens, multiplying per-call cost by 15x. Log input token counts on every LLM call and plot the p95 over time — upward drift means context bloat. Fix with a sliding window or a summarization step.

What should I monitor on an AI app that traditional tools miss?

Five behavior-level signals: token usage per user per day, LLM call rate per service, job queue length trend, per-fingerprint event rate, and cost per active user. Traditional monitoring catches errors, but AI cost incidents are usually successful behavior at high volume — they need pattern-based monitoring, not error-based monitoring.

How do multi-agent systems blow through token budgets?

Multi-agent setups can clarify each other indefinitely. Agent A produces output, Agent B asks for clarification, Agent A clarifies, Agent B clarifies the clarification, and so on until context or budget runs out. One ambiguous user query has produced 4,200+ LLM calls in documented cases. Hard-cap multi-agent loops at 10 turns; after that, return whatever you have and exit.

AI observability · 2026

Your AI app is silently burning $2,000/month.
Here are the 5 patterns.

Five real production patterns that quietly drain AI budgets while everything looks fine in your dashboard — retry loops, self-triggering agents, fingerprint aggregation blind spots, context bloat, abandoned crons. Plus what to do tonight, regardless of where you host.

Disclosure: I'm a senior backend tech lead and I run HostingGuru, where built-in AI monitoring is the feature I'm proudest of. This article will mention HostingGuru once near the end, but the patterns and detection methods below work on any platform — I want this useful even if you never become a customer.

The cleanest version of this story is one I keep hearing from founders, with small variations each time:

"We woke up to a $2,400 OpenAI bill. The product still works. Sentry is green. Our error rate is normal. We have no idea what happened."

Then they dig in. They find a webhook handler that's been retrying a Stripe event for 11 days because a key was rotated and the retry logic capped out at "30 minutes between attempts" instead of "stop after 24 hours." Each retry calls an LLM to summarize the event for an internal log. 11 days × 48 retries × 8K tokens × $0.04. The math is unforgiving.

Or they find an agent that's been self-triggering. Or a context window that quietly grew from 4K to 80K tokens because nobody noticed a bug stuffing the entire conversation history into every prompt. Or a cron job that runs at 3am and produces output nobody reads, but produces it via Claude Sonnet at $3 per million input tokens.

This is the 2026 version of a problem that used to be small. AI made it expensive.

I want to walk you through the five patterns I see most often, why they're invisible to traditional monitoring, and what you can actually do about them tonight.

Why this is harder than it used to be

Pre-AI, a runaway loop in your app was annoying. It maxed out a CPU, your alerting noticed the CPU pegged, you got paged, you fixed it. Total damage: a few hours of degraded service, maybe a small AWS bill bump.

Post-AI, a runaway loop is expensive. Each iteration calls an LLM. Each LLM call costs real money. Worst of all, the loop doesn't show up as a problem in any of your existing tools:

Sentry: aggregates errors by fingerprint. Same retry loop = "1 issue, +850 events." It looks like one bug, not eight hundred.
CloudWatch / Datadog: traffic and CPU look fine — a retry loop is just a steady stream of requests.
Stripe / your billing dashboard: shows charges after they happen, on a 24–48h delay.
Your inbox: silent. The OpenAI / Anthropic / Stripe APIs don't email you when one customer is making 50,000 calls an hour.

The first signal you usually get is the credit card alert from your bank. By then, you're $1,000+ in.

Pattern 1: The infinite retry loop

The classic. A background job hits a transient error, your retry logic backs off exponentially, eventually retries every 30 minutes, but never gives up. The underlying issue is permanent: a webhook secret was rotated, an API key was deactivated, a file path was renamed. The job will retry forever.

If the job involves an LLM call (summarizing the error, deciding next action, generating a fallback response), every retry costs tokens. Multiply by however long until someone notices.

Real example I saw last month: a B2B SaaS doing email parsing. Their email parser used GPT-4 to extract structured data. One specific email format consistently failed validation downstream. The retry queue kept retrying. 11,000 emails × 6 retries × $0.10 per call = $6,600 wasted before the founder noticed.

How to detect it tonight: query your job queue (BullMQ, Sidekiq, Celery, whatever) for jobs that have been "active" or "failed" for more than 24 hours. Set a hard cap: any job that retries more than 10 times gets paged or dropped, no exceptions.

Pattern 2: The self-triggering agent

Multi-agent systems are particularly good at this one. Agent A produces output. Agent B reads agent A's output and decides "I should ping agent A for clarification." Agent A produces a clarification. Agent B reads it and decides "I should clarify the clarification." The conversation continues until you run out of context or money — whichever comes first.

I saw this kill a YC startup's monthly budget in 14 hours. They'd shipped a "research assistant" that orchestrated three agents. A user typed an ambiguous query. The agents started clarifying each other. By the time the user's session timed out, the system had made 4,200 LLM calls.

How to detect it tonight: hard-cap your multi-agent loops at 10 turns. After 10 back-and-forth iterations, the system returns whatever it has and exits. If you're using LangChain or similar, this is one config flag. If you've written your own orchestration, it's three lines of code.

Pattern 3: The "fingerprint aggregation" blind spot

This is the most insidious one because it specifically defeats Sentry / Bugsnag / Honeybadger.

Error monitoring tools group errors by fingerprint (basically a hash of the stack trace + error message). Same fingerprint = "this is the same bug." The dashboard shows "+842 events on this issue" with a slowly incrementing counter.

The problem: a retry loop firing the same error 800 times looks identical to 800 different users hitting the same bug once. Your error tool can't tell them apart. Both show up as "+800 events on the same issue." If you're not specifically watching event-rate per fingerprint, you'll miss the loop entirely.

The default Sentry alerts trigger on new issues, not on suddenly-very-noisy existing issues. So a bug that's been silently looping at 50/sec for 6 hours doesn't trip any alerts.

How to detect it tonight: add a custom Sentry alert on "any single issue with > 100 events per hour." Most teams forget this exists. It's the alert that catches the silent loops.

Pattern 4: The context window that quietly grew

Here's how this happens: you ship an AI feature with a 4K-token context window. Works fine in dev. In prod, a customer accumulates a long conversation history. Your code (or worse, Claude's code from when it built the feature) appends the entire conversation history to every new prompt without truncation.

Six months later, that customer has a 60K-token conversation. Every interaction now costs 15× what it did at launch. Multiplied across all your power users, you've quietly 5x'd your per-user AI cost without noticing — because the increase is gradual and the dashboard just shows "monthly OpenAI bill went up."

How to detect it tonight: log the input token count of every LLM call (most SDKs return this). Plot the p95 input token count over time. If it's trending up, you have context bloat. The fix is usually a sliding window or a summarization step.

This is also where I see the most "Claude Code did this and now I owe $400" stories. Claude is generous with context — it'll happily concatenate everything if you don't tell it not to.

Pattern 5: The cron job that never reads its output

Less dramatic, more common: a 0 3 * * * cron job kicks off every night at 3am. It runs an analysis. It generates a report. It writes the report to a database table or an S3 bucket. Nobody reads the report.

This was useful when you had it built last year. Then the team member who used it left. Then the report became stale. Then it became wrong. But the cron keeps running every night, calling the LLM, eating tokens. Quietly.

How to detect it tonight: list every cron job in your system. For each one, ask: "if this stopped running tomorrow, would anyone notice within 7 days?" If the answer is no, kill it. (You can always add it back if someone complains.)

What "good monitoring" looks like for AI apps

Traditional monitoring (Sentry, Datadog, CloudWatch) is great at finding errors. They're bad at finding patterns.

The patterns above all share two properties:

They're not errors. They're successful behavior at high volume.
They don't trigger alerts. Each individual call looks fine. Only the aggregate rate is wrong.

What you actually need is a layer that watches behavior, not errors. Some signals worth tracking:

Token usage per user per day (spike = investigation trigger)
LLM call rate per service (steady ≠ healthy if it's been steady for 18 hours unattended)
Job queue length over time (growing slowly = retry loop accumulating)
Per-fingerprint event rate (the Sentry blind spot above)
Cost per active user (rising = something's bloating somewhere)

You can build this yourself. It takes about 2 weeks of work for a backend engineer. You can also use a platform that has it built in, which is what I want to be honest about now.

What I built (and why I built it)

I'm a senior backend tech lead. I've shipped production systems for BeReal, Oney, Ringover. I built HostingGuru because the gap between "Sentry tells me when something errors" and "I get a Telegram ping at 3am that says this Stripe webhook handler has retried 200 times in the last hour, here's the link to the logs" was the gap I kept finding myself filling manually for clients.

HostingGuru's AI monitoring tails your production logs and alerts on patterns, not errors:

Retry loops detected when the same operation fires faster than expected, regardless of error rate
Token spikes detected when a user's per-day LLM cost jumps significantly
Hot fingerprints detected when one Sentry-style issue suddenly explodes in event rate
Anomalous response times detected when p95 latency jumps without an obvious traffic cause
Silent cron failures detected when a job that ran consistently for 30 days suddenly stops

Alerts go to Telegram by default — because that's where founders actually look at 3am. (Email and Slack also supported.)

It works on any app deployed to HostingGuru, on any of the 14+ frameworks we support. The alerts and pattern detection are part of the platform — no extra config, no extra subscription.

If you've ever woken up to a surprise bill, this is the layer that would have caught it before it happened.

What to do tonight, regardless of which platform you use

You don't need to switch hosts to catch most of these. Five concrete moves:

Run a query on your job queue for any job retrying more than 10 times. Cancel them.
Cap your multi-agent loops at 10 turns in code. One commit.
Add a Sentry alert on "any single issue with > 100 events per hour."
Log token counts on every LLM call and check p95 input tokens trend. If trending up, fix context truncation.
List every cron job and kill any whose output nobody reads.

These five moves take an evening. They prevent the vast majority of the surprise-bill stories I hear. Whether you do them on HostingGuru, Render, Railway, AWS, or your own VPS, you should do them.

The harder truth

The hardest part of running an AI-powered product in 2026 isn't building it. AI tools made building it 10x cheaper. The hard part is operating it — knowing what's running, what it's costing, what's broken in a way that doesn't show up as broken.

The cost of a runaway loop went from "annoying" to "expensive" the moment AI became a per-call API charge. The tools we use to monitor production didn't get the memo. Sentry was designed in a world where errors were the primary problem; it's still the best at that, but it's not the right tool for "your tokens are leaking somewhere."

Until that gap closes across the whole industry, you have to build it yourself or use a platform that has it built in. Either path is fine. The one path that doesn't end well is "we'll find out at the end of the month."

Catch the loop before the bill.

Pattern-based AI monitoring with Telegram alerts is included on every HostingGuru plan — even the free one.

Start free See how monitoring works

No credit card required 1 service free forever Telegram, email & Slack alerts Built-in pattern detection

Frequently asked

Quick answers

Why is my AI app's bill higher than expected even when traffic looks fine?

Common causes: a retry loop on a failing tool call, agents self-triggering on their own output, fingerprint-blind cost attribution that hides the worst caller, and unbounded context windows on long-lived sessions.

How do I detect a runaway AI agent loop before it costs $1,000?

Set a per-conversation token budget (e.g. 50k tokens) and hard-stop when exceeded. Log every tool call with conversation ID; alert when the same conversation makes more than 20 tool calls in 5 minutes.

What's the cheapest way to add cost observability to an AI app?

Stamp every LLM call with a (user ID, route, model) tuple and store the token count and cost in a Postgres table. A weekly query grouped by route surfaces the cost outliers — no dedicated APM needed.

Does HostingGuru offer AI cost monitoring?

AI cost monitoring is on the roadmap. Today, HostingGuru ships request-level metrics and AI-generated health reports that flag traffic anomalies — the same signal you'd use to catch a runaway agent.