web down
All posts
May 4, 2026 · flndrn

Why we monitor every 5 minutes (not every 30 seconds)

Most monitoring tools brag about 30-second checks. We deliberately don't. Here's the engineering and operational reasoning behind our 5-minute interval.

When you compare uptime monitoring tools, almost every competitor leads with a number: "1-minute checks", "30-second checks", "real-time monitoring". It's the SaaS marketing equivalent of horsepower in a sports car ad — the figure that's supposed to tell you everything you need to know.

We monitor every 5 minutes. We're not embarrassed about that. We think it's the right answer for the people we're trying to serve.

What "30-second checks" actually buys you

The pitch is straightforward: faster checks = faster detection = fewer minutes of customer-facing downtime. If your site goes down at 14:03, a 30-second tool will tell you by 14:04, while a 5-minute tool might not tell you until 14:08.

That four-minute difference is real. But ask yourself: in those four minutes, what would you actually do?

If you're a solo founder running a marketing site, a SaaS app, or a few client sites, the honest answer is usually one of:

  • You're at your desk and you'd see the alert either way
  • You're asleep, and you're going to deal with it when you wake up regardless
  • You're in a meeting, and you can't excuse yourself to fix it anyway
  • You're in transit, and your laptop's already closed

The 4-minute window only matters when you can act on it. For an on-call SRE at a Series B startup with paging escalation chains, yes — every minute counts. For a freelancer running 12 client sites, the difference between a 4-minute and an 8-minute alert is invisible. You're going to fix it after the meeting either way.

What 30-second checks actually cost

Here's the part the marketing pages don't mention. Doing a check every 30 seconds means:

  • 120 checks per hour per URL. At even modest traffic, that's a meaningful load on the monitored site. We've seen WordPress hosts throttle monitoring traffic, return 429s, and trigger false-positive alerts.
  • More noise per signal. A monitoring check from a single location returns false positives all the time — DNS hiccups, TLS handshake failures, transient cloud regional issues. The faster you check, the more of these you see, and the more your alerts train you to ignore them.
  • More infrastructure to absorb. Sub-minute monitoring tools need multi-region check fleets to deduplicate false positives. That's why they all charge more. Not because the underlying signal is more valuable, but because the noise is louder.

5-minute checks have a 12-checks-per-hour cadence. They smooth out the noise without losing the signal. If your site is genuinely down, it'll still be down 5 minutes from now.

Where the 4 minutes actually go

If the goal is "minimize total customer-facing downtime", check frequency is one of several inputs. Most of the time, it's not the bottleneck.

The real time sinks in an outage:

  1. Detection latency (5 min on us, 30 sec on them) — the headline number
  2. Alert routing latency (10–60 sec on email, instant on chat) — varies by channel
  3. Human acknowledgment latency (2 min – 8 hours) — depends on time of day and how busy the recipient is
  4. Diagnosis latency (5 min – 2 hours) — depends on how complex the system is
  5. Mitigation latency (1 min – days) — restart, rollback, escalate to vendor, etc.

For a typical solo-founder operation, the human acknowledgment latency dwarfs everything else. If you're getting that "site is down" email at 03:17 on a Sunday, the difference between detection at 03:12 vs 03:16 is not the constraint.

For an enterprise team with 24/7 NOC coverage, sub-minute detection is genuinely useful. Use a tool built for them. We're not it.

What we optimized for instead

Since we made check frequency simpler, we had budget to spend on things that matter more for our users:

  • Anomaly detection that's actually useful. Status flips are obvious. Subtler signals — SSL expiring in 5 days, redirect chain changing after a misconfigured deploy, server header drifting from nginx/1.21 to nginx/1.18 after someone rolled back the wrong way — these get surfaced as anomalies. Detection of these doesn't depend on check frequency; it depends on having enough signal history to compare against.
  • Plain-English explanations. When an anomaly fires, Pro tier users get a 2-3 sentence explanation of what happened and what it might mean, written by Claude. Faster checks would just mean more identical alerts; better context per alert is more valuable.
  • Quiet by default. We email on actual issues only. No daily summaries, no per-check confirmations, no "your site is still up" pings. The weekly digest covers everything that's not an issue. This works because our checks are 5 minutes — at 30-second cadence, the false-positive rate would force us to send "everything looks fine actually" follow-ups, which trains people to ignore the alerts.

The honest version

5-minute checks are right for solo founders, freelancers, agencies running client sites, and small ops teams. They're wrong for high-frequency-trading firms, payment processors, and 24/7 enterprise SaaS.

If you're in the second group, you should be on a different tool — and you'd be on a different kind of tool, with multi-region consensus, paging escalation, and a paid SRE rotation.

If you're in the first group, the 4-minute headline number isn't actually the thing you're buying. You're buying "I get told when my site is down so I can fix it before too many customers notice." 5-minute checks deliver that. So would 1-minute checks. The difference between them is invisible in your operational reality, and pretending otherwise is the kind of feature theater that exists to justify higher prices.

We picked the boring number on purpose. If a 4-minute headline difference disqualifies us, you're not the customer we built this for, and that's fine.