How to Monitor Cron Jobs: The Complete Guide
Cron jobs are the backbone of every server. Backups, cleanup scripts, certificate renewals, database dumps, report generation — all scheduled, all running in the background, all assumed to be working.
Until they're not.
The worst part about a failed cron job isn't the failure itself. It's the silence. Cron doesn't page you when a job stops running. It doesn't send a Slack message. If you're lucky, there's a log entry buried in /var/log/syslog. If you're not, you find out three weeks later when your backups are gone and you need them.
This guide covers everything you need to monitor cron jobs properly: the patterns, the tools, and the specific setup steps.
Why cron jobs fail silently
Cron has no built-in concept of "this job should have run but didn't." It fires jobs on schedule. If the job exits non-zero, cron doesn't retry. If the machine was off during the scheduled time, cron doesn't catch up. If someone deletes the crontab entry, there's no warning.
Common failure modes:
- Script error — the job runs but fails partway through
- Permission change — a file or directory becomes inaccessible
- Dependency missing — a binary gets removed during an OS upgrade
- Disk full — the job can't write output
- Machine reboot — the scheduled time passes while the server is down
- Crontab overwritten — another deploy or config management run wipes the entry
- PATH issues — cron's environment is minimal, missing paths your shell has
All of these produce the same symptom: nothing. No output, no alert, no indication anything is wrong.
The dead man's switch pattern
The most reliable way to monitor cron jobs is to flip the problem around. Instead of watching for failures, expect a success signal and alert when it doesn't arrive.
This is called a dead man's switch (or heartbeat monitoring):
- Your cron job runs normally
- After completing successfully, it sends an HTTP ping to a monitoring endpoint
- The monitor knows your job's schedule (e.g., "every day at 3 AM")
- If no ping arrives within the expected window, the monitor triggers an alert
# Your crontab entry
0 3 * * * /usr/local/bin/backup.sh && curl -fsS https://monitor.example.com/ping/nightly-backup
The && is critical — curl only runs if backup.sh exits 0. A failed backup doesn't send a ping, and the monitor alerts you.
This catches every failure mode listed above. Script errors, permission issues, missing binaries, disk full, machine down, deleted crontab — all of them result in the same thing: no ping arrives, and you get alerted.
What to look for in a cron monitor
Not all monitoring tools handle cron well. General-purpose uptime monitors (Uptime Robot, Pingdom) check that a server is responding. Cron monitoring checks that a job ran on schedule. Different problem.
A good cron monitor needs:
- Cron expression parsing — it should understand
0 3 * * *means "expect a ping daily around 3 AM" with a configurable grace period - Per-job tracking — separate checks for each job, not one global health endpoint
- Multiple alert channels — email, webhook, Slack, PagerDuty
- Failure capture — ability to receive and store job output on failure for debugging
- Low overhead — adding monitoring shouldn't slow down your jobs
Option 1: Self-hosted monitoring
If you're already running your own infrastructure, a self-hosted cron monitor keeps everything under your control. No third-party dependency, no usage limits, no monthly fee.
cronguard
cronguard is a self-hosted cron monitor built for this exact use case. Single Go binary, no dependencies, runs on anything.
Setup:
# Download and run
./cronguard
# Open http://localhost:8099
Create a check with a name and schedule, then add the ping URL to your cron job:
# Basic: ping after success
0 3 * * * /usr/local/bin/backup.sh && curl -fsS http://localhost:8099/ping/nightly-backup
# With output capture: pipe stdout/stderr to the ping
0 3 * * * /usr/local/bin/backup.sh 2>&1 | curl -fsS -d @- http://localhost:8099/ping/nightly-backup
# Report failures explicitly
0 3 * * * /usr/local/bin/backup.sh || curl -fsS -X POST http://localhost:8099/ping/nightly-backup/fail
Add a webhook URL or email as the alert destination. When a job misses its window, you get notified.
Docker Compose setup for homelabs:
services:
cronguard:
image: narrowcastdev/cronguard:latest
restart: unless-stopped
ports:
- "127.0.0.1:8099:8099"
volumes:
- cronguard-data:/data
env_file:
- .env
volumes:
cronguard-data:
Healthchecks.io (self-hosted)
Healthchecks.io is open source and can be self-hosted. It's a Django app, so it requires Python, Postgres, and more operational overhead than a single binary. But it has a polished web UI, integrations with dozens of alert services, and a strong community.
The trade-off: more features, more infrastructure to maintain.
Option 2: Hosted services
If you don't want to run your own monitoring infrastructure:
| Service | Free tier | Paid from | Notes |
|---|---|---|---|
| Healthchecks.io | 20 checks | $20/mo | Open source, self-hostable |
| Cronitor | 5 monitors | $24/mo | Mature, good integrations |
| Better Stack | 5 monitors | $24/mo | Formerly Better Uptime |
| Dead Man's Snitch | 1 snitch | $5/mo | Simple, focused |
All of these work the same way: create a check, get a ping URL, add it to your cron job.
Setting up alerts properly
The monitoring tool is only as good as its alert pipeline. A few things to get right:
Grace periods
Cron jobs don't run at exactly the scheduled second. System load, other jobs, and clock drift all introduce variance. Set a grace period of 5-15 minutes so you don't get false alerts from normal jitter.
Schedule: 0 3 * * * (daily at 3:00 AM)
Grace period: 15 min (alert if no ping by 3:15 AM)
Alert fatigue
If you monitor 30 cron jobs and three of them are flaky, you'll start ignoring alerts. Fix or remove flaky jobs before adding monitoring. Every alert should mean "something is actually wrong."
Escalation
Set up two tiers:
- First alert — email or Slack message. "Your nightly backup didn't run."
- Repeated failure — after 2-3 missed pings, escalate to SMS or PagerDuty.
A single missed ping might be a transient network issue. Three in a row is a real problem.
Monitoring wrapper script
Instead of adding curl to every crontab entry, use a wrapper:
#!/bin/bash
# /usr/local/bin/cronwrap
# Usage: cronwrap <check-slug> <command...>
MONITOR_URL="${CRONGUARD_URL:-http://localhost:8099}"
SLUG="$1"
shift
OUTPUT=$("$@" 2>&1)
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "$OUTPUT" | curl -fsS -d @- "${MONITOR_URL}/ping/${SLUG}" > /dev/null
else
echo "$OUTPUT" | curl -fsS -d @- "${MONITOR_URL}/ping/${SLUG}/fail" > /dev/null
fi
exit $EXIT_CODE
Then your crontab stays clean:
0 3 * * * cronwrap nightly-backup /usr/local/bin/backup.sh
0 * * * * cronwrap hourly-cleanup /usr/local/bin/cleanup.sh
0 0 1 * * cronwrap monthly-report /usr/local/bin/report.sh
This captures output on both success and failure, reports the correct status, and keeps the monitoring URL out of individual crontab entries.
Monitoring cron in Docker and Kubernetes
Docker
If your cron jobs run inside Docker containers, the ping needs to reach the monitor. Use Docker networking:
services:
cronguard:
image: narrowcastdev/cronguard:latest
ports:
- "127.0.0.1:8099:8099"
backup:
image: your-backup-image
command: sh -c '/backup.sh && curl -fsS http://cronguard:8099/ping/backup'
depends_on:
- cronguard
Services on the same Docker network can reach each other by container name.
Kubernetes CronJobs
Kubernetes CronJobs have their own failure modes: pod scheduling delays, image pull failures, node pressure evictions. The dead man's switch pattern works the same way:
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
spec:
schedule: "0 3 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: your-backup-image
command:
- sh
- -c
- |
/backup.sh && \
curl -fsS http://cronguard.monitoring:8099/ping/nightly-backup
restartPolicy: Never
Checklist
Before you close this tab:
- List every cron job running on your servers (
crontab -l, check/etc/cron.d/,/etc/cron.daily/, etc.) - Set up a cron monitor (self-hosted or SaaS)
- Add ping URLs to each job with
&&(only ping on success) - Set grace periods appropriate to each job's expected runtime
- Configure at least two alert channels (email + webhook or Slack)
- Test the alert pipeline — pause a job and verify you get notified
- Add monitoring to your new-job checklist so future jobs are covered from day one
Every server has cron jobs. Most of them aren't monitored. The gap between "my backup script runs every night" and "I know my backup script ran last night" is one HTTP ping.
→ cronguard — self-hosted cron monitoring. Single binary. Free and open source.