If you’re familiar with crontab in Linux, there’s a good chance you’re
equally familiar with the infamous cron job silent failures. Many of us
sysadmins and developers have experienced these failures without knowing
before it’s too late. Automated backups and sending monthly emails aren’t
always as automated (or on time) as we tend to think. Herein lies the
problem.
My cron jobs send me an email when they run…isn’t that enough?
That can be true, but is there REALLY any value in knowing that they ran?
Isn’t that why you created the cron job in the first place–so that it does
its job? Sure, receiving an email of cron output after it runs is great.
However, the value lies in knowing when your cron jobs FAIL to run (or are
delayed). Then, you can investigate and fix the problem before it’s too
late. Not convinced? Consider this example from a long time Dead Man's Snitch
user, Kareem Mayan, co-founder of SocialWOD.com.
A little background:
“At SocialWod.com we do workout tracking for gyms. When a new workout is
emailed to us from a customer (in the form of a photo of a whiteboard,
which has the workout and results), we put the data online. Once it’s
online, we email that gym’s clients telling them new workout results have
been posted.
Great. Where’s the problem?
"When Delayed Job failed silently, we wouldn’t know until me or my
co-founder was prompted to look based on seeing something funny, e.g.
seeing a Stripe email about a new customer signup but NOT seeing the
automated welcome email to the new customer (sent by our system… which was
waiting in the database, ready for Delayed Job to pick it up, which would
never happen because that process had died).”
"The result would often be several days of emails (THOUSANDS of emails)
queued up until one of us manually restarted Delayed Job. This sucked
because customers would either get a ton of emails in once, and some would
be days old, or we would delete those emails before they got sent. This
also sucked because customers would never get notifications about their
posted results.”
How did Dead Man’s Snitch help?
"Using Dead Man’s Snitch made that problem go away. Now, if Delayed Job
dies, Dead Man's Snitch never gets pinged, and we get an email as soon as
that happens. At most we’ll go five minutes - not days - before knowing
that we need to kick Delayed Job into action again.”
If you can relate and want the peace of mind in knowing right when your
cron jobs fail, give Dead Man’s Snitch a try and sign up for free.
After all, your first snitch is on us. However, if you can’t relate but you
actually made it to the end, I applaud you. If this topic doesn’t relate to
you, there’s a good chance your computer friends, IT department, or website
managers would. Do them a favor and pass this on.
Happy Snitching!