Thus starts the most embarrassing post-mortem I’ve ever written. The EA Forum went down for 5 minutes today. My sincere apologies to anyone who’s Forum activity was interrupted. I was first alerted by Pingdom, which I am very glad we set up. I immediately knew what was wrong. I had just hit “Stop” on the (long unused and just archived) CEA Staff Forum, which we built as a test of the technology. Except I actually hit stop on the EA Forum itself. I turned it back on and it took a long minute or two, but was soon back up. ... Lessons learned: * I’ve seen sites that, after pressing the big red button that says “Delete”, makes you enter the name of the service / repository / etc. you want to delete. I like those, but did not think of porting it to sites without that feature. I think I should install a TAP that whenever I hit a big red button, I confirm the name of the service I am stopping. * The speed of the fix leaned heavily on the fact that Pingdom was set up. But it doesn’t catch everything. In case it doesn’t catch something, I just changed it so that anyone can email me with “urgent” in the subject line and I will get notified on my phone, even if it is on silent. My email is jp at organizationwebsite.
Thus starts the most embarrassing post-mortem I’ve ever written.
The EA Forum went down for 5 minutes today. My sincere apologies to anyone who’s Forum activity was interrupted.
I was first alerted by Pingdom, which I am very glad we set up. I immediately knew what was wrong. I had just hit “Stop” on the (long unused and just archived) CEA Staff Forum, which we built as a test of the technology. Except I actually hit stop on the EA Forum itself. I turned it back on and it took a long minute or two, but was soon back up.
...
Lessons learned:
* I’ve seen sites that, after pressing the big red button that says “Delete”, makes you enter the name of the service / repository / etc. you want to delete. I like those, but did not think of porting it to sites without that feature. I think I should install a TAP that whenever I hit a big red button, I confirm the name of the service I am stopping.
* The speed of the fix leaned heavily on the fact that Pingdom was set up. But it doesn’t catch everything. In case it doesn’t catch something, I just changed it so that anyone can email me with “urgent” in the subject line and I will get notified on my phone, even if it is on silent. My email is jp at organizationwebsite.