Outage since last night was my fault, derp


#1

If you encountered a 502 error getting to the forums over the past 12-ish hours, that was me.

I had the hosting company doing a BIOS upgrade on the web server last night, which required a reboot, and it appears that someone hit the CoG forum URL at exactly the right instance at the tail end of the reboot, just before the Discourse docker container was fully back online. The 502 bad gateway response was cached, and Varnish has been serving that since last night. It would have eventually fixed itself as the cache expired, but I’ve killed & restarted Varnish and that appears to have fixed the problem.

Thanks to @nabiki for the heads-up!

tl;dr - Caching. It’s always caching.


#2

Oops, that probably was me

Muhuhahahaha

/runs away


#3

Who do I contact for a refund?


#4

Wait you pay for BigDino??


#5

I’m on the premium plan.


#6

Premium plan eh?

You have Lee’s personal mobile number? :rofl:


#7

Well, shicks, that may have been me… I was uploading pics and it was giving me weird error messages but I figured I was just doing it wrong so continued on my way. Then the forums were gone. Sorry!


#8

It’s no problem! Didn’t mean to make it sound like someone was to blame or anything like that—it’s just a consequence of upgrading in the middle of production rather than doing things the “right” way.

The “right” way would be to pick an off-hours outage window, alert everybody about the outage at intervals starting 48 hrs prior to the picked time, switch the site over to a maintenance/outage “we’ll be right back” page, do the upgrade, and then cut back live.

1-8xraf6eyaXh-myNXOXkqLA


#9

I am Spartacus


#10

:joy: lol hehe

But I’m also guilty of doing such things like rebooting a DC during the day…


#11

Heyyyyyyyyyyyyy I did it again. Apologies for anyone who saw a weird-ass error for a bit this evening.

The good news, though, is that now everything in the entire web stack now uses unix sockets instead of loopback ports! Which…okay, yeah, that sounds a lot less impressive because it’s just kind of an under-the-hood change but it was neat figuring out how to make it work!


#12

I’d be worried if you weren’t tinkering under the hood around here…


#13

All of today I’ve noticed that CoG is slow to load, loads a blank page, then works normally after refreshing. Anyone else seeing that?


#14

Works fine on mobile here in Ookville


#15

Yes, I left something out of the config and it was causing a lot of really weird unintended redirects. Should be OK as of yesterday evening. @Nabiki, if you’re still seeing oddness, please do let me know.


#16

It loaded fine just now. It didn’t this morning. I’ll let you know if I experience that issue again. :slight_smile:


#17

Yep. Still happening. Going to see if it happens at work or just here at home and update there.


#18

I get a lot of load errors with the site when I’m at home, but it seems fine at work.


#19

Im seeing it too.


#20

It’s fine at work. I’ll clear my cache at home.