this post was submitted on 07 Nov 2023
209 points (98.6% liked)

Technology

58744 readers
4416 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] autotldr@lemmings.world 4 points 11 months ago

This is the best summary I could come up with:


Cloudflare's main network and security duties continued as normal throughout the outage, even if customers couldn't make changes to their services at times, Prince said.

We're told by Prince that "counter to best practices, Flexential did not inform Cloudflare that they had failed over to generator power," and so didn't have a heads up that maybe things were potentially about to go south and that contingencies should be in place.

Whatever the reason, a little less than three hours later at 1140 UTC (0340 local time), a PGE step-down transformer at the datacenter – thought to be connected to the second 12.47kV utility line – experienced a ground fault.

By that, he means at 1144 UTC - four minutes after the transformer ground fault – Cloudflare's network routers in PDX-04, which connected the cloud giant's servers to the rest of the world, lost power and dropped offline, like everything else in the building.

At this point, you'd hope the servers in the other two datacenters in the Oregon trio would automatically pick up the slack, and keep critical services running in the absence of PDX-04, and that was what Cloudflare said it had designed its infrastructure to do.

The control plane services were able to return online, allowing customers to intermittently make changes, and were fully restored about four hours later from the failover, according to the cloud outfit.


The original article contains 1,302 words, the summary contains 228 words. Saved 82%. I'm a bot and I'm open source!