add new blogpost about my site being down

2025-11-08 07:30:04 -05:00 · 2025-11-08 07:30:04 -05:00 · 33a7bfdee5
commit 33a7bfdee5
parent c0fefae605
4 changed files with 244 additions and 0 deletions
--- a/blog/whoops/index.html
+++ b/blog/whoops/index.html
@ -0,0 +1,118 @@
+<!DOCTYPE HTML>
+<html lang="en">
+  <title>'whoops'</title>
+  <meta name="date" content="2025/11/07">
+  <link rel="stylesheet" href="/style.css">
+  <style>
+    img { width: 100%; }
+    red {
+      color: var(--cyan);
+      font-size: 0.5em;
+    }
+    green {
+      color: var(--green);
+      font-size: 0.5em;
+    }
+  </style>
+  <body id="blog">
+    <h1>whoops</h1>
+    <h2>Preface</h2>
+    <p>
+      Debian sucks. After attempting to update from debian 10 -> 11 -> 12 -> 13
+      so that I could have post quant encryption and get rid of the annoying
+      ssh message <a href="https://www.openssh.org/pq.html">found here</a> I
+      bricked my system. I've had enough I'm switching to NixOS. Over the past
+      week I've been working on setting up a flake for all my servers and this
+      is the staw that's broken the camel's back. If you're reading this then
+      obviously I got it working and am no longer suffering a mild heart attack
+      \o/.
+    </p>
+    <p>
+      I think I'm gonna start making backups. Which is kinda a no brainer but
+      in my defense I set this server up prior to understanding quite how
+      fickle debian truly is. I may end up posting again soon to share my
+      backup solution (if I get it working).
+    </p>
+    <p>
+      Everything following the preface was written live while I was working on
+      my server. I'm aware that I very likely could've recovered the system, but
+      I chose not to.
+    </p>
+    <h2>Initial Incident <red>0 minutes in the red</red></h2>
+    <p>
+      I was in the middle of updating my system from debian 12 -> 13. I had just
+      finished pulling the packages from the mirror, and was halfway through
+      installing them when I got booted from the ssh connection. To me this
+      looked like the system just restarted and I went to ssh back in. Nothing.
+      Okay that's weird, I went to over to my vps' website, logged in, and
+      opened the vnc. Fuck.
+    </p>
+    <img src="/blog/whoops/pics/panic.png" alt="kernel panic">
+    <p>
+      So maybe I should've been taking more care to update the system properly
+      whatever, this was bound to happen at least once in my life.
+    </p>
+    <h2>Recoverability <red>5 minutes in the red</red></h2>
+    <p>
+      This isn't the first time I've managed to make a drive unbootable but it
+      is the first time I've done it to a remote server. I knew the next steps
+      I had to take so I went into the vultr dashboard and mounted a live iso
+      to see if I had completely trashed the drive. Luckily, I had not and it
+      seems as though it was just the boot sector.
+    </p>
+    <h2>Backing everything up <red>1 hour in the red</red></h2>
+    <p>
+      Since the actual data on the drive was all good I started compressing all
+      the important files and noting down their locations. The reason for
+      noting down their locations is so that I could come in later and scp them
+      out.
+    </p>
+    <h2>Configuring the new system <red>1.5 hours in the red</red></h2>
+    <p>
+      Because I use my server to host my mail I really need to have as little
+      downtime as possible (although I'm fine with missing some mail overnight).
+    </p>
+    <h2>Deploying the new system <red>4 hours in the red</red></h2>
+    <p>
+      Just like in the recoverability section I loaded up a live iso, but this
+      time it was NixOS. I decided to use the gui installer because it was
+      1:30am, and I know how to pick my battles. After starting the install
+      I had to wait around 30 minutes just for it to fail saying that the
+      system ran out of memory. To those who saw this coming: cudos to you.
+    </p>
+    <p>
+      Take 2: After loading the non-graphical live iso I popped open the docs
+      and started manually installing NixOS.
+    </p>
+    <h2>A working(ish) mailbox <red>7 hours in the red</red></h2>
+    <p>
+      I've gotten my mail working, and all that's left is to get my website
+      back up.
+    </p>
+    <h2>Git server is up <red>7.5 hours in the red</red></h2>
+    <p>
+      I've gotten my git server up using forgejo and have all of my git repos
+      back up.
+    </p>
+    <h2>Main site technically up <red>9.5 hours in the red</red></h2>
+    <p>
+      I've now gotten my main website to return a 502 using nginx.
+    </p>
+    <h2>Main site back up <green>0 minutes in the green</green></h2>
+    <p>
+      Finally working! Approximately 10 hours after going down I've managed
+      to get my main site back up again. That's less time than AWS was down
+      for ;).
+    </p>
+    <p>
+      If you couldn't tell from the ever shortening sections, I got tired and
+      multitasking was becoming quite difficult. I will sort out the rest of my
+      server later including the voidpkgs repo (which I'm sure nobody uses).
+      Additionally I am going to push my NixOS config to my git server so that
+      people can see what my infra looks like and so that I can manage my own
+      server easier. Just in case anyone was wondering what my mini outage
+      looked like, here's the health status for the last 24 hours:
+    </p>
+    <img src="/blog/whoops/pics/health.png" alt="health stats">
+  </body>
+</html>
--- a/blog/whoops/pics/health.png
+++ b/blog/whoops/pics/health.png
--- a/blog/whoops/pics/panic.png
+++ b/blog/whoops/pics/panic.png