feat(wpcarro/ava): Support earlyoom

Strange start to my Monday where I spent ~2h debugging my hanging
NixOS. Strangely I'm not sure I made any changes to my configuration to trigger
this, and I was finding this hard to reproduce:
- graphical X sessions hung (once when opening Chrome)
- TTYs hung (during `nix-build` and `rebuild-system`)

Per kn's recommendations whenever a system is hanging, see if it's reachable
over the network (e.g. SSH). Since I didn't have my laptop, I downloaded Termius
on my iPhone, which I used to mosh into ava, which is a surprisingly nice UX.

I suspect my machine (with only 8GB of RAM) was OOMing, but I'm not
certain. Thanks to grfn I installed `earlyoom`. For more commentary, check-out
Profpatsch's blog post about this: https://profpatsch.de/notes/preventing-oom

What went well:
- Thankfully I installed a Matrix client on my iPhone last week, which allowed
  me to troubleshoot with the #tvl folks

AIs:
- I'd like some instrumentation like Prometheus, Loki (`journald`, `dmesg`), so
  that I can accumulate troubleshooting information that isn't destroyed when I
  reboot my machine (which I did 1/2-dozen times today).
- Consider adding `git` metadata to `system.nixos.label` to get more useful
  information in a GRUB/EFI context.

More unknowns:
- Why can't I switch back to EFI (from GRUB) for my bootloader?

Change-Id: Ie2a5a15f5c0ead346d50e331fa2937f8f3453960
Reviewed-on: https://cl.tvl.fyi/c/depot/+/5625
Tested-by: BuildkiteCI
Reviewed-by: wpcarro <wpcarro@gmail.com>
Autosubmit: wpcarro <wpcarro@gmail.com>
This commit is contained in:
William Carroll 2022-05-16 12:05:19 -07:00 committed by clbot
parent c16a18a718
commit d100c1f49f

View file

@ -42,6 +42,10 @@ in
};
services = wpcarro.common.services // {
# Check the amount of available memory and free swap a few times per second
# and kill the largest process if both are below 10%.
earlyoom.enable = true;
tailscale.enable = true;
openssh.enable = true;