0a06319d0c
... already having second thoughts about loadAddress
511 lines
18 KiB
Text
511 lines
18 KiB
Text
Thu Sep 22 00:12:42 BST 2022
|
||
|
||
Making quite reasonable progress, though only running under emulation.
|
||
Since almost everything so far has been a recap of nixwrt, that's to
|
||
be expected.
|
||
|
||
The example config starts some services at boot, or at least attempts
|
||
to. Next we shoud
|
||
|
||
- add some network config to run-qemu
|
||
- implement udhcp and odhcp properly to write outputs
|
||
and create resolv.conf and all that
|
||
- write some kind of test so we can refactor the crap
|
||
- not let the tests write random junk everywhere
|
||
|
||
Thu Sep 22 12:46:36 BST 2022
|
||
|
||
We can store outputs in the s6 scan directory, it seems:
|
||
|
||
> There is, however, a guarantee that s6-supervise will never touch subdirectories named data or env. So if you need to store user information in the service directory with the guarantee that it will never be mistaken for a configuration file, no matter the version of s6, you should store that information in the data or env subdirectories of the service directory.
|
||
|
||
https://skarnet.org/software/s6/servicedir.html
|
||
|
||
> process 'store/pj0b27l5728cypa5mmagz0q8ibzpik0h-execline-mips-unknown-linux-musl-2.9.0.1-bin/bin/execlineb' started with executable stack
|
||
|
||
https://skarnet.org/lists/skaware/1550.html
|
||
|
||
|
||
Thu Sep 22 16:14:49 BST 2022
|
||
|
||
what network peers do we want to model for testing?
|
||
|
||
- wan: pppoe
|
||
- wan: ip over ethernet, w/ dhcp service provided
|
||
- wan: l2tp over (ip over ethernet, w/ dhcp service provided)
|
||
- lan: something with a dhcp client
|
||
|
||
https://accel-ppp.readthedocs.io/en/latest/ could use this for testing
|
||
pppoe and l2tp?
|
||
|
||
|
||
Thu Sep 22 22:57:47 BST 2022
|
||
|
||
To build a nixos vm with accel-ppp installed (not yet configured)
|
||
|
||
nix-build '<nixpkgs/nixos>' -A vm -I nixos-config=./tests/ppp-server-configuration.nix -o ppp-server
|
||
QEMU_OPTS="-display none -serial mon:stdio -nographic" ./ppp-server/bin/run-nixos-vm
|
||
|
||
To test it's configured I thought I'd run it against an OpenWrt qemu
|
||
install, so, fun with qemu networking ensues. This config in ../openwrt-qemu
|
||
is using two multicast socket networks -
|
||
|
||
nix-shell -p qemu --run "./run.sh ./openwrt-22.03.0-x86-64-generic-kernel.bin openwrt-22.03.0-x86-64-generic-ext4-rootfs.img "
|
||
|
||
so hopefully we can spin up other VMs connected either to its lan or
|
||
its wan: *however* we do first need to configure its wan to use pppoe
|
||
|
||
uci set network.wan=interface
|
||
uci set network.wan.device='eth1'
|
||
uci set network.wan.proto='pppoe'
|
||
uci set network.wan.username='db123@a.1'
|
||
uci set network.wan.password='NotReallyTheSecret'
|
||
|
||
(it's ext4 so this will probably stick)
|
||
|
||
|
||
Fri Sep 23 10:27:22 BST 2022
|
||
|
||
* mcast=230.0.0.1:1234 : access (interconnect between router and isp)
|
||
* mcast=230.0.0.1:1235 : lan
|
||
* mcast=230.0.0.1:1236 : world (the internet)
|
||
|
||
|
||
Sun Sep 25 20:56:28 BST 2022
|
||
|
||
TODO - bugs, missing bits, other infelicities as they occur to me:
|
||
|
||
DONE 1) shutdown doesn't work as its using the busybox one not s6.
|
||
|
||
2) perhaps we shouldn't have process-based services like dhcp, ppp
|
||
implement "address provider interface" - instead have a separate
|
||
service for interface address that depends on the service and uses its
|
||
output
|
||
|
||
* ppp is not like dhcp because dhcp finds addresses for an existing
|
||
interface but ppp makes a new one
|
||
|
||
3) when I killed ppp it restarted, but I don't think it reran
|
||
defaultroute which is supposed to depend on it. (Might be important
|
||
e.g. if we'd been assigned a different IP address). Investigate
|
||
semantics of s6-rc service dependencies
|
||
|
||
DONE 4) make the pppoe test run unattended
|
||
|
||
5) write a test for udhcp
|
||
|
||
6) squashfs size is ~ 14MB for a configuration with not much in it,
|
||
look for obvious wastes of space
|
||
|
||
7) some of the pppoe config should be moved into a ppp service
|
||
|
||
8) some of configuration.nix (e.g. defining routes) should be moved into
|
||
tools
|
||
|
||
DONE 9) split tools up instead of having it all one file
|
||
|
||
10) is it OK to depend on squashfs pseudofiles if we might want to
|
||
switch to ubifs? will there always be a squashfs underneath? might
|
||
we want to change the pseudofiles in an overlay?
|
||
|
||
11) haven't done (overlayfs) overlays at all
|
||
|
||
12) overlay.nix needs splitting up
|
||
|
||
13) upgrade ppp to something with an ipv6-up-script option
|
||
|
||
14) add ipv6 support generally
|
||
|
||
15) "ip address add" seems to magically recognise v4 vs v6 but
|
||
is that specified or fluke?
|
||
|
||
16) tighten up the module specs. (DONE) services.foo should be a s6-rc
|
||
service, (DONE) kernel config should be checked in some way
|
||
|
||
DONE 17) rename nixwrt references in kernel builder
|
||
|
||
18) maybe stop suffixing all the service names with .service
|
||
|
||
19) syslogd - use busybox or s6?
|
||
|
||
chat -s -S ogin:--ogin: root / "ip address show dev ppp0 | grep ppp0" 192.168.100.1 "/nix/store/*-s6-linux-init-*/bin/s6-linux-init-hpr -p"
|
||
|
||
|
||
Working towards a general goal of having a derivation we can
|
||
usefully run `nix path-info` on - or some other tool that will
|
||
tell us what's making the images big. The squashfs doesn't
|
||
have this information.
|
||
|
||
Towards that end (really? can't remember how ...) what would be a
|
||
way for packages to declare "I want to add files to /etc"? Is that
|
||
even a good idea?
|
||
|
||
Thinking we should turn s6-init-files back into a real derivation.
|
||
|
||
Tue Sep 27 00:31:45 BST 2022
|
||
|
||
> Thinking we should turn s6-init-files back into a real derivation.
|
||
|
||
This turns out to be Not That Simple, because it contains weird shit
|
||
(sticky bits and fifos).
|
||
|
||
Tue Sep 27 09:50:44 BST 2022
|
||
|
||
* allow modules to register activation scripts that are run on the
|
||
root filesystem once all packages are installed
|
||
|
||
- do they run on build or on host? if we're upgrading in place
|
||
how do we ship filesystem changes to the host?
|
||
|
||
or:
|
||
|
||
* allow modules to declare environment.*, use pseudofile on build and
|
||
create real files on host. will need to keep the implementation on
|
||
host faily simple because restricted environment
|
||
|
||
Tue Sep 27 16:14:18 BST 2022
|
||
|
||
TODO list is getting both longer and shorter, though longer on
|
||
average.
|
||
|
||
2) perhaps we shouldn't use process-based services like [ou]dhcp as
|
||
queryable endpoint for interface addresses (e.g. when adding routes).
|
||
Instead have a separate service for interface address that depends on
|
||
the *dhcp and uses its output
|
||
|
||
3) when I killed ppp it restarted, but I don't think it reran
|
||
defaultroute which is supposed to depend on it. (Might be important
|
||
e.g. if we'd been assigned a different IP address). Investigate
|
||
semantics of s6-rc service dependencies
|
||
|
||
4) figure out a nice way to fit ppp into this model as it actually
|
||
creates the interface instead of using an existing unconfigured one
|
||
|
||
5) write a test for udhcp
|
||
|
||
7) some of the pppoe config should be moved into a ppp service
|
||
|
||
11) haven't done (overlayfs) overlays at all
|
||
|
||
13) upgrade ppp to something with an ipv6-up-script option, move ppp and pppoe derivations into their own files
|
||
|
||
14) add ipv6 support generally
|
||
|
||
15) "ip address add" seems to magically recognise v4 vs v6 but
|
||
is that specified or fluke?
|
||
|
||
19) ship logs somehow to log collection system
|
||
|
||
21) dhcp, dns, hostap service for lan
|
||
|
||
22) support real hardware
|
||
|
||
Tue Sep 27 22:00:36 BST 2022
|
||
|
||
Found the cause of huge image size: rp-pppoe ships with scripts that
|
||
reference build-time packages, so we have x86-64 glibc in there
|
||
|
||
We don't need syslog just to accommodate ppp, there's an underdocumented
|
||
option for it to log to a file descriptor
|
||
|
||
Wed Sep 28 16:04:02 BST 2022
|
||
|
||
Based on https://unix.stackexchange.com/a/431953 if we can forge
|
||
ethernet packets we might be able to write tests for e.g. "is the vm
|
||
running a dhcp server"
|
||
|
||
Wed Sep 28 21:29:05 BST 2022
|
||
|
||
We can use Python "scapy" to generate dhcp request packets, and Python
|
||
'socket' model to send them encapsulated in UDP. Win
|
||
|
||
It's extremely janky python
|
||
|
||
Thu Sep 29 15:24:37 BST 2022
|
||
|
||
Two points to ponder
|
||
|
||
1) where service config depends on outputs of other services, we
|
||
do that rather ugly "$(cat ${output ....})" construct. Can we improve on
|
||
that? Maybe we could have some kind of tooling to read them as environment
|
||
variables ...
|
||
|
||
2) we have given no consideration yet to secrets. we want the secrets to
|
||
be not in the store; we want some way of refreshing them when they change
|
||
|
||
Sat Oct 1 14:24:21 BST 2022
|
||
|
||
The MAC80211_HWSIM kernel config creates virtual wlan[01] devices
|
||
which hostapd will work with, and a hwsim0 which we can use to monitor
|
||
(though not inject) trafic. Could we use this for wifi tests? How do
|
||
we make the guest hwsim0 visible to the host?
|
||
|
||
|
||
Sat Oct 1 18:41:31 BST 2022
|
||
|
||
virtual serial ports: I struggled with qemu for ages to get this to work.
|
||
You also need the unhelpfully named CONFIG_VIRTIO_CONSOLE option in
|
||
kconfig
|
||
|
||
QEMU_OPTIONS="-nodefaults -chardev socket,path=/tmp/wlan,server=on,wait=off,id=wlan -device virtio-serial-pci -device virtserialport,name=wlan,chardev=wlan"
|
||
|
||
Sun Oct 2 09:34:48 BST 2022
|
||
|
||
We could implement the secrets store as a service, then the secrets
|
||
are outputs.
|
||
|
||
Things we can do in qemu
|
||
|
||
1) make interface address service that depends on dhcp, instead of
|
||
being set by it directly
|
||
2) check out restart behaviour of dependent services when depended-on
|
||
service dies
|
||
3) pppd _creates_ an interface, work out how to fit it into this model
|
||
5) add bridge support for lan
|
||
8) upgrade ppp to something with an ipv6-up-script option, move ppp and pppoe derivations into their own files
|
||
9) get ipv6 address from pppoe
|
||
10) get ipv6 delegation from pppoe and add prefix to lan
|
||
11) support dhcp6 in dnsmasq, and advertise prefix on lan
|
||
12) firewalling and nat
|
||
- default deny or zero trust?
|
||
14) write secrets holder as a service with outputs
|
||
20) should we check that references to outputs actually correspond with
|
||
those provided by a service
|
||
|
||
Things we probably do on hardware
|
||
|
||
6) writable filesystem (ubifs?)
|
||
7) overlay with squashfs/ubifs - useful? think about workflows for
|
||
how this thing is installed
|
||
16) gl-ar750
|
||
17) mediatek device - gl-mt300 or whatever I have lying around
|
||
18) some kind of arm (banana pi router?)
|
||
19) should we give routeros a hardware ethernet and maybe an l2tp upstream,
|
||
then we could dogfood the hardware devices. we could run an l2tp service
|
||
at mythic-beasts, got a /48 there
|
||
|
||
|
||
|
||
https://skarnet.org/software/s6/s6-fghack.html looks like a handy thing
|
||
we hope we'll never have to use
|
||
|
||
Sun Oct 2 22:22:17 BST 2022
|
||
|
||
> make interface address service that depends on dhcp, instead of being set by it directly
|
||
|
||
We can do this for dhcp, but we can't do it for ppp. Running the ppp service
|
||
creates a ppp[012n] interface and assigns it an ipv4 address and there's not
|
||
a whole lot we can easily do to unbundle that.
|
||
|
||
So
|
||
|
||
- the ppp service needs to behave as if it were a "link" service
|
||
- either it *also* needs to behave as an address service, or we could
|
||
have an address service that subscribes to it and does nothing other than
|
||
translate output formats
|
||
|
||
Note regarding that second bullet: at the moment the static address
|
||
service has no outputs anyway!
|
||
|
||
|
||
Tue Oct 4 22:43:02 BST 2022
|
||
|
||
While trying to make the TFTP workflow not awful I seem to have written
|
||
a TFTP server.
|
||
|
||
|
||
Thu Oct 6 19:26:40 BST 2022
|
||
|
||
We have a booting kernel on gl-ar750, but we aren't at a point that it can
|
||
find a root filesystem
|
||
|
||
I'd *like* to be able to use the same delivery mechanism (kernel uimage
|
||
concatenated monolithic
|
||
|
||
|
||
Sat Oct 8 11:12:09 BST 2022
|
||
|
||
We have it booting on hardware, mounting root fs, running getty :-)
|
||
|
||
For NixWRT TFTP boots we used a single image with both kernel and squashfs, and
|
||
relied on CONFIG_MTD_SPLIT_FIRMWARE to identify where the boundary was and create
|
||
/dev/mdtn devices at the right offsets so that the kernel could find the
|
||
squashfs
|
||
|
||
For Liminix we're not going to do that.
|
||
|
||
* CONFIG_MTD_SPLIT_FIRMWARE is only available in OpenWrt patches
|
||
* it's an uncomfortable level of automagic just to save us doing two TFTPs
|
||
instea of one
|
||
* the generated image is anyway not the one we'd write to flash (has unneeded
|
||
PHRAM support)
|
||
* it means we need to memmap out enough ram for the whole image inc kernel when really
|
||
all we need to reserve is the rootfs bit
|
||
|
||
|
||
Sat Oct 8 11:23:08 BST 2022
|
||
|
||
"halt" and "reboot" don't work on gl-ar750
|
||
|
||
Sat Oct 8 13:10:00 BST 2022
|
||
|
||
Where do we go with this ar750?
|
||
|
||
- wired networking
|
||
- wifi
|
||
|
||
|
||
Sun Oct 9 09:57:35 BST 2022
|
||
|
||
We want to be able to package kernel modules as regular derivations, so that
|
||
they get added to the filesystem
|
||
|
||
This means they need access to kernel.modulesupport
|
||
|
||
This means kernel.modulesupport needs to be in pkgs too?
|
||
|
||
This is fine, probably, but we'd like to avoid closing over vmlinux because
|
||
there's no need for it to be in the filesystem
|
||
|
||
Mon Oct 10 22:57:23 BST 2022
|
||
|
||
The problem is that kernel kconfig options are manipulated in the
|
||
liminix modules, which means that data must be (transitively) available
|
||
to modules, so they can't be regular packages as they're tied so tightly
|
||
to the exact config. Unless we define a second overlay that references
|
||
the configuration object, but my head hurts when I start to think about that
|
||
so maybe not.
|
||
|
||
Tue Oct 11 00:00:13 BST 2022
|
||
|
||
Building ag71xx (ethernet driver) as a module doesn't work because
|
||
it references a symbol ath79_pll_base in the kernel that hasn't been
|
||
marked with EXPORT_SYMBOL.
|
||
|
||
We could forge an object file that "declares" it with a gross and disgusting hack like this
|
||
|
||
$ echo > empty # not actually "empty", objcopy complains about that
|
||
$ grep ath79_pll_base /nix/store/jcc114cd13xa8aa4mil35rlnmxnlmv09-vmlinux-mips-unknown-linux-musl-modulesupport/System.map
|
||
ffffffff807b2094 B ath79_pll_base
|
||
$ mips-unknown-linux-musl-objcopy -I binary -O elf32-big --add-section .bss=empty --add-symbol ath79_pll_base=.bss:0x807b2094 empty f.o
|
||
|
||
I don't claim this is a good idea, just an idea. Thought was that we would not
|
||
have to declare its type this way. Also it might not work with kaslr
|
||
https://stackoverflow.com/a/68903503
|
||
|
||
|
||
Backstory: why are we trying to build this as a module? because the
|
||
openwrt fork of it seems to be a bit more advanced than the mainline,
|
||
and I *suspect* that the mainline version doesn't work with our
|
||
openwrt-based device tree which ahs the mdio as a nested node inside
|
||
the ag71xx node - in mainline the driver seems to have all the mdio
|
||
stuff inline. So, could we build the openwrt driver without patching
|
||
the crap out of our kernel
|
||
|
||
Sun Oct 16 15:25:33 BST 2022
|
||
|
||
Executive decision: let's use the openwrt kernel (at least for
|
||
gl-ar750). Mainline kernel doesn’t have devicetree support for this
|
||
device or the SoC it’s based on, and the OpenWrt dts for it doesn’t
|
||
have the same "compatible"s, which makes me think that an indefinite
|
||
amount of patching will be necessary to make dts/modules for one of
|
||
them work with a kernel for the other
|
||
|
||
As a result: now we have eth0 appearing, but not eth1? Guessing we
|
||
need to add some kconfig for the switch
|
||
|
||
Mon Oct 17 21:23:37 BST 2022
|
||
|
||
we are spending ridiculous amounts of cpu/io time copying kernel source
|
||
trees from place to place, because we have kernel tree preparation
|
||
and actual building as two separate derivations.
|
||
|
||
I think the answer is to have a generic kernel build derivation
|
||
in the overlay, and then have the device overlays override it with
|
||
an additional phase to do openwrt patching or whatever else they
|
||
need to do.
|
||
|
||
Tue Oct 18 23:02:43 BST 2022
|
||
|
||
* previous TODO list is Aug 02, need to review
|
||
* dts is hardcoded to gl-ar750, that needs cleaning up
|
||
* figure out persistent addresses for ethernet
|
||
* fix halt/reboot
|
||
* "link" services have a "device" attribute, would much rather
|
||
have everything referenced using outputs than having two
|
||
different mechanisms for reading similar things
|
||
* Kconfig.local do we still need it?
|
||
* check all config instead of differentiating config/checkedConfig
|
||
|
||
Sun Feb 5 18:14:02 GMT 2023
|
||
|
||
We have resumed.
|
||
commit eb4efab6a215bf03cf5aab10d4ac909e83e9c148
|
||
Author: Daniel Barlow <dan@telent.net>
|
||
Date: Sat Jan 28 23:18:28 2023 +0000
|
||
|
||
|
||
* find out what works
|
||
* add that stuff to hydra
|
||
* fix the rest
|
||
* add that stuff to hydra
|
||
* convert to flake
|
||
* check if routeros can be run interactively
|
||
* some per-device docs in a form that can be transcluded for website
|
||
|
||
|
||
ci builds
|
||
|
||
* each of the tests has hardcoded device/config/etc
|
||
* build an "empty" configuration for each target device
|
||
* build an unstable configuration for qemu
|
||
|
||
|
||
Wed Feb 8 16:52:22 GMT 2023
|
||
|
||
We have hydra builds for all the previously-working devices, though we
|
||
don't yet know if any of those builds actually boots or does anything
|
||
useful.
|
||
|
||
Would be nice to clean up the run-qemu and connect-qemu scripts
|
||
and put them in the buildEnv [DONE]
|
||
|
||
Some thought needed about how to hook up the gl-ar750 to the internets,
|
||
ideally in a way that mirrors typical real uses. AAISP have an L2TP
|
||
service, but I would prefer to use pppoe on the device, so how to
|
||
translate one to t'other on an intermediary/gateway machine?
|
||
https://www.rfc-archive.org/getrfc.php?rfc=3817#gsc.tab=0 exists
|
||
as an RFC but I can't find anything that actually implements it
|
||
|
||
Actual Documentation (e.g. user and developer manuals) should live in
|
||
the liminix repo so it corresponds with the code, and can be rsynced
|
||
from there to the web site, maybe with a deploy hook or something.
|
||
Haven't decided what a good doc format is yet
|
||
|
||
If we create a flake for Hydra to run on, that _more or less_ means we
|
||
don't have any manual hydra jobset configuration to document.
|
||
|
||
There are still some tests that need adding to CI
|
||
|
||
Should the per-device config be a module not an overlay? Given that
|
||
half of what's in it is kernel config (a module could set this)
|
||
and the rest is source tarball download specs (needs nixpkgs,
|
||
a module has this and could set it too) I wonder why it isn't already
|
||
|
||
Can we make Hydra report output sizes so we can plot closure size
|
||
trends and see if it all goes awful?
|
||
|
||
Thu Feb 9 08:14:39 GMT 2023
|
||
|
||
For better developer experience, I am thinking that either (1)
|
||
swap tasks 2 and 3 (writable filesystem before module system)
|
||
or (2) add NBD support so I can iterate on a real device without
|
||
full rebuilds every time
|
||
|
||
|
||
Fri Feb 10 06:18:25 PM GMT 2023
|
||
|
||
did the overlay->module thing
|
||
|
||
Need to fix all the configuration around PHRAM, I can't see how it
|
||
would ever ork
|