Hacker News

stereo-highway
Diskless Linux boot using ZFS, iSCSI and PXE aniket.foo

verytrivial28 minutes ago

I know it was just a convenient pretext for a learning journey, but do not come away from this thinking llama.cpp needs to be compiled on Windows before use. The GitHib project has a cornucopia of pre-built artifacts to use.

https://github.com/ggml-org/llama.cpp/releases

guenthert29 minutes ago

"I didn’t want to get into the hassle of repartitioning everything that the boot loader works with both Linux & Windows."

Hmmh? I haven't done so in years, but configuring multi-boot used to be considerably easier than disk-less operation.

jeroenhd5 minutes ago

The Debian installer is less than optimal for repartitioning.

The Linux NTFS resizing code also has a tendency to trigger data corruption. Not really Linux' fault, but it's a good reason to do partitioning from inside of Windows, which can be a pain already.

Another issue I've run into is Windows creating a very small (~300MiB) EFI partition that barely fits the Windows bootloader, let alone a Linux bootloader and kernel. You can resize and recreate the partition of course, but reconfiguring Windows to use a different boot partition is a special kind of hell I try to avoid.

pbhjpbhj11 minutes ago

SecureBoot is a PITA.

jeroenhd7 minutes ago

For Debian and most other distros, secure boot isn't a problem. Installers are all using a signed, trusted-by-default bootloader.

There are some exceptions (some hardware from Microsoft doesn't trust the third party certificate used, for instance, and Red Hat Enterprise has their own root of trust if you opt into that), but they're very rarely ever an issue.

yjftsjthsd-h3 hours ago

Nice. I'm extra fond of ZFS backed network root filesystem, because it lets you put an OS on ZFS without needing to deal with ZFS support in that OS. (One of these days I want to try OpenBSD with its root on NFS on ZFS, either from Linux or FreeBSD.)

Does anyone have an opinion on iSCSI vs NBD?

guenthert23 minutes ago

Well, iSCSI is a standard, so chances are better that it's supported in a non-Linux OS, e.g. MS Windows. Years ago I booted a Windows (7, iirc) client that way, but gave up on it (too much hassle and performance limited by the network) when SSDs became cheap.

Modified30192 hours ago

I don’t have direct experience, but when I looked into it my takeaway that NBD was unable to reliably deal with network interruptions as well as iscsi.

https://forums.gentoo.org/viewtopic.php?p=4895771&sid=f9b7ac...

https://github.com/NetworkBlockDevice/nbd/issues/93

Whether that’s the case with the latest version, I don’t know, but it’s something you might test if you choose to try it.

jaypatelanian hour ago

You might like https://smolbsd.org/

yjftsjthsd-han hour ago

Well yes, I do like that:), but I don't see the connection to this thread?

deathanatos2 hours ago

> UEFI fixes that to some extent, but it’s a pain to maintain the UEFI entries manually and change them every time the kernel updates.

… you don't have to update the UEFI entries every time the kernel updates. (I guess you might if you do like a kernel w/ CONFIG_EFI_STUB, and you place the new kernel under a different filename than what the UEFI boot entry point to then you might … but I was under the impression that that'd be kind of an unusual setup, and I thought most of us booting w/ EFI were doing so with Grub.)

yjftsjthsd-h2 hours ago

Even if you do CONFIG_EFI_STUB, there should be a post-update hook to automatically call efibootmgr.

nicman232 hours ago

or just copy the latest kernel to something like /vmlinux and /initramfs

Tepixan hour ago

This could be an interesting setup for booting off a NAS like Synology or QNAP. I haven't really used iSCSI, it's intimidating how much prep this takes...

rwmjan hour ago

iSCSI seems intentionally obscure. One of the improvements I made to NBD was invent a simple, standardized URI format so that you can specify servers easily, eg:

  nbdinfo nbd://server
  nbdcopy nbd://server:2001/ nbd+unix:///?socket=/tmp/localsock
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/ur...

burner420042an hour ago

The 'target' moves slow so once you learn it, it all stays relevant forever.

... And it's very, very fun.

anonymousiam3 hours ago

I've done a lot of headless/diskless stuff. I haven't done much for years, because my NAS only has gigabit Ethernet ports. I can cascade them and get four Gbps downstream, but it's still painful.

I have recently upgraded my house to 10Gbps Ethernet, with only one room still stuck at gigabit, and unfortunately, it's my main office. I'm working on getting the drop there now (literally, just taking a break here).

Even once I'm done, accessing an iSCSI drive over 10GbE will be 4-8 times slower than a local NVMe drive, but it will sure be a lot better than it was!

Ideally, I could run VMs on the NAS and have great performance, but that's another hardware upgrade...

olavgga minute ago

Using a proper NIC (Chelsio) with their iSCSI accelerator will boost your iSCSI performance significantly. Another alternative is Mellanox with RDMA. You need CX4+ for optimal performance over TCP/IP, while the cheap CX3 is excellent with IPoIB. If you have a lot of packet drops and retransmissions, another option for boosting iSCSI performance is getting a network switch with a lot of memory for packet buffering. This helps with incast congestion. There are special switches with gigabytes of memory built for this.

NVMe-oF is the best protocol with least overhead for network drives, with a proper setup you lose only 10-20% latency compared to local disk even with Intel Optane. Throughput should be almost similar.

ReDress2 hours ago

Really I wonder how this turns out to be diskless while you're clearly accessing a disk/drive over the network. Shouldn't we refer to this as network boot?

pdpian hour ago

It's diskless from the point of view of the device being booted.

dhash3 hours ago

something worth mentioning here is that iSCSI is quite unhappy on congested networks or packet loss caused by incast traffic.

to make this actually work well, consider modifying your switches QoS settings to carve out a priority VLAN for iSCSI traffic

fragmede3 hours ago

or a north-south/east-west architecture, so there's an entirely separate network just for iSCSI. Control plane vs data plane.

protoman30003 hours ago

Pretty cool! You could also boot into an ephemeral minimal initrd that displays a selection menu instead of doing it in iPXE. That would grab the new kernel and initrd from the network and kexecs it without reboot.

tehlike3 hours ago

I used similar ipxe setup for robotic cluster - every robot booted from the same thing, then kubernetes managed the containe orchestration. it was fun.

ggm3 hours ago

NFS diskless is the more common approach I've used but this is very cool.

KaiserPro2 hours ago

NFS diskless was easier for me to setup when I was doing it.

THe caveat was, you needed readonly root, so that meant freezing the OS, anything that needed changing was either stored in a ram disk (that you need to setup) or a per host nfs area (kinda like overlayfs, but not)

yjftsjthsd-han hour ago

Why would you need a read-only root? Do you mean to share across multiple machines?

ahepp3 hours ago

When I tried root-on-nfs I had a lot of issues. The Redhat and Arch package managers don't seem to like it (presumably a sqlite thing?).

contingencies3 hours ago

You can download the rootfs, extract it to a ramdisk, and just run in memory. This is fast for everything. Unfortunately, memory just got super expensive. Fortunately, Linux requires ~no memory to do many useful things.

ahepp3 hours ago

You might find it worth upgrading to 10gbps if you continue to go down this road. The Mikrotik CRS-309 has served me well, and a couple Intel X520-DA2s. I believe those NICs can do iSCSI natively, and pass the session to the operating system with iBFT.

SFP28 might be cheap enough now too, I'm not sure...

nicman232 hours ago

what i want to play with is rdma and having a bcache block device with the remote as a backing and a small local nvme as a write-through cache

[deleted]5 hours agocollapsed

louwrentius3 hours ago

I would probably recommend to look into NVMe over TCP over iSCSI, especially for fast NVMe drives.

darig3 hours ago

[dead]

hn-front (c) 2024 voximity
source