Every since I started using NixOS I have this nagging question about the ease of handling services could be and how we use over complicated systems to handle services. This feeling started first with docker vs nix as a build tool. I had an issue getting a Dockerfile to be reproducible, but nix managed to build the image easily. This ease is extended in the enabling services, for example, services.nginx.enable becomes as easy as docker run nginx/latest. In a similar vein, the caching of Docker layers is like using cachix and nix store.
This is why, going forward for me, the priority for building large scale systems is as follows:
- Monitoring and tooling that enables your coworkers to also be in charge of monitoring. It also helps with on-call rotation.
- An overlay network or VLAN to allow for easy service discovery.
tailscaleandconsulcome to mind due to how they are capable of spanning across the internet. - Finally the orchestration provider, whether it be kubernetes,
nomad,docker swarm, or plainsystemd.
Where is the state
Depending on the service you are running, depending whether it is a public service, that needs infinite scalability or it can be handled with a single replica, you don't know yet. You must figure this out on during the architecture design of your system. You don't start designing the system under the assumption that you will use Kubernetes. You first need to figure out how your system will operate and WHERE STATE IS. ONce you know where state is being managed, and where state is being mutated, you can design the scalability of the rest of the system around this. So the design of the system depends on where the state resides. A personal lesson of mine was having to deal with state in two locations that where on opposite ends of the system. We used websockets, which is a stateful protocol, and the message queue that was keeping track of the messages had to keep track of which websocket originated the message. We couldn't separate these two states, which lead to an unscalable system because we could not grow the replicas of the websockets proxies without loosing state of the socket.
One of the reasons why orchestration services are useful is because of the use of containers. It removes the dependency on the host operating system. You don't have to care what the host OS has, because everything you need is inside the container. This removes state at the operating system level. NixOS removes this problem, because it makes sure that all the dependencies a service/job needs are taken care of. The same amount of effort one spends writing a Dockerfile can instead be spend in writing a NixOS module.
Why is monitoring more important
I put monitoring as number 1, because one of the biggest issues with fast growing team or teams using new technologies is that not all team members understand the system on the same level. Monitoring allows team members to get the best overview of the system. It also means that when things go down, they should have an understanding of what to do to fix the failure. They should be able to see the alerts and scan the logs.
Overlay network
Benefits of an overlay network is the segregation of the system.
