Directly jump to the content and enable reader mode
Article image

have read aloud


Reading time: about 4 min Print version

A systemd rant or why I move away from it

The past

In the past I did not care much about pro or con systemd.
For my own servers or desktop I use Gentoo Linux and they gave me a choice, so I said no thanks, openrc works fine, no need for the extra bloat.
On customer servers I used whatever distribution the client wanted and what it came with, all of them moved to systemd, but it did the job and I did not care much. Yes it was way more work the set up the servers because of the systemd bloat and functionality it has taken over and does not provide properly, but hey, I get paid by the hour.

Why I switched to the  anti-systemd camp

It simply becomes too powerful, takes over more and more functionality and has the "I know better than the service I control or replace" attitude.

It took me almost 2 weeks to find a critical issue and kind-of-spoiler: it was systemd's fault, why my apache did not work properly.

The long debugging misery:

The setup:
For a customer I run a haproxy with failover followed by several machines each running a classical apache webserver with stuff behind it.
For a while now, the montioring system sometimes complained that one of the apache webserver is not running or is unreachable, not always the same one.
It happens for a minute, the next time the monitoring checks, it is marked as running again.
The service was running all the time, but at one time I saw that the queue on the network socket was full. This indicates that apache ran out of workers to process incoming requests fast enough.
I tried to tweak the apache settings without success, the service has an unusual usage pattern with high bursts of incoming connections (several thousands of new connections/requests per second per server). But whatever I set, the issue did not resolve. Until I noticed that the apache server could not start workers and threw an error message. But the limits were set way higher so it should be able to spawn more threads or workers.
I checked the system limits, apache limits itself, did local tests, all was fine and should be able to withstand 10x the requests.

This is where systemd has bitten me in my rear end

So my system limit was set high, the apache config limit was set high, and it worked locally, I noticed one big difference: systemd on the live system, openrc or manual start on the local system.
After digging for a while I found out that system created it's own default values and enforces them on the started services. In this case there was a limit on threads and open files. Both way lower than system limits or needed for the actual load. Who the hell designed it, this service should start the other services and keep it's claws out. Bad enough that it messes around with logging, network, time and other stuff, enforcing arbitrary limits on others is not OK. If you want to change those limits, you have to create a new config with exceptions for the service. No, not changing, creating new configs.

For reference, you need to create the config file /etc/systemd/system/<service>.service.d/override.conf with your overrides, in my case:

TasksMax=20000
LimitNOFILE=100000

After this, no more warnings from the monitoring (at least for apache), after a few days I got similar warnings for the haproxy, this time I knew where to look, and yes, more limits.

For me this behavior is unacceptable and was the last drop in the bucket to make me switch distributions. Sadly the previous distribution does not offer a systemd-free variant so I will use a different one. As soon as the next release is due to upgrade, I will switch the remaining machines, new ones will be on the new distro.

And it was so much easier to set up, too. No manual disabling lots of systemd services that are on by default. Some of them include

  • time synchronisation via ntp, not via systemd
  • proper network management without messing up routing and vpn
  • no more dns caching that causes issues, finally my system settings are respected
  • proper logging without using extra binaries and magic managements
  • having a working logrotate again, that does what it should do, when it should be done
  • no more artificial limits on services
  • no more Poettering on the system (yes, I hate PulseAudio and none of my systems has it)
  • no more messing around with the location of my tmp files

I really hope distributions come to reason and get rid of systemd or limit it to the bare minimum: starting a service. But my expectation that this happens is quite low.

0 comments

Our algorithm thinks, these articles are relevant: