Skip to content

Add possibility to restart systemd services through annotations#220

Merged
Gerrit91 merged 9 commits into
masterfrom
annotation-controllers
Jun 16, 2026
Merged

Add possibility to restart systemd services through annotations#220
Gerrit91 merged 9 commits into
masterfrom
annotation-controllers

Conversation

@majst01

@majst01 majst01 commented May 27, 2026

Copy link
Copy Markdown
Contributor

Description

Depends on:

This PR allows operators and users to restart systemd services running on the firewall. For users there is a small whitelist of services, which are allowed to be restarted.

This contributes to the headscale upgrade path to version >= v0.27.1 allowing operators to easily re-connect the tailscale clients on the firewalls.

Sample:

k annotate firewall -n namespace <firewall-name> firewall.metal-stack.io/restart-systemd-services=droptailer

TODO:

  • Annotation somehow not removed after restart
  • Add Documentation
It is now possible to annotate the `FirewallMonitor` resource with the annotation `firewall.metal-stack.io/restart-systemd-services=<service-name>` in order to trigger a restart of a systemd service on the firewall. Only whitelisted services can be restarted.

Used AI-Tools ✨

  • None used for generation

@majst01 majst01 changed the title Add possibility to restart services on Firewall through annotations Add possibility to restart systemd services through annotations May 27, 2026
@majst01

majst01 commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

Nope:

May 27 12:34:58 shoot--pbs4kr--skifoan0-firewall-9ebd8 ip[112935]: {"time":"2026-05-27T12:34:58.43324249+02:00","level":"ERROR","msg":"unable to create firewall annotation controller","error":"controller with name firewall already exists. Controller names must be unique to avoid multiple controllers reporting the same metric. This validation can be disabled via the SkipNameValidation option"}
May 27 12:34:58 shoot--pbs4kr--skifoan0-firewall-9ebd8 ip[112935]: panic: controller with name firewall already exists. Controller names must be unique to avoid multiple controllers reporting the same metric. This validation can be disabled via the SkipNameValidation option
May 27 12:34:58 shoot--pbs4kr--skifoan0-firewall-9ebd8 ip[112935]: goroutine 1 [running]:
May 27 12:34:58 shoot--pbs4kr--skifoan0-firewall-9ebd8 ip[112935]: main.main()
May 27 12:34:58 shoot--pbs4kr--skifoan0-firewall-9ebd8 ip[112935]:         ./main.go:303 +0x2ba5

@majst01

majst01 commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

Now after annotating the firewall object in the seed:

$ k annotate -n shoot--pbs4kr--skifoan0 firewall shoot--pbs4kr--skifoan0-firewall-9ebd8 firewall.metal-stack.io/restart-systemd-services=droptailer
metal@shoot--pbs4kr--skifoan0-firewall-9ebd8:~$ sudo journalctl -lfu firewall-controller | grep -i drop
May 27 12:48:13 shoot--pbs4kr--skifoan0-firewall-9ebd8 ip[114529]: {"time":"2026-05-27T12:48:13.603232326+02:00","level":"INFO","msg":"restart service","logger":"controllers/FirewallAnnotation","service-name":"droptailer.service"}
^C
metal@shoot--pbs4kr--skifoan0-firewall-9ebd8:~$ systemctl status droptailer
● droptailer.service - Droptailer
     Loaded: loaded (/etc/systemd/system/droptailer.service; enabled; preset: enabled)
     Active: active (running) since Wed 2026-05-27 12:48:13 CEST; 13s ago
   Main PID: 114789 (droptailer-clie)
      Tasks: 7 (limit: 38019)
     Memory: 4.0M (peak: 5.3M)
        CPU: 49ms
     CGroup: /system.slice/droptailer.service
             └─vrf
               └─vrf50
                 └─114789 /usr/local/bin/droptailer-client

but forcing a second service to restart gives:

$ k annotate -n shoot--pbs4kr--skifoan0 firewall shoot--pbs4kr--skifoan0-firewall-9ebd8 firewall.metal-stack.io/restart-systemd-services=tailscaled
error: --overwrite is false but found the following declared annotation(s): 'firewall.metal-stack.io/restart-systemd-services' already has a value (droptailer)

also tested from a shoot against the firewall-monitor

@Gerrit91 Gerrit91 moved this to In Progress in Development Jun 1, 2026
@majst01

majst01 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Tested with the following results:

$k annotate firewallmonitors.firewall.metal-stack.io shoot--pcfgbt--gerrit2-firewall-1fd9b firewall.metal-stack.io/restart-systemd-services=droptailer
$k annotate firewallmonitors.firewall.metal-stack.io shoot--pcfgbt--gerrit2-firewall-1fd9b firewall.metal-stack.io/restart-systemd-services=tailscaled
$k annotate firewallmonitors.firewall.metal-stack.io shoot--pcfgbt--gerrit2-firewall-1fd9b firewall.metal-stack.io/restart-systemd-services=frr
$ k get events
2m40s       Normal   ServiceRestarted   firewallmonitor/shoot--pcfgbt--gerrit2-firewall-1fd9b   systemd service "droptailer.service" was restarted through monitor annotation
2m2s        Normal   ServiceRestarted   firewallmonitor/shoot--pcfgbt--gerrit2-firewall-1fd9b   systemd service "tailscaled.service" was restarted through monitor annotation

After that, no annotations are present on the fwmon object. Also no annotations on the firewall object in the shoot.
On the firewall itself:

$systemctl status tailscaled                                                                                                                                                                       
● tailscaled.service - Tailscale node agent                                                                                                                                                                                                     
     Loaded: loaded (/etc/systemd/system/tailscaled.service; enabled; preset: enabled)                                                                                                                                                          
     Active: active (running) since Tue 2026-06-16 12:06:06 CEST; 3min 25s ago
$systemctl status droptailer
● droptailer.service - Droptailer
     Loaded: loaded (/etc/systemd/system/droptailer.service; enabled; preset: enabled)
     Active: active (running) since Tue 2026-06-16 12:05:28 CEST; 4min 28s ago

$systemctl status frr                                                                                                                                                                              
● frr.service - FRRouting                                                                                                                                                                                                                       
     Loaded: loaded (/usr/lib/systemd/system/frr.service; enabled; preset: enabled)                                                                                                                                                             
    Drop-In: /etc/systemd/system/frr.service.d                                                                                                                                                                                                  
             └─override.conf                                                                                                                                                                                                                    
     Active: active (running) since Wed 2026-06-03 15:37:51 CEST; 1 week 5 days ago

So end user was not able to restart frr.

If a service was requested which is not part of the whitelist, a event is triggered:

seed$ k  annotate firewall <name> firewall.metal-stack.io/restart-systemd-services=droptailer.service

shoot$ k get events
<invalid>   Warning   Self-Reconciliation   firewall/shoot--pcfgbt--gerrit2-firewall-1fd9b          updating firewall-controller failed with error: could not replace firewall-controller with version 5eb3121, err: checksum error
60m         Normal    Reconciled            firewall/shoot--pcfgbt--gerrit2-firewall-1fd9b          nftables rules and statistics successfully

@Gerrit91

Copy link
Copy Markdown
Contributor

Maybe we should also emit an event when a service restart was not executed because the service is not whitelisted?

@majst01 majst01 self-assigned this Jun 16, 2026
@majst01 majst01 marked this pull request as ready for review June 16, 2026 11:02
@majst01 majst01 requested a review from a team as a code owner June 16, 2026 11:02
@majst01 majst01 requested review from Gerrit91 and mwennrich June 16, 2026 11:14
@Gerrit91 Gerrit91 merged commit 7eb0261 into master Jun 16, 2026
2 checks passed
@Gerrit91 Gerrit91 deleted the annotation-controllers branch June 16, 2026 12:16
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Development Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants