I got my hands on an OpenLDAP instance which started to exist sometime around 2004. The instance was upgraded several times and was quite unstable. It crashed seemingly at random when some users logged in on an LDAP enabled system. The only thing that popped up consistently during those crashes was the password policy overlay (ppolicy). Turning it off made the crashes disappear. As the password policy overlay is required by the customer, disabling it was just a temporary solution.

The first step was to reproduce the crash. It turned out, that enabling password authentication in OpenSSH while using nslcd triggered the assertion reliably. When a crash occurs, one can find the following line in OpenLDAP’s logs:

slapd: ppolicy.c:912: ctrls_cleanup: Assertion `rs->sr_ctrls != NULL`

This assertion was reported several times and one of the reports was closed with the comment:

This turned out to be a configuration issue. Closing this out as NOTABUG.

Unfortunately, the solution was not posted and it is hidden somewhere behind RedHat’s commercial support website.

After the usual GDB session without much success, I decided to review the configuration of this particular instance:

$ cd /etc/ldap/slapd.d
$ sudo grep -ri "ppolicy"
...
cn=config/olcDatabase={1}mdb/olcOverlay={2}ppolicy.ldif:dn: olcOverlay={3}ppolicy
cn=config/olcDatabase={1}mdb/olcOverlay={2}ppolicy.ldif:objectClass: olcPPolicyConfig
cn=config/olcDatabase={1}mdb/olcOverlay={2}ppolicy.ldif:olcOverlay: {3}ppolicy
cn=config/olcDatabase={1}mdb/olcOverlay={2}ppolicy.ldif:olcPPolicyDefault: cn=default,ou=...
cn=config/olcDatabase={1}mdb/olcOverlay={2}ppolicy.ldif:structuralObjectClass: olcPPolicyConfig
cn=config/olcDatabase={1}mdb/olcOverlay={3}ppolicy.ldif:dn: olcOverlay={3}ppolicy
cn=config/olcDatabase={1}mdb/olcOverlay={3}ppolicy.ldif:objectClass: olcPPolicyConfig
cn=config/olcDatabase={1}mdb/olcOverlay={3}ppolicy.ldif:olcOverlay: {3}ppolicy
cn=config/olcDatabase={1}mdb/olcOverlay={3}ppolicy.ldif:olcPPolicyDefault: cn=default,ou=...
cn=config/olcDatabase={1}mdb/olcOverlay={3}ppolicy.ldif:structuralObjectClass: olcPPolicyConfig
...

As one can see, the ppolicy overlay is referenced twice and the fix is quite easy: Remove the second ppolicy reference:

$ sudo systemctl stop slapd
$ sudo rm cn\=config/olcDatabase\=\{1\}mdb/olcOverlay\=\{3\}ppolicy.ldif
$ sudo slaptest -F /etc/ldap/slapd.d
config file testing succeeded
$ sudo systemctl start slapd

The instance is now operating reliably.