Tuesday 25 February 2014

A Christmastime Computer Ghost Story, or, a loop in time

The poor coder was running out of time for his deadline, and right before the Christmas holidays

The new system was not working. PAM had been updated to a newer version which choked on the old style pam.d files which used the old @ style include directives.

This meant that CRON was failing with permission denied because it's config file could not be processed.

Examination of the source version control system showed that old versions of the system used the same new version of pam and yet did not have that problem at all.

The pam.d config files were not supplied as part of the pam source, and so had come from somewhere else - and came from the base file system on which the system was based.

An older version of the system was fired up, and sure enough, it had a different set of pam.d files to cope with the newer pam release that was also in use on the older system.

It seemed simpler to just transfer the pam.d files from the old system to the new system - and indeed that fixed the problem.

These were then added as a fixup to the pam project, so that the new pam.d files would be installed whenever the new pam project was used. Otherwise, the original files would still be used.

The build system was tested and it did indeed deposit the new pam.d files in the correct place.

A new build was fired up and just before delivery for QA was found to fail in exactly the same way as before.

Things get spooky...

Analysis showed that the old pam files were installed after all; and this was because the old pam files had been packed in the install image, and this was because the fixup to the pam project contained the old pam files.

What?

Didn't the coder fix that to have new pam files the day before? Version control showed that he did, but version control showed that the fixup archive contained the old files, known not to work; and only those files were committed.

The previous system was fired up again, and to his surprise it was using the old pam files too and yet was NOT failing - because it was using the old pam libraries, and not the new pam libraries!

The command history showed that it's pam files had been packaged up using tar ... | base64, for cut-n-paste from window to window, so there was sadly not left behind any temporary files of evidence of what was actually packaged. But it was clearly the machine from which the files were taken.

The new system on which the replacement pam files had been tested had been re-installed to test the new build, and so it had the newly installed old broken pam files and not the new ones that had been proved to work.

There had been a fix and it had worked but all that was left was the same old pam.d files that did not work. And nowhere to get a fix from.

The coder then remembered that the previous installation is always preserved on a backup partition, and so quickly mounted this on the new system and found the new, different, pam.d files - which worked.

The coder packaged these up to a fixup to the pam project, just like 2 days before.

And kept a back up just to be sure.

The coder was glad to have his fix in time for the deadline, and have his Christmas holidays, but if the newer fixed files had not come from the working older system (which did not use new pam after all), where had they come from... and where did they go to... But he did not let these questions prevent him from starting a new build for QA, and going off to enjoy his long anticipated holidays.