A Small Rant
I had fun this morning trying to submit some more calculations only to have them all fail, with the log file having one of Gaussians “succinct” error messages that something went wrong in FileIO.
I’ve seen this error before, usually when I’m an idiot and fill up the hard drive with a CIS calculation, or using up all the systems file descriptors (should probably have the advisor increase this…), etc. So, these were the problems I was looking for, and tearing my hair out that everything seemed fine: permissions worked out, there was space, there were enough descriptors.
Finally, I realized all the calculations were restarts/guess=read routes (reads in a checkpoint file) and had forgot to copy the checkpoints to the new filename (as I like to keep the old failed/partial checkpoint in case something goes wrong), which means the FileIO error was because it couldn’t find the checkpoints to read.
For those who don’t use Gaussian: while it is a very very fine program, there are some things that just make you go “WTF?” For example, whenever g09 exits unsuccesfully, it segfaults. Granted, it sets the error code to 1 and to most people it’s no different than a messages that says “Something went wrong! Check the logfile,” but to me this is fundamentally wrong. Segfaults are when a program tries to access memory is can’t/shouldn’t. Think buffer overflows and the such. To me, segfaulting when a program error occurs and the program tries to exit reminds me of a story EHS told us:
A student was heating diethyl ether in a beaker out on a bench top, and with it’s low flash point, it inevitably caught fire from the internal circuitry in the heating plate. Luckily, no one was hurt. However, the student’s solution to this? Do the same thing, but keep a watch glass handy to cover it when it catches fire. So, he sat there covering this thing up whenever it caught fire.