Saturday, August 13, 2016

What makes software development difficult

The Law of Leaky Abstractions: "...suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks."

I don't have time to read the spec of every function before using it, I make assumptions based on the name of the function. This means that I take risks by using something without completely understanding it. The benefit is speedy development. As long as the problems are limited to a couple of things that leak, it is manageable by testing and debugging. But sometimes many of them team up and you have a "bugfestation". Unfortunately, even one simple error can crash and burn the whole system in unpredictable ways.

Example1: I assume that the strlen function in C returns the whole length of the string and use strlen result in malloc. Later I am greeted with system crashes which happen only on some versions of Windows. After months of debugging looking for the bug in unrelated places, I realize that strlen does not return the terminating null character and in malloc you have to do strlen+1.

Example2: I write a script to synchronize my drive to my network backup. Then I realize that some folders were not copied. After analysis I see that xcopy fails with "insufficient memory" error when path+file name is longer than 255 characters, but I don't infer that at first from the error message. A better message would be "file name longer than 255 characters". It would be even better if xcopy didn't have this limitiation in the year 2016!

Example3: I assume NASA code is correct, but that is not always the case.

Example4: I assume MATLAB functions are correct, but that is not always the case. The more depressing fact is that the MATLAB folks will resist your claims, even after you mathematically demonstrate that there is an error.

These issues make self discipline and testing early/often so critical. It's also the reason why 95% of code takes 95% of the time and the remaining 5% take another 95% of the time, because in the last 5%, you realize that libraries you trusted had errors in them. Therefore it is common to be late by at least 2x the planned time.

Bonus: Case of the Unexplained

No comments: