Thursday, March 31, 2016

How to improve quality of legacy software

When developing software from scratch, I follow these steps: Use cases (aka concept of operation), requirements, design, code & unit tests, system test planning, system test.

Recently, I was asked how to improve the quality of a software written by other people a couple of years ago. I was informed that it was working, the only missing thing was compliance to software development procedures. I recommended to do the above sequence in reverse, i.e. first planning and executing full system tests. That way, we can answer the most important question, "is it working correctly". After we are satisfied with the tests, we can review and document the design and refactor code.

Even if we have to stop after testing, we will have added value to the existing product by being able to demonstrate in a repeatable manner that it is working. If we started the development sequence from the beginning and had to stop midway due to other priorities (which always happens), we would have wasted our time.

Wednesday, March 30, 2016

Model simplification strategies for testing

When developing complex models/algorithms, it is usually difficult to evaluate full model test results. The only option to verify complex model output is to compare it with some other implementation, for example Matlab toolboxes.

In addition to using 3rd parties for comparison, you should add functionality that lets you to simplify the model to a form whose results can be easily interpreted by a technical person. For example, if you are working on an algorithm that converts Geodetic coordinates to ECEF, you could have a function that temporarily sets the ellipsoid to a sphere (by making the eccentricity zero). It is easy to calculate expected results for a perfect sphere.

Similarly, if you develop a kinetic 6DoF flight simulation, you should have functions or flags that let you easily turn off complicating factors like aerodynamics (by multiplying coefficients with zero), wind, variable gravity, Coriolis, ellipsoidal Earth and terrain elevation. Your aim is to simplify your model to a ballistic flight in vacuum with constant gravity over a flat and non-rotating Earth. You can then use high school physics to calculate expected trajectories and compare them with your model outputs.

To clear doubts about your code, you should be able to quickly show that it obeys basic geometrical/physical laws.

Saturday, March 19, 2016

Error in NASA code

NASA World Wind java code is available on the internet. While looking at the their SDK2 EGM96.java file, I discovered that the simple two dimensional linear (bilenear) interpolation code has an error. The original code:
...
double ul = this.gePostOffset(topRow, leftCol);
double ll = this.gePostOffset(bottomRow, leftCol);
double lr = this.gePostOffset(bottomRow, rightCol);
double ur = this.gePostOffset(topRow, rightCol);

double u = (lon - lonLeft) / INTERVAL.degrees;
double v = (latTop - lat) / INTERVAL.degrees;

double pll = (1.0 - u) * (1.0 - v);
double plr = u * (1.0 - v);
double pur = u * v;
double pul = (1.0 - u) * v;

double offset = pll * ll + plr * lr + pur * ur + pul * ul;
...

I prepared a schematic to visualize the algorithm:


As you can see at the end of above code snippet, the offset formula starts with pll*ll. pll is (1.0 - u) * (1.0 - v). The correct multiplier of the ll term has to be v*(1-u).

A quick sanity check: Consider the case when lat=latBottom and lon=lonLeft, i.e. the lower left corner. In that case we expect the offset to be equal to ll. Plugging in the values we get v=1, 1-v = 0 and u=0, 1-u = 1. If we use the original (wrong) formula we get offset = 1*0*ll + 0*0*lr + 0*1*ur + 1*0*lu = 0 which is clearly wrong. If we use the corrected formula we get offset = 1*1*ll + 0*0*lr + 0*1*ur + 1*0*lu = ll which is the expected result.

The easiest way to fix the code is to swap the topRow and bottomRow indices when calling getPostOffset() function.

I guess the error was not detected since results do not differ too much because, although the points are used in wrong order, they are close to each other nonetheless, so the result might not have looked suspicious.

I tried to log a bug report on their Jira site, but I could not since I don't have an account. I was able to file a bug report on GitHub.

Lessons learnt:
  • Thoroughly test 3rd party code, even if it is from NASA or Mathworks. Every code is guilty unless proven innocent. Ignore this advice and you will find yourself chasing strange errors for a very long time.
  • Separate interpolation code into a function, write lots of unit test for that function to verify that it works correctly.

Good error messages

A good error message should contain the following:
  • Short description
  • What triggered the error?
  • What was expected?
Example: User inputs a negative value (-5) into a function foo that only accepts positive integers. A message similar to the following should be displayed:
Negative input!
The input value was -5. Function foo only accepts positive integers.