37 - Null Island

Unsichtbares Akkordeon

Statistically, the area around the police station was the most dangerous place of all.


The online crime map of the Los Angeles Police Department showed that between October 2008 and March 2009, over 1,380 entries came from the area surrounding the police station itself. That accounts for almost 4% of all recorded crimes in this city during this period. It was only when the LA Times (which base is also in the neighbourhood) complained that the police station noticed the error in the system. But what had happened?

All police reports were written by hand at the time and the majority of them were automatically entered into the database. It also happened often that the location of the crime was not recognized. In this case, the location of the police station itself was simply taken as the default value. This was not checked, which led to a major falsification of crime statistics. The LAPD corrected the error by setting missing location information to a "null" value (information for missing value in computer science). Of course, null values can also render certain parts of data records unusable if the values have to be used for certain visualizations or calculations. One, therefore, speaks of "Null Island - where bad data goes to die".

From this story one can learn how important it is to correctly determine the attributes of tables and databases, especially if they might also have missing values. If you simply set a value that appears to be logically readable for machines (such as "Null" as comment text or (0.0,0.0) as location), the data is not interpreted correctly and can falsify the subsequent results. You should always ensure that all possible values of a data set are well documented and that programs can recognize exceptional cases if necessary.

  • "When Good Data Turns Bad" from the book "Humble Pi: A Comedy of Maths Errors" p. 253