Statistically, the area around the police station was the most dangerous place of all.
On the online crime map of the Los Angeles Police Department, you could see that between October 2008 and March 2009, over 1,380 entries came from the area surrounding the police station itself. That accounts for almost 4% of all recorded crimes in this city during this period. It was only when the LA Times complained (which base is also in the neighborhood) that the police station noticed the error in the system. But what happened?
All police reports were written by hand and the majority of them were automatically entered into the database. It also happened more often that the location of the crime was not recognized. In this case, the location of the police station itself was simply taken as the default value. This was not checked, which led to a major falsification of crime statistics. The police station had corrected the error by setting missing location information with "null" (information for missing value in computer science). Of course, null specifications can also render certain parts of data records unusable if the values have to be used for certain visualizations or calculations. One, therefore, speaks of "Null Island - where bad data goes to die".
From this story you can learn how important it is to correctly determine the attributes of tables and databases, especially if they can also have missing values. If you simply set a value here that appears to be logically readable for machines (such as "Null" as comment text or (0.0,0.0) as location), the data is not interpreted correctly and can falsify the following results. You should always be careful in the work steps that all possible values of a data set are well documented and that programs can recognize exceptional cases if necessary.
- "When Good Data Turns Bad" from the book "Humble Pi: A Comedy of Maths Errors" p. 253