The City of Austin has a lot of data available, including a dataset of all 2.1 million reports filed by the Austin Police Department since 2003 that you can find here.
I enjoy the aesthetics of geospatial data visualizations, so let’s see what we can make from that dataset.
We start by loading the data into a Pandas DataFrame, and removing and renaming the columns we’re interested in:
There are 379 types of crimes included in the dataset, ranging from ‘LOITERING IN PUBLIC PARK’ to ‘BEASTIALITY’, so in order to make a useful visualization we need to narrow-down and group some of the types of crimes.
I manually categorized most of the crimes into 9 categories: Auto, Assault, Burglary, Domestic, Drugs, Fraud, Misc, Property, and Theft.
These categories and their constituent crimes are quite arbitrary, you’re free to use whatever categorization system you want.
For the sake of brevity and because it’s boring, I’m not including the code I used to categorize the crimes here, but you can find it in the script on my GitHub.
A visualization of the categories I chose and their constituent crimes can be seen below in the Sankey diagram, generated using the Google Charts Service utility.
To keep the visualization readable I only included crimes with more than 10,000 occurrences over the last 16 years.
Sankey diagram of crime categories
After the previous step, we now have a DataFrame containing the location, day, year, and categorical crime code, for every incident corresponding to a crime included in my categories.
Dropping uncategorized crimes from the dataset reduced the number of incidents from 2.1 million to 1.8 million, which is still still sufficient for cool data visualizations.
A lot of problems can arise when plotting more than 1 million datapoints, but thankfully there’s an amazing Python package called datashader that takes care of everything.
I first want to make a plot showing all crimes, regardless of category.
Here’s the code I used to generate the following plot:
All crimes
That looks pretty cool, and it seems to make sense: I doubt any Austinite would be surprised to see that a lot of crimes occur near downtown.
Now we want to color by category rather than intensity.
We again use Datashader, but we use a different color palette, and we specify the column of codes in our DataFrame:
Crimes by category
It looks like there are a lot of drug and alcohol crimes occurring in the East 6th Street area, which is completely unsurprising.
Most of the auto crimes occur along major roads and highways, again unsurprising.
Let’s look at each of the nine categories a bit more closely.
We start by generating plots for each category:
and then tile them together using ImageMagick:
Crimes category, separated
My name is Tristan Lee, I'm a data science with a background in computational physics. This is my blog that I use to share simulations, visualizations, and analyses. My focus areas include right-wing extremism, social network analysis, and high-performance computing.