US unemployment dataset analysis

Immanuella Duke
4 min readMar 22, 2021

Introduction

The unemployment rate is measured by dividing the number of unemployed people by the total number of people in the workforce. It serves as an indicator of a country’s economic status. In this post, data analysis is conducted on a Kaggle US unemployment dataset. The different aspects of the data including county and state level unemployment prevalence is studied as well as unemployment in different time periods.

Dataset description

The dataset contains unemployment data from 47 US states and 1752 counties from 1990 to 2016. The dataset consists of 885548 rows and 5 columns of data. There are no missing values within the dataset. Tableau software was used for the data visualization.

Unemployment by state

A graph showing unemployment rates for different states in the US.
Fig 1: Bar chart showing mean unemployment rates for 47 US states

The bar chart in Fig 1 shows the breakdown of unemployment rates by state. To the extreme left, we see that Arizona, California and Mississippi have the highest unemployment rates and South Dakota, North Dakota and Nebraska are among the states with lower unemployment rates.

Fig 2: Map showing states and their unemployment rates

Representing the information on a map gives more insights into whether geographical locations affect unemployment. The map above shows that the majority of the states in the mid-US have lower unemployment rates whilst those in the outer parts generally have higher unemployment values. This is likely because commerce is concentrated in the middle belt of the country and a lot of businesses would rather be located much within the country than on the outskirts. Another reason could be the higher population of immigrants on the outskirts of the country, contributing to a larger percentage of people without jobs.

Unemployment by county

Fig 3: Map showing counties and their unemployment rates

Similar to the map in Fig 2, we see that the counties on the border in Fig 3 also have higher unemployment rates. A special case can be seen with counties that share a border with Mexico.

Unemployment by month

Fig 4: Chart showing level of unemployment in different months

This interesting graph in Fig 4 shows a clustering of the months based on unemployment rates. We can immediately see that January, February and March have the highest rates, followed by another cluster of months — June, July, December and April. Finally the months with the lowest rates are May, August and November. October and September report the lowest unemployment levels.

Fig 5: Line graph of mean unemployment by month

In Fig 5, we see that there are higher rates towards the end of the year and the beginning. This is perhaps because companies layoff their employees during that time. This could advise companies on the best time to recruit new personnel for job positions.

Unemployment by quarter

Fig 6: Bar chart showing mean unemployment rates by the quarter

Fig 6 shows a main difference between unemployment rates in Q1 and other quarters. This supports the monthly breakdown in Fig 5 that shows that unemployment is highest at the beginning of the year.

Unemployment by year

Fig 7: Line graph of mean unemployment by year

At the turn of a new decade, 2000, we see unemployment rates at its lowest in the US. This was due to an economic expansion in the US in that year (see reference). A rapid growth in GDP and wage growth from the early 1990s led to increased employment in those years. In 2008, however, when the financial crisis occurred, there were spiking unemployment rates. This was because many businesses could not survive the crisis and people lost their jobs. As the economy picked up again, unemployment rates began to fall again in 2011.

Take note

It’s not enough to analyze the data, you should also seek explanations for the results you get. Look at what is happening in the real world and compare them to your own results.

Conclusion

In conclusion, we see that different types of interesting analysis and explanations can come from a simple dataset such as this. Go ahead and perform your own analysis and see what you find!

--

--