Indicators of Deprivation

Indicators of Deprivation Part 1 (See the code)

In this piece of analysis, I try to determine if there are any variables which can predict the Deprivation score of an area in England. I downloaded loads of data from the Office for National Statistics, and linked all the datasets together using the geography codes.

 

The variables of interest in this analysis are:

  • Number of people in broad age categories (children, adults and pensioners)
  • Number of people living in a communal establishment and number of communal establishments (a measure of homelessness)
  • Number of people in broad ethnicity categories (7 ethnicity categories + 1 ‘Other’)
  • Number of people in broad religion categories (Christianity, Buddhism, Hinduism, Islam, Jewish and Sikh + ‘Other Religion’, ‘No Religion’ and ‘Religion not stated’)
  • Population density

 

The first part of this analysis deals with how to wrangle and manipulate the data into a format which can be easily analysed. I had to overcome a lot of problems when loading the data in from different sources (.csv, .xls etc.), and the solutions I found will hopefully be applicable to any analyses which you do! Please take and reuse my code as you see fit – you can fork it at my Github.

 

Some of the problems which I faced when wrangling this data:

  • Duplicated records for each Local Area District (LAD). Each LAD is comprised of several smaller areas – some data sources only had information for the smaller areas and I had to take the average of these smaller areas to get a score for the LAD using the Pandas groupby function.
  • Combining and summing columns. The age data came in 100 separate columns; far too many to run through a regression! I had to combine the columns into three broad categories, summing the number of people of each age in the particular range.
  • Stripping out rows and columns with missing data. The Communal Living data (in particular) came in a very human-readable format which is unfortunately difficult to read in as a DataFrame. I used the Pandas drop and dropna functions an awful lot!
  • .csv encoding. Stackoverflow provided a great answer which I used to read in a .csv file using a different encoding.
  • Changing text to numbers and removing commas as thousand separators. This took me ages to realise what the problem was, and only slightly less time to solve it!

 

The second part of the analysis looks at how to actually analyse the data. I use the SklearnStatsmodels and SciPy libraries to do a multiple regression on the variables. Because of the large number of variables that could potentially be included, I wrote a function which runs the multiple regression on every combination of variables and ranks each combination based on several statistics for that model (R^2, p-value etc).

WARNING: This analysis gets pretty statisticky at times; I’ve tried to explain my processes and reasoning as clearly as possible, but let me know if you have any trouble understanding!

 

To do the data manipulation I used the Pandas library – this is the standard library for doing data analysis in Python; hopefully this tutorial will help you to learn how to use it!

The code for Part 1 is available here, and if you want to read ahead you can find the introduction for Part 2 here and the code here. I try to explain my analytical methods and thinking in plain English, and I really take the time to explain what each bit of my code does but if there’s anything that you don’t understand, send me an email or ask a question in the comments.

 

Follow me on TwitterGithub and Plotly, add me on LinkedIn and visit my Website.

12 thoughts on “Indicators of Deprivation

  1. I like what you guys tend to be up too. Such
    clever work and reporting! Keep up the very good works guys I’ve added you guys to my blogroll.

  2. I used to be extremely pleased to discover this great site.

    I wish to to thanks for the time just for this particularly fantastic
    read!! I definitely enjoyed every part of it and i also have you
    book marked to find out new things on your web site.

    1. Pretty nice post. I just stumbled upon your weblog and wanted to say that I’ve truly enjoyed browsing your blog posts. In any case I will be subsiribcng to your rss feed and I hope you write again very soon!

  3. wonderful post, very informative. I ponder why the other specialists of this sector do not notice this.
    You must proceed your writing. I am sure, you have a huge readers’ base
    already!

  4. I do agree with each of the concepts you’ve offered to your post.
    These are very convincing and will definitely work. Still, the
    posts are very brief for starters. May you please lengthen them a bit from next
    time? Thanks a lot to the post.

  5. I simply want to tell you that I’m all new to blogging and really liked you’re web blog. Probably I’m going to bookmark your blog . You amazingly come with excellent article content. Thanks for sharing with us your blog site.

  6. We absolutely love your blog and find nearly all of your
    post’s to be what precisely I’m looking for. Does one offer guest writers to write content
    for you personally? I wouldn’t mind creating a post or elaborating on most of the subjects you write related to here.

    Again, awesome web log!

  7. Greate pieces. Keep writing such kind of info on your page.
    Im really impressed by it.
    Hello there, You have performed an incredible job.

    I’ll definitely digg it and in my opinion suggest to my
    friends. I am sure they will be benefited from this site.

Leave a Reply

Your email address will not be published. Required fields are marked *