For this analysis, I will be looking at how different measures of deprivation (Education, Income, Employment and Crime) contribute to the overall Deprivation Score of different areas in England (Lower layer Super Output Areas – LSOA, (1)) used in a government report which examined deprivation in the UK in 2010. I will also look at differences within and between the scores of different regions in England (Government Office Regions – GOR, (2)). The data and supporting documents are available here, with several caveats concerning the use of the data here.
Obviously, as a governmental analysis, the method of calculating the scores for the Indices of Deprivation was cleverly done, although very tedious. Needless to say, the higher the score for each indicator, the more deprivation of that type present in the area. This is slightly counter-intuitive for positively themed indicators such as Income, where a higher number is traditionally associated with a better outcome. For example, an area with a Crime Deprivation Score of 1.5 would rank as more deprived than an area with a Crime Deprivation Score of -0.7.
I only used data for overall Deprivation, Crime, Education, Employment and Income scores in order to keep my first analysis at a manageable size. For this analysis, I will be treating Deprivation as the response variable and Education, Crime, Employment and Income and the explanatory variables. The descriptive statistics for the Score columns are shown below:
|DEPRIVATION SCORE||CRIME SCORE||EDUCATION SCORE||EMP SCORE||INCOME SCORE|
|mean||2.093 x10^-10||-8.189 x 10^-10||21.691||0.101||0.147|
This table should give you an idea of how each measure differs in range of values and the spread of those values. Deprivation and Crime appear to have plausible values in the range -4 to +4, whereas Education, Employment and Income all take positive values. Employment and Income look to be on a scale of 0 to 1 whilst Education appears to contain values from 0 to 100.
So, after getting an idea of the shape of the variables, my next step is to look at how they relate to each other. I plotted the Score variables against each other on a Seaborn Pairplot to get an initial idea of any correlations:
This plot shows a scatterplot of each score variable against every other variable to give an idea of correlation. A histogram of each variable is shown on the diagonal and displays that variable’s distribution which was hinted at in the summary statistics. Deprivation and Crime look to be relatively normally distributed, whereas Education, Employment and Income seem to have exponential distributions.
Unsurprisingly, Income and Employment correlate strongly and there is fairly clear evidence of correlation between Deprivation Score and the rest of the explanatory variables. Looking at these correlations in more detail may provide more information about their relationship. There is a great deal of overplotting here; with over 32000 observations this isn’t surprising, however a plot which gives some information as to the density of the plotted points would definitely be helpful.
The plots below (Seaborn Joinplots (kind=’hex’) for those interested) give us a better idea of how densely packed the observations are. These plots also give a measure of how strong the correlation is between each different explanatory variable and Deprivation Score through the ‘pearsonr’ coefficient. A pearsonr coefficient of 1 means that two variables are perfectly positively correlated, 0 is no correlation and -1 is a perfect negative correlation.
This plot of Crime Deprivation Score against Deprivation Score shows a relatively weak correlation between the two variables; the relationship looks to be adequately modelled by a straight line. The darker the colour of each point, the more observations have been plotted in that place. The very darkest points for example have 54 observations plotted.
Plotting Education Deprivation Score against Deprivation Score gives a slightly different picture. The correlation is slightly stronger at 0.69 and the shape of the relationship is markedly different, appearing to be slightly curvilinear (however, a straight line may fit better, but that is a story for another day), although this could be down to the distribution of the two variables.
The relationship between Income Deprivation Score and Deprivation Score is similar in shape to that of Education Deprivation Score and Deprivation Score although the points are clustered together more tightly. This is reflected by the higher Pearson coefficient of 0.79. it’s also worth noting that the density of overplotting has increased with each plot so far. The very darkest point now represents 105 points in the same area as opposed to only 54 in the first of these plots. That this trend continues in the next plot alongside an even stronger correlation will not surprise you.
The final of the four plots shows the strongest correlation yet, with a Pearson coefficient of 0.86. Whilst the previous two plots indicated that a curvilinear relationship between the respective variables and Deprivation Score may be appropriate, it is apparent in here that a straight line would not model the relationship between Employment Deprivation Score and Deprivation Score well.
So, unsurprisingly, the measures used to determine the Deprivation Score correlate with the Deprivation Score itself, nothing groundbreaking here. The levels of correlation differ between the variables and based on the correlation alone, it seems so far that the Employment Deprivation Score may be the best predictor of Deprivation Score. I’ll leave this line of thinking for now and instead look at the aforementioned regions and how the variables differ within and between them.
This plot has a lot of information in it, so it’s worth explaining clearly. Each of the rows (colour-coordinated) corresponds to a different region (GOR); the name of which sits at the left-hand side of the plot. Each of the columns shows the distribution of the five different variables and allows us to compare them between the different regions. The point where each row and column intersects contains a lot of information about the variable for that region. The coloured box shows where the middle 50% of the points lie, with the black line in the centre displaying the median value. The ‘whiskers’ each extend 1.5 times the interquartile range (the size of the coloured box); any points that fall within this range are encompassed in the whiskers. The points outside this range are identified as outliers by convention.
A quick comparison of the plot shows that the North East has the highest Deprivation Scores, with those of the South East and East of England being the lowest. All of the GORs performed fairly similarly for Crime Deprivation Score, although London was significantly worse, with the North West having the largest difference between the worst and best LSOAs in that region. As for Education Deprivation Score, it seems that areas in London had consistently worse scores. The Employment Deprivation Score also varied between regions, although not as much as Education Deprivation Score, with Income Deprivation Score falling somewhere between the two. I will now look more closely at each variable for the areas to try and tease some more insight from a less cluttered plot.
Looking at the Crime Deprivation Score, it’s clear that London has the highest average, and also one of the lowest range of possible scores (the distance between the ends of the whiskers). The North West by comparison, has a much larger range of scores and in fact also has many of the areas with the highest Crime Deprivation Scores. London aside, there is also a clear North-South divide in the Crime Deprivation Scores. The bottom three plots (South West, East of England and South East) all have a median far below that of the majority of the northern areas, however it is interesting to note that each region has outliers at either end of the scale, suggesting that this inequality exists not only between, but also within regions.
London has the best performance for Education Deprivation Score, with the lowest median and a range of scores significantly lower than all the other regions. Each region has areas within it that score around 0 for Education Deprivation Score, however it is the middle and upper end where we see the most distinction between regions. The northern regions score consistently higher for Education Deprivation, with the median and lower and upper quartiles being larger than for southern regions. In contrast, the southern areas all had a lower median Education Deprivation Score and a small interquartile range, as well as a large amount of outliers at the higher end of the scale. The polarisation within the southern regions suggests that there is some inequality in delivery of education within these regions .
Whilst the location and the spread is different for each region, there are no substantial differences in Income Deprivation Score; the northern regions seem to do a little worse and the huge outliers in the North West and the North East each deserve a closer look however it is when compared with the Income plot that the Employment plot becomes more interesting.
When comparing Income Deprivation Score and Employment Deprivation Score London is once again the outlier. It has one of the highest median Income Deprivation Scores and a range which covers many of the higher scores, however, when coupled with its low Employment Deprivation Score suggest that there are many jobs on offer in London, however despite this, Income Deprivation is still prevalent. As with the Education Deprivation Score, the plots for the southern regions hint at inequality within the regions for Income Deprivation Score, whilst there is a lot more variation in the performance of the northern regions. The most income-deprived regions are concentrated in the North and Midlands.
Now, looking at Deprivation Score, it seems that the north/south divide remains. The North West and North East each have a median Deprivation Score far above every other region, as well as containing the areas with the highest Deprivation Scores. Yorkshire however, despite its geographical proximity to the North West and North East has a Deprivation Score profile similar to that of a midlands area. The Southern Regions all have median Deprivation Scores well below 0, although they do have many outliers at the top end of the scale. As we’ve come to expect, London is distinct from the other regions. As an area, it has some of the lowest Deprivation Scores as seen by the outliers at the lower end of the scale.
If it interests you, the code I used to create this analyses is available here.
Thank-you for reading. I hope to write one of these analyses each month and I would definitely appreciate any feedback you can give. Did you spot any glaring holes in my analysis? Do you think the plots are ugly or my explanations and interpretations to be terrible? Tell me!
- Lower layer Super Output Areas are 32,482 separate regions in England which contain roughly the same number of people.
- Government Office Regions are the ‘primary statistical subdivisions of England’ (https://data.gov.uk/dataset/government-office-regions-eng-2010-names-and-codes), and for the purposes of this analysis serve to divide the data into enough chunks to be interesting, but not so much as to be unwieldy.