The “cluster of six”

Unsupervised machine learning

Hansard reports what’s said in the UK Parliament, sets out details of divisions, and records decisions taken during a sitting. The hansard R package provides functions to import its data.

Using the Hansard API (Application Programming Interface), we’ll apply unsupervised machine learning to analyze the voting patterns of 219 Labour Members of Parliament (MPs). We’ll consider all divisions (results of the votes) in the UK House of Commons since the 2017 general election.

Supervised machine learning makes predictions from labeled training data. The unsupervised flavour looks for hidden structure in “unlabeled” data, i.e. a classification or categorisation not included in the observations. Hierarchical clustering will identify a cluster of six MPs as the most “distant” from the wider party.

The full methodology, including the code, is published here. This extended narrative confirms the suitability of the data for clustering; reviews eight clustering methods for optimal fit; plots the full dendrogram of 219 Labour MPs; and rationalises the outcome in more detail, for example, using Cook’s Distance.

We’ll set a vote of “aye” to 1, and “no” to -1. And we’ll treat non-votes as 0. Voting the opposite way to the majority of the party, as well as non-votes, will be of interest when assessing which MPs are “most distant” from the majority.

The “cluster of six”

We’ll apply a “bottom up” clustering approach. Each MP starts in their own cluster, and pairs of clusters are progressively combined until none remains.

What we find is the above cluster of six MPs who, based on their voting patterns, are the last to merge with the wider party.

Does this cluster make sense?

Can we rationalize why machine learning has isolated this cluster? If we inspect the ten MPs recording the fewest votes since June 8th, 2017, it does include all six.

Nonetheless, non-voting will not be the only influencing factor. A small minority of MPs voting the opposite way to the overwhelming majority will influence the “distant cluster”.

Cook’s Distance visualizes these influential outliers. It shows the voting of three MPs, all on the European Union Withdrawal Bill readings, to be particular outliers. All three MPs are in the “cluster of six”.


So, in summary, we established that the data is suitable for hierarchical clustering and selected the clustering method that best fits the data. We identified a “cluster of six” MPs who are the last to combine with the rest of the party. And, by inspecting the non-votes and most influential outliers, we can rationalize why unsupervised machine learning merged these MPs last.

R toolkit

purrrmap[4]; map_dfr[2]; possibly[2]; set_names[2]; compact[1]; negate[1]; reduce[1]
furrrfuture_map[1]; future_map2_dfr[1]
futuremultiprocess[1]; plan[1]
hansardcommons_members[1]; mp_vote_record[1]
dplyrmutate[10]; filter[6]; if_else[6]; as_tibble[5]; select[4]; summarise[4]; tibble[3]; arrange[2]; group_by[2]; left_join[2]; desc[1]; everything[1]; n[1]; rename[1]; top_n[1]; ungroup[1]
tidyrgather[3]; spread[2]; unnest[1]
stringrstr_c[6]; str_detect[4]; str_replace[2]; fixed[1]; str_count[1]; str_remove[1]; str_remove_all[1]
rebusliteral[4]; lookahead[3]; whole_word[2]; ALPHA[1]; lookbehind[1]; one_or_more[1]; or[1]
lubridateday[1]; month[1]; year[1]
statssd[4]; hclust[3]; cophenetic[1]; cor[1]; dist[1]; lm[1]; reorder[1]
baselibrary[14]; c[5]; function[4]; mean[4]; Sys.Date[3]; list[2]; max[2]; min[2]; rep[2]; rev[2]; conflicts[1]; cumsum[1]; cut[1]; labels[1]; nrow[1]; round[1]; scale[1]; search[1]; sum[1]
ggplot2element_blank[12]; aes[6]; ggplot[5]; element_text[4]; ggtitle[4]; coord_flip[3]; theme[3]; geom_jitter[2]; labs[2]; scale_colour_manual[2]; element_line[1]; geom_col[1]; geom_text[1]; theme_set[1]; theme_update[1]
ggthemeseconomist_pal[9]; theme_economist[4]
factoextrafviz_dist[1]; get_clust_tendency[1]
dendextendassign_values_to_leaves_nodePar[3]; set[3]; color_branches[2]; cor.dendlist[1]; dendlist[1]
formattablecolor_bar[3]; formattable[1]
kableExtrakable[2]; kable_styling[2]

View the code here.

2 Replies to “The “cluster of six””

  1. I like the “R Toolkit” image. Did you create this manually or is there a function to details the functions used from different packages? I had always thought it useful to identify packages no longer required as you decided to swap a function out for an alternative.

    1. Thanks. I create the table container using the WordPress plugin TablePress. The icons are adapted from Font Awesome. The content though is manual; a function to do that would be good for the reason you describe. I like creating the table to see what new functions I’ve newly introduced in an article that I haven’t used in any prior article.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.