Criminal goings-on in a random forest

Criminal goings-on in a random forest

Supervised machine learning

In the “cluster of six”, we used unsupervised machine learning, to reveal hidden structure in unlabelled data, and analyse the voting patterns of Labour Members of Parliament. In this blog post, we’ll use supervised machine learning to see how well we can predict crime in London. Perhaps not specific crimes. But we can use recorded crime summary data at London borough-level (non-personal aggregated data licensed under the Open Government Licence), with some degree of accuracy, to predict crime counts.

Along the way, we’ll see the pay-off from an exploration of multiple models.

Continue reading “Criminal goings-on in a random forest”

The plots thicken

Every story needs a good plot

One could think of data science as “art, grounded in facts”. It tells a story through visualisation. Both story and visualisation rely on a good plot. And an abundance of those has evolved over time. Many have their own dedicated Wikipedia page!

Which generate the most interest? How is the interest in each trending over time? Try this app to find out. Continue reading “The plots thicken”

The “cluster of six”

Unsupervised machine learning

Unsupervised machine learning

Hansard reports what’s said in the UK Parliament, sets out details of divisions, and records decisions taken during a sitting. The hansard R package provides functions to import its data.

Using the Hansard API (Application Programming Interface), we’ll apply unsupervised machine learning to analyze the voting patterns of 219 Labour Members of Parliament (MPs). We’ll consider all divisions (results of the votes) in the UK House of Commons since the 2017 general election. Continue reading “The “cluster of six””

SW10 digs deep

SW10 digs deep

Responding to a weak property market

In December I looked at how recent events have shaped the property market in London SW10. If short-distance moves are off the table in the current climate, how are property owners responding? When sales are weak, are planning applications in the ascendency? I applied data science techniques to Royal Borough of Kensington and Chelsea (RBKC) planning data to find out.

Continue reading “SW10 digs deep”

Surprising stories hide in seemingly mundane data

Drifting boat

Experimentation with geospatial mapping

Recently I experimented with geospatial mapping techniques in R.  I looked at both static and interactive maps. Embedding the media into a WordPress blog would be simple enough with a static map. The latter would require (for me) a new technique to retain the interactivity inside a blog post.

My web-site visitor log, combined with longitude and latitude data from MaxMind’s GeoLite2, offered a basis for analysis. Although less precise than the GeoIP2 database, this would be more than adequate for my purpose of getting to country and city level.  I settled on the Leaflet package for visualisation given the interactivity and pleasing choice of aesthetics.

The results however were a little puzzling.
Continue reading “Surprising stories hide in seemingly mundane data”

House sales in London SW10 take a few punches

London SW10 Housing Market

The anatomy of SW10

Analyses of house sales often focus on the wider UK market. In this blog, we’ll take a deep dive into one of London’s more-than 100 postcode districts. We’ll draw on 10,000 property transactions to see how key events have shaped the market. The object of our focus will be SW10 which forms part of the Royal Borough of Kensington and Chelsea.

We’ll start with the anatomy of SW10 per the chart below. Over 80% of property transactions were for leasehold flats / maisonettes.  In contrast, detached freehold properties are a prized scarcity: Only 40 of the circa 10k transactions, over the past 20 years, were for detached properties. Continue reading “House sales in London SW10 take a few punches”

Do G-Cloud categories need a tweak?

G-Cloud Categories

Why take a deeper look at G-Cloud categories?

The last blog – “The key to unlocking services on G-Cloud” – touched briefly upon their overlap. And as the concept of G-Cloud categories was newly introduced in the current iteration (G9), it may be worth taking a deeper look at their impact in advance of the next.

So, in this blog, I want to explore the extent and effects of category overlap. And let’s see what insights may be drawn.  For example, are some categories of less value than others?  Could some suppliers gain an advantage? Perhaps by aligning each service to many categories so buyers find them irrespective of their carefully crafted search criteria?

Continue reading “Do G-Cloud categories need a tweak?”

The key to unlocking services on G-Cloud

G-Cloud Keywords

The importance of keyword-rich descriptions

There are nearly 20,000 services on G-Cloud. Suppliers have strewn their services with G-Cloud keywords designed to grab the attention of buyers. So what should buyers search for, and how does that vary by cloud service category?

Only selected parts of the suppliers’ content are indexed for searching: The service title, a 50-word summary, and bulleted features and benefits. So suppliers must cram in thoughtful keyword-rich phrases to optimise their chances of success.

In this blog, I want to compare and contrast the most frequent keywords used by suppliers. I’ve selected four categories from the Cloud Hosting lot for this purpose: Continue reading “The key to unlocking services on G-Cloud”

Could G-Cloud pricing be simplified?

Pricing on G-Cloud

Background to G-Cloud pricing

The Digital Marketplace is helping those transforming public services by making it simpler, clearer and faster for them to buy what they need. G-Cloud focuses on cloud-based services. Since its launch in 2012, it has evolved through multiple iterations, with the current version being G-Cloud 9.

So, the introduction of a set of categories in G-Cloud 9 provided a natural step forward.  These offered a level of granularity below the three lots of Cloud Hosting, Software and Support. As a result, buyers are able to find and compare groups of suitable products more easily.

Yet there is plenty of opportunity to further simplify the buyer’s task in future G-Cloud iterations. For example, around price comparison. Continue reading “Could G-Cloud pricing be simplified?”

Does G-Cloud 9 provide too much choice?

Buyer Choice on G-Cloud

The risk of choice overload

In his book The Paradox of Choice, American psychologist and professor of social theory Barry Schwartz argued that choice overload can, in the long run, lead to decision-making paralysis.

The launch of the G-Cloud framework opened up the market to SMEs via the Digital Marketplace.  And with successive iterations of G-Cloud came lots of choice.

Continue reading “Does G-Cloud 9 provide too much choice?”