The key to unlocking services on G-Cloud

The importance of keyword-rich descriptions

There are nearly 20,000 services on G-Cloud. Suppliers have strewn their services with G-Cloud keywords designed to grab the attention of buyers. So what should buyers search for, and how does that vary by cloud service category?

Only selected parts of the suppliers’ content are indexed for searching: The service title, a 50-word summary, and bulleted features and benefits. So suppliers must cram in thoughtful keyword-rich phrases to optimise their chances of success.

In this blog, I want to compare and contrast the most frequent keywords used by suppliers. I’ve selected four categories from the Cloud Hosting lot for this purpose:

  • Compute & Application Hosting (C&AH)
  • Object Storage
  • Infrastructure & Platform Security (I&PS)
  • Platform as a Service (PaaS)

Discarding distracting data

Services can belong to multiple categories as demonstrated in the Venn diagram below. For example, 53 (those at the heart of the plot) are aligned to all four categories. Comparing and contrasting the keywords for these would clearly be of little benefit. So I’m going to focus on those services around the periphery which are unique to each category, for example, the 323 for C&AH and so forth.

Venn diagram of G-Cloud services and how they align to 4 hosting categories

Having defined the scope, we now need to do a bit of cleaning. The words are converted to lower case so that we get a truer count of each distinct word. Common stop words, such as “and” and “the”, are removed. Words which are category-neutral, such as “cloud” and “service”, as well as the names of the suppliers or services themselves, are also weeded out. This cleaning will enable us to home in on service characteristics.

Visualisation of G-Cloud search terms

With that done, we could visualise the word frequency per category with a Word Cloud. The Compute & Application Hosting example below shows the most frequent words, where, for example, “uk”, “data”, “virtual”, “scale” and “security” figure prominently.

Word cloud of G-Cloud search terms for Compute & Application Hosting

However, whilst visually appealing, we do need a better approach if we are to compare and contrast across categories. This facet-wrap plot shows the ten most frequent words in each category. The advantage here is that we can more easily see both common ground and points of distinction.

Top 10 G-Cloud search terms used by suppliers in 4 hosting categories

“Security” and “data” are among the top keywords for all four categories. In contrast, “API” and “integration” are distinctively important for Platform as a Service (PaaS). Similarly, “scale” and “virtual[isation]” are distinctively important for Compute and Application Hosting.

The takeaway

A more extensive analysis of this nature may help the G-Cloud team to identify inter-category dissimilarity and thus refine the service categorisation newly introduced in the ninth iteration of G-Cloud. It could also form the basis of guidance to buyers on the keywords to consider when preparing search terms for a given category.

R toolkit

 PackagesFunctions
purrrmap_df
rvestread_html; html_nodes; html_text
dplyrselect; arrange; filter; count; mutate; if_else; anti_join
tidyrseparate
tidytextunnest_tokens
stringrstr_replace; str_trim
tibbletibble
lubridatetoday
ggplot2theme_set; geom_col; geom_text; coord_flip; facet_wrap
vennDiagramvenn.diagram; calculate.overlap
wordcloud2wordcloud2
ggthemestheme_
economist

Citation

R Development Core Team (2008). R: A language and environment for
statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

Contains public sector information licensed under the Open Government Licence v3.0.

Leave a Reply

Your email address will not be published. Required fields are marked *