Posts in Commentary
Technology & the NHS
NHS data analysis sydney


As the UK election race hots up and the challenges of the NHS come back into the limelight, the NHSX have produced a report detailing the incorporation of AI into the service and what impacts this could have.

NHS technology data analysis

The NHS is of crucial concern to a great proportion of the UK economy, and hence the developments of technology which improve the services and increase its staff capabilities are of equally crucial importance. Figure 3 shows exactly where the use of AI will come into effect, with diagnosis of illness consisting almost 50% of the total usage, highlighting the advancements in image recognition machine learning and the standardised digital format of data inputs for a range of diagnoses.

NHS employment of AI years data analysis

Despite the upsides present, it must be noted that the role out of newer technologies won’t happen instantaneously. A quarter of the proposed products are ‘very likely’ to be introduced to the NHS in the next year, but it will take at least 5 years for the great majority to be incorporated. 

Some other facts from the report include: 

  • 19% of AI solutions are being developed on algorithmically generated datasets (machines gaining insights from processes and using these insights to make further decisions about products) 

  • 44% of all GDHP members have a national or regional policy framework for the development and/or deployment of AI


For the full report.

For more data analysis and visualisations, click here.

Or, get in touch for a discussion about your data strategy.

CommentaryJack Sloman
Why We Chose Working with KNIME
Knime review analytics data preparation sydney

As data practitioners or enthusiast, you must have realised or heard that 60-70% of your job would involve data preparation. So, it makes sense to have dedicated tools that creates reproducible workflows and also reduces the pain involved in these tasks. There has been varying amount of data transformation tools like Alteryx, Penthao, SSIS and Tableau prep, which to a large degree are one of the best in the market for BI types of task. However, these tools don't come cheap! There is a huge price tag to them. So, in the last few months here at White Box Analytics, we went on a journey to find an alternative to the costly ETL tools. We arrived at KNIME and we must say it is pretty good! So, we thought to share and highlight our experience with the open source tools.

KNIME is an open source data platform that has been around for more than 10 years now, just before the buzz of data science. It is a guided analytics software for data professionals and an easy to use interactive data platform that executes workflows in nodes. In the years it has been around, the platform has evolved from a standalone desktop software to a collaborative tool for data practitioners. It has also developed into other products to cater for enterprise users with the release of the KNIME Server and KNIME extensions. It can be described as a data platform as it covers most end-to-end workflow process of a data project. These include data transformation and import (ETL), visualisation and predictive modelling.

Specifically, our experience with the product has been with transforming and loading data into the data sources - popularly known as the Extract, Transform and Load (ETL) process - for our clients. With a fairly decent skill in data manipulation we found the software to be relatively quick to pick up and run workflows. It is very intuitive thanks to the development team as we find the panes to be really useful. The panes include KNIME Explorer pane, Node Repository pane, Description pane, Console pane, Outline pane, Workflow Coach pane and KNIME extension. See Figure 1.

Figure 1: A screenshot of the Knime desktop.

Figure 1: A screenshot of the Knime desktop.

As most guided analytics software and ETL tools, KNIME executes in nodes and the node repository pane groups these nodes into functions. These include a database group specifically for connecting and querying different databases. Out of the box, KNIME connects to the usual suspects databases; Microsoft SQL Server, MySQL, Oracle and PostgreSQL and can connect to any other data source using the arbitrary JDBC driver.

One other cool feature we found with the KNIME node repository is the file handling capabilities. Aside from being able to connect to Secure/File Transfer Protocol (SFTP/FTP) servers with the usual password authentication, we found connecting to the servers via public key to be seamless compared to other applications. Connecting to AWS S3 bucket was also a walk in the park as it was just a matter of installing the relevant extensions.

Something worth highlighting is the scripting capabilities of KNIME. This feature of KNIME emphasises why it is not just an ETL tool but a data platform. With this capability we found it easy to integrate other scripting languages into our workflow, which came in handy to execute some required tasks other than using the KNIME created nodes. So far, the languages we integrated with KNIME include R and Python, which are some of the most relevant languages in the industry, but it does have the ability to integrate other languages such as Java and JavaScript. This can be useful for those problematic csv files or if you would rather work in either R or Python but need reproducible visual workflows. Something to be very cautious about is the fact that some of its node terminologies could be a bit confusing to get your head around if you are using it for the first time. We found that we had some significant delays in completing a task just because we were confused about the input variable that goes into the source and target port for the “download/upload from list” node. This node connects to an FTP server and lists the files in a remote directory with the option to either download or upload to this directory, but the source and the target port expects a variable format name URI. It took quite a while to understand that the input variable which usually would be a file path had to be converted to a KNIME format called URI.

We did notice a cool community engagement. The KNIME community portal was really useful as it was well organised and it answered just about any question we had. This could be due to the maturity of the platform or using the right keywords but we just like to put it out there that it was fairly easy to find answers to most questions using the KNIME community portal. We found that the user of the platform are really engaged and this has led to the development of quite a number of KINME extensions. Currently, there is an extension for configuring slack messages because this is 2019!

There are two helpful features for beginners, these are Workflow Coach and the KNIME NodePit. The Workflow Coach pane is one of the features that makes KNIME stand out, particularly for beginners as it recommends a list of nodes with a percentage ranking that should be accompanied with a selected node. This feature just mimics the KNIME NodePit but in lesser details. The NodePit goes into greater detail, giving a detailed explanation about the node and also has a section called the Best Friends (Incoming and Outgoing) nodes that mostly accompanies a given node. The NodePit also has a section that give example workflows that incorporates the node in question into it. This is also super cool for beginner to re-engineer that example workflow to have a better understanding of the node.

Although there weren't a lot of difficulties working with KNIME, we found the desktop version wasn't built to handle very large datasets. While using KNIME desktop, the extent of the workload depended on the local server or computer's hardware capability as the temporary file are stored on the hard drive or memory. In our experience we also found that the desktop application struggle to handle a dataset of about 14 million records using a 16GB ram. Another downside faced with KNIME was its difficulty to successfully run stored procedures. Although there are workarounds for running stored procedures, we do hope that running them in later updates are as easy as just configuring a node.

This is by no means an exhaustive guide to using the KNIME desktop product, we believe there are multiple resources online that covers those topics. It is just a highlight of our experience with the software and an expression of why we feel it is a better alternative to the traditional SSIS ETL tool which seems not to have really evolved with the growth in the industry and a better alternative to the costly ETL tools like Alteryx and Informatica. There is something to be said about the open source movement and it’s importance to the data analytics industry, but we will leave that for a separate article. Until then, we hope you found this informative.


This review was written by Ije Iruemi, a data analyst for White Box Analytics.

For more data analysis and visualisations, click here.

Or, get in touch for a discussion about your data strategy.

Commentary, ReviewsJack Sloman
Black Friday with Shopify - Article Analysis
Black Friday Data Analysis Sydney

Black Friday is not only about scoring a bargain in your favourite stores or online, but it is also a great showcase of how far eCommerce has come since the inception of online shopping. Shopify discuss the democratization of eCommerce and how it allows small businesses to compete with the big corporates that once ruled the online space as a result of their large scale operations and cost-effective delivery. They argue that the empowerment of small businesses through services like their own are vital for the economy, and, as the online retail space in Australia grows quickly, enables for literally anyone to become successful where they have a good product or service to sell. 

Deloitte coin Shopify as “a catalytic engine for entrepreneurial activity, scaling businesses, and ultimately contributing to economic growth.” Thinking back to the old marketing saying, ‘think global, act local’, it can be argued that Shopify are indeed enabling the breakthrough of niche businesses offering unique and high quality products inspired from sources all over the world, and brought to life in garages and home offices, challenging the giant corporates who will have to reassess the potential threat of these changes to their underlying business models. 

With $183bn in global economic activity generated from businesses on Shopify, the online era is surely coming to takeover, whether we like it or not.

For the original article Click Here.

For more data analysis and visualisations, click here.

Or, get in touch for a discussion about your data strategy.

CommentaryJack Sloman
Interview - Microburbs with Luke Metcalfe |Data Stories|

Marketing is becoming increasingly personalised. But what do you do when you have too many customers to know personally? We sat down with Microburbs founder, Luke Metcalfe, to find out.

Thanks for watching and thanks to Luke from Microburbs for his time and expertise. For more information about Microburbs, check out their website or check out the full 15 minute interview transcript on our blog.

If you haven’t already, feel free to sign-up to our monthly newsletter for more insightful data stories.

Or, get in touch for a discussion about your data strategy.

CommentaryJack Sloman
Full Interview - Microburbs with Luke Metcalfe |Data Stories|

Marketing is becoming increasingly personalised. But what do you do when you have too many customers to know personally? We sat down with Microburbs founder, Luke Metcalfe, to find out.

P.s. If you’re in a hurry we have time stamped each of the discussion points so you can skip to the bits that interest you the most (see below)!

Time Stamps:

  • 00:16 - inspiration for Microburbs and where the data comes from

  • 3:09 - what value can Microburbs bring to businesses?

  • 5:41 - how does Microburbs differ from traditional competitors in the segmentation and analysis space i.e. Roy Morgan?

  • 8:12 - can users of Microburbs literally find themselves?

  • 9:59 - industries and companies that can benefit most

  • 10:59 - future of Microburbs

  • 12:52 - the most fascinating insight found since Microburbs was developed

  • 15:00 - conclusions and links for extra info (below)

Thanks for watching and thanks to Luke from Microburbs for his time and expertise. For more information about Microburbs, check out their website.

If you haven’t already, feel free to sign-up to our monthly newsletter for more insightful data stories.

Or, get in touch for a discussion about your data strategy.

CommentaryJack Sloman