Becoming a Data Analyst - Data stories by White Box
becoming a data analyst

Becoming a data analyst isn’t, for the most part, an easy process. Many people are unfamiliar with what data analysts do behind the doors when they are ‘mining’ what is now often coined the new business gold. The confusion surrounding the underlying skills a data analyst holds often leads to friends, family, and others interested in their workings, wondering how they came to be in the positions they are in today. As such, the team at White Box Analytics thought it’d be a great exercise to discuss the stories of present and past colleagues who have worked or are working as analysts, to identify the key challenge areas faced, and to highlight some of the important factors which got each person to where they are today. 

Louis Keating, our founder talked of the context of the industry following his graduation in Bristol in 2001, where analyst jobs were few and far in-between. He said “I remember being concerned with the lack of jobs and those that were available, seemed very dull. Is this what I spent 3 years of university studying for!? Compared to the market now, we were underpaid and no-one really got it. Try explaining a multiple regression model to a client who doesn’t even know they have a database!” 

Since 2001, the world of data has changed enormously, with a widespread appreciation for the powers analysis has for driving key business decisions. Louis first noticed this change in 2006, when he earned a job at a firm called Tree. He said “...the data analyst became the all knowing one and underpinned everything else that agency and client was doing.”

Despite the eventual success Louis found, he mentioned “I remember thinking that perhaps I made the wrong career choice and even made moves to become an account manager, enticed by the money and not being back office staff”. That is to say, the process wasn’t an easy one for the founder of White Box. Becoming an analyst takes years of practice and dedication which is why its valuable to understand how to get started, and what can give you that slight edge when it comes to interviewing and landing that first position. 

Ije Iruemi, White Box’s longest standing employee said “the fun stuff for me is in generating insight from the data stored”, leading him to pursue a career in data analytics following the completion of his Master’s Degree. Interestingly, Ije also mentioned “I wish someone had told me that even learning a skill such as SQL takes at least 3 years of constant practice to be somewhat proficient at it. I wouldn’t have beat myself up as much when I failed at solving a problem on my own”, demonstrating that even experience practitioners go through steep learning curves.

Lucas Hadin, a previous colleague and employee of Louis, discussed the hardships of gaining the experience needed to compete among a pool of applicants for a position as a data analyst. “I came up against the classic catch 22 where all the jobs for junior data analysts still required 1+ years of experience.” In 2012, when Lucas started his job search, and still to this day, many employment postings for junior roles require some level of experience, and back then, as Lucas told us “anything from excel, SQL, python, R, machine learning, c++” weren’t available from online resources like they are today. This made it even more challenging for the young graduate to break into the industry. 

Eventually Lucas was picked up by Louis, telling us “I think that my few years of travel and working abroad (not as an analyst) might have been the thing which gave me the life skills to be able to build a rapport with Louis and ultimately seal the deal getting the job.”

data analytics marketing sydney how to become a data analyst

Jack Sloman, White Box’s current junior analyst was encouraged to get into the industry as a result of a mismatch between his soft and hard skills. “I found my skills in interpreting and writing information were only as good as the numbers behind them,” he said. One of the biggest challenges he currently faces is knowing what to learn first, and why. He manages this by dedicating his personal time to learning whilst simultaesuoly producing data related content and analysis pieces to drive the brands engagement and presence in the industry.

A key theme, if any to take away from these interviews is that your people skills, more often than not, count for more than what you might think. Each individual has to start at a point where they have no experience, so if you are looking at starting your journey, make the start sooner rather than later. Before you know it, your personal dedication and hard-work will be picked up by someone who can see your potential. Use the resources available to you and offer value where possible, you may feel alone on your journey to becoming a data analyst, but we can ensure the time spent and the experience gained will be invaluable! 


Thanks for reading! If you enjoyed this edition of Data Stories please feel free to sign-up to our newsletter - a concise monthly synopsis on all things data.

For more data analysis and visualisations, click here.

Or, get in touch for a discussion about your data strategy.

CommentaryJack Sloman
Can Machine Learning win the Melbourne Cup?
noah-silliman-fxAo3DiMICI-unsplash.jpg

Whilst building the Melbourne Cup game, we had the chance to assess the variables and build some models to see if we could use our analytical skills to help predict the winner.

Firstly, our game showed as an important truth….predicting the winner is tough! So we settled on modelling a place (1st, 2nd or 3rd).

Important note - We do not encourage gambling and by no means do we believe that this is a robust solution otherwise we would quit our jobs and do this for a living!

What did we learn?

Melb_cup_chaid.JPG

Chaid is a great way of visualising the modelling process through decisions. The nodes help you decide which way to go by showing you the predictive result (the middle decimal) and the percentage of the file this represents (the bottom %).

So firstly, at the top, we see that Price is a key variable, odds/price under $23 is ideal.

Then a crossroads. Do we use the Breeding Sire and Dam options that lead to the bright green in the bottom right or do we use colour to find the green bottom middle node? Both also use horse country (origin) and both have a decent outcome, although bottom right is better.

Either way, we now have some ideas of what is going to be predictive.

For the purposes of our scoring exercise, we used the bottom right node.

Model_importance.JPG

Next we tested a Logistic regression model and a Random Forest. Both showed Price as a significant and crucial variable.

Once our training and testing was complete, we need to score the runners!

 
 

To simplify things, we have ranked the outputs for each horse to gauge a consensus across the models.

Keeping in mind that we’re predicting a place (1st, 2nd or 3rd), Cross Counter feels like the strongest choice.

Melb_cup_predictions.JPG

An important question you should always ask is “how good are these models?”

The answer: The accuracy isn’t great. They suffer from a lack of data. The ideal situation would be using the longer trend race history, for every runner. Maybe next year.

This exercise is purely for fun (we do find this kind of data analysis fun) but keen to see how the models do.

Now we wait and see what happens at 3pm today….


Post 3pm update


So the runners have run and the champagne has been sunk. How did we do?

Melb Cup 2019 results.JPG

If you look down the final column, very near the bottom, you’ll see Vow And Declare, the winner of this years 2019 Melbourne Cup. Not one of our predictions…

However, if you remember from the start of the article, we were looking for a “place” and our intuition told us to ring fence the top 5 predictions for an each-way bet.

So Prince of Arran came to our rescue with a second place.

Our conclusion: Don’t gamble kids, even if you’re a data scientist.

Commentary, EventsLouis Keating
Melbourne Cup Challenge
 

For this years Melbourne Cup, we’ve pulled together the last 20 years of runners and winners and created a game to test your gambling strategy. Most people choose by name or jersey colour, so time to find out how good that method is! Odds are, the bookie will come out on top…

If you have issues viewing on a mobile, click here but we suggest you use a laptop for the best experience.

 
VisualiseLouis Keating
Buy-Now-Pay-Later - Article Analysis
Buy Now Pay Later Data Analysis Sydney

Buy-Now-Pay-Later on the Rise 

The Buy Now Pay Later industry is growing at a rapid pace. Amongst its users, there are many interesting trends surfacing from the popular technology which enables customers to break up payments interest-free over a number of instalments.

One of the biggest trends is the transfer of consumers from the use of credit cards to the use of BNPL services. One in four users cancelled their credit cards and about the same said they no longer use theirs. The most popular brand used is AfterPay, with 84% of BNPL users owning an account with the company.

These BNPL purchases are increasing the amount of online retail spend across the country which reflects Australia Post’s 2019 eCommerce Report findings, displaying the Australian online purchase industry to be growing at a rate of 24.4% YOY. Stronger distribution networks and growing trust with online service providers are driving this strong growth and as such businesses need to adapt their strategies to match these trends in order to meet increasing consumer expectations. 

Purchase categories varied although clothing dominated, as seen in the below figure. 

Where do BNPL users spend their cash? (%)

Of potential concern is the rise of missed repayments on BNPL purchases, an issue for 1 in 3 users. For example, almost 1 in 4 men aged 25-34 have reported missing multiple payments, whilst only half the number of females did the same. 

From a marketing standpoint, there are many interesting insights to gain from these figures. 60% of the purchases made on BNPL services are classified as “luxuries” like clothes, whereas only 17% were classified as “essentials” including food. This could suggest that BNPL service users make more irrational decisions when making online purchases and are willing to make more purchases when payments are broken-up. This represents an opportunity for businesses to expand their online offerings and benefit from the growing trend, but also highlights the need for ethical information about the implications of the use of the service. For example, 28% of users have come under financial strife as a result of using BNPL, and as a great percentage of these users happen to be in younger generations (only 55% of 18-24 year old males have never missed a payment vs 86% of 55-64 year olds), it makes sense for brands to highlight the potential dangers of using BNPL before engaging. Resulting, brands can enhance their customers’ lifetime value, but also attempt to foster a purchasing culture which doesn’t place customers under unnecessary financial stress. 

Furthermore, with recent RBA talks of sanctions to BNPL services in its 2020 review of payments regulations, the future success of the technology is still unclear. Despite this, AfterPay, the original and leading supplier of the technology defended its business model stating it generates thousands of leads for businesses using the service to hard to reach millennial and Gen-Z target markets, whilst advocating its ability to keep customers loyal, trustworthy and spending responsibly. Further, it shares data about its users with merchants to enable them to better understand and interact with them in order to provide higher quality services and products. 

Do you think the use of BNPL services is the future of retail spending, or is it simply a flashy new mechanism to add to Australia’s already colossal debt portfolio which encourages impulse buying at the expense of reaching long-term savings targets? 

For the original article Click Here.


For more data analysis and visualisations, click here.

Or, get in touch for a discussion about your data strategy.

CommentaryJack Sloman
Disinformation Campaigns - Article Analysis
disinformation shot. your right to know

70 Countries have now experienced organized disinformation campaigns  

A study recently released by two Oxford scholars, Samantha Bradshaw and Phillip N. Howard has identified an increase of 169% from the past 2 years in countries engaging in disinformation campaigns. The attention around disinformation has recently been brought to the eyes of the Australian public with a nation-wide advertising campaign named ‘Your Right to Know’.  

What is a disinformation campaign? 

A disinformation campaign is a country’s use of organised social media to manipulate information for the purposes of suppressing fundamental human rights, discrediting political opponents and drowning out dissenting opinions. 

These are some of the highlights of the Global Disinformation Report: 

  1.  Since 2017, the number of countries engaging in disinformation campaigns has increased from 26 to 70 (169% increase). Despite recent rises, research from the scholars suggests that many countries have been manipulating social media for the past decade.

  2. China has transitioned from a domestic misinformation approach to a global one. This has seen it highlight Hong Kong’s democracy activists as ‘radicals with no popular appeal’, demonstrating the adverse affects these campaigns can have on their targets.

  3. The report highlights Australia as a country being manipulated on both Facebook and Twitter, but has avoided cyber troop activity on WhatsApp, YouTube, and Instagram.

  4. When comparing Australia to the UK and the US, there have only been two identified cases of disinformation - both related to politicians and political parties. The UK on the other hand, has seen two organisations found for government agencies, politicians and political parties as well as disinformation from private contractors. The US boasts even greater stats, showing 3 or more cases for each of the mentioned organisational bodies.

  5. 80% of countries engaged in using disinformation campaigns used bot accounts to conduct them

  6. Australia has only used disinformation campaigns to support a political movement or attack another. On the other hand, the US and the UK have used it to both support and attack parties, as well as distract and divide its target audience in order to achieve its goals. On top of this, the report found that the messages shared was manipulated media (disinformation) and largely used data-driven strategies.

  7. The report identifies the countries with high cyber troop capacity have large resource allocations on a permanent basis. It estimates, for example, that China could have anywhere from 300,000 to 2,000,000 people working on disinformation campaigns at any one point in time. 

 Takeaways and Analysis

The last two lines of the conclusion of the report ask the reader whether social media platforms are places for public deliberation and democracy or tools to amplify addiction, disinformation and anger. The answer to that question is that there is no right answer. 

For many, after reading the report and this analysis it may become clear that unpredictable and catastrophic events in recent times including the Trump administration, Brexit, and the Hong Kong riots have been, at least in part, influenced by these disinformation campaigns. Despite this, the purpose of these campaigns is to manipulate their target to behave and/or think in a certain way. As a result, it is imperative that people are better educated about the prominence of such campaigns and the ways they target and influence people. Our personal data is and will always be used in approaches to influence our decisions, but it is up to us, as the decision makers, to understand that not all information is good information. Our opinions and beliefs about political parties, people and/or businesses should therefore only be moulded once we have conducted our own research and analysis, rather than being reactive to campaigns in our social feeds. This way, our data can only be used to empower, rather than cripple, us. 



For the original article Click Here.

For more data analysis and visualisations, click here.

Or, get in touch for a discussion about your data strategy.

CommentaryJack Sloman