Posts by Jack Sloman
The Coronavirus and Carbon Emissions
 

As a result of Coronavirus, carbon dioxide emissions over the past 3 weeks have been 25% lower than for the same period last year. 

This has been reflected in the data, with this interesting line chart showing the difference between the 2020 flat line of coal consumption compared to the rebound usually seen approximately 5-10 days after the Chinese New Year.

Source: https://www.nytimes.com/2020/02/26/climate/nyt-climate-newsletter-coronavirus.html

Source: https://www.nytimes.com/2020/02/26/climate/nyt-climate-newsletter-coronavirus.html

To put this dip into better context, the reduction is equal to the amount of carbon dioxide emitted by the state of New York in a full year - approximately 150 million metric tonnes. That is just under a third of what Australia’s entire population produces in an entire year (source).

Despite the positives that often surround reducing emissions, these events have come about as a result of dire circumstances, with over 77,000 cases of the virus now diagnosed. This has had cascading effects on the Chinese economy, with some impacts including:

  • Reduced output from related industries. For example, car sales are forecast to fall 30% lower than last years levels

  • In the past week, economy wide stock indexes, as well as giants including Apple and Microsoft have seen large downward movements in their prices

Early stock price increases have tapered off in the past wek

Even the tech giants have fallen victim to the impacts of Coronavirus, with Apple already decreasing their revenue expectations for the quarter.

The S&P 500 Index value
  • ‘The industry data provider OAG reports reductions of 50-90% in capacity on routes departing mainland China and a 60-70% reduction in domestic flights within the mainland over the past two weeks, compared with the week commencing 20 January’ (source).

  • The International Air Transport Association has predicted the virus will cost the airline industry USD $29.3 bn (source)

If you haven’t already, check out the spread of the coronavirus in our latest visualisation here to get the most up-to-date World Health Organisation data updates.

For more data stories, join our community on LinkedIn.

 
Why Snowflake is the real power – Snowflake Review
 
SNowflake Logo

Snowflake is a cloud-based data-warehousing platform, which enables a modern data warehouse, augmented data lakes, accelerated analytics, Integrated data engineering, Secure Data Exchange, Agile Data App Development and Advanced Data Science Features.

The real power of Snowflake lies in its architecture and its flexibility. Snowflake is the first analysis performable database built for the cloud, meaning you can access your data warehouses from anywhere. On top of this, it seamlessly integrates with AWS and other cloud platforms. 

Snowflake handles all aspects of authentication, configuration, resource management, data protection, optimisation, and availability and as a user, you only need to sign-up, load your data and start querying.Overly, this means that a lot of the hassle that surrounds traditional data analysis platforms doesn’t exist for Snowflake users! 

How does Snowflake differ from other traditional architectures?

In traditional architectures:

Shared disk architectures use multiple nodes to access data shared on a single storage system and Shared nothing architecture stores a portion of data in each node and each cluster in the data warehouses.

In snowflake architecture:

It combines the benefits of both architectures in an innovative new design that takes full advantage of the cloud using its multi-cluster shared data architecture, which consists of three separate layers:

  • Data Storage Layer

  • Compute Layer

  • Service Layer

Each layer scales independently and includes built-in redundancy, and in addition to that, snowflake lets you store structured relational data and semi-structural non-relational data.

Snowflake’s architecture is made up of three layers: database storage layer, query/compute layer & cloud services layer.

Snowflake’s architecture is made up of three layers: database storage layer, query/compute layer & cloud services layer.

Regardless of the data types, we can use ANSI standard SQL to perform all data related tasks. 

Snowflake uses highly secured cloud storage to maintain all your data and as data is loaded into tables, snowflake converts into an optimized columnar compressed format and encrypts it using AES 256 strong encryption.

Unlike traditional architectures snowflake allows you to create multiple independent compute clusters called virtual warehouses that all access the same data storage layer without contention or performance degradation.

To create a virtual warehouse you simply give it a name, specify a size, and snowflake handles all the provisioning and configuration of the underlying compute resources. On top of this, virtual warehouses can be scaled up or down at any time without any downtime or disruption.

The key advantage with snowflake is when a virtual warehouse is resized, subsequent queries take advantage of additional resources and its unique cloud architecture enables virtually unlimited scale and concurrency without resource contention.

For example, separate virtual warehouses can be used to handle loading and querying concurrently because virtual warehouses access the same data storage layer and any update or inserts become immediately available to all other warehouses.

On top of everything, the service layer coordinates and manages the entire system by authenticating users, managing sessions, securing data, performing query compilation and optimization. It also manages virtual warehouses and coordinates data storage updates and access, ensuring that once a transaction is completed, all virtual warehouses see the new version of the data with no impact on availability or performance.

The key component of the services layer is the metadata store, which powers a number of unique snowflake features including Zero Copy Cloning, Time Travel and Data Sharing.

A growing ecosystem of external tools have native connectivity with Snowflake, meaning virtually all operations can be seamlessly integrated with the platform and making it even simpler to complete operations. 

What do you need to manage Snowflake?

Not much, snowflake eliminates most of the tuning knobs and parameters required by other data warehouses and you only need to create database tables and virtual warehouses, load data and execute queries - Snowflake handles the rest.

How much does snowflake cost?

The advantage with Snowflake is that you only need to pay for the storage and computing resources used. 

Storage costs are based on the amount of compressed data stores in database tables and the additional data are retained to support Snowflakes unique data recovery features.

Compute costs are based on warehouse size and how long your warehouse/s is/are running.

 

We hoped this brought some insight into the powers of Snowflake and the efficiencies it can create for companies with large amounts of data stored and being used. If you’re interested in learning more about Snowflake, get in touch for a chat on your data strategy today! 

Do you use Snowflake? We’d love to know your thoughts on this platform and whether you’ll be using it. Join the discussion on our LinkedIn Page.

 
Commentary, ReviewsJack Sloman
Online Data Analytics & Science Courses - are they worth it?
 
Data Analytics Courses Sydney are they worth it

The democratization of data is transforming our world and demand for a new breed of professionals skilled in data analytics, machine learning, and artificial intelligence. This change is now having a widespread influence over both the workforce and higher education institutions, with firms in high demand for data scientists, engineers, governance, privacy and more. 

Resulting from these changes, professionals of all ages and industries are becoming more interested in the fields related to data analysis, begging the questions, are online data courses worth the time, money and effort? What are the best online data analytics courses? Who are the best providers of data science courses? Is university a better option, or can I learn everything I need online? 

With the lead of Sai Diwakar Bhrugubanda, a Master of Data Science student, we will quickly discuss some of the reasons for and against the purchase of online data courses. 


Data democratization impacts every career path, so academia must strive to make data literacy an option, if not a requirement, for every student in any field of study.

Source: IBM

Source: IBM

As we can see in the image above, the demand for data science jobs is projected to grow 39%, yet it takes an average of 5 days longer to fill candidates for available positions, suggesting a shortage in professionals with the right skills. 

Data Science is a very complex field and it requires an individual to master multiple skills. There is often a lot of confusion about what a data scientist/analyst does, and a lot of that confusion is down to the diversity of skills and roles that are needed and can be classified as relating to the field. These skills include but are not limited to: 

  • programming, 

  • mathematics, 

  • statistics, 

  • business operations, 

  • algorithms, 

  • databases and data management,

  • data visualization and so on 

All this, on top of a strong understanding of the business you are conducting an analysis on. Remember, numbers don’t lie, unless they’re used incorrectly or in the wrong context! 

Now, considering these fields, there are a large number of data science-related courses being offered on various platforms. Assuming the reader is not an mid-to-expert level data analyst already, these are our thoughts on those courses. 

1. A scratch on the surface 

Everyone has to start somewhere in their journey, and online courses allow you to understand the basics. The skills developed here won’t allow you to become a leading force in the industry, but they will give you an insight into how it works and the work it’ll take to get there. 

2. Learning from experience 

“As a masters of data science student, I (Sai) learn and experience in practice rather than in lecture sessions and as a part of the course we go through major things like history of data, life cycles of software and programming the databases to buildings projects and it is understandable that online courses teach you the same but the line between practical learning and theoretical learning have disappeared.” 

 Rather than knowing the technology, having the skill to use the technology brings up the worth more. Remember, learning is a continuous process in data science since it is a vast and complex field and there is a very slim chance a short course or online program, however well designed, is going to help you master the topic.

Data Courses Online - are they worth it - expectations vs reality.png

3. Expectations vs. reality 

 This infographic is an interesting take on how expectations and reality marry up with learning the skills surrounding data science. Good examples are Deep Learning and Machine Learning, with learners having high expectations of understanding them, likely as a result of their use as media buzzwords, when they actually take up to 2 to 5 years to get a handle over.

Conclusion

Yes, online courses can be worth it to develop a fundamental understanding of data science concepts. However, they are most useful when applied with real world projects as to allow the practitioner to gain experience and understand the strengths and weaknesses in their methods. Online courses are, however, a drop in the water of the price of a University Degree, and far cheaper than vocational programs. The draw back is that they don’t provide an official qualification, and do require a lot of self-discipline and accountability.

An example - courses that are out there - Udemy

Udemy offer a range of online courses, with discounts almost all year round.

Udemy offer a range of online courses, with discounts almost all year round.

As you can see, a broad range of beginner courses can be found for prices ranging from around $15-$25. Depending on the business, these courses may be a one-off payment for lifetime access, or a subscription service which gives access to certain courses.

From a White Box perspective, we always encourage continuous learning but push real life experience as our core training principle. Data science projects are never straight forward and you learn from each one, even if it doesn’t go to plan.

 
CommentaryJack Sloman
What is a White Box?
What is a white box white box data analytics sydney

One of the first things people ask us is what does White Box mean? What is a White Box? And to answer that, lets start with what it isn’t.

In data analytics, a black box is an output with no understanding of its inner workings. It’s kind of like when you’re trying to solve a complex problem, and somehow, you manage to get an answer, but you have no idea how you got there or what the answer means. Is that useful? Maybe if someone else knows what it means, but in most cases, without a fundamental understanding of the inner workings and processes, you’re no better off than when you started.

So, we finally get to the answer, which embodies what White Box represents as a business. A White Box occurs when the business understands the inputs and variables that contribute to the outcome. By developing a mature understanding of these factors, all businesses can better understand their data and use this understanding to improve their offerings, increase efficiencies and better measure and enhance the success of their operation.

If you feel like you’re struggling to find your White Box, send us a message, we work with a range of clients everyday, unlocking the potential of their data, and refining these findings into profitable solutions.

Or looking for more insights and visualisations? Check them out here.

CommentaryJack Sloman
Harmful Pesticide Usage in the US - Visualisation
 

This week we welcomed Sai Diwakar Bhrugubanda to the White Box team. Sai has kicked things off with an interesting visualisation on the usage of harmful pesticide ingredients in the United States, relative to their respective usage in other countries including China, Brazil as well as the continent of Europe.

Context

The United States of America (USA), European Union (EU), Brazil (BRA) and China (CHN) are the largest agricultural producers and users of agricultural pesticides in the world, accounting for more than 50% of all global agricultural production.

Comparing the inclination and ability of different regulatory agencies to ban or eliminate pesticides that have the most potential for harm to humans and the environment provides us with a glimpse into the effectiveness of each nation’s pesticide regulatory laws and oversight.

The Data Sample

Pesticide Action Network (PAN) International maintains a list of pesticides that are banned in various countries. However, because of drawbacks with the data the analysis was done independently of PAN international list. Despite this, many of the same sources were used.

The United States Geological Survey (USGA) National Water-Quality Assessment Project maintains an online resource of annual pesticide use estimates for all pesticides in USA agriculture from 1992 forward.

We proceeded to plot data points for a 25 year period, from 1992 - 2016, with the approval status of over 500 agricultural pesticides used in the USA compared with the number approved in the EU, Brazil and China.

Statistics

Comparing the list of 500 active pesticide ingredients used in agricultural application in the US since 1970, the following countries banned a large number of them:

  • Europe - 72 ingredients,

  • Brazil - 17 ingredients,

  • China - 11 ingredients ,

  • And at least one other country within the data set banned 85 ingredients.

Considering the great deal of banned ingredients among other countries compared to the US, the quantity of pesticide use is alarming - China being the greatest consumer whilst seemingly having the least stringent regulations on dangerous pesticides.

 

Consumption of Pesticides

 

More than 10% of the total pesticide use in the USA in 2016 was from pesticide ingredients either banned, not approved or of unknown status in Brazil, China and the EU, a huge figure considering the enormity of US agricultural production.

Discussion

Of the pesticides banned in at least two of these nations, many have been implicated in acute pesticide poisonings (poison exposure to a single dose / repeated small amounts of doses of pesticides) in the USA .

From 2000 – 2015 there were over 1000 pesticide illnesses in California alone (largest agriculture producing state by value), with up to 100 poisoning incidents in the USA each year.

Worryingly, there has been 1 death per year since 2012 as a result of pesticide poisonings. On top of this, from 1990 -2014, there were 27 deaths, as well as 22 high-severe and 181 moderate-severe cases of illness.

Specifically, the National Indicate for occupational safety and health indicate between 1998 and 2011 – 43% of insecticide related illness in the USA involved cholinesterase.

Over 45 million pounds of agriculture pesticide use in the USA comes from the 13 pesticide that are banned or in the process of being phased out in at least two of the three other agricultural nations.

However 10 of the 13 are either banned, being phased out, not approved of unknown status in all three.

Conclusion

Total pesticide bans remain the most effective way to prevent intentional or accidental exposure to highly hazardous pesticides and can catalyse the transition to safer alternatives . Surprisingly, the USA is lagging when it comes to banning or phasing out pesticides that the top agricultural powers have identified as too harmful for use.

This is likely due to deficiencies in pesticide legislation in the USA. FIFRA gives the US EPA significant discretion on which pesticides it ultimately decides to cancel and makes the US EPA-initiated, non-voluntary cancellation process particularly onerous and politically fraught. This, in part, has led to an almost exclusive reliance on industry-initiated, voluntary cancellation of pesticides in the USA.

Without a change in the US EPA’s current reliance on voluntary mechanisms for pesticide cancellations, the USA will likely lag behind its peers in banning these harmful pesticides. Recent mitigation measures finalized by the US EPA, which include warning labels, extra training requirements and safer packaging standards that are fully supported by the pesticide industry, indicate that voluntary mitigations will likely be used in lieu of cancellations for at least some of these dangerous pesticides in the future.


This visualisation was created by Sai Diwakar Bhrugubanda.

For more fascinating visualisations and data stories, click here.

To keep up with all things data and White Box, follow us on our LinkedIn page.

 
VisualiseJack Sloman