It’s 2019 and we all hear ‘digital’ this and ‘intelligent’ that. Machine learning may perhaps be overtaking all others as the buzziest phrase of the day. But, unlike its buzz-worthy competitors, machine learning is real, its tangible and it’s already delivering big business value.
As companies deploy new smart sensors and implement expensive new back-end systems to harvest and crunch data, two primary questions present themselves. 1) How can we turn data into information, and in turn, generate actionable knowledge from that information? 2) Is machine learning capable of delivering any kind of value if the data it’s transacting is not accurate and comprehensive? This quick read will provide some of the key pitfalls in data driven machine learning.
“51% of CIO’s indicate that data quality is the #1 barrier in adoption of machine learning” according to a recent study conducted by ServiceNow and Oxford Economics where 500 CIO’s across 11 countries and 25 distinct industries were surveyed.
High quality data is key. The phrase “garbage in, garbage out” has been used for decades. From high-school algebra teachers to the brightest data scientists of the day, they will all tell the integrity of your output is predicated by the quality of your inputs. Machine learning uses powerful algorithms to analyze gigantic data sets that YOU provide. Bad data = poor ML performance. Correctly labeled, robust, enriched datasets are critical. Constantly be on-guard for skews between training data and production data and place an emphasis on retraining models frequently to avoid failures resulting from stale models.
Focus on what is important. Machine learning is NOT all about fancy algorithms developed by mathematicians. Instead its value is derived from data cleansing and feature engineering. Those skilled in employing machine learning technologies will share that more time is put into transforming raw information into features more representative of the signals in the data, than in modifications to the algorithms themselves.
Diversity in data sets is critical. One primary difference between machine learning and human learning is semantics. As humans, when we learn, not only do we begin to understand the rules of processes but we also consider important environmental factors. Machines operate in the black and white. Machine learning relies upon analyzing massive data sets for patterns and machines are programmed to identify the most obvious patterns first. However, to achieve powerful and valuable outcomes, highly diverse data sets are necessary to ensure the machine sees a wide range of counterexamples that scratch out false patterns.
Beware of operator error. We have long thought that part of the perfection of machines is that they remove human bias. However, we can’t forget that all machines and systems are programmed. Machine learning errors are much more likely to occur because of introducing human error into the training data, rather than the in the algorithms themselves. Once the error is inserted into the training data, systems can generate outputs with even greater bias than the operators themselves by continuing to generate new training data that reinforces those biases.
“Only 41% of CIOs state that they have developed methods to monitor for mistakes resulting from machine learning” according to a recent study conducted by ServiceNow and Oxford Economics.
So, to recap...four key pillars for machine learning success:
- Focus on data quality that is well-governed for long-term integrity
- Voluminous, highly diversified data is important to provide context to analysis
- Machine learning is only as effective as the inputs provided
- Beware operator error and continually retrain your models
Clean, high-quality data is your growth fuel. To learn more about data quality assessments, migrations and governance contact our team and let’s talk.