Human in the loop Machine learning and AI for the people
Human in the loop Machine learning and AI for the people
Paco Nathan is a unicorn. It's a cliche, but gets the point across for someone who is equally versed in discussing AI with White House officials and Microsoft product managers, working on big data pipelines and organizing and part-taking in conferences such as Strata in his role as Director, Learning Group with O'Reilly Media.
Nathan has a mix of diverse background, hands-on involvement and broad vision that enables him to engage in all of those, having been active in AI, Data Science and Software Engineering for decades. The trigger for our discussion was his Human in the Loop (HITL) framework for machine learning (ML), presented in Strata EU.
Human in the loop
HITL is a mix and match approach that may help make ML both more efficient and approchable. Nathan calls HITL a design pattern, and it combines technical approaches as well as management aspects.
HITL combines two common ML variants, supervised and unsupervised learning. In supervised learning, curated (labeled) datasets are used by ML experts to train algorithms by adjusting parameters, in order to make accurate predictions for incoming data. In unsupervised learning, the idea is that running lots of data through an algorithm will reveal some sort of structure.
The less common ML variant that HITL builds on is called semi-supervised, and an important special case of that is known as "active learning." The idea is to take an ensemble of ML models, and let them "vote" on how to label each case of input data. When the models agree, their consensus gets used, typically as an automated approach.
When the models disagree or lack confidence, decision is delegated to human experts who handle the difficult edge cases. Choices made by experts are fed back to the system to iterate on training the ML models.
Nathan says active learning works well when you have have lots of inexpensive, unlabeled data -- an abundance of data, where the cost of labeling itself is a major expense. This is a very common scenario for most organizations outside of the Big Tech circle, which is what makes it interesting.
But technology alone is not enough. What could be a realistic way to bring ML, AI, and automation to mid-market businesses?
AI for the people
In Nathan's experience, most executives are struggling to grasp what the technology could do for them and identify suitable use cases. Especially for mid-market businesses, AI may seem like a far cry. But Nathan thinks they should start as soon as possible, and not look to outsource, for a number of reasons:
We are at a point where competition is heating up, and AI is key. Companies are happy to share code, but not data. The competition is going to be about data, who has the best data to use. If you're still struggling to move data from one silo to another, it means you're behind at least 2 or 3 years.
Better allocate resources now, because in 5 years there will already be the haves and have nots. The way most mid-market businesses get on board is by seeing, and sharing experiences with, early adopters in their industry. This gets them going, and they build confidence.
Getting your data management right is table stakes - you can't talk about AI without this. Some people think they can just leapfrog to AI. I don't think there will be a SaaS model for AI that does much beyond trivialize consumer use cases. "Alexa, book me a flight" is easy, but what about "Alexa, I want to learn about Kubernetes"? It will fall apart.
Human in the loop Machine learning and AI for the people
Paco Nathan is a unicorn. It’s a cliche, but gets the point across for someone who is equally versed in discussing AI with White House officials and Microsoft product managers, working on big data pipelines and organizing and part-taking in conferences such as Strata in his role as Director, Learning Group with O’Reilly Media.
Nathan has a mix of diverse background, hands-on involvement and broad vision that enables him to engage in all of those, having been active in AI, Data Science and Software Engineering for decades. The trigger for our discussion was his Human in the Loop (HITL) framework for machine learning (ML), presented in Strata EU.
Human in the loop
HITL is a mix and match approach that may help make ML both more efficient and approchable. Nathan calls HITL a design pattern, and it combines technical approaches as well as management aspects.
HITL combines two common ML variants, supervised and unsupervised learning. In supervised learning, curated (labeled) datasets are used by ML experts to train algorithms by adjusting parameters, in order to make accurate predictions for incoming data. In unsupervised learning, the idea is that running lots of data through an algorithm will reveal some sort of structure.
The less common ML variant that HITL builds on is called semi-supervised, and an important special case of that is known as “active learning.” The idea is to take an ensemble of ML models, and let them “vote” on how to label each case of input data. When the models agree, their consensus gets used, typically as an automated approach.
When the models disagree or lack confidence, decision is delegated to human experts who handle the difficult edge cases. Choices made by experts are fed back to the system to iterate on training the ML models.
Nathan says active learning works well when you have have lots of inexpensive, unlabeled data — an abundance of data, where the cost of labeling itself is a major expense. This is a very common scenario for most organizations outside of the Big Tech circle, which is what makes it interesting.
But technology alone is not enough. What could be a realistic way to bring ML, AI, and automation to mid-market businesses?
AI for the people
In Nathan’s experience, most executives are struggling to grasp what the technology could do for them and identify suitable use cases. Especially for mid-market businesses, AI may seem like a far cry. But Nathan thinks they should start as soon as possible, and not look to outsource, for a number of reasons:
We are at a point where competition is heating up, and AI is key. Companies are happy to share code, but not data. The competition is going to be about data, who has the best data to use. If you’re still struggling to move data from one silo to another, it means you’re behind at least 2 or 3 years.
Better allocate resources now, because in 5 years there will already be the haves and have nots. The way most mid-market businesses get on board is by seeing, and sharing experiences with, early adopters in their industry. This gets them going, and they build confidence.
Getting your data management right is table stakes – you can’t talk about AI without this. Some people think they can just leapfrog to AI. I don’t think there will be a SaaS model for AI that does much beyond trivialize consumer use cases. “Alexa, book me a flight” is easy, but what about “Alexa, I want to learn about Kubernetes”? It will fall apart.