Episode: 187 |
Josh Miramant:
Machine Learning:


Josh Miramant

Machine Learning

Show Notes

Our guest today is Josh Miramant, the Founder of CEO of Blue Orange Digital, a full-stack data science agency using machine learning to simplify business decisions.

In our discussion, Josh provides several case studies of how his firm is helping clients improve decisions through better predictions, including one project for a hedge fund that analyzes a wide range of data on job applicants to help the fund hire analysts who are likely to perform better.

I also learn a new term: data lake

Blue Orange regularly partners with consultants, and if you have a client you think you could benefit from their help, Josh would love to discuss it with you.  You can reach Josh at: josh@blueorange.digital

And you can read more on their website: blueorgange.digital

One weekly email with bonus materials and summaries of each new episode:

Will Bachman: Hello, Josh. Welcome to the show.
Josh Miramant: Hi, Will. Thanks for having me.
Will Bachman: Awesome, and we are sitting here in global headquarters in New York City, of your firm.
Josh Miramant: Recently located to right in Manhattan here.
Will Bachman: Fantastic. Usually we do this by phone, but it’s really nice to see you in person, so face-to-face. Josh, let’s start … we’ll get to your bio, your firm, but just what is machine learning? Help me understand some of these terms. Machine learning, artificial intelligence, data science, define those terms for me. How are they connected?
Josh Miramant: That’s a great starting point. One thing I will say, there’s just a ton of hype and media and terms that are used interchangeably, and I think starting with definitions is the best place and I’ll try to stay consistent here. There has been this massive advent of the introduction of AI, artificial intelligence, and that’s a very encompassing term that’s actually not new at all, and really not as notable as the specificity of what we do for work and where the intersection of artificial intelligence and business impact occur is at a much lower level. Just from a definition’s perspective, artificial intelligence is really when a computer is making a decision for … of any process or something that’s modeled similar to what a human would do, and that can be as basic as an if-then statement like if you come up to an option and you do one of two things with a conditional, that’s an artificial intelligence. It’s a machine making a decision.
There’s a popular app called IFTTT, If This. Then That. It’s used all over and it automates processes, and in its simplest and most encompassing term, AI is an if-then decision tree. Ultimately, that’s not the most interesting part or where the hype is coming from and what’s changing the entire world of automated decision-making and cheap prediction, which I’ll talk to in a bit. Inside of that there are far more sophisticated tools, so inside the corpus of artificial intelligence you have a subset of tools called machine learning, and those again have been around for a while, but with the recent advent of cheap data and decreased cost of computer power, we’re actually able to see some amazing outcomes of prediction that are being produced by machine learning. I’ll emphasize the term prediction a lot and I think that’s important to bring into context in a moment because it’s where it intersects with the business value. Where this tools meets business value is prediction.
The thing that’s really popularized the AI media explosion recently is the introduction of deep learning, or neural nets. These are things that have popped up with many examples lately, from Google’s DeepMind producing AlphaGo and teaching itself how to play chess and become the best in the world by all standards. All the way from Google’s, sorry, Amazon’s recommendation engine that suggests different products and is getting extraordinarily successful at providing product recommendations for others. The examples that you could give are almost endless now, that AI has become the high level as being talked about, but what we’re really starting to see is machine learning algorithms, which is a subset of it, and deep learning neural nets have become very ubiquitous.
It’s actually a reducing layer of specific tools that ultimately are all resources to deliver higher quality and higher accuracy of prediction, which is really this intersection of where decision-making is made in businesses. One other kind of area that I think is important that you noted is where data science falls, because that’s a very high, heavy buzz word environment. Data science is probably actually a four or five different roles that are folded into one around the practice of computer science, data engineering, data munging and actually putting your data in a form where you can run it through these data science, excuse me, machine learning tools. Data science incorporates these resources of machine learning and the aggregation of all the data to get results and to provide these predictive answers that are now with cheaper computing costs something that we can do in applications that we’d never thought before.
Will Bachman: Machine learning learning and neural nets, what’s the relationship between those terms? Is neural nets just one type of machine learning? Or maybe [crosstalk 00:04:50]-
Josh Miramant: Exactly.
Will Bachman: What’s the difference? If there’s multiple sort of subcategories of machine learning, walk me through those different categories.
Josh Miramant: That’s a great question. There are a number of tools inside of machine learning that are actually different algorithms, and a class of algorithm is a neural net. That’s actually a subset of machine learning tools called deep learning and this is actually an interesting background. These neural nets actually were quite popularized in the early ’90s and they were… a number of engineers across the computer science field and academia were saying, “These will change the way we’re able to think about prediction and optimizing services”, yet they never came to fruition and they kind of went away. After-
Will Bachman: Why was that?
Josh Miramant: Well, for two specific reasons that’s changed to today. One is cost of computing, and two is the amount of data available. They’re very greedy algorithms, and what this means is they just need large, large sets of data and it needs to be structured in a formal way. Particularly back then, you needed to have it stored in a way that could be accessed very quickly and very different permutations, and that storage capability didn’t exist in the technology of the time.
Will Bachman: We’re talking like how many gigabytes or terabytes or…
Josh Miramant: At that time, for the problem… it’s very dependent on the problem you’re solving, but we’re talking hundreds of gigabytes of data to solve very small sets of problems, but even more important, the computing cost and computing restrictions were without being able to distribute your computing process, which can run at between multiple computers and have it run concurrently, that didn’t exist to sufficiently feed these very greedy algorithms.
That’s the big thing that has changed in the last… we’re getting into the last five years where neural nets have become… frankly, due to heavy investments with a company like Amazon, Facebook, Google, OpenAI, governments, academia, they’ve invested heavily to both increase the performance of capable computing power and decrease the cost with the advent of cloud computing. You can spin up an Amazon web service or Google Cloud provider or Microsoft as your… any of these are able to spin up incredible super computers and the click of a button for pennies on the dollar for where that was even two years ago.
This is now introducing the capabilities of neural nets and other machine learning algorithms to answer questions that we could have never applied them to before. That’s why where the hype… there’s certainly some overhype and misapplication of the value, but when you’re coming to business decisions and applying this at scale across where we intersect with BI, business intelligence, we can actually now apply these algorithms to answer questions that we used to just have inferred models and there was statistics behind it. Now, we have what are really superpower statistics through models that are driven by machine learning, so that’s now able to distribute pretty openly through the business sector.
Will Bachman: That’s very cool. Let’s maybe talk through some examples and maybe you can walk us through some case examples that your firm has worked on.
Josh Miramant: There are many. I think there are a few things to classify where I think we’ll just right into some case studies, but I think it’s all good to frame the problem set with the framework, and I’ve noted this a couple of times with the word “prediction”. I think it’s important to look a little bit about what machine learning is doing with this becoming so ubiquitous and what the prediction framework is that we’re getting here. At the core of what we’re talking about and what prediction really is is it’s the core driver of any decision-making.
If you look at the example of what machine learning comes from, compare it to the advent of the spreadsheet. If you’re thinking about financial modeling for your company prior to the spreadsheet, you’re accountant would have to be two things. A, good at adding and putting all your numbers together, and then B, the second example is making predictions with the outcome of that model.
If you were to make changes prior to the spreadsheet, that was a laborious task, very difficult to do. With the advent of spreadsheet, they no longer had to be good at adding. That was something a spreadsheet handled. They could just make tweaks and then ask good questions with that data, and then you could think about improving your financial planning because of the spreadsheet. Machine learning is very similar in what it is doing and what prediction changes of how you actually make decisions in business. Before, you would just look at a dashboard and say, “Okay, here’s what’s happened last year. Let’s internalize this decision.” Maybe do some modeling. Say, “If this, then”, like rule-based conditions that you established, that’s pretty much all modeling in business decision-making, and then you would make the best inferred decision you could get from that.
Now, we can outsource that for cheap and quite simply to machine learning algorithms. We can say, “We have this constrained business problem”, and let’s apply some examples as you requested. Let’s take one of the verticals that we at Blue Orange Digital spend a healthy amount of time on, is talent analytics. We cover hiring, we cover attrition, which is keeping good talent in place, comp decisions. All of these areas are somewhat qualitatively assessed right now. We say, “Let’s just look at hiring.” This is a very biased-ridden, rife for improvement sector that everyone feels is actually derived from very human, qualitative assessment. If you ask any HR team, they feel that, “The more I talk to somebody, the more I know whether they’re effective for this role.”
The data shows quite the opposite, actually. The more interviews you have after I think it’s typically one, you actually denigrate the quality of efficacy in a role. People are horrible at making decisions in general, and you can look at Daniel Kahneman’s Thinking, Fast and Slow, Dan Ariely’s Predictably Irrational to prove that. In hiring decisions, it’s no different. Same in forecasting, we are not good at looking at statistically-inferred outcomes like machines can.
An example that I would provide there, and we’ve done this work where we pulled together a large data set of information on pretty much the entire universe of potentially hired candidates, and then we are able to generate and pull performance data out of this one, happened to be applying it for a hedge fund. We had market-derived performance data of every analyst inside of their employment pool and then we were able to actually apply a machine learning algorithm. This is a classification of machine learning called supervised machine learning algorithm to make a prediction on traits that would correlate to being effective at your role. If you look at how a lot of those filters that we had before would just be like, “Well, you went to Harvard and you had 4.0, and then you went to Wharton and so we’re going to throw money at you and you will solve our problem.”
What applying a data-driven approach to this is actually saying, “Let’s give it… feed it all the data that we have around performance and let a machine determine the characteristics and the correlative factors that indicate an effective hire.” It took us a day after we aggregated data to prove that a Wharton 4.0 is a poor correlate to an effective outcome. It’s not a direct indicator. There are many great analysts that are from Harvard and many poor analysts from Harvard, so that’s a bad proxy indicator. That is not the case with our machine learning algorithm. We can increase our confidence in a strong predictor of a good fit as an analyst at a hedge fund by an increase of like 15%. We can be more confident in the outcome of that hiring.
Will Bachman: Let’s walk through that example a little bit because I guess I can easily imagine how a hiring algorithm could help with a role where you have a really large number of hires. Like maybe if you’re McDonald’s, say, or you’re Walmart and you’re trying to figure out who’s going to be a good associate hire because you have such a large number of hires that you can A/B test it and look at common characteristics and people’s background. I guess I wouldn’t have anticipated that it could be useful where you have maybe a small number of hires that you’re going to make because it’s sort of hard to look at the counterfactual.
Maybe give me a little bit more detail. Directly, can you give me for this… you can keep it sanitized and not tell me the name of the client, but what was the role that you were trying to hire for? Describe that a little bit. What’s an analyst at a hedge fund do? I should know. but I don’t know what that means.
Josh Miramant: No, no. It’s a good question, and they are somewhat of a volume job at a hedge fund. They’re the ones that do… in large hedge funds there are portfolio managers and then analysts and then research directors and that kind of hierarchy. Each portfolio manager has a cluster of analysts, and then there are many portfolio managers that make up a hedge fund and they all make decisions that go into a book and then they invest on the aggregate level. There’s different formats that exist in said hedge funds, but that’s a rough formula.
One of the things that’s important to note, because you bring up a very good point, when you have a statistically-inferred outcome, it is one of the parts that you’re doing is actually producing the accuracy of that model so you know the range in which it is accurate to the data that’s provided. You can say a model that’s going to be 93% accurate against the training data that we provided, and that’s something to talk to in a minute, that tells you how effective it is. On aggregate, 93% of the time with the training data provided, you will be able to know that you’re correct in the prediction that the indicators are being provided from the machine learning algorithm.
With an example like the analysts that we’re talking about here, it doesn’t work for a single entity, it works across to scale, so you’re bringing up the point of where this is just standard statistics. If we know that most of the time it’s accurate, not every time, an individual won’t be a strong correlate factor to the whole. That’s just never going to be the case. What you do need is scale on any of these. The scale in application, wherever it may be even with hiring, one of the things you’re able to do is look at like where we get to the point of making it with small data sets is we look at historical data.
We actually have large sets of historical data to draw from for our examples of all hires that have existed in the last 20 years plus the current performance of the market and sector. We can extrapolate all of those correlations and then feed it in as a training data set for getting those outcomes, and then the thing you have to recall, this isn’t to say that this person is going to be perfectly successful, because that’s an individual level, that’s it could or could not fall on either side of that 93% accuracy level. The thing that’s happening here is you’re replacing a far, far more broken indicator. All the fraught effort of one-to-one, face-to-face interviews are very ineffective actually and the data reflects that.
I can’t look at specific examples inside the hedge fund because that’s their proprietary information, but if you look at the efficacy of hiring absent of data, quite a number of analysts churn out of the hedge fund space. They don’t stick with MI. That’s a short-lived job and it’s extremely expensive to train for. It’s extremely expensive to hire for. It’s a pretty small environment with a lot of competition, so when you’re able to find fit for a role through an unbiased indicator, statistically-inferred correlative factor, that will increase your ability to ensure you have fit more likely than you would if you were just using a proxy indicator like a resume or even fact-to-face interviews, which is another area we can actually get more… a data science application here is not just a global indicator this is good fit.
There’s also additional things that like applying this one topic you brought up earlier which is another subset of machine learning called natural language processing, or termed often NLP. That’s an area that makes machines able to read and contextualize information that’s given to it. When you get a resume, a human can look at that and be like, “Oh, they were in debate club and looks like a healthy amount of information and they can qualitatively say, ‘This is fits in certain categories.’” Well, that’s actually very difficult for machines prior to something like a classification problem or a clustering problem with machine learning. It’s very hard for a machine to understand those pieces.
They can take in the data, there’s technologies like OCR, which is image detection where you can take the words out and turn it into words, but then actually understanding how to categorize that information to make it meaningful other than just presenting it to humans is the process of natural language processing, which is interpreting that. Things that you can extract out of that are like sentiment analysis, so you know what is actually being said.
Some of the areas that we are doing to remove bias in the hiring process with NLP is to consume interview summaries, so people’s handwritten notes that are… we’re never going to remove the human process of a decision-making here. That’s not something that machine learning can offer. It still has to feed back through a human. There are applications of pure automation, but that’s not the work that we do.
Something like natural language processing will allow a interview or somebody that’s printing all of the interview summary notes to remove bias from how they’re using language. It can extract the points they’re trying to make in a neutral framing and actually feed that into a decision tree and into the actual machine learning algorithm that we are developing instead of somebody saying, “Well, I need to go review the interview summaries and the hiring decision.” Well, now an algorithm can take that content and remove the bias of tonality and take the core content of the point they’re trying to make and pull it in and then feed it into a machine learning algorithm.
Will Bachman: What would the kind of data going into the algorithm be for these hiring decisions? Is it looking at their resumes? Is it looking at every tweet or Facebook post the person ever did in their life? Is it looking at their photograph or their college transcript? What are the kind of inputs into the algorithm?
Josh Miramant: It’s as much as you can get, so all of that. In this case, we built the data infrastructure, the data warehousing that contained all of that we could have plus social media information. That we were getting club information from colleges because they hire out of a relatively refined environment. We correlate… in this example, we went out and fetched performance data with the market and correlated to the sectors that they were actively training with in other jobs to kind of get an idea of how effective they were in the sectors because you can see performance of a… there’s publicly-filed information from every hedge fund.
We’re able to go get that information of how effective the hedge fund did in certain sectors and then correlate it to individuals, which is another example of a perfect data science project where you’re saying, “How much can we infer about this person and then feed it into a machine as a feature of that algorithm to help it use this information to get a good glimpse of whether this person will be effective for the firm?”
Pretty much, this is called the data enrichment phase, which it’s kind of important to understand and we should talk a little bit about what a project looks like and the architecture of actually implementing it, but in this example, this would be towards the beginning of your process. First, you had to get your data in a unified environment, typically called data warehousing, or more popular now for this sort of systems like this, a data lake, which contains both structured and unstructured data to consume from as these models need.
Then, you enrich, which is to layer in additional pieces of information that the algorithms will need, and then you apply your prediction modeling, which any machine learning algorithm or deep learning model can be applied to get some outcome that you’re looking for. Then, you would release that or get to a point where you’re looking for feedback on that because that’s really the first part is just input data which teaches the model how to think, but the real value of once a model deployed comes from the feedback. People saying, “Well, this is effective and this is not.”
That’s the teaching phase, and that ultimately… a great example of that in practice, if you look at a lot of the startups entering into the AI space, there is this company that was doing image classification of crop yield and whether or not lettuce should be harvested. They were going through and just in the beginning, they just went out and took a bunch of pictures of lettuce heads and they said, “This one should be harvested, this one should be not.” They just had a very small set of data actually when they started. Just enough to do a high enough competency to beat a very low… like right now, there was no effective of way of knowing when to harvest lettuce or not. You’d harvest bad lettuce, it would be mixing them together and somebody would have to manually pick it out.
The solution was to prevent the harvesting of it because the machine can choose whether to pluck it or not, and so they took a bunch of pictures. They just walked over a large field and manually did this, fed it into a machine learning algorithm, and then they were able to do a pretty poorly accurate machine learning at a low competency rate in their prediction. Once they put it up there, they would actually take a whole bunch more pictures and train it on those pictures of whether it should or should not have harvested.
It did okay, but anytime there was a false positive or a false negative, far more common a false negative, you’d train that and grow the data set, and then another farmer would implement it because the accuracy was higher. They would take more pictures of the yield and they now have the largest corpus of crop informational data set to train these models and the accuracy is up in the high 90s now. They know with certainty by the feedback loop that this machine can detect whether you should harvest this crop, and then they were able to get into yield prediction years out with weather patterns and all of this because they have that data plus this data set, and that’s the kind of future prediction which is hugely saving.
This company was acquired immediately in the first year and a half, they’re just Stanford grads, and it was acquired by John Deere. A behemoth snapped them up because this is something that’s distributed out as a commodity to every single one of the people that are buying a tractor at scale. That is a small company getting small sets of data, and then cycling feedback to increase prediction.
That’s really where almost in all applications you’re looking for, whether it be hiring, whether it be financial forecasting, whether it be in Google Search, all of these things, it’s the data that underpins the models themselves that make it valuable. They need to be trained in time so just the initial data set is not the only thing that matters. That’s just getting it up to a level of defensibility, of high enough confidence so you can introduce it to let it start being trained to get really good.
Will Bachman: I’d like to have one of those for when to cut up an avocado. Sometimes you cut one up and it’s like still not really ripe and you kind of waste it. Sometimes it’s like, “Oh, I waited too long”, so that’ll be awesome.
Josh Miramant: It’s a perfect example of if you reduce down, and this is the thing that’s [crosstalk 00:25:16]-
Will Bachman: App for that, like you take a picture of your avocado.
Josh Miramant: Absolutely. If you were able to crowdsource that one… actually, it reminds me of the HBO show Silicon Valley where they have “is hot dog or not”. I think that’s the hipster solution for it is to take a picture of an avocado, is ripe or not?
Will Bachman: You can make your avocados totally straight, you use [crosstalk 00:25:36] avocado [crosstalk 00:25:36]-
Josh Miramant: It’s a necessity.
Will Bachman: On this system, you look at kind of everything you can gather on the person’s background, right? The resume, all the tweets, all the Facebook posts that you can scrape from the universe, whether they were a college athlete or not or whatever. Put all of this in there in the prediction engine, and then you can then the engine starts making recommendations on who to hire from the talent pool and then presumably the company then follows that recommendation, or maybe they don’t. Maybe they override it, but you can keep track and then you can see the performance of those who get hired and see if the AI, or the machine learning made good recommendations, if they were better.
I suppose you can’t really check the counterfactual, so if it recommended not hiring someone and you didn’t hire them, then you would never know if you missed someone who is awesome, right?
Josh Miramant: That’s a great point, but that’s not a problem that is with the advent of machine learning or not. There’s a counterfactual to just standard hiring practices that are rife with bias, that they’re proven to be inconsistent, so it’s measuring against the wrong benchmark I’d argue.
Will Bachman: You can’t do that, but the people that you do hire, I suppose you could also really test it by saying, “Well, there’s some people that the algorithm said not to hire, but we’re going to hire them anyway and see how they do.” Then, you can use that crop of new analysts and see their performance and that can then help improve the algorithm. You can actually put in the performance of each analyst and improve the algorithm. Is that kind of what’s happening with one of your clients?
Josh Miramant: That is, and I think there’s a couple of important points that you mentioned in that description of the process, one being the decision-making still being handled by humans. That’s an important factor here. This is something that is feeding better information to a decision-maker, and in almost all applications that we work on, that’s what our aim is. That’s where we’re improving business analytics. Where what I term is we’re providing prediction in advanced analytics.
We can answer questions better and more efficiently with unbiased indicators, because when you want to look at this example with hiring that you just explained really well, if you think about where what you’re comparising… comparing to, excuse me, is if you have a picture on a resume, it immediately shows or a name that’s nontraditional, bias is just… every study showed that humans will act differently, and frankly, quite discriminatorily, against people with just a picture added or nontraditional name, all these things.
When you’re saying comparing this to counterfactual, I find it’s kind of the wrong benchmark, opposed to comparing against what we deal with, the human condition. That’s typically the thing we’re looking to improve against and that is actually an interesting… what we’re trying to do from a data science perspective when we put it together, there’s a term that we apply called the expected value framework. This is the simplest of terms, we actually just apply a measurement between what would have been done and what is improved by adding our algorithm to a process because there needs to be a measurable ROI when you’re justifying to the business, justifying to your executives. This isn’t just something that comes in and it’s a solution and there’s immediate buy-in. There’s too much mystery in the algorithm, so you need a benchmark.
What we attempt to do, there’s algorithmic ways to apply measurement of what was happening and what is changed by the application of machine learning in the decision process just to show that if you were to take the trajectory of a decision based on this machine learning, doesn’t matter if you do or not, you can still extrapolate what would happen against that example. You’re saying, “We make the decision, machine learning.” What if you didn’t do that decision? We can apply the delta between those two of the expected value framework. There’s a lot of application because all of this stuff is reduced down to math. That’s at the core of what we’re doing here.
Will Bachman: Josh, you talked about reducing bias several times. What’s been your experience so far with the involvement of machine learning and the algorithmic approach to issues of diversity and inclusion? I could imagine a world in which the algorithm goes and does its thing and you could make actually things worse, like you could get even a lower percentage of women or minorities being recommended or it could go the other way and actually you ended up with better diversity and inclusion. What’s been your experience so far with the recommendations of the algorithm?
Josh Miramant: That’s a great question and famously Amazon, throughout their machine learning hiring process, because it was effectively racist, that was what they found out which isn’t… there’s a number of examples where machine learning supports discrimination. If you look at another famous example of I think it was recidivism in granting bail, or bonds, excuse me. Granting bail from incarcerated criminals in this case. They realized that if you were black you had I think it was a six-time more likely chance of being rejected for bail, and if you actually explore the correlative factor, this was somewhat of a black box AI, which we just give input data and it gives a recommendation. We didn’t really know the process, but if you explore the correlation to that, you found that the race was the single factor that ML was using, so that’s why they chose not to use it going forward.
That’s actually a pretty I think… that definitely can occur. It does occur with… let’s take the example of hedge funds, where I have a client who’s example we’ve been following through here that that’s a heavily white male-dominated industry. You look at the stats on it, it is. If you fed that information without any training or any future training against it, hyper parameters to give it some rules against just training on that factor alone, you would just affirm that condition, which is what you found in these other two examples. The actual charter of it actually has nothing to do with it. You can train, and this is where you need humans to come in and help teach the model differently.
You need to train the model differently… is to address that, see that correlation, and then find correlations to factors that aren’t related to that one. There is, as I introduced before, this is just a better way of getting predictions and outcomes. You would see that same thing if you were to say, “Well, Harvard, they do a pre-qualification”, but you’re hiring a very specific type of person right now. If you use Harvard prior to using machine learning algorithms, it’s not solving the problem, either. There’s a very specific subset of people that go to Harvard, to Wharton, and then go into hedge fund world. That’s a very discriminative, selective… it’s a selection of its own, self-selecting discrimination.
Again, you have to measure that you can tune a machine learning model to find correlations outside of that factor. You can’t tune to self-selection discrimination. There’s actually control with unbiased indicators that works against discrimination in a system that didn’t have that before. The charter of where we were brought in this hedge fund… the woman that brought us in is actually trying to do this exact thing to avoid exact discrimination, so we have one of our knowledge engineers working specifically on that problem.
Will Bachman: If you’re going to use a machine learning system to help with hiring, is there sort of some minimum number of hires that you really need to be making? I’m just guessing, but tell me if I’m wrong, that if you’re hiring one CEO or one Vice President of Procurement, maybe you’re not going to have enough reps or data set to help you. It seems like it’s applicable to a class of problem where you’re hiring a hundred analysts or where McKinsey is hiring, whatever, 5,000 new associates out of business school, but if you’re hiring just one unique person for one role, maybe it’s not going to be applicable. I’d love to hear your reaction. What’s the breakpoint on that?
Josh Miramant: That’s a great question, it’s dead accurate. There’s an area… if you think about additional applications of machine learning where you get those predictions, on anything that scales down to low number of usage is not an effective use case for it. You could say like, “Look at all characteristics of all CEOs”, and predict those traits who is good, but there’s a whole bunch more fit for firm. There’s a thing that you can apply aggregate statistics. That would be a horrible application but it wouldn’t help improve anything. There’s not statistical meaningful feedback loop to the model training. There’s nothing that would be valuable, but I would argue the application of that scale that you could see is things like compensate for executives. You can predict like retention at firms and attrition, and on the other side of that you can predict when people… the attrition cost of people.
Actually, you can calculate the cost of an individual at the specific role saying, “Okay, we have a marketing manager and they’re a 35% likelihood of departure in this period of time, or there’s this one that’s a head of product and they’re a 95% likelihood of departure but their cost of replacement is X and the cost of marketing that. You can start to get a data-driven view of your entire workforce derived from data that you have internally. That’s a perfect example of something that would be impossible to model and make predictions without putting it in the data system and running a ration on top of it.
Will Bachman: We kind of spent some time on hiring. Let’s talk about the attrition a little bit. At least I’ve read in the popular press how some companies are now just kind of monitoring employees’ emails or maybe even their activity on LinkedIn. If somebody starts brushing up their LinkedIn profile, that’s kind of an obvious indicator maybe they’re getting ready to look for a job, but you could perhaps identify other characteristics. If they start having unusual “medical appointments” offsite or something or their might just be other characteristics in terms of travel or phone calls.
What are companies doing in the area of sort of predicting who may attrite and then trying to retain those people proactively? Talk about what you’ve sort of seen in general and anything that your firm has been working on.
Josh Miramant: I just want to characterize the problem a little bit… frame it a little differently than that because the classic is like somebody updates their LinkedIn profile, as you mentioned. That’s like, “Okay, they’re leaving.” That is a human-derived correlation. You see that and A equals the potential for B. A data-driven approach is if you just look at… put a bunch of data into a system and you say, “Okay, after this period of time and that factor may weigh in but it may not be an indicator, but in this period of time and this role and then this comp package and all these other pieces put together, the churn, if you can weight all of these things into your system, you can start absent of somebody updating their LinkedIn profile, you can start to get a number.
An increasing likelihood of them leaving absent it and then you can make the decision like, “Well, what if we give them a raise early? A nine-month raise because we really want to keep them.” This Scala engineer is a very high… senior Scala engineer is extremely hard to replace, or data scientists are very hard to rehire. That is something that you can then see that the cost of retaining is worth the investment to play and you can actually play with those predictions that you’re getting and adjust the invest that you make into that person. You can see the ones you care about retaining or you’re potentially okay with the churn, and you don’t need some of those traditional… these are potentially absent of everyone updating their LinkedIn profile.
You can look at the frequency of poached data, how frequently the churn of senior Scala engineer, feed that into a system and then you can actually get a number that can weigh when you should be doing your raises, how much should you be raising for to keep the people that you care to keep or potentially the opposite. There’s an acceptable amount of churn where you see a productivity variant. We’ve done work around, and this is a human-provided context, a feature to our modeling, but some churn in analysts at hedge funds is a good thing.
Getting new people for an ideation problem and balancing out convergent and divergent thought. I’ve actually done a lot of work on my company that was founded out of Blue Orange around personality assessment for job fit and those kinds of things. It’s another model problem, but there’s actually benefits to some of that that you want to incent, too, or you don’t need to pay more than you need to keep that churn if that’s what you’re looking for. There are examples that it’s still providing a framework in which you can make that decision and which wasn’t capable before. The action with that information is still human driven and that’s where you demystify what machine learning is doing.
It’s just giving a mathematical framework to look at probability and percentage of things occurring. That’s all we’re looking at, and when given a well-constrained problem, and there are many, it’s a vast range of where you can provide… get enough data to get a… if you look at a logistical regression model or a regression model of some sort to decide between two outcomes. Will they leave or not? Those are not particularly data-hungry models. They’re something that you can get with pretty low… it will tell you the accuracy that you have within it, but as a framework for unbiased decision-making on what you need to do to retain the highly expensive hire, if you’re a large tech company, how much should you directly spend to retain a senior data scientist could be a major movement in your bottom line and not overpaying for that problem.
We now have prediction to allow a framework to be established in your accounting process, and that’s what applying regression in a supervised learning model inside of that case would allow you to get an outcome in which you can make that decision and actually benchmark at something against just, “Well, before somebody left at this period, so let’s throw money at the problem.” That’s just a crude way of doing it.
Will Bachman: Great. Maybe you could walk us through… we sort of talked about sort of the nature of one of these assignments, but give me a little bit more detail of what actually happens and sort of the stages of one of these projects. You could use one of these or we could pick a different example that your firm has done of what actually happens in terms of, “Okay, you first have a discussion and then you have to go collect data.” How does that happen? What are the steps? I’d love to understand very tactically. For consultants who are listening to this show and who are not data scientists but may encounter a situation where they’re going to work on this kind of a project in cooperation with a team of data scientists. Walk me through what we would experience.
Josh Miramant: This is really the highly valuable question and something I think is actually very interesting to your network and the Umbrex environment overall is there are… we’ve really broken it down into three core stages, which are quite large stages, each of themselves, but three core stages to getting to actualize the value of machine learning in a business model and business practices, which Blue Orange specifically focuses on that area. There’s many other areas of just general machine learning research. We focus on machine learning applied to business problems. We want to prove the predictive analytics for business intelligence.
The kind of three core steps start with data aggregation, getting the data in a warehouse or a data lake so it’s able to be consumable by the prediction models that you put in place. The second is the data science work, which is the machine learning algorithms and the prediction work that needs to be done. The third is integration. It’s visualization enriching CRMs or wherever, and there’s quite a range of how we’ve done work. Whether it may just be reports, simply giving reports to decision-makers, whatever that end goal. Before we just flush those three topics out, there is an important thing to note inside of each of these three steps. Those are the tactical, that’s the work that Blue Orange does.
We put data together in a cloud-based environment that’s owned by our companies that are hiring us. We will build the models that will make these predictions. We will present them and show them to decision-makers, each of those three stage, but there is an important framework that we need to provide, which is there needs to be buy-in from the organization, and that’s where there’s a really important relationship between the work that we do as an agency and something like your network, consultants that are seeing these opportunities for prediction optimizing systems inside of businesses. That’s the area that’s probably the most important for success of machine learning and predictive modeling.
It’s much more about the buy-in for it, that people want to make decisions with this. They understand what’s happening in the process, that the day you get your first machine learning-derived predictive indicator as part of your decision process isn’t going to be perfect and that will improve over time and how that improvement is tracked and captured. That is something that is still very difficult for organizations to capture, but I’d say is the single most likely contributor to a failed project, a failed machine learning integration project is buy-in from the decision-makers and understanding this process.
There’s a ton of media right now about, “Oh, machine learning is changing all these pieces”, but where you see very successful integrated AI and machine learning solutions are companies that want this outcome. They care a lot about the deliverable and they think the AI is solving a problem, machine is learning a prediction problem for them. That’s the core when there’s a company that’s driven that way.
There’s a great book that I pull out of my framework from that I think is extremely valuable for your network to read. It’s called Prediction Machines.
Will Bachman: Who’s it by?
Josh Miramant: There’s three authors. Agrawal is the University of Toronto professor, the face of the book and he’s tacitly articulate on the position, but if you want to simplify the context of a lot of what we’re discussing today, I think that would be the number one book to read. If you look at the framework as he… an example that he provides in the book that I think is very interesting, if you were to just be a store in the ’90s and you put up a website, you’re not an E-commerce online store, you’re a retail store that has a website. I think that we’re in a similar transitional environment where, obviously, Amazon is the end of the world in E-commerce and there’s no competition between somebody with a website from standard brick and mortar that’s flying out a website. They were a web-first approach and they’ve eaten E-commerce.
AI is similar where just getting a machine learning predicting model, integrating it and saying, “This is going to change everything”, that’s actually not enough to get the effect that you need out of it. You need to understand that this needs feedback data. There needs to be buy-in from like additional integrations that are providing feedback to the models to improve how predictions are made because right now, the value of prediction is more important than the apps they’re based on.
A quick example on this is Google and Bing are both search engines. They both use a similar page rank model, which Larry Page invented it for referential information, but purely the data that feeds the AI in Google is drastically better, and no one would argue, than Bing because they have so many search and the volume and the feedback that’s provided back to this machine and proves it so drastically that the product is no longer competitive. It monopolizes the product, frankly. The only differentiation between those at a code and algorithm level is the data that’s fed into it.
If you look at meta failure of machine learning, and Bing isn’t a failure for how Microsoft wants it. They want search entirely as a separate use, but of democratizing… sort of like becoming a monopoly in search, Bing has lost, and that’s the only difference there is, the data that feeds the algorithm underlying. They came too late. They missed the mark on that, and so that’s the example of when you look at when you’re applying it, you can apply these strategically to get systemic answers on… or specific answers, excuse me, to problems.
If you think about when you’re putting a system in play that relies on these systems being trained and there’s buy-in from the team to provide feedback and get the data and continue to grow your data investment as an organization, that transition is it’s not just for a specific solution around hiring, and hedge funds do this extremely well because they make large decisions around for trading, but this warehouse around their people analytics is used in many application, and those were a couple of examples. The growth of the data that they are enriching in their data lake now lets them answer even more complex questions and they’ve invested in a data warehouse that allows more sophisticated and more specific machine learning solutions to come to fruition.
I think the example correlative to this is the introduction of just the one specific machine learning algorithm isn’t transitioning to solve all problems, it’s investing in the data warehouse and investing in kind of the evolution of feedback that requires machine learning models to then automate, decreased costs, the area which they completely transform how decisions are made inside of organizations.
Will Bachman: I haven’t heard of a data lake before our conversation today.
Josh Miramant: That’s a great question.
Will Bachman: What is a data lake? I assume it has nothing to do with a lake, but what’s the metaphor and what is it actually?
Josh Miramant: I’ll give the kind of description of it and then explain it a bit because I think a description for context is important. Traditionally, the term that is comparable would be data warehousing, where you put a bunch of databases together that are relational and you can store all of your data in a similar format and then say you have your organizational data warehousing, this is the traditional method. Say you have a marketing team, you check out what’s called a data mart, which is a subset of the data that’s relevant to the marketing team or say the sales team or another environment. That’s how it was always stored before.
Data lake, it’s an important nuance, but the data lake is just this open environment of all data, so that’s all structured and unstructured data. You can have raw texts, you can have speech files, any type of data that you have collected, it gets stored in an environment that any other business unit can just draw from and create their own data warehouse from or localized data environment drawn directly from that data lake. It’s an important distinction because when you look at feeding something like machine learning algorithms, they need lots of data that you wouldn’t necessarily think would be related and then the accessibility from that environment is it’s available, it’s accessible in a consistent way. It’s effectively stored in a very efficient manner to fetch it to each system.
Pretty much any forward-thinking company, and this is all with Amazon and Google Cloud platform, Amazon Web Service, all of these provide systems to set up data lakes where you can have structured, unstructured, all forms of data sitting in an environment that’s extremely quick to fetch from. You do the processing on it, the transform stage, after it’s been stored and that’s a very big importance to machine learning because they can pull right from the data lake and be very greedy and you can stand up computing process just to pull from this large, large corpus of data opposed to data marts where it’s saying, “Oh, well, you need transactional data and you need this one other factor and we’ll set that up.”
Well, what if you needed another sector, data from the database? Getting it into a data mart is a provisioning problem and you have to restructure it to be similar. That’s typically where organizations break down data unification. Data lake is just a more… their options is just a little bit cleaned up and why you’re making the decision and there are some cases where using a data warehouse is still important in data marting, but ultimately, most organizations that are getting very machine learning derived and do a lot of data science work want unified… the value is out of unified data insight. Almost all organizations live in the death of a thousand data warehouses. It’s never just one data warehouse, there’s five data warehouses internally and they’re all managed on different databases and they don’t talk to each other, a very common occurrence because they have a different structured data in each warehouse.
The marketing team had mixed panels split up with other transaction and that was in a MySQL environment, and then the dev team is tracking sales behavior in a NoSQL database, and those are different formats but they don’t quite relate. Storing them into a unified data environment and then having a data scientist pull from both of those data resources would be far easier than trying to fetch into two different environments that don’t necessarily have APIs or open access and their security protocol is different. That’s kind of transition of data warehousing to data lake that’s occurred to support more accessibility of data overall.
Will Bachman: Beyond a company’s own internal data, data scientists are in that first phase of the data aggregation and they want to get some data from outside company, are there now sort of data aggregators out there that you can kind of go shopping and say, “Hey, if I want to get someone who probably has already scrapped all the tweets that any person in the universe has done and we will sell you the ones that… you give us a bunch of names and we’ll give you all their tweets or we’ll give you what magazines they subscribe to or what car they owned.” Clearly, you can go into credit unions and get that kind of stuff, but are there sort of data sellers out there? What are some of the names of those? How does that work if you want to go buy… kind of go shopping for data?
Josh Miramant: That’s a great question and yes, it’s a huge business now. I’m sure everyone has heard the term “data is the new oil”, they talk about that. That is pretty apt in that it’s really become this massively valuable commodity and data aggregation is an incredibly valuable service. Depending on the sector, like in finance, we do a lot of predictive financial modeling for the finance space or for private equity, and some work in hedge funds on predictive area.
When we’re modeling companies and the founding teams of companies… you can run down the list of DataFox and PitchBook and Preqin all store massive amounts of information about companies. For $10,000 I can know pretty much everything I want from PitchBook about a specific sector of Series A to Series B companies, and then if I want all historical data they ever have, that’s an extra five grand. For 15,000 I can have a pretty healthy amount of historical snapshot on a specific sector of Series A to B companies from PitchBook that could transfer my system, and we work with this all the time.
Further, there’s massive amounts of marketing enrichment information, because marketing attribution and marketing problems are very well defined for machine learning. How do you attribute something to a marketing campaign and chase that person through different systems? You need to have that person. What is that entity? How do you enrich it? There are People Data Labs and Clearbit and companies that… there are many, probably hundreds in this space. Those are some of the bigger ones, ones that I like that have large sets of data, but depending on what area you’re in, those are great for startups and consumer goods and finance.
There’s other services that have very specific analysis information, OneWire. There are a massive number in every vertical that you would get. E-commerce, I can’t even count the number of APIs and FTP enrichment services that we use which are just the ways we get the data into our system that we use to enrich stage. Not only is it inferred enrichments, which we do algorithms to get some inferred indicator that we put into the database, we also pull in large sets of data.
Another example just to pull in work that we’ve done where machine learning is a perfect solution for is a very often necessary problem in data unification called entity resolution. Ask any engineer, this is something they tried to solve in rudimentary ways many at time, and it’s the idea that if you have two data sets that have… well, you’re in two records, in two different data sets, how a standard data set would relate that data use. You’d have an ID, so you’re number one in that data set, and anything that’s related to number one is about you.
If you have a data set that doesn’t have you and say it says William or your name is misspelled or it doesn’t have your first name, how does it know to… but you have really good data there. The record, like entity resolution and there’s a number of names for it, but resolving that is a very big challenge and where people fall down all the time in data warehousing. We actually have a semi-supervised machine learning model that beats industry standard for entity resolution where we can take those things together without the ID. It uses semantic representation using NLP. It reads the name and you’ve trained it over the systems.
Many times it can pick those two things and push them together, and so we can pull in dozens of data sources and have a machine learn just read them and better than you and I are reading those two data sets. It can make a selection to put them together. We’ve gotten ours up to about a 94% accuracy, so at 94% of the time against a trained data set, we can pick the right merging of records together. This is not a small problem. Major insurance companies have written massive amounts of solutions toward a specific algorithm just for this problem, so it’s another example where you’re just predicting which two things go together.
Prediction is the glue, but you can’t get marketing insights, you can’t get attribution. If you’re thinking about lead segmentation, all these things come together with bringing disparate data sources together and then making predictions on top of that. This is just foundational and similar work where machine learning has provided the best solution the industry can offer.
Will Bachman: You’ve given several examples of the work of your firm, but we haven’t actually talked about your firm. Tell us… make sure we get the name of the firm and give your website, where we can find you. Tell us a little bit more about the range of services that your firm offers.
Josh Miramant: Sure. My firm name is Blue Orange Digital. Actually [crosstalk 00:58:40]-
Will Bachman: What’s the idea behind the name?
Josh Miramant: I [crosstalk 00:58:44] have a kind of better marketing name, but I’ll give your team the real… the down-low is we had a different branding when my ex-business partner and this was my legal entity behind it. I never expected to be the brand of the company, but when we took over Blue Orange became the name. There was actually a moldy orange in my room when I had to name an LLC, not embarrassing, but true story.
The other one that you can use that I typically would give out front is in TensorFlow, which is one of the engines that we use for modeling AI development. There was a system that we built that had an orange light that would go on when I think with compile and when it was off it would look blue, so we always moved from blue to successful compiling, so blue to orange. I came up with that story after we actually named the LLC, but that’s what I tell people on the face.
Blue Orange digital is a… we focus as noted on machine learning and predictive analytics. The three core verticals that we have done work in are in E-commerce and marketing, talent analytics, and finance. Those are the three that we have done investment on. Investing our effort on our team is made up I was mentioning about 32 employees right now. My CTO, Colin Van Dyke, actually joined us from Capital One. He had been working as a Director of Machine Learning Tools there, which they have about 250 engineers, incredible machine learning department. He’s seen just a wide array of machine learning tools into production.
We break our team up between the data engineer side and the data scientists. We have kind of theoretical research PhDs that are focusing on these core machine learning problems on one side of the data science house. We have analysts that work under them that help translate the complexity of the model into companies and give reporting and integration where we work on the visualization, and then we also have data engineers which pull together the unification of data, stand the warehouse up, get things up in Amazon Web Service and make it accessible for organizations.
Generally, our client profiling is companies large enough to invest into this space. We typically starting an exploratory engagement around 25 or 30,000 just to start getting data together, doing an audit, and getting some indicators together that are of low confidence, but that’s just an initial engagement before you’re moving on. We’re certainly of a company that can afford the investment to get in. A lot of our deals to this point have been partnerships with consultants, where they have a relationship with a firm and they see an opportunity for the work to be addressed.
We actually work alongside those consultants to make sure that we’re communicating what we’re doing well. Feedback is being fed that we’re translating the business needs and the business problems into our modeling and our work. Even delivering things around the timeframe that’s required for buy-in from executives and stakeholders.
We take a very business aware approach to this problem and try to familiarize the constraints of it. Kind of the standard practice as you think about investing… as organizations think about strategic investment and machine learning is right now there isn’t quite a need because of the early stage of machine learning for business decisions to have every team have a machine learning engineer. The sales team doesn’t need an ML engineer. The marketing team, the product selection team, the HR team, that would be an inefficient use. There would be data conflict, so a lot of how firms are building out their architecture now is hiring kind of what is a standalone machine learning team that deploys its services in different parts of the business units or different parts of the company of each business unit.
What we offer, as I view how Blue Orange fits into this puzzle, is we’re an excellent zero to one, coming from nothing to the advent of that machine learning division. We have pretty great expertise. It’s a very hard space to hire and we have an incredible team and they’re able to spin-up the systems that then once we have the data lake in place, we have models that have been validated, it’s a really good time to hire internal data scientists that then can focus on their core problems.
We’re excellent at kind of advising companies from that early stage or just knowing they need more advanced analytics like the hiring has gone up like 2,000% in the last 24 months for data scientists. It’s just incredibly hard positions to fill, but hiring in-house out of the gate, typically you’ll hire a very pedigreed PhD research scientist that knows a lot about machine learning and they end up just doing data engineering, which is a very different task.
It’s getting data in place and figuring out other problems and you’re way overpaying for the engineering problem and it’s not even their necessary expertise. Then, you got a data warehousing that’s not sufficient, and even if you have some of it pulled together, getting that in an environment from… moving it to a data lake where that can be consumed from multiple parts of the organization, we have all of those services in-house at Blue Orange to get you to a place, get your organization to a place where you have a really high-performing, scalable data architecture that you can start investing in machine learning on multiple different business units. We apply in a few different areas at that point.
Will Bachman: Fantastic, and for people who want to go learn more about it, I suppose they could just Google “Blue Orange Digital”, but give us your URL. What’s that?
Josh Miramant: It’s blueorange.digital, so [crosstalk 01:04:17]-
Will Bachman: blueorange.digital, so check it out. If someone wanted to follow up and contact you directly, what’s the best way for them to reach out?
Josh Miramant: Josh@blueorange.digital is my direct email, and there’s forms right on our website to get in touch. I’m pretty accessible, but one of the things that I spend most of my time doing is working with… I was actually introduced to you via one of the members of your network, but I love doing free kind of audits of companies and areas of application. Figuring that problem out is one of my favorite things to do is saying like, “Let’s look at where we’re at with companies. Let’s look at data. How would something like a machine learning solution fix a certain problem?”
That’s where I spend all of my time right now, so I love talking about this. I love being presented with situations and learning about systems to figure out how if… sometimes it’s not the right fit, but if there’s a solution. I’d love anyone to reach out and have those discussions. I’m fully available for that.
Will Bachman: Fantastic, so listeners, if you have a client where there is basically some kind of prediction that has to get made or some decisions and there’s a lot of data and you think it might use some data science, reach out to Josh, josh@blueorange.digital. Josh, thank you so much for joining today.
Josh Miramant: Oh, this is a joy, Will. Thanks for the time.

Related Episodes


Integrating AI into a 100-year-old Media Business

Salah Zalatimo


Author of Second Act, on The Secrets of Late Bloomers

Henry Oliver


Third Party Risk Management and Cyber Security

Craig Callé


Co-founder of Retraction Watch

Ivan Oransky