In part 1 of Tim Friebel’s interview with Jeff Jonas, better known as The Wizard of Big Data, and Founder and Chief Scientist at Senzing, Jeff talked about the parallels of Context Computing to Journey Analytics, and the importance of context in any decision making system. Here we continue the conversation on context’s place in analytics and how journeys will be essential to AI and machine learning programs for companies looking to gain an edge.
Tim: With regards to Journey Science & analytics, we see that having data across more channels provides more context and ultimately leads to better decisions. Can you talk about some of the decisions that you’ve seen made that would not have been possible without data being connected and readily available?
Jeff: When banks are dealing with money laundering alerts, they get showered with false positives. You’ll have AML analysts working alert after alert, month after month, without finding a real suspicious one. The problem is, the transaction monitoring engines are not taking into account other channels of data. For example, if an analyst had seen a materially similar transaction a few months back and had already found it explainable, why escalate a similar transaction again? If you could see these additional pieces of the connected puzzle, you can get rid of the false positives.
I have also learned to admire bad data, meaning errors and natural variability, like misspellings or month/date transposing a date of birth. My favorite example of this is when you search Google and it says, “did you mean this?” It's not looking in a dictionary, it's remembered everybody's errors. If it hadn't remembered everybody’s errors then it wouldn’t be so smart. In fact, I’d go as far to say, if you over clean the data, polishing every puzzle piece to perfection, then you will never find the clever bad guys.
Tim: In your Strata+Hadoop World talk on Context Computing, you say that more data means less compute, the concept that as the puzzle becomes more complete, and more context is available things fall into place quicker. It’s how humans make decisions. The more information and context we have about a situation, we can quickly make the best decision. Do you think it’s important that machines and processes get better at evaluating data in this way similar to the way humans think?
Jeff: I couldn’t agree more. Two examples of this:
- Driving a Car - If you remember the first time you learn to drive a car, do you remember the amount of compute trying to track everything that's going on around you? It would even be hard to chew gum with the massive amounts of compute going on in your head. Now fast forward, you now have so much experience you can probably hold your phone up to one ear, bite your sandwich with your right hand and drive with your knee. You're barely paying attention because you've seen so much data and so many patterns play out that it doesn't take the same amount of compute.
- Completing a Puzzle - When putting a jigsaw puzzle together at home, why is it that the last few pieces are almost as easy as the first few, despite the fact you have more data in front of you than ever before? There comes a point when putting a puzzle together, when the pieces start to clump, new pieces (information) take less computational effort to integrate.
I’ve seen real evidence to support the notion that the more data and context available, less compute cycles are required. And I think this whole concept will have radical implications in the field of big data, machine learning and artificial intelligence.
Tim: You’ve also said that people build algorithms around individual puzzle pieces too soon, and they’re making assumptions too quickly on incomplete data. This is similar to making decisions based on individual touch points and not the full journey. How valuable do you think it is to have a repository that has all data collected, connected, and contextualized so that people can rapidly iterate through questions?
Jeff: I see a lot of organizations making this mistake where they take a transaction and they try to put algorithms on the transaction to see if it's good news or bad, just like the puzzle piece with flames. Without additional puzzle pieces it really is just a guess. Then they wonder why they have so many false positives. But if you first take that event, or the puzzle piece, and figure out how it relates to previous observations you've seen, it’s like taking the puzzle piece to the puzzle to see how this new observation relates to the previous data you’ve seen. By doing this first before a making a decision, you benefit from all the context that comes from data finding data.
All this is manifested through ClickFox when rolling up of the events into journeys. It's obvious and no surprise that you can get a higher quality understanding about what's going on by taking the full picture, aka the journey, rather than trying to building algorithms on individual events.
Tim: Speaking of algorithms, contextualized journey data is becoming a really important input to our client’s predictive models and machine learning initiatives. What is your take on how AI and machine learning programs would benefit from having a platform where all data is routinely connected into journeys and readily available?
Jeff: To me, it’s the difference of feeding machine learning a pile of puzzle pieces versus complete pictures. If you integrate the events and roll those up into journeys and then pass them into machine learning, then it’s going to improve the ability for machine learning and make something of it. Having data connected, contextualized and readily available will prove to be fundamental and essential.
When you add things like net promoter score (NPS) or outcomes that indicate the person completed the journey they started, it’s the equivalent of auto-labeling which solves one of the biggest problems facing the machine learning community. For example, someone may go on a journey to pay their bill on the web but never complete on that channel and go to another. When you combine this data across all those journeys, it’s labeling the data. That means machine learning can feed on that to make more useful predictions e.g., need for new pathways or fixing existing pathways. Everyone pursuing machine learning is likely going to need to assemble their data – seeing events through journeys being one critical example.
Everything has a journey, whether it’s someone paying a bill, a patient getting well, a server crashing, to even an asteroid collision. As companies look for new and better ways to improve business performance, contextualizing events to reveal journeys is going to prove a crucial step for better and faster decision making.