Podcast

Episode: 524 |
Adam Braff:
ChatGPT Code Interpreter:
Episode
524

HOW TO THRIVE AS AN
INDEPENDENT PROFESSIONAL

Adam Braff

ChatGPT Code Interpreter

Show Notes

In this episode, Will Bachman talks to Adam Braff, a former McKinsey partner who specializes in data analytics. Adam has been using chat GPT to explore how this powerful tool can be harnessed for data analysis. He explores the implications and potential impact of this innovative approach.

The Quest for Analyzing Quantitative Data

The ability to analyze quantitative data using generative AI has long been a holy grail for many data scientists. While Chat GPT and other language models have proven their prowess in generating text and even creating visual content. Adam talks about how to  tackle the challenge of applying these tools to analyze large datasets problems and uncover potential solutions.

Adam outlines four key aspects of the problem at hand. First, there is a need to upload data into the Chat GPT tool, as the existing training data may not encompass the specific dataset of interest. Second, an intuitive interface is required to facilitate a conversation with the tool, allowing for iterative exploration and analysis. Third, the ability to visualize the data in various formats, such as tables and graphs, is crucial for understanding and validating the results. Lastly, incorporating up-to-date contextual information about the world around us is essential to gain insights into correlations and patterns within the data.

 

Uploading Data: Bridging the Gap

To address the challenge of uploading data into Chat GPT, several options have emerged. One approach involves integration with popular spreadsheet tools like Google Sheets and Microsoft Excel. Users can interact with the data by writing formulas and commands directly within the spreadsheet software. 

Another option is to paste data directly into Chat GPT, as long as it fits within the context window. This approach allows for a quick overview of the data and initial exploration of its contents. The ability to have a conversation with chat GPT is a significant breakthrough in data analytics. Adam highlights the emergence of third-party plugins that enable users to interact with the tool directly. These plugins, such as “chat with your data” and “chat with G sheet,” bring us closer to the goal of conversational data analysis within the chat GPT environment.

Additionally, separate startups have leveraged APIs to connect with open AI models like GPT 3.5 and GPT 4. These startups, such as seek.ai and data DM, provide an alternative approach to interact with the data, although they operate outside the chat GPT window.

 

Code Interpreter: The 800-Pound Gorilla

Among the various solutions, Chat GPT code interpreter stands out as a powerful tool for data analysis. As an official open AI product, it offers a native and robust interface within Chat GPT. By activating code interpreter, users gain access to a chatbot-like interface where they can upload data, ask questions, and receive answers in real-time.

The code interpreter translates user queries into Python code, allowing for complex data manipulations and analyses. For example, if a user wants to analyze the correlation between variables or observe trends over time, code interpreter can aggregate and analyze the data accordingly. While the current interface may require users to refer back to the original spreadsheet for column names and other details, it provides a promising solution for non-technical analysts to engage with data.

 

Unleashing the Potential: A Case Study

To illustrate the capabilities of code interpreter, Adam conducted an analysis using three datasets: daily credit card spending on fast food brands, weekly food spending in various categories, and macroeconomic data from the Federal Reserve. The goal was to explore correlations between fast food spending, overall food spending, and economic conditions.

By uploading these datasets into code interpreter, Adam engaged in a conversation with the tool, asking questions and receiving insights on trends overtime. The analysis aimed to uncover potential drivers of spending on fast food brands and identify correlations with broader food spending and economic indicators. Adam explains the various types of analysis and data the tool can deliver and how it can be delivered.

 

Accessing a Python Interpreter

For those unfamiliar with Python programming, Braff provided guidance on how to access a Python interpreter. He suggested using platforms like Replit, which allow users to create a free environment for running Python code. Additionally, he mentioned that AI language models like ChatGPT can generate Python code for specific tasks, making it easier for non-technical users to experiment with programming. He emphasizes the importance of hands-on experimentation and encourages individuals to explore these tools to enhance their data analysis skills.

 

Navigating the Landscape of AI Tools

Adam talks about the landscape of AI tools and their potential applications in organizations. He talks about how he experimented with scraping. He stresses the need for a problem-solving framework and highlights the importance of breaking down complex problems into manageable steps. By understanding which parts of the problem-solving process AI tools excel at, users can leverage these tools effectively. Braff also emphasized the importance of experimenting with different modalities of interaction, such as step-by-step queries or end-to-end analysis, to find the most suitable approach for each problem.

 

Implications and Future Impact

The ability to analyze data using chat GPT and similar tools has significant implications for various industries. Adam talks about the problem of hallucination, where the tool is limited, and how far it is to becoming a plug and play data scientist. However, he explains how non-technical analysts can engage with data in a conversational manner, gaining insights and experimenting with how they ask questions and exploring correlations without the need for advanced technical skills. This democratization of data analysis opens up new possibilities for decision-making and problem-solving. Investors, corporate executives, and researchers can leverage chat GPT to uncover hidden patterns and trends within their datasets. By understanding the correlations between different variables, they can make more informed decisions and develop strategies based on data-driven insights.

 

The Role of AI Tools in Enterprise Data Analytics

When discussing the use of AI tools at the enterprise level, Adam acknowledges the need for caution and data security. He advises against randomly uploading corporate data into AI tools and highlights the risks associated with data leakage and potential misuse. To address these concerns, he mentions solutions like Microsoft Azure’s OpenAI service, which allows organizations to run AI models locally and keep their proprietary data secure. He also mentions Chat GPT’s incognito mode, and the upcoming release of ChatGPT for enterprise tool, which will probably have additional safety guarantees. He talks about what the tool is being used for today such as crunching numbers and making predictions, in addition to coding and analytics and generative AI.

 

Implications and Forecasting

As the conversation draws to a close, Adam talks about using the tool for forecasting but that it will become better when the technology merges with browsers. He emphasizes the importance of continuous learning and experimentation, as well as the potential for individuals to enhance their skills in domain knowledge, statistics, and technical/data knowledge. He highlights the role of AI tools as a means of human augmentation, assisting users in their data analysis tasks, and talks about his writing and teaching work, and writes about how generative AI is used in teaching and learning. 

Looking ahead, Adam predicts that AI tools will continue to evolve and improve, becoming more user-friendly and capable of handling complex analytics tasks. He emphasizes the need for organizations to embrace these tools while ensuring data security and compliance. By leveraging AI tools effectively, organizations can unlock the full potential of their data and drive better decision-making.

 

Timestamps:

01:37 Options for uploading data into chat GPT

08:40 The interface of chat GPT code interpreter

12:25 The potential for non-technical analysts to use these tools

13:37 Example of using code interpreter to analyze credit card spending data

15:46 Using code interpreter

21:07 Experimenting with code interpreter and learning Python programming

23:34 Code interpreter can graph data, but limitations exist

25:16 Recommendations for using code interpreter effectively

34:33 Enterprise solutions for using code interpreter with proprietary data

35:45 Current use cases of code interpreter in companies

36:51      Using the GPT-3 tool for forecasting

 

Links:

Website: https://braff.co/genai-1

 

One weekly email with bonus materials and summaries of each new episode:

Adam Braff

SPEAKERS

Adam Braff, Will Bachman

 

Will Bachman  00:01

Hello, and welcome to Unleashed. I’m your host will Bachman and I’m excited to be here today back again with Adam Braff, former McKinsey partner, he focuses on data analytics. Adam, welcome to the show.

 

Adam Braff  00:14

Thanks for having me back. Well,

 

Will Bachman  00:16

Adam, I’m excited to hear about what you’ve been doing, using chat GPT to explore how it can be used for analytics. Talk to me a bit about what you’ve been, what you’ve been doing.

 

Adam Braff  00:28

I thought it would be good to start with the problem that we’re trying to solve since you don’t want to just jump into chat GBT and start messing around without having some direction. So let’s just start with the problem, which is, we all know, at this point, now it gets the most of your listeners know how to use chat, GBT and Bard and Bing and other LLM chat bots and tools to do stuff with words, write, anyone can put in a question and ask it to generate a poem or write a paragraph or fill out a framework. And we can use these tools to draw pictures. But the real holy grail that’s been out there, at least for a bunch of us has been how do you use these tools to analyze quantitative data? That is the interesting problem, because you can’t just paste a bunch of data into that little context window, you only get so many characters, and typically a spreadsheet that you’re going to want to understand or a dataset is going to be much bigger than that. So that’s the problem that we’re trying to solve. How do you analyze data using generative AI?

 

Will Bachman  01:37

Okay, fantastic. So I’m definitely curious about this. So carry on, tell me more.

 

Adam Braff  01:43

So the specific parts of what we want to do are, I want to say three or four things. So, one of them is upload data, you need to have a way to get a bunch of data from one or more datasets into one of these tools, because the context that the tools are looking at is whatever their training data set is. So for chat GPT, there’s some giant corpus of internet stuff that’s in there, that goes up to September of 2021. But the data set that you want to talk about isn’t in there, right, it’s extremely unlikely that the data set that you want to analyze happens to have been analyzed already, by chat GBP. So you need a way to upload data into it either by pasting in a lake, or by or by uploading a file. The second thing is you want an interface and these for most purposes, where you can just chat about it, you can have a conversation with this tool, as if you are talking to a data scientist and go back and forth, iterate on it, get answers, look at you know, pictures, and so on. The third thing is those pictures. So how do you not only get answers, but get back the answers in different formats that let you understand data, whether that’s a table of numbers, or a line graph, or God help us a pie graph, or bar graphs, whatever it is. So you want to be able to see it in part to validate that the answer is correct. And in part for the reason that we always want to look at data, it’s a lot easier to understand a very large data set when it’s graphed. If there’s a fourth thing, I would say it is to incorporate context about the world around us. So most of these tools are going to understand the world up to 2021. But you would ideally want to have up to date context. So, if you saw an interesting pattern in the data that went through 2023, you’d want to be able to talk about it and say, Hey, what was going on in the world that might correlate with this? And that’s really hard to do. Because as we’re going to see the same tools that are handling data are not the same ones that are browsing. Okay, so you want yeah, so actually, so you got the context?

 

Will Bachman  03:53

Import data? How do you analyze the data? How do you visualize it? analysis? And how do you get context? 1234?

 

Adam Braff  04:02

Exactly. Okay, so yeah, so to do that, there are a few different options. And these are changing every day. And the big breakthrough is something that happened last week that we’re going to talk about at the end of this list. So the first one is, there have been for a few months now plugins that you can plug into spreadsheet tools like Google Sheets, and there will be on the way something official for Microsoft Excel that will let you somehow have the kind of conversation we’re talking about, but do it natively inside of the spreadsheet software that you know and love, whether that’s a seller sheets. So there are already these third party plugins, you go into the little add in window and Google Sheets and you browse around and you find what you like and you authorize it. And somebody somewhere has access to all your data. And what it lets you do is put a little formula in cell b1, that says, hey, you know, equal sign, look at all of the data in column A, and help me fill in the rest of the pattern in this spreadsheet based on what you see in column A, it can do things like that. Okay, so that’s, yeah, yeah. Okay. It’s pretty, it’s wonderful. But again, what you’re doing is you’re interacting with that by writing a little formula in side of Google sheets that is doing one of like, eight things, like, you know, look up this number, structure, this number, complete this list, translate this phrase, whatever. So that’s pretty cool. And that’s, that’s been around for a few months. The next one, is a very, very rough solution, which is you can take data and just paste it right into chat GBT as long as it’s not bigger than the context window, right? So if you have the first few rows of a spreadsheet, and what you want to do is just understand what the words need for whether whether this is the kind of data you’re looking for, you can just copy data out in the spreadsheet, paste it in, it’s gonna look funny, the formatting is gonna look funny. But the chat GVT or Bard, or, you know, po window, it’s gonna understand roughly what you did there. And so we can do a little bit to tell you what it sees in the data. It’s just not enough to do a massive analysis. Good so far. Okay, good. Now, the third one is if you know something about chat GPT and you’ve got chat GPT Plus, you can do plugins that are inside of chat, GBT plus, right, you go into your settings, and you select, you know, beta features, and you authorize it to look at plugins. And there have been third party plugins that were created over the last month that are basically called like, chat with your data, and chat with G sheet. And these are a step a big step in the direction of what we’re talking about here. Because they’re a plugin. They’re third party plugins that are right there inside of chat, GBT. So as long as they work the way they’re supposed to, and if they’re not limited in any way, then we’re pretty close to the answer there. And so that is that is kind of the third category of solution. Then let’s see. So then, the next, the next family of solutions would be completely separate startups that are using API’s to basically send information back and forth to a, you know, some kind of open AI model like GPT 3.5, or GPT. Four, right. So there are separate startups like seek.ai, or data DM, which is a newish tool from a week or two ago that a company called approximate Labs has created that lets you load up some datasets, send your queries in, but it’s not happening inside of a chat GBT window, they’re kind of sending your query off to the LLM model of your choice, and then it brings it back into what’s happening in that window. And then finally, we get to code interpreter. So chat GPT code interpreter is kind of the 800 pound gorilla here. Because it is a an actual open AI product. It’s native to chat GPT it’s just sitting right there on the list of things you can do if you have chat, GPT plus, and if you activate it, and you can kind of expect it to be more robust and less limited than these other tools.

 

Will Bachman  08:40

So what does it look like? Does it look like kind of a spreadsheet that you can also talk to? Or like, what’s the display?

 

Adam Braff  08:50

It looks like Jack GBT. So if you’re familiar with Jack EBT, it’s got a little window where you enter your question, and then it comes back and it tells you its answer. So that so basically, it is a chat bot interface. And the thing you’re doing to get data into it is you’re clicking a little arrow next to the window where you enter your text and you open it opens up a little browser where you can upload a file. Everything from there once it does that is basically you chatting with a chatbot and the chat bot chatting back. And occasionally, a chat bot is going to have a little gray window where it’s thinking and what it’s doing inside that window. And you can expand that window is it’s writing Python code. That is basically the thing that is going on inside all of these tools. There’s a there, there isn’t some like, you know, massive brain and they’re doing something really weird. It’s actually just translating the thing that you ask it to do into some kind of query that it can do in Python. So if you say, Hey, here’s a data set. We’ll talk about an example in a second. Here’s a data set And it’s daily and I want to see the trend by weeks, it’s going to use Python code to aggregate the daily data into weekly data and write a new table in its memory of weekly data. And then it’s going to analyze that new table that it created. So that is the interface is basically you chatting with a thing and the thing chatting back to you.

 

Will Bachman  10:22

Interesting, I guess, maybe, because I’m, I’m still so new to this, I would almost prefer if I could, you know, see a spreadsheet and then talk to it and say, Hey, could you add a column that does this? Or, Hey, could you clean up this, there’s some spaces and stuff, or hey, could you try showing me you know, analyze this, and then see it right there in the spreadsheet. But what I’m hearing is you upload a file with data, and then you like, talk to the thing, but you have to, you’re gonna have to know sort of what the column names are and stuff. So you can refer back to it, and you’re not going to see additional columns and analyses or calculations in that spreadsheet, you’re going to more see some kind of output, I guess.

 

Adam Braff  11:01

100%, right. So for this purpose, if you are using Czech CPT code interpreter, you might actually have the spreadsheet open next to the window that has the chat GPT running just so you can look around and remember the names of things and remind yourself what to do with it to your point, there could be a much better interface for a lot of people, which is a souped up spreadsheet, that is the thing that that Microsoft says they’re going to release, which is basically like a co pilot for Excel, they just haven’t given a public timetable for when they’re releasing it, they’re doing some experiments now with some joint ventures with corporations to test it out. But that is the thing you’re talking about. There has been already for some time now, a very, very simple version of this inside of Google Sheets, where if you have like, a bunch of data that’s fairly clean, and like a rectangle of, you know, rows and columns, you can hit a little like star on there. And you can ask it a question saying, you know, what is the correlation between these two variables, or what is the trend over time, and it will generate a little graph there. But it’s not the kind of robust conversation that you can have with chat GVT. So there, there were little versions of the thing you’re talking about, which is spreadsheet plus window of you know, of typing next to context, that that is, that is definitely a thing. And it’s, it’s probably going to be a pretty important way for interacting with data a few months from now,

 

Will Bachman  12:25

yeah, that would be really nice, because then people who are, you know, can sort of understand data, but maybe they’re not the best at pivot tables, or V lookups. Or, you know, cleaning data can then start doing analyses. It’s almost like having an awesome business analyst sitting there doing your bidding.

 

Adam Braff  12:43

Exactly right now, that is, that kind of leads us into the analysis that I did, for purposes of the blog post, I just put up called Code interpreter is not a data scientist. Because effectively the thing you just said, which is a non technical analyst, has some data, they sort of notes in it. They’re not wild about pivot tables. They’re not like super technical in terms of even knowing what kind of graphs to do, but they just want to get into it. Either one of the interfaces we’re talking about could be the right answer for that person. Right? There could be a person who says, I just want to talk in English, I don’t even want to open a spreadsheet, I just want to have a conversation. And if I forget the names of the columns, I just want to say, hey, remind me the names of what’s in this data set. And it’ll tell you so a non technical person could probably hang with either one of these solutions.

 

Will Bachman  13:31

All right, so let’s get in a bit more to the code interpreter. What can you do with it today.

 

Adam Braff  13:37

So the example of what I tried to do is I took three data sets and I uploaded them into the tool. And these data sets were first daily credit card spending that I got from an alternative data provider called consumer edge. This is information that comes from a panel of credit cards and debit cards, where people are spending their money. And it was on 10 fast food brands. The second data set was from the USDA, the Department of Agriculture, and it was weekly data on food spending in about 10 categories and 30 subcategories of where Americans are spending money on food, not necessarily credit cards, not necessarily fast food, a lot of grocery stores. So partial overlap with the first data set and from a different source on a different time aggregation. And then the third data set was macroeconomic data from the Federal Reserve. That was things like unemployment and mortgage rates and inflation data, that’s monthly data. And if you think of the kinds of questions you could answer with these things, like the most the most basic question, if you’re an investor in quick serve restaurant stocks, or if you’re a corporate executive in this space would be how do these things correlate with each other? Can I understand the driver As of spending on different fast food brands, as relates to what people are spending their money overall on in terms of food and in terms of economic conditions, if you’re an investor, and if imagine that you did this analysis, and you saw that there was this really interesting lag effect where the economy changes in a certain way, and it very predictably moved to McDonald’s spending in one direction and Starbucks spending in a different direction, then you have an insight that you could potentially do something with. I’m not saying you build the whole quant trading strategy off of it, but it’s the seed of an idea that you could go investigate further. So that that was the pile of the three piles of data that I basically loaded into code interpreter to have a conversation about?

 

Will Bachman  15:46

And what Tell me then, like, what did you do with with the interpreter? And what did you find?

 

Adam Braff  15:53

Yeah, so the thing that I did with code interpreter was I loaded in these three datasets. And then I said first, can you clean this up? Right? Can you just make sure this data is in a format, where you can basically do analyses like showing the trends over time and showing me how the variables relate to each other? And so so some of the things that code interpreter was able to do or like, I could definitely ask it anything, I could say, Hey, tell me 10 surprising things about these three data sets? Or something as mundane as how does the data field look in each of these? Do you need to do something to clean that up? So you can ask questions at any level of data nerd stuff to interpretation of what’s in the data to just answer this question to just make a graph. So code interpreter lives up to its billing in terms of you at least being able to pose the question. And the tool having a pretty good understanding of what you’re asking for and, and translating it into Python. So it’s good at that. And that’s different from saying it does everything perfectly, but at least you can, you can take a whack at it, it is good at letting you upload multiple datasets, some pretty big ones, I want to say it goes up to about 100 megabytes, you can upload a zip file with that much stuff in it. And so you can have lots of different kinds of data that go in there, and it can join the datasets together. If you think about the data sets that I was describing, the common fields that they would have would be would be time, right. So you can join them up by days and weeks and months. And that was about it. Because there was no other dimension on which I could basically like join them together. But that was enough for me to do comparisons. It was I would say code interpreter was good at producing answers to questions like, like, what’s in this data set? And what are these? What is the general trend over time, it was able to produce graphs that showed me, you know, cross sectionally, what are the big and small categories inside of this, this data set, and, and correlation tables and scatter plots and all that. And then finally, because it’s working in Python, it may be limited in terms of the kinds of graphs it can show, but it’s really not limited at all, in terms of just writing a program, like, you know, a Python script that you can just copy and paste into your favorite, you know, place to run Python code red blood or or, you know, GitHub or Google colab. So, code interpreter is basically good at at all of those things. And the question is like, what are the limits on its ability?

 

Will Bachman  18:46

Okay, so a question here from a non technical person. I, I did take computer science 50. But that was a long time ago. Let’s say that you don’t have your Python interpreter. And can you give us just one minute on for people like me, if we don’t know how to program Python, but maybe we know a little bit about it? If you how can you get a Python interpreter if you want to run some of this code.

 

Adam Braff  19:14

So if you want to, if you want to do a simple little experiment, and by the way, this is something that predated code interpreter, if you just went to chat GPT and you said, Write me a program that prints the first 100 prime numbers. Okay, if you just told it to do a thing like that, which didn’t require you to load data into it, right, it would 100% Like write a little program that would that would do that thing. And it would be in a little window that looks like you know, in a funny typeface, and it would it would be like this code that you could then hit a little button that says copy in the top right corner. Now where would you go paste it? That’s where your questions right. So if you go to for example, Restlet our EP l it there are online platforms that just let you for free, create an environment in which you can paste Python code and run it and share it with other people. That’s one of like, you know, five different solutions. I’ll go you one further will. And I’ll just say this. If you said to chat, GBT, hey, thanks for that code. I’m an utter noob. Here, I have no idea what I’m doing. What do I do with this code? How do I run it? It will give you like five options, it’ll say, Oh, you can run it in the command line. And you can run it in repple it and you can run it in Google colab. And you can ask it as many dumb questions as you want. And they will very patiently explain everything to you. It will say, here’s how you open a replica account. Here are the little places on the window where you need to click and make sure that you’ve pasted the program in the right place and hit run. And any error messages you get, you can just paste them right back into chat GBT and say hello, you know, what is going on here? So you do not especially need to know anything about Python. Other than the very bare basics and chat, GBT will walk you through the whole process.

 

Will Bachman  21:07

Okay, amazing. I’m motivated to try this now. So it’s fun

 

Adam Braff  21:11

to do. And the first thing I did when I got access to it was I was writing web scrapers with GVT. Because I said, Oh, man, well, you know, what can I scrape? Here, I’m what are the limits on my ability to scrape? It’s an interesting detour for us to talk about because it gets us to the problem of hallucinations. If you say to chat, GPT, I want to write a program that lets me enter in a food ingredient, because as you know, most of my blog is about is about food and analytics, and have it come up with 10 recipes that are not ones that you make up but are scraped from a foodie recipe website, like serious eats or the Food Network or something, go and you know, write me a scraping program that will do that. It’ll certainly take a crack at that. But what it will do is hallucinate some facts about the websites that I just named. So it’ll make up some URLs that sort of sound good, like serious.com/recipe, you know, slash search bar, right? And it’ll make stuff up. So you actually, if you’re gonna do something that’s interacting with real world data like that, as opposed to writing a little math program, then you do have to roll up your sleeves and get into like, what the code is doing on the website. But I was able to create that program, right to have it, basically scrape the Food Network website for an arbitrary ingredient that you typed in and have it pull it back. And it’s just sitting in my Restlet, you know, in my in my replic folder, and anyone can run it. And so basically, it is possible to get your merit badge in python programming, using Jack GPT. But you know, your it would behoove you if you really did aspire to learn how to program where to get back into programming, like perhaps you and I did long ago, you would want to like look through the code and understand what’s going on. Because even if even if it gets the right answer, you want to understand what’s happening. Sure. That’s a digression.

 

Will Bachman  23:10

Yeah. Okay. So red blood is a great idea. So you were saying earlier, we talked, very beginning your framework of four things. So the third thing was graphing or displaying the data. So it sounds like the code interpreter cannot directly in Chechi PT display it, but you could have it write a Python program that you can run Restlet that then could graph the data? Is that what I’m hearing?

 

Adam Braff  23:34

No, it’s a little better than that. So we’ve actually can graph the data in many different ways. It’s, there are some graphs that it can do and some graph that it can’t do. And it doesn’t always know the difference. But you know, nine times out of 10 is the genes allows the type stuff of like, you give it data and you tell it to graph it in the most intuitive way. It will definitely choose a column graph, bar, graph, line graph, scatterplot, waterfall, whatever. And it can, it can do those right inside of the chat GVT window. So the things that I would go outside of it if you need a tree map, or a sunburst, or some really fancy Edward Tufte D thing, or if you want it to produce a program that lets you fiddle with the data and lets the user kind of tap on things and almost like an interactive dashboard that’s getting into programming as opposed to what can actually be delivered inside of the chat GBT window. So this is all just to say, in the simple form of the problem, where the user just wants to understand what’s in this giant spreadsheet, and, you know, give me answers and show me pictures. It’ll do that inside of the chat GBT window.

 

Will Bachman  24:42

Okay, now I want to get your view of the landscape here. So you are counseling CEOs and C level leaders on how to set up and structure their data analytics programs and how to get the most value out of those things. What What’s your sense of of how this tool or the ones that you see just on the near horizon should be used or can be used by by companies to, you know, generate more value, generate better analyses faster, save money cetera.

 

Adam Braff  25:16

So I would start with, what are the tools? What are they good and bad, right? We’ve talked about what they’re good at the flexibility and the graphing, and so on, it’s important to understand what the tools today are bad at and how things are getting better over time in order to answer the question of what a CEO should do about it. So today, we have not only the problem of hallucination, because these tools are just kind of trying to predict the next thing they should do, or predicting the next piece of code or the next line of code to write. And they don’t do it consistently every time. And they can get tangled up. And you know, if the thing is making tables, and you’ll it’ll often get stuck in this loop where it’s making the same table over and over again, and it’s apologizing and saying, I don’t know what I just did there. And it forgets the context over time. So there are all these limits on it now where it’s very far from being a plug and play data scientists, which is why I wrote the piece that I did and explicitly said, it’s not a data scientist. Today, it’s more of a tool for human augmentation that’s getting better every every week. Okay, so let’s just keep in mind that things, it’s good and bad. My recommendations for users? And again, this is, before I get to the enterprise solution question would be, I would say, first, you have to have some kind of problem solving framework in mind. And this, I think, will listen to the kind of stuff you probably talked about a lot on, on this podcast. And in your resources, you need some framework for how you do problem solving, you’ve got to define the problem, generate hypotheses, propose data sources to go get, you know, clean up the data, analyze it, visualize it, and you know, synthesize the results, right. And I think breaking out the problem that way. Let’s say we have the problem of I want to understand the relationship between the macroeconomic environment and how much Americans are spending on fast food. If you break it out, that way, you can figure out which parts of that process catchy VT code interpreter is good and bad, and what it can be used for. So as long as you pick the thing, it’s good at and you skip over the stuff, it’s bad you’re in, you’re in ok shape. So if you wanted to know what data sources, you can go grab and feed into the maw of this machine. In order to do this analysis, it will propose some, it’s going to maybe hallucinate the exact URL of where you should go on the usda.gov page, but it’s going to get you close enough. And you then finish the job yourself by going in and finding the data. It can do the data cleanup where it says, Oh, this is daily data, this is weekly data, the date formats don’t even match up, I’ve got a structure that I’ve got to translate these food brands into, I’ve got to make them lowercase, or whatever. It can do that stuff, as long as you spot check it and it will make mistakes. And it can do like interpretation of the results, right? So they can’t read the graphs created. But they can read the tables that created the graphs, and they can definitely synthesize the data and answer questions like this is counterintuitive. That, you know, Starbucks sales are inversely correlated with beverage sales. Like that’s interesting and weird. And I’ve got to understand that better. But that should be part of the storyline, right? So if you know which parts of the problem solving framework, you should do yourself versus have the machine do I think you’re I think you’re in a good starting place. The second thing is, you should probably experiment with different modalities of how you ask these questions. So there’s an end to end version of the question, which is, you dump the three data files into chat GBT. And you say one thing you’re like, tell me if there’s a way I can predict fast food sales based on everything else in here. That could work. It’s, you know, sometimes it works. And sometimes it doesn’t. You can also try the modality of going step by step and saying, first, tell me what’s in the data set. And second, tell me if there’s anything missing, or if the data looks like it’s good quality. Third, start to tell me you know, the relationships like you can choose which one you want to do, and you’ll probably do both. And you’ll probably start over again many times. And you can always clean up the data yourself. If you get impatient I myself became code interpreter, Summer, summer intern, and I ended up cleaning up the data myself just to basically keep the process moving here. I just took 15 minutes and did that and loaded it back in and I was able to get much farther. So this really is like a lot about iteration. And then going with the flow. And thinking about what what you said earlier, which is, how do we think about non technical users here, right? Who are the people who are going to actually benefit in my organization from this tool, because you’re going to have data scientists who are going to be much more comfortable writing SQL queries, and, you know, writing Python code and basically interacting directly with the data, because it’s not that hard for them to quickly get answers popping out in this thing. But you’re gonna have this large category, I would assert of people in every organization who are smart in their domain. And they’re smart about statistics, but they’re just not programmers. And they don’t want to, you know, necessarily get their hands super dirty with really weird kinds of data structuring and kind of joining up a data from multiple sources and setting up a pipeline and stuff. So I think there are enough people like that in an organization, that these kinds of tools can be helpful and are gonna get more helpful over time.

 

Will Bachman  31:07

I guess what I’m hearing or thinking about, as you say, this is that it’s, for someone who works with data, it’s good to start using these tools now just kind of play around with them and get familiar with how they operate. And it’s very different than just sitting there. And, you know, trying to use Excel, because it’s more of a coaching, like being patient and coaching and trying different things. Almost, if you’re training a dog, or, you know, teaching a young kid or something, like just learning how it reacts so that as the tools get more advanced, you’ll already have some familiarity with it. It’s like a new learning process of how to interact with a new type of entity.

 

Adam Braff  31:52

I 100% agree with that. And I think that there are opportunities, certainly with your own personal data and in your on your home computer, for you to do that kind of learning and experimentation. I’m a big advocate of that kind of hands on experimentation. Even if you’re not the most technical person, I believe everyone can get slightly better on each of the three dimensions. We talked about domain knowledge, statistical knowledge, and technical slash and data knowledge. And you can always inch a little bit closer toward that triple threat, you know, data scientist model, wherever you are. Now, as a starting point, if you’re a curious person, you can experiment with these tools on your own, and you should experiment with them on your own. Now, that’s a far cry from the I’ll go back to your earlier question of what does the CEO do at the enterprise level? Right? You don’t want your employees randomly opening a giant GBT plus, you know, account and a window and pasting corporate data into it right? It just just to stay the obvious, there are lots of problems with that you don’t want data, leaving your company, you don’t want that data to be ingested into the broader kind of open AI universe and being used for training up models that your competitors would then be running, that would happen to add your proprietary stuff in there. Right? So there are plenty of reasons why from an enterprise standpoint, you want to be safety. First, you want to work with it, to get some kind of a sandbox, you know, created, there are solutions to the problem that we’re talking about here, the safety problem, which is, you know, Azure, Microsoft Azure has an open AI service where they will, you know, set this up inside of your own walls, and you can put your own proprietary data in it. And you’re running the model, basically locally to your business. So you’re not leaking information out into the world. There are open source large language models that you can run inside your own walls. There’s even i This is obviously not a, you know, not legal advice and not a recommendation for a giant enterprise. But chat, GBT does have an incognito mode, where you’re asking it questions and you’re doing your dialogues, and it’s not saving them. So it’s not saving them. And if not using them in the corpus that’s training other people’s stuff. If you trust that, then that is a potential solution. Maybe for your personal stuff. There will be a chat GPT for enterprise tool that will be released. I want to say fairly soon, that will probably have some guarantees of safety around it. But yeah, so there are different ways for enterprises to tackle this problem that they want the experimentation to continue at the enterprise level with with enterprise data.

 

Will Bachman  34:33

And then your conversations with clients. What have you seen people using these AI tools, whether it’s the chat GPT itself, or some of these third party tools with API’s, like, what are what are people doing with them? Now?

 

Adam Braff  34:49

All of the above, I would say there’s a little more focus on on tools like the Azure open AI service and collaboration with cloud providers that will let you run these bottles in a safe way. So different kinds of joint ventures and collaborations. You know, some of them were talking to Microsoft about getting access to the co pilot for Excel when that’s ready. Some of them are doing these open source large language models that they’re putting on their own tools. I haven’t heard of any of them using you know, chat GBT incognito mode, that’s really not probably robust enough for people to try. But all those other all those other solutions everyone’s in in a different place on this right now. And and so there’s no one answer.

 

Will Bachman  35:34

And in terms of just the use cases that you see companies beginning to adopt, how are people starting to use these these tools?

 

Adam Braff  35:45

It’s early days on the analytical use cases, I would say there is more excitement around the kind of dirty work of data and plumbing and setting up data pipelines and doing data quality checks for anomalies. And, you know, translating code, there’s so much more that’s going on in terms of using generative AI for, for coding, basically, as opposed to analytics, that’s been around for some time now. And people using GitHub copilot, and using tools that help them program more efficiently, they’re very far ahead of where things are in analytics, per se, which is, you know, crunching numbers to generate answers and make predictions and recommendations. That is a tiny corner of the generative AI world that happens to be the corner that I live in. But in the grand scheme of things, it has received much less attention than the coding use cases, which you know, there are probably many more people in an organization doing coding than then then doing analytics with generative AI.

 

Will Bachman  36:51

And I should mention, for listeners who are dedicated listeners of the show, they may have caught Adam perhaps last last appearance here where he talked about his the forecasting competition that you run. So as a as a someone who’s very fascinated with forecasting, what are some of your predictions? Where are we what we should expect to be seeing a year from now.

 

Adam Braff  37:14

So I’ll make a I’ll make them first a prediction about predictions, which is, I’ve tried using these tools to do forecasting. So the context that that will that you’re talking about is my annual forecast and contest where the contestants have to predict the probability of each of 25 binary events to happen or not. Like will Putin still be in office at the end of the summer and, and you will know, Joe kovich have sole possession of the Grand Slam all time title, by the end of the year, that kind of stuff. So when the first browser connected tool was released in I believe, February, which was being the tool that was nicknamed Sydney, I immediately pasted in all of the prompts in the contest that had not yet been resolved. I think there were like 22 or 23 of them, where I could basically ask the question, all right, Dan, you’re so smart, like tell me how probable is it that George Santos is still going to be in his office on a certain day, and it gave me numbers. So I actually was able to create an entry in the contest for being you know, Sydney, to, to judge its forecast, I will say that is probably the dumbest possible way to use this tool to do forecasting. It is it is like, you know, getting into a car that has like a little bit of automation in it, just taking your hands off the wheel and putting it on cruise control, then crashing into a tree. So forecasting is possible with this tool, but the more sensible way to do it is more akin to what we’ve been talking about, which is taking it step by step, figuring out where the tool has some comparative advantage. For example, if there’s a proposition in the contest about, you know, the price of Bitcoin or something, you could upload a file of Bitcoin prices. And it was a simple to get from the internet, right? And then ask it your question, say, hey, how likely is a 10% rise in Bitcoin over any, you know, 30 day period? Like, if that’s the question in the context, then it’ll give you a an answer that’s derived from statistics, right? From what is the base rate of that thing happening? It might actually be faster, depending on how good you are at Excel. And its statistics, it might be faster to ask, you know, code interpreter, that question because they’re going to, you know, standardize the data and get the z scores and all that good stuff. So that’s a long way of saying the tool can be used for forecasting. It’ll be a lot better when the code interpreter capability is joined with the browsing capability, so that it’s got access to the internet after September 2021. So so that’s a meta prediction about prediction. It’s going to get better Over time, I would say, in general, you know, if you think about all the pain points that this tool has now with the hallucinations and the crashing and choking on choking on some of its data, structuring steps and getting tangled up at its table, all of that is going to get solved, I would say, probably, you know, within, I would say, within the next year, I think there’ll be a substantial amount of corporate like enterprise and investment analytics that’s going on with these tools in a safe way. So I put like an 80% probability on that within within a year, they’ll be a whole bunch of that going on. I know, it’s not precise enough to really make a you know, a bet on a betting market about it. But I’m reasonably confident that this thing will keep marching forward, and that there won’t be some epochal apocalyptic event where you know, the price just skyrockets. And it’s impossible to do this for free anymore, or for the $20, monthly fee of GBP. Plus, or the hallucinations get so bad that there’s just no way to do it, or there’s a high profile data leak, after people have been assured that their data was protected, right? I think those things are like in the 20% category. And the 80% is this thing is just gonna get better and better, and, and more and more useful for practical purposes.

 

Will Bachman  41:14

I mean, if you had asked me a year ago, what’s the chance that there’ll be a tool in July 2023? That can, you know, just take a prompt and write like coherent English that is, you know, better than probably the average college graduate could write on a topic of anything, you know, that explain Tolstoy, but in iambic pentameter or something, it’s just right. I mean, I would have said the probability is low. So I mean, thinking about what we might have a year from now with these tools, it just it’s it’s incredible to see the abilities that they can unlock for people.

 

Adam Braff  41:52

Absolutely, yeah, I would have yet to be perfectly honest, I would have been like three to 5% on that one. And, you know, in July, maybe higher in October, but in July of last year, I would have been very low, and I would have gotten killed on my Brier score on that one. So I’m glad that we didn’t ask that question of the contest.

 

Will Bachman  42:07

Amazing. So you mentioned that you have written several blog posts about that, could you just maybe just give us a 30 seconds on on each of those. So it for people who want to dive in and read read, read your recent posts on these topics?

 

Adam Braff  42:22

Yeah, my, my main line of work is in advising executives and boards and investors on how to use data and analytics, not just AI, but in general, to create value. So I’ve written primarily from that perspective, business and investment analytics. Some other things that I do, we mentioned the forecasting contest, I also teach graduate classes on analytics at Brown and NYU. So I’ve written on the subject of how generative AI is being used in teaching in higher education and learning. And it’s sort of in the form of a parable with with little animals. So that’s interesting to read. I spend a lot of time blogging on this subject. So I’ve written about how to use GPT, and large language models to write effective prose and to create, even to create good illustrations. I also have taught a class that is a capstone class on how to create a startup and start a business plan. So I’ve written on the subject of where you can use LLM to make your business plans better, and where you can mess it up and do a lot of damage. So all of these topics, there’s actually one more that cuts across all of these. There’s a blog post on the different kinds of people in an organization. We talked earlier about the three skills that matter for analytics, which is domain knowledge, and statistics and programming skills. At every one of those binary combinations. You’ve got different kinds of people. And so I’ve written about how each of those kinds of people should be using generative AI, now and in the future. So all of that content is as on my website@draft.co slash Jen AI, je n. Ai, which stands for generative AI.

 

Will Bachman  44:10

And we will include that link in the show notes. Adam, thank you for this discussion. I’m gonna go check out red blood and do some Python. Thank you so much for joining today and walking us through the latest.

 

Adam Braff  44:22

My pleasure run, don’t walk to your browser and try it all out. Thanks. Well, all right.

Related Episodes

Episode
559

AI Project Case Study

Paul Gaspar

Episode
558

AI Project Case Study

Astrid Malval-Beharry

Episode
557

AI Project Case Study

Julie Noonan

Episode
556

AI Project Case Study

Markus Starke