Podcast

Episode: 350 |
Daniel Tunkelang:
How Search Works:
Episode
350

HOW TO THRIVE AS AN
INDEPENDENT PROFESSIONAL

Daniel Tunkelang

How Search Works

Show Notes

Daniel Tunkelang, one of the top search relevance consultants in the United States. He is a data science and engineering executive who has built and led some of the strongest teams in the software industry, including Google. Daniel was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He was also director of data science and engineering at LinkedIn, and he studied computer science and math at MIT and has a PhD in computer science from CMU. So when it comes to understanding how search engines work and search queries, Daniel knows what he’s talking about. Today, he shares insights on how search works and how to make it work for you.

You can read more about all things search on Daniel’s blogs at QueryUnderstanding.com, or, if you need expert consultation, reach out to him on LinkedIn.

Key points include:

  • 04:08: How search engines match content to search queries
  • 09:23: How big companies search functions differ from smaller companies
  • 12:21: The search problems companies experience and solutions offered
  • 21:18: The surprising measures companies have used when compiling search data
  • 25:32: How Daniel approaches search data investigation
  • 32:27: Understanding search for staffing situations
  • 36:35: Behind the scenes of LinkedIn’s search function
  • 44:31: How to conduct effective search queries

One weekly email with bonus materials and summaries of each new episode:

Will Bachman 00:01
Hello, and welcome to Unleashed the show that explores how to thrive as an independent professional. I’m your host Will Bachman. And I’m here today with Daniel Tunkelang, who is one of the top search relevance consultants in the United States. Daniel, welcome to the show.

Daniel Tunkelang 00:21
Thanks Will, pleasure to be here.

Will Bachman 00:22
So Daniel, you have worked with an A list of clients, I mean, on LinkedIn, you got Apple, Etsy, a whole list of impressive companies, name brand companies that you’ve helped on search, tell us a little bit about the kind of work that you do.

Daniel Tunkelang 00:42
So, I’ve had the good fortune of working the full stack of how search engines work. When I was, really my first real job out of school, was helping to build a company called Endeca, that really changed the shape of search for ecommerce companies doing something called faceted search. And when I was there, I helped build the initial engine, worked on research and development, and because it was a startup, I got to touch a little bit of everything from sales, to intellectual property, to marketing, and so forth. And the nice thing about that is that now, when I work as a consultant, I can go into companies and I can help them with everything from understanding their strategic roadmap, to helping with tactical decisions, like trying to figure out what the words in each search query mean, and different AI techniques for doing that. So, it’s a lot of fun, and it is, while very specific to search, search discovery recommendations, but it’s a bit of a niche area. But what I found is that I end up having to work across the full stack of those areas.

Will Bachman 02:03
Okay, so just to be clear to folks, we’re not talking so much about, you know, showing up in Google search results, right, to give us a sense of the range of types of search that you’re working for you so you mentioned ecommerce. So if I type into, if I’m on an e commerce site, and I taught me and if I had been like, shirts, or blue shirts or something, you’d help figure out what blue shirts, it’s gonna show me I imagine, like, what are the different things, the different kind of search bars or search tools you work on?

Daniel Tunkelang 02:33
Sure. So you’re right, it’s this is typically Site Search, which is to say, the search boxes that you find on websites, as opposed to the search engine, something like a Google or Bing. I didn’t work at Google, in the past. But, yeah, as you can imagine, a lot of other people need search to work for them. And that’s especially important for retailers where search is typically the main avenue by which they sell things. But also, for folks who, for example, are doing staffing or helping people connect with information, services, documents, and so forth. So, these are proprietary collections of documents or products, usually large numbers of them that their users have some value in funding.

Will Bachman 03:28
Okay, so let’s walk through maybe a case example. This is something that maybe just is almost invisible, or not something that I really think about too much. But if I go to a e-commerce website, and I type in a term, how is that website, deciding what products to show me? Is it be context specific, like it sort of knows something about me already. So based on that, it’s deciding what it’s going to show; different people would see different things for the same term. Just tell me a little bit about what’s going on behind the scenes?

Daniel Tunkelang 04:08
Sure. So one of the interesting things about search is that it’s maybe the only part of your interaction with the Internet with machines, where you pretty much say what you want, right? This isn’t like going to a social media site that largely feeds you what it thinks you want to see or what it wants you to see for some other reason. With search you come in, you express an intent through, you know, a few words in the search box, and you get stuff back. So, a good search engine tries pretty hard to listen to what it is you’re asking for. So the query itself, is the main signal, so let’s say you go in and you look for blue shirts, then the simplest thing you could do would be to say, “Well, let me find the products that have the word blue and the word shirts” and then get back them back into some order. Even there, there’s a lot of room to do better than that, for example, recognizing that shirts are a product type, that blue is the color. And the inventory hopefully knows the difference between any particular item in the catalog, what is its product type, and what if colored is able to match those accordingly. So, that kind of both structure in the index and analysis of what you typed in to figure out, not just what words they are, but a little bit more about what they mean, allows the search engine to do a better job of matching, relevant inventory. And by the way, part of the advantage is not just saying to make sure that it’s blue, the color and shirt, the product type, but to realize that maybe blue also includes turquoise, and maybe shirts, you know, also includes, if it’s women’s shirts, things like blouses, chemises, and so forth, right? So a lot of this work is just to get the relevant matching inventory. But for a search query, like blue shirts, there are a ton of them. So there’s the issue of which ones to show. And at that point, depending on how well the search engine is done at returning only relevant results, there may still be a factor of saying, “Well, part of the way it’s going to order the results that returns to you is going to be prioritizing relevant ones.” My own preference is that it’s not an issue of prioritizing relevant ones only returning relevant ones. But that can be challenging. And so there’s the issue of trying to get things through relevant and then, you know, we’re talking about e commerce here, there’s a desire to return things that you’re more likely to buy, things that will convert. And so the scoring that the search engine does is a mixture of factors that have to do with relevance and factors that have to do with the likelihood that something will convert. Now, you ask the beginning, but you know, is it going to be something for you personally. That depends, the a search engine has to work even for a new user. And for new users, you have no history to go on. It might know something, right? If you’re looking for blue shirts, and right now, you’re in a part of the country or world that it’s cold versus hot. That’s information that could be used to prioritize, for example, short sleeve versus long sleeve shirts. And there may be other factors that can be considered. If you bought things before on the site, and you’re logged in, which the search engine manual because of a cookie, or because you’re literally logged in, say because you’re using an app, then it might try to show you things that you’ve bought before similar brands, and so forth. And those are factors that can influence if there’s a lot in play. And these things seem to matter, mostly when you have very large sets of matching inventory. So what you show on matters is even more things like trying to show a diverse set of results to hedge its bets make clear, there’s a variety of inventory, showing you different ways to slice and dice. This is actually the work that I did at this company in Dhaka, which was saying, “look, returning a ranked list of results doesn’t necessarily satisfy the user, if I look for shirts, that you just don’t know enough to deal with it. I want t shirts, dress shirts, even men’s shirts, or women’s shirts if you don’t know who I am.” So there’s a navigation element of this to help me find that parts of the inventory much like if I were to store, I would go to the right department, I talk about this in e commerce. But this applies more broadly to search problems against large and varied inventory.

Will Bachman 08:52
So with the technology, or is there certain behind the scenes software providers? So are companies typically building this from the bottom up? Do they grab some open source software? Or are there vendors providing, you know, they’re sort of three big vendors that provide a search tool for website search, like what’s powering these different searches?

Daniel Tunkelang 09:23
So that there’s a mix, that there are certainly some companies that build things completely from scratch. Those tend to be the really big technology companies. So companies like Google or Amazon, Facebook, build their own search. And that makes sense. They’re at a scale where even a tiny improvement that is specific to their use cases is worth the army of people they have to build it. The next level down from that tend to be people who work with an open source platform and the dominant platform is based on Apache Lucene. And there are a few companies that have really established themselves as building on top of this. The two most known being Elastic, and Lucid Works, which had some of the developers. And what you see there, are often, I mean, there are some companies that will use these things, the platform, pretty much as is, but because it’s open source, there’s a lot of ability to tinker with it. And so the larger companies, not at the scale of Google or Amazon. But, you know, at the same scale, some of the largest retailers, are more likely to take that approach, then you start to see a level down from that, folks that use managed search. And there they are, they typically don’t really sort of want the full responsibility of developing the software, they’d like something more like a service. But they’re willing to sacrifice, having, you know, as much control over it as if they were developing themselves. And a big advantage, there is a lower operational costs. And then finally, I’d say that sort of the simplest folks are using the search that comes with the other platform, they’re using, so perhaps they they’re using an e commerce platform, and search is built in, or they’re using some kind of content management system m-search is built in. There it’s free to them. But generally, they don’t know how it works. They don’t want to know, they know they have a search box, but they don’t want to invest any more than they have to into it. So I’d say that’s the spectrum that I’ve seen in terms of technology.

Will Bachman 12:02
I’d like to walk through a few examples of the types of projects that you do. Could you pick one, just sanitize it and walk through an example of when do you get called in and what stage? And then what’s your process? What, what’s the stages of your work?

Daniel Tunkelang 12:21
Sure. I mean, as you said, I’ve had to sanitize examples not to name particular names, but I can talk about the sort of the problems that people bring me in for. One company recently knew that it had problems with spelling correction. And they’ve done a little bit of analysis, to see what the extent of their problem was. But they were about to dive into a, let’s just say, sort of an AI approach. They built deep neural networks collect lots of training data. And it’s nice for the retailer, and for the size of their team, and folks, they had this really didn’t make sense, unfortunately, I’m the head of that team had been told that, hey, you know, talk to this consultant. And so when I talked to them, I asked them to share with me the kinds of logs of queries that they had. And it became clear that a large fraction of their misspellings were relatively close to popular queries. So knowing that, as I’d say, “Look, I’m here is an approach that could solve maybe half of your problem in a few weeks, as opposed to the effort you’re embarking on, which could easily be months or, or even past a year to develop.” And so I worked with them to show them the opportunity to give essentially, a very early proof of concept of what this might look like, enough for their engineering team to take it from there. And so that’s a nice sort of tactical example, maybe at the other extreme, I’ve had folks come in and say, “Look, we know our search is primitive. Help us.” That’s a little bit more open ended, obviously. So there, what I typically do is start with what I call a search experience audit. Right? That’s something where I look at the way every kind of user facing aspect of the site works the way that they’re handling long queries, stored queries. Different kinds of synonyms, the way that they autocomplete in the search bar, what happens if you sort by price, for example, versus sorting by relevance. And I write up a report of how I see the gaps relative to benchmarks. And often, those start to suggest potential solutions. For example, if if it’s easy to return no results. And it’s often because they need to be able to take longer queries and drop tokens or drop words from them. Or maybe there aren’t recognizing synonyms or compound words are being broken up in various ways. So this is really kind of a very crude opportunity analysis just from the user perspective. And then, if they’ll share query logs with me, I’ll start analyzing those because amazingly, a lot of folks with search engines don’t have much of a picture how searchers use their site. So for example, in some sites, I that found that they were worried about understanding natural language queries, but the overwhelming majority of their queries were only one or two words. And a surprising number of those queries had a mixture of letters and numbers in them, which turned out to be identifiers. And they had a particular problem, when people would use identifiers that were no longer active on their site, a problem they could easily address just by keeping a history of them, and then having some way of saying, “Oh, we don’t have that item anymore. But here are similar items,” or whatever ID or in other cases where they would type in these identifiers, but the spaces would be different, or the hyphens or whatever it might be, again, something that they could address. So, this would be more of analysis of behavior. And I’ll look at the technical architecture to see if there are things they’re doing. They’re clearly at odds with best practices. And so, I’ve done that for a number of companies, either it’s just the entirety of an engagement. So, I give them an overview of where they stand, or as the introduction. So I say, Okay, well, let’s do this. And then, you know, you’ll know what’s up. And then, you know, this can be the beginning of a longer relationship, where I’m helping them fix it, or guidance for how they work with their own teams or other partners. But it really, it depends, but you know, what, as with all things, you can’t really start solving problems until you know what the problems are. Yeah, love it. For the search experience audit, do you have a kind of a diagnostic guide, or a template that you start with every time or you can custom create it, talk to me about that a bit? Um, What I’d say, like, is when I do this with folks, I try to find a recent example of something I’ve done in their space, right? If it’s about retail, a lot of retailers have extremely similar goals and similar problems on sites. So I’ve alluded to, for example, looking at the autocomplete and long queries and, you know, query relaxation, query expansion, spelling correction, there’s a list. In fact, I’ve written a blog called Query Understanding, that tries to walk through the lifecycle of what happens to search queries, which is, in my view, often the most neglected part of what people have done when they’ve developed search applications, and then a variety of things with relevance. And so, I’d say the problems tend to be similar across unfortunately, it’s often where you can just point a machine at a site and do these things. Because the notion of what it is to find something relevant, the notion of how you would back off from something, or what a conversion is, those are a bit idiosyncratic, for each site. I worked with, with one retailer where they had a lot of items that cost $1000s, in some cases, even over a million dollars. On a site like that, you can’t say, “well, that’s optimized for conversion,” nobody just does a search finds a million dollar piece of jewelry and clicks buy it now. So or, I have also worked with customers, where the retailers that have an in store and online offering, and so they know that some of the things that people find online they’ll buy in the stores, and particularly those tend to be things that are either more expensive or larger and therefore more expensive to ship. So, what I’ve seen is that I do need to learn a little bit about each customer, just to even do a reasonable job of helping them in the space. With that said, I think a lot of the product, technical, and sort of user experience challenges, they tend to show up just in slightly different form for everyone. It’s exciting when I find that somebody has a new issue. But I think that search is one of these universal activities. And all the stories are out there, you just have to match the details to that story.

Will Bachman 20:40
Yeah. I’d love to hear what other types of findings you’ve had from looking at these search queries that have been surprising to the, you know, executives in the business, if not to you. You mentioned a few, like, people searching on the, product ID, me, which is kind of a natural thing. I’ve certainly done that lots of times; people searching on, you know, getting like no search results, what are some of the other things that you’ve found by digging into those search queries?

Daniel Tunkelang 21:18
That fun examples, one was where someone was using frequent searches to power what they showed for search suggestions when you were typing, sort of autocomplete. And I said, “These look funny to me.” And then when I dug in, my first impulse was that, well, they weren’t strictly using the frequencies; they were also using some other measures. So, I threw all that away. What if you get it sorting just by the how frequent they were, and they still looked weird. It was one of these moments where I said, Look, I don’t like arguing with data, but I think the data is wrong. And a little bit of investigation turned out that, hey were getting a lot of traffic from bots. So automated scraping traffic, and they knew about this, and they even were removing that traffic from some parts of their accounts, but not from the ones being used to power these completions. And, you know, once those are removed, actually, the frequencies look pretty reasonable. And it was funny, because, you know, as somebody who’s called myself a data scientist, I don’t usually like saying, “my, you know, my opinion is right in the data is wrong,” but it was a case where I’m glad I stuck to my guns, and found the issue in the data. Um, another example of that is surprising to me, as well as to the client, is that, you know, a typical thing people do when they have search engines is the moment you type something in, they convert it all to lowercase, regardless of whether you typed in uppercase or lowercase. And that’s used for matching, and so forth. Nobody types in proper names with capitals, into search engines unless it’s done automatically. But it turns out that you often do find capital letters in the search logs. And the reason you find them is that people copy and paste from one site to search in another, if say, they’re comparison shopping, there may be other reasons like searching for a name that they get from elsewhere. And those searches often are quite different than the searches they would have typed otherwise. And in the case of people who are shopping, they often have very specific intent. If you don’t sort of handle those, well, they’re often very long queries, they may have punctuation or all sorts of things that mess up your search engine, because you didn’t plan for them. So, those are the kinds of discoveries, like finding that a search query is a mixture of capitalized, lowercase punctuation, so forth. Once you find them, you look for them everywhere. But what I found is that, by virtue of my working with so many different companies with their various search logs, I’ve picked up all of these cases and sort of thrown them in my bag of tricks. But folks who have been one place for too long, they’ve discovered some of them, but not all of them. And so I find part of my job is effectively socializing that list of problems people run into.

Will Bachman 24:58
Let’s see what happens if you, you know, client calls you up and says, Hey, we don’t know exactly what the problem is, but we just think that our search could be improved. Like in e commerce, right? If we could improve our search, and that would, you know, increase revenue? Would you? I mean, I suppose you’d do the audit? And then would you start just experimenting and trying out different things? Or, you know, how would you go about actually finding out what is going to be the highest impact?

Daniel Tunkelang 25:32
Well, part of this is, is sort of interrogating them as to what they understand of their problems, for example, if I asked, “but why do you think your search could be better?” Are people complaining, do people inside find us, is the problem that people do searches and, you know, get no results, or they see that there are good results, but they’re not ranked very well. I tried to tease out even you know why it is that they think there’s room for improvement. And to be clear, almost everybody has some room for improvement. But a little bit more specificity hops. And usually people have some intuitions around that. Then, you know, I play with a site myself, to get both my own intuitions and to validate those that they have. Now at that point, it’s still all anecdotal, although it doesn’t take long to figure out, for example, that a site doesn’t do spelling correction, or is highly sensitive to whether you put singular or plural words or things like that, I mean, sometimes you can find out from one or two examples, other things don’t show themselves so easily. But what I then would do, once I’m actually engaging with a customer, is to say, “look, if you have problems you’ve already committed to solving, I can help you tactically with coming up with the best solutions for them. If you’re trying to figure out how to prioritize what problems to solve, I can help you frame them. In terms of things like improving relevance, improving recall, that is making sure you actually show the stuff that you have, improving the efficiency of the searchers journey through your site, whatever it might be, quantify those things.” So that way, you know, sort of a way of measuring those, and then look to see, in fact, you know, where you stand on those with your current search, when part of this is also finding out what it is that they are tracking. I’ve seen, for example, people with search engines that they only track when users click on something. And so you learn nothing about searches where users don’t click, that can be a bit challenging. The first thing, there is to say “look, you need to do a bit of a better job of instrumenting your site, so that we can see what’s going on.” Unfortunately, off the shelf tools tend to do a fair amount of this automatically. So, I learned a lot from their logs. And even just seeing different breakdowns of the kinds of queries that are frequent, often makes it easier to see what’s going on. But I mean, what I basically say is, “look, you can give me a month of, you know, what your users are searching for whether or not they clicked on those searches, whether or not they bought something, what positions,” the stuff that tends to be fairly raw in their logs. Then, by looking at different cuts of that, I can get a pretty good idea of what the problems are. Especially if I sort of couple that with playing around on the site, and I can go back and forth between gaining into intuition from sort of anecdotal examples, and then validating that intuition through a more rigorous analysis of their data.

Will Bachman 29:10
So you’ve talked some about the maybe kind of more visible or legible parts of a website that someone who is not necessarily technical could be, you know, looking at as well. Are you also looking at the more kind of technical infrastructure sides that kind of into the kind of computer the back end the database piece of that.

Daniel Tunkelang 29:37
So what happens, the nice thing about search is it’s such a human thing. So in terms of the product problems, those generally do surface that people and if they don’t, you don’t really have to worry about them. Where the technology comes in, is when you’re trying to solve them. So for example, let’s say I noticed that a lot of the results coming back are in the wrong category. So then, I start to say, “Well, what kind of structured data is associated with these, these elements? Do you have a way of mapping or classifying the search queries to those,” which is often a classifier you’d build using machine learning? If you are building such a thing? What does the pipeline look like for doing that? So, I mean, the solutions to the problems, or the causes of the problems, certainly live within that back end, and they either have to do with whether it’s acted, what data is being collected. A bad, you know, what kind of analytical or machine learning pipeline that’s being thrown into and so forth? On the so yeah, no, I, I try to catch the sort of the head of the problem where it’s user visible and chase it to the back. But, you know, one of the things that I’ve seen with search architectures, they’re very deep, very often, right? Like it’s content comes in, it gets joined to a variety of other sources, there’s a, the way the sausage is made can be pretty complicated. So one of the things I’ve learned is that, if I want to have impact on someone quickly, I try to either catch content immediately on the way in, when it’s, you know, kind of in its least processed form, or do things like, say, catch queries on their way in and work with those,because in the middle, it tends to be the hardest to make changes? So, I mean, the short answer is, yes, I look at the various pieces of the stack. But, I try to think to the end of the user problems or content problems. And as I said, if those aren’t visible, they probably don’t matter.

Will Bachman 32:12
Let’s talk about a project that you’ve done, you mentioned that you’ve also worked on search for staffing type situations. Could you give us an example of how search is used there and the type of work that you’ve done with that sort of problem?

Daniel Tunkelang 32:27
Sure. I mean, I can actually talk about even before, as I was doing consulting, I spent about four and a half years at LinkedIn, where I ran search, and there, you know, you see a variety of things. I mean, you have maybe the most conventional use of people typing in someone’s name, which you think would be really easy, but bear in mind, they might type in the first name, you might not know, they’re typing in a name, and so forth. So, that was kind of a fun problem. But also, we wanted to make sure that, you know, when people would look for a software engineer, they still find people who call themselves software developers. Or that if I looked for an architect, I probably mean a software architect, but my wife who works with folks in real estate actually means real architects. And so, in a situation like that, you have a mixture of trying to tease out the various parts of the search query, but then also item, matching that to the right elements in this kind of semi structured data, expanding it, and so forth. And ultimately, in the case of LinkedIn, not that different from the way a staffing application would work, you’re measuring your success, not just by whether people click on the results, but what you know, by the ultimately, whether they engage with the people or jobs that they find. I think that, I’ve worked with other folks in the space, say, companies more in freelance marketplaces like Upwork and Fiverr. And there, you get lots of additional factors coming into play, sometimes a bit of an imbalance between the supply and the demand, you want to make sure that new entrants to the marketplace get enough exposure, but you tend to have a lot more signal about the older entrants. It’s also because the two-sided market you don’t want a handful of players that market, you know, getting more attention than they know what to do with, while everybody else is being starved and this is kind of like dating sites. You don’t want the same few people getting all the attention, because that actually doesn’t optimize for anyone. But ultimately, if you’re trying to hire someone, or you’re trying to find jobs, you kind of come in, you describe what you’re looking for, you probably describe it in only a couple of words, which isn’t enough to do this, and then you have a bunch of results that you want to slice and dice through. It’s not that different than shopping.

Will Bachman 35:32
I’m a big user of LinkedIn. And I have a question about LinkedIn search. So LinkedIn search, could offer much more functionality than it does. Even if you’re a premium user, even if you have Sales Navigator, there are a fairly limited number of dimensions you can search on. And there’s a lot that you cannot. You can’t say, search on, oh, I only want to look at people with so many with like, more than 500 connections, or I only want to look at people that have added 100, more than 100 connections in the last month or two, or have updated their profile recently, because they’re maybe they’re looking for a job, or I want to only look for people with a PhD in chemical engineering. So what, you know, to the degree you can share that that’s not under NDA, what’s the thinking behind limiting that capability, and only exposing, kind of a subset of fields.

Daniel Tunkelang 36:35
So a few things, you know, I left LinkedIn in 2015. So, my knowledge is probably a little bit stale, but I stayed very friendly with folks there. First, the most expensive product, which you can get from LinkedIn for hiring is the recruiter seats. And those two have more power. I don’t remember which of the things you listed that they do, or don’t support. But I do know that the recruiter, the enterprise recruiter users have a more expressive search functionality than the more consumer, or even premium users. So, if you have the budget for that, that’s certainly something you can always explore. But beyond that, you know, the funny thing is, people, power users particularly, right, will say, “Oh, I want all these knobs,” but then the overwhelming majority of people don’t use those knobs or don’t, even if they do they don’t necessarily use them well. I’ll give you an example. At one point, somebody said, you know, only, I forget what it was, but suddenly, some small percentage of LinkedIn users have college degrees. I heard this and I said, “That’s ridiculous.” I mean, we knew at the time that LinkedIn skewed very heavily towards a more educated users. I don’t know the exact numbers were, but certainly, the overwhelming majority went to college. What happened was that in order to, first off, not everybody fills out their profiles, because, they may be lazy about it. In the past, you pretty much had a giant form to fill out. But then, as part of supporting growth, a variety of folks made that experience easier. So it was easier to create a LinkedIn profile with less information. Then, when you’re asked what college you went to, you were asked, what year did you graduate? Not everybody was so thrilled about putting their graduation here on their profile. And at the time, you couldn’t put where you went to college. So you just didn’t put it. Now someone looking at that data, and concluding that most LinkedIn users didn’t go to college, it’s just not understanding how the data was created. But if you have a field you expose, that says went to college or not, yeah, it doesn’t necessarily have an asterisk saying, by the way, a lot of our users don’t bother to put this information on there. And so this is a big problem with sophisticated interfaces is that they will run into problems with missing data, dirty data. Or you know, there were people, for example, who were moderating groups on Facebook, and they said Facebook moderator and the next thing you knew they showed up as Facebook employees – job position, moderator. Not really what you’re looking for. So, I think all of those kinds of issues, not just in LinkedIn, but in general, have pushed search towards people saying, “look, I should just be able to put a few words in the search box like Google and find what I’m looking for.” It turns out that as you’re reacting, thinking ‘well, it’s not quite enough,’ but there’s a tension between exposing the all sort of fine tuning knobs, the ability to do full bleed expressions or even joins, I mean, they’re people who pretty much want to have the same access to something like LinkedIn as they would to a relational database. And that minority of users, they’re loud. And then at the other extreme, you have the majority of users who type in sick. And they expect to find the salespeople or sales jobs they want. So it’s tough navigating that, when you’re making sort of product prioritization choices. And ideally, you find ways to hedge where you are making it easy for the people who just want to type in a little and being smart about what you can infer, but offering just enough flexibility, so that you could say, Oh, you know, maybe you should specify the industry or years of experience or whatever it might be. But then there’s also just the business aspect that, particularly for a company like LinkedIn, the basic product is free and there are a variety of upsells. But ultimately, the people paying the most money are also getting a sort of the most functionality for hiring, marketing and selling to people. And so that’s its own set of decisions about what to make available at what price.

Will Bachman 41:20
I suppose those same principles you just talked about, really apply to many other types of search products, where some users internally may want more functionality, but then you want to keep it simple enough for the majority of users. Talk to me about how you’ve handled that in other situations.

Daniel Tunkelang 41:42
So, I know that I’ve had people complain to me, that we should prioritize some search feature they care about, like, support, advanced search of some sort, support Boolean expression, support natural language understanding. And the first thing I do is to say, “I looked at the logs. And I did find that 0.2% of queries meet the use case you’re describing. So are those really, really important people?” Or, you know, are you simply convinced that if we built it, they will come? Now, how aggressively I tell people that it will vary, but it’s, you know, both when I was an employee, and as a consultant, it’s my job to speak truth to power in these things and to say, Look, you want to meet a real user need. And if there’s no evidence for that need, in your own traffic, then or there’s minimal evidence for that need, then this is a really big gamble. Because changing user behavior is hard. Especially when that user behavior doesn’t just come from people’s familiarity with your site, it comes from the way they’ve been trained to do search on other sites. Which is why so you know, the baseline is people are going to search your site, the way they would search Google the way they would search Amazon, and so forth, because that’s most of the time they spend in search boxes. So that’s not to say that your data or the use cases you’re supporting are the same ones as what Google and Amazon does, that can be a bit of a mismatch there. But your baseline should be that is how your users will behave. So when product managers or engineering leaders get excited about, kind of very exotic ways of doing search, I tried to push back with a certain skepticism that they would have to market this to their users, in order to get them to engage in it. And part of what this often exposes is a lack of familiarity with their own user behavior, which is something that people need to spend more time getting to know, especially in search.

Will Bachman 44:10
When you yourself are searching on a website as a user of the website, what are some of the things that you do that the rest of us might not know about to do? Or? In other words, what are some tips and tricks that you have for all of us on how to search effectively?

Daniel Tunkelang 44:31
I do think that a little bit of Advanced Search can go a long way. A lot of sites will let you, for example, quote phrases or use minus to negate things and I would say that it’s often worth doing that, to get a sense of what’s available. I mean, part of the problem is the sort of, you know, unknown problem, anything that’s below the first page, you don’t really have a sense of what’s there. And I think that these kinds of Boolean operators and or not are useful for exploring. I’d say, in general, as much as the people who are building the search engine are trying really hard to make the ranking be for you. They’re not in your head. So the more you can express in the query, the more you can control what you’re going to get back. And I would say also, learn what happens with the way that any kind of sorting is done, I found many sites, the moment that you change the sort, they throw any notion of relevance out the window. And in those cases, you may be better off, for example, filtering, and using the default sorts that they have, rather than using a sort. Sorting by price. On an ecommerce site is notorious for this. And many sites, you get, you know, complete garbage when you sort by price, because you suddenly discover all the irrelevant results that have been included. And you’d been better off kind of doing a binary search, if you will, with different filter values and saying, I don’t really want the cheapest one. But I need to find a reasonable range below which I’m still seeing relevant results. Good sites do this for you. But that doesn’t mean you can’t do it yourself. The other thing is that sometimes you’re able to, you find a few good products, and or it might not be products, it might be for looking for people, it might be a few good people, and so forth. It’s worth looking at the words on those, which is essentially turn to use those to inspire better queries. This is actually something known as Mercia Base, it’s called the pearl growing part of information foraging theory. Again, something that the best sites will do some of for you. They’ll show you products like this one, or even suggest searches based on a particular page. But it’s something you can often do yourself if they don’t.

Will Bachman 47:17
Talk to me about the community of experts like yourself in search, and how you get projects. I imagine that there’s not a ton of, you know, experts and search out there. So like, do you kind of know the other players? Do you know, most of the folks at companies that are responsible for search? I’m curious about that, just that kind of social networking space of search experts.

Daniel Tunkelang 47:49
It’s a funny space, I think that what happened in search is that, there was a time when search was just very hot as a problem. And then it became replaced by recommendations, data science, AI, and me, I still like search. So I doubled down on it using machine learning and AI, to be sure, but the what happened is that in the academic community, there are a handful of conferences focused on information retrieval and search. A lot of the folks in there are professors, then you get folks who are researchers at places like Microsoft and Google. It’s a nice community. I love these people. And I hope we can have conferences again, one of these days. But they’re often a little bit detached from the pragmatic problems that most folks are working on in search. There are a handful of industry conferences. But part of the problem there, is that they’re often organized around vendors. And there really isn’t a great vendor neutral gathering. So then you have the online world. And back in 98, when sorry, back in 2008. When I realized it was a bit of a void there, I started blogging about search problems. And I found that people came to me even though I was kind of a nobody at that point in the space. But a lot of what I was doing was just taking everything I was learning the best practices of, and putting it in a place that not only would share it with people, but would encourage people to come into comment threads. And I was shocked that some bold faced names, who had worked on search at places like Google and Microsoft, were showing up on my blog and putting in comments. Of course, I sort of took advantage of that quickly and got to know them. And, what I found is that, if you go to these conferences, you put yourself out there you’re write, you engage with people, and at some point, I’d actually organize a symposium on human computer information retrieval. I mean, there’s definitely work to get to know, folks, but it’s not that big a community. And I’ve been lucky to meet a lot of the folks working on it. And when I discover people who are doing interesting work, maybe from a conference presentation, so forth. I just cold call them, like, “Okay, how are you? How can we run into each other?” You know, I’m sure there are people who receive those kinds of notes from me have been a little bit thrown by them. But I’ve found that it’s a great way to get the other folks in space.

Will Bachman 50:41
And before we wrap up, please, where can we find your blog, and what’s the best place for people to find you online.

Daniel Tunkelang 50:50
So the nice thing about having a name like Tunkelang, is that there’s no place to hide. If you look me up on Google, you’ll get a little note panel, and that will link to a bunch of my stuff. But I post most of my writings now on medium, so meet my my media name’s DTunkelang. And I have a series of posts called Query Understanding, conveniently, at QueryUnderstanding.com where that really walks through the life cycle there. I’ve content thats posted on LinkedIn, and even old posts at a blog called The Noisy Channel, that is, even though I haven’t posted anything new there, in several years, I keep the content up because it’s a good place for it to live. I also post on Quora, frankly, like, I would say, Medium is probably the best place to look for things. But, as I said, googling my name, you’ll find a bunch of this stuff. And, you know, can’t miss an opportunity for self-promotion. But if you really have search problems, feel free to just reach out to me directly. through LinkedIn, you can just send me messages. I have a what’s called Open link.

Will Bachman 52:06
And as Daniel said, he shows up in search results. Daniel we’ll include those links in the show notes. It was really great speaking with you, thanks so much for coming on the show.

Daniel Tunkelang 52:16
My pleasure. Well, thanks for having me.

Related Episodes

jay-altizer-bain-alum-dallas-tx

Episode
440

Food Industry 101

Jay Altizer

Episode
439

Craig Beal on the Travel Business

Craig Beal

Episode
438

Rob Ristagno on Customer Segmentation

Rob Ristagno

Episode
437

Equity Research

Neeraj Monga