The Guide to Semantic Search for Sourcing and Recruiting

If you have nearly any tenure in HR, sourcing or recruiting, you’ve probably heard something about “semantic search” and perhaps you would like to learn more.

Well – you’ve found the right article.

As a follow-up to my recent Slideshare on AI sourcing and matching, I am going to provide an overview of semantic search, the claims that semantic search vendors often make, explain how semantic search applications actually work, and expose some practical limitations of semantic search  recruiting solutions.

Additionally, I will classify the 5 basic levels of semantic search and give you examples of how you can conduct Level 3 Semantic Search (Grammatical/Natural) with Monster, Bing, and any search engine that allows for fixed or configurable proximity.

But first – let’s define “semantic search.”

What is Semantic Search?

Semantics is the study of meaning, inherent at the levels of words, phrases, and sentences.

Semantic search is most often used to describe searching beyond the literal lexical (exact word for word) match and into the meaning of words and phrases at the conceptual and contextual level, and sentences at the grammatical level.

When sourcing candidates, semantic search can be achieved at the conceptual level when a search for a specific term (e.g., Java) also yields matches on related terms (e.g., J2EE, EJB, servlets, etc). – words that are related conceptually.

As another example, in the healthcare space, a semantic search for “cancer” could also produce positive hits on terms such as oncology, lymphoma, tumor, etc.

Words and phrases by themselves can be somewhat ambiguous, but are less so when taken in context using surrounding words or passages that can shed light on the intended meaning.

For example, “Java” is a software programming language, but it is also used to refer to coffee, and it is also an Indonesian island. A quick Twitter search for “Java” will typically net you a mix of references to Java. By reading each tweet and the text surrounding “Java,” we can easily disambiguate the reference to “Java” and divine the intended meaning.

Below you can see Java referenced on Twitter in 3 very different ways in 3 successive tweets, and the context tells you how to interpret the meaning of “Java” in each one:

Why Should HR/Recruiting Professionals Care about Semantic Search?

There is more information available about more people today than ever, and the volume is only going to increase and the rate at which is accumulates is accelerating.

Sifting through an ever-increasingly large amount of human capital data in the form of resumes, social media profiles (LinkedIn, Twitter, Facebook, etc.), blogs and other sources is a significant challenge.

The promise and potential of semantic search is that it can help you more quickly and easily cut through massive volumes of potential candidate information to help you find more of the right people faster than standard methods.

Choose Your Own Adventure!

Now that you understand semantics and the basic concepts of semantic search, you have a choice:

  1. If you don’t particularly care to get into the details of how solutions that claim to use semantic search actually work and achieve their claims, you can skip all the way to the end for a presentation on the 5 Levels of semantic search. In that presentation that you can find a couple of examples of how to achieve Level 3 semantic search with Monster or any search engine that offers proximity search, which allows you to control how close your search terms are to each other.
  2. If you currently use a matching application that claims to leverage semantic search (e.g., Monster’s 6Sense), if you’re considering purchasing/implementing such a solution, or if you’re just curious how these kinds of applications achieve their claims, don’t skip ahead and continue reading.

Semantic Search Claims for Sourcing and Recruiting

Many vendors are quick to explain that their semantic search solution can help you and/or (wink) your team to “stop wasting time trying to create difficult and complex Boolean search strings”, and instead, let “intelligent search and match” applications do the work for you.

Some claim that “a single query will give you the results you need – no more re-querying, no more waste of time!”

Going further, semantic search solutions for the recruiting industry commonly state that their offerings:

  • Understand titles, skills, and concepts
  • Automatically analyze and define relationships between words and concepts
  • Intuit and infer experience by context
  • Perform pattern recognition
  • Perform fuzzy matching

Sounds Great – But How Do They Really Work?

Over the years, I’ve had many people attempt to sell me on the benefits of semantic search when it comes to sourcing potential candidates, and I have also had the opportunity to use and evaluate quite a few semantic search solutions, including pretty much all of the usual suspects in the space.

My experience and skill with regard to human capital data information retrieval information retrieval affords me some unique insight as to how the technologies and techniques semantic search vendors utilize to make their claims actually work, as well as their limitations specific to human capital data. More on that last bit later.

First, let’s get into how semantic search applications for recruiting actually work.

When semantic search vendors make claims that their applications can automatically understand titles, skills, and concepts, analyze and define relationships between words and concepts, intuit and infer experience by context, perform pattern recognition and fuzzy matching, they are typically using 1 or more of the following to do so:

Resume Parsing

Parsing slices and dices resumes and extracts useful information contextually based on the structure of most resumes.

A good parser can take a resume and break it down to its component parts and “understand” a person’s experience.

Resume parsing can be used to extract skill words and differentiate between terms mentioned in skills summaries vs. those that are mentioned in the body of the resume – the latter having a higher probability of being indicative of real experience. Resume parsers can also typically extract titles and employers and some can even reliably identify the most recent title and employer.

Solid parsing technology can correctly identify addresses and education information and “realize” that “George Washington” in an address is likely a street name, but in an education section a University.

Some parsers can even determine current vs. dated experience with specific skills, as well as automatically calculate years of experience with specific skills, management, and overall years of work experience based on date analysis. Being able to control years of experience can help find people who aren’t under- or overqualified or not likely to be in the compensation range of the opening you are sourcing/recruiting for.

Resume parsing can result in highly structured data, which can enable a recruiter to move beyond free text search and to search for information contextually in specific sections/fields, such as current title, current experience, education, etc.

A more automated way of achieving semantic search via parsed resume data is to take basic search terms entered by a sourcer or recruiter and weight search results based on recency of related titles and experience, based on data parsed and identified as more recent, as well as calculated years of experience (e.g., Java and related terms mentioned in most recent work experience, dated ‘9/06 to Present’).

So now you know that when you hear that a semantic search application can “automatically understand titles and skills” and can “intuit and infer experience by context,” not only do you know what they’re talking about, you know at least one of the ways they try to make good on that claim.

Taxonomies and Ontologies

Some semantic search solutions for recruiting leverage ontologies and taxonomies.

Taxonomy is the science which deals with the study of identifying, grouping, and naming things according to their established natural relationship. An ontology is a “formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts.”

As complex as those definitions may sound, they are really quite easy to understand when it comes to how vendors utilize taxonomies and ontologies to achieve semantic search.

Taxonomies and ontologies are leveraged by semantic search solutions for recruiting and staffing as a back-end list of keywords organized by concept and relationship so that when you search for a term or phrase, the solution can compare your search against terms and phrases it “knows” are conceptually related.

A common taxonomy used in recruiting solutions is a parent-child, hierarchical (directional, one way) taxonomy. Wikipedia uses this simple way of explaining the parent-child relationship: A car is a subtype of vehicle, so any car is also a vehicle, but not every vehicle is a car.

Hierarchical Taxonomy

With a hierarchical taxonomy for accounting terminology, if you searched for “SOX 404,” you should get positive hits and relevance ranking from the term “SOX 404” as well as “accounting,” because the system can recognize that “SOX 404” is an accounting-related “child” term/concept tied to the “parent” term “accounting.”

In a true hierarchical taxonomy, if you searched for “accounting,” you should only get positive hits and ranking on the term “accounting” and not on mentions of “SOX 404,” because not all accounting-related work involves SOX 404.

In other words, SOX 404 is accounting-related work, but not all accounting work is SOX 404-related.

Conceptual Search

A semantic search solution using a hierarchical taxonomy can help you find terms and phrases other than the ones you specifically searched for, because they compare your search terms with the taxonomy and return results that not only mention your keywords, but also related terminology.

This is a form of “conceptual search” – you search for 1 term, and you can get results mentioning all related concepts as well as your original search term.

In addition to hierarchical relationships, semantic search solutions may also perform conceptual searching based on synonymous terms and phrases.

For example, if you searched for “Director of Tax,” a well developed taxonomy would also return results for all of the title variants you didn’t actually search for, but are the same, such as “Tax Director,” “Director, Tax,” etc. This form of conceptual search can be useful for finding common abbreviations for phrases, such as CPA and “C.P.A” from a search for “Certified Public Accountant,” and vice versa.

A comprehensive taxonomy can be especially helpful for Information Technology sourcers and recruiters, as it can be difficult to know or even remember all of the various ways certain technologies can be referenced (SQL 2008, SQL Server, MSSQL, etc.).

Statistical Methods

Rather than relying on pre-built taxonomies to define relationships between titles, terms and concepts, some semantic search solutions use complex statistical methods in an attempt to automatically “understand” language and relationships between words.

While I am not aware of any semantic search vendor supplying solutions to the recruiting industry that publicly explains their statistical methods, thankfully Google gives us a tiny bit of insight of how such an approach works.

Google has found that keywords with the same or similar meanings in a natural language sense tend to be “close” in units of Google distance, while words with dissimilar meanings tend to be farther apart.

Here is the equation for the Google distance, which is a measure of semantic interrelatedness derived from the number of hits returned by the Google search engine for a given set of keywords.

That was easy, right?

Semantic Clustering, Machine Learning, Pattern Recognition – Oh My!

I don’t pretend to understand semantic clustering and machine learning at the technical level, but I do have a good understanding of what they are used for and how they work at a high level, specifically with regard to sourcing and matching candidates from human capital data.

Semantic clustering is a non-interactive and unsupervised machine learning technique seeking to automatically analyze and define relationships between words and concepts.

For candidate sourcing purposes, algorithms are created to automatically learn to recognize complex patterns, “learn” and draw relationships from human capital data (resumes, social network profiles, etc.).

Rather the relying on a static taxonomy, semantic clustering allows for dynamic concept matching.

Based on statistical analysis/algorithms and pattern recognition, an application can “learn” that C# is related to .Net, due in part to keyword frequency and proximity that it has analyzed across thousands to millions of documents.

A query cloud offers an excellent visualization of semantic clustering – you can see and choose from a group of terms and phrases that the semantic search solution has determined to be related to your search term.

Here is an example of a query cloud for C#:

While semantic clustering can quickly and easily find related terms, the question has to be asked of whether or not the related terms are actually relevant. Only the person conducting the search can make that determination.

Fuzzy Logic

When an application claims to perform fuzzy matching, it is apply fuzzy logic to the search, which finds approximate matches to a pattern in a string.

Fuzzy logic is especially useful to automatically search for slight phrase variations and word misspellings. Most sourcers/recruiters do not take the time to search for misspellings, and understandably so as it is quite laborious. However, a good fuzzy matching solution will find your exact search terms as well as any slight spelling variation, intentional or unintentional.

If you don’t search for misspellings, you’re missing people:

The 5 Levels of Semantic Search

Now that you have a basic understanding of the concept of semantic search and how applications using semantic search actually work, I’d like to introduce you to what I believe are the 5 basic levels of semantic search.

Intended for HR professionals, sourcers and recruiters, this presentation explains and explores the concepts of semantics and semantic search, including the 5 Levels of Semantic Search: Conceptual Search, Contextual Search, Grammatical/Natural Language Search, Inferential Search, and Tagging.

You’ll also see some examples of how you can achieve Level 3 semantic search using Monster (classic search) or any search engine that allows for fixed or configurable proximity search.

Semantic Search for Recruiting: The Good

I love technology and anything that can make me better and faster at what I do. Semantic search solutions for sourcing candidates can provide many benefits, including:

  • Reducing the time to find relevant matches
  • Lessening or eliminating the need for recruiters to have deep and specialized knowledge within an industry or skill set
  • Reducing and even eliminating time spent on initial research
  • The ability to go beyond literal, identical lexical matching
  • Leveling the playing field for those with less sourcing experience or ability
  • Making an inexperienced person look like a sourcing wizard
  • Boosting teams with low search/sourcing capability
  • Working well for positions where titles effectively identify matches and where there is a low volume and variety of keywords
  • Working well for organizations with a high volume of unchanging hiring needs

Semantic Search for Recruiting: The Bad

On the other hand, you should be aware of some issues associated with blindly trusting semantic search solutions, including:

  • Just because terms are related, it doesn’t automatically make them relevant to the search
  • Removing thought from the talent identification  process
  • The danger of eliminating the need for recruiters to understand what they’re actually searching for
  • Difficulty with information technology, healthcare, and other sectors/verticals with ever-changing technology and terminology
  • Finding some people, but eliminating and/or burying others
  • Finding the best matches based on keywords present, as opposed to the best people
  • The inability to search for what isn’t explicitly stated – applications will only return results that mention required keywords and their variants
  • The fact that many people have skills and experience that are simply not mentioned anywhere in their resumes and thus they cannot be retrieved via any direct search method
  • They level the playing field – if competing companies use the same software solution, they will both find (and miss!) the exact same people
  • The fact that a single search cannot find all of the best people – every search both includes and excludes qualified candidates
  • They can favor keyword rich resumes/profiles, yet keyword poor resumes/profiles may in fact represent better candidates that keyword rich resumes

Semantic Search for Sourcing & Recruiting: The Bottom Line

The potential of semantic search for talent identification and acquisition is powerful and exciting!

However, it’s important to realize that with technology that’s been on the market for over a decade, sourcers and recruiters have already been able to “manually” achieve Levels 1-4 semantic search for a while now, and there are some solutions available today that allow for searchable tagging as well (Level 5).

On the other hand, using software for automating semantic search/match can allow you to quickly, easily and somewhat reliably achieve Levels 1-2 semantic search, depending on the vendor/solution you choose. At this time, true Level 3-5 semantic search is beyond the reach of today’s semantic search/match applications (IMHO).

One of the main and inescapable problems with any automated semantic search/match solution is that human capital data is quite often incomplete and unstructured. Let’s face it – no company is looking to find people because they mention specific keywords and titles – everyone’s looking for their next great hire who has specific skills and experience which may not even be explicitly mentioned in a resume, on a LinkedIn profile, in a Twitter bio, etc.

Matching software can work with what’s there (text that’s present), but they can’t match on what’s not there (text that isn’t present). On the other hand, one thing that humans do incredibly well is instantly perform dynamic inference, more commonly known as “reading between the lines.” Perhaps at some point in the future, software will be able to somewhat reliably infer experience and capability beyond text that is present, but it can’t be done today beyond guessing (e.g., “Were you looking for _____________?”).

Food for thought – how would you like to explain to a candidate that the reason why they weren’t considered for a job was because your semantic search application didn’t think they were a match based on their resume? How would you feel if you were turned down in consideration for a job because a software solution didn’t “like” your resume? Do we really want to rely 100% on a software solution that seems to make our life easier when it can result in missing and altogether eliminating some of the best people available?

While software can retrieve and move data, data requires analysis to yield information and produce knowledge which can facilitate decision making. That’s why these solutions are referred to as Decision Support Systems – the operative word being “support,” because they don’t (and should not!) make the decisions for you – these solutions provide you with data to interpret for information to make an informed decision.

In the case of sourcing/recruiting – it’s deciding who to engage, screen, and potentially recruit.