Exploring Search Types in Azure AI Search Service Indexes

I'm exploring replacing the WordPress search engine with an AI-powered search that uses AI models to better understand human queries beyond simple keywords, delivering more natural and comprehensible responses.

After indexing and embedding my content using text-embedding-ada-002 in the Azure AI Search Service index and performing some queries, I noticed that the query responses varied significantly depending on the query type and the parameters included in the JSON payload sent to the API.

In brief, the available search types are:

keyword search
vector search
semantic search
hybrid search

keyword search

The query operates as a standard search, where a human-readable query is sent to the search service to find documents containing the specified terms within fields marked as searchable. Results are ranked based on relevance. If a scoring profile is configured, the results are re-ranked according to the profile, with both the original and re-ranked scores returned:

"@search.score": 0.015384615398943424,
"@search.rerankerScore": 2.4301483631134033,

The search engine uses a scoring algorithm (BM25 by default) to determine relevance based on term frequency, inverse document frequency, and field lengths as we can see from the JSON definition:

"similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity"
  }

Example of a pure keywork search:

{
"search": "articles about AI",
"select": "id,parent_id,title,chunk_content,url",
"top": 5,
"scoringProfile": "name of scoring profile"
}

The line "scoringProfile": "scoring profile" can be omitted if you have only one scoring profile, as Azure applies it by default.

vector search

Vector search enables similarity search by comparing vector representations (embeddings) of the query and documents. It’s ideal for scenarios where you want to find documents based on conceptual similarity rather than exact keyword matches.

Documents in the index must have a field (e.g., "embedding") containing vector embeddings (I already have them when I indexed my posts using text-embedding-ada-002) and the query vector is provided in the "vectorQueries" parameter in the JSON request:

{
    "search": "*",
    "vectorQueries": [
        {
            "kind": "vector",
            "vector": [0.010303643, -0.0023242137, /* ... more values ... */],
            "fields": "embedding",
            "k": 5
        }
    ],
    "select": "id,parent_id,title,chunk_content,url",
    "top": 5
}

The * in the search field indicates that the query will skip keyword search and instead perform a vector search on the "embedding" field of the index.

For a pure vector search, the query relies solely on vectors, but if the retrieved documents contain non-vector fields (such as text, numbers, or tags), the scoring profile can use these fields for re-ranking. The scoring profile adjusts the result set by boosting documents based on criteria defined in the profile. While it does not modify the core vector similarity scores, it influences the final ranking.

semantic search

Semantic search makes search results better by understanding what a user really means when they type a query, instead of just looking for exact keyword matches. The results are re-arranged so the most relevant ones appear first, based on the query's meaning.

To process a semantic query we need to set "queryType" to "semantic" and provide a semanticConfiguration. Unlike other query types, semantic search cannot be used on its own. It functions as a secondary ranking layer (L2 ranker) and requires an initial set of results from a keyword or vector search, which it then re-ranks by analyzing the query's intent using semantic understanding.

{
    "search": "articles about AI",
    "queryType": "semantic",
    "semanticConfiguration": "semantic-config",
    "select": "id,parent_id,title,chunk_content,url",
    "top": 5
}

hybrid search

A hybrid search blends keyword and vector search techniques into one request. It includes a standard text query (in the search parameter) and one or more vector queries (in the vectorQueries parameter). Azure AI Search runs both the text and vector queries simultaneously. After retrieving results from both searches, they are combined into a single result set. To prioritize the most relevant documents, Azure AI Search applies the Reciprocal Rank Fusion (RRF) algorithm, which merges relevance scores from both search types, boosting documents that perform well in both keyword and vector searches.

{
    "search": "articles about AI",
    "vectorQueries": [
        {
            "kind": "vector",
            "vector": [0.010303643, -0.0023242137, /* ... more values ... */],
            "fields": "embedding",
            "k": 5
        }
    ],
    "select": "id,parent_id,title,chunk_content,url",
    "top": 5
}

If a scoring profile is defined in the index, it influences the initial keyword search results before they’re combined with vector search results and re-ranked semantically.

Optionally, a semantic configuration can be included in a hybrid search to enhance the ranking:

{
    "search": "articles about AI",
    "queryType": "semantic",
    "semanticConfiguration": "semantic-config",
    "vectorQueries": [
        {
            "kind": "vector",
            "vector": [0.010303643, -0.0023242137, /* ... more values ... */],
            "fields": "embedding",
            "k": 5
        }
    ],
    "select": "id,parent_id,title,chunk_content,url",
    "top": 5
}

The challenge now is to test and assess each search option to determine the best approach. I plan to run a set of questions that I can create, and also use an AI model to generate diverse, creative test questions by setting a high temperature parameter.

Featured image created with Grok.

Exploring Search Types in Azure AI Search Service Indexes

keyword search

vector search

semantic search

hybrid search

Leave a Reply Cancel reply