Azure AI search for RAG use cases

10 min readAug 24, 2024

Azure AI Search (formerly known as “Azure Cognitive Search”) provides secure information retrieval at scale over user-owned content in traditional and generative AI search applications. Azure AI Search is a recommended retriever for Retrieval-Augmented Generation use cases that are built on Azure or the Microsoft stack, due to its versatile capabilities in vector and full-text hybrid search, rich indexing with integrated data chunking and vectorization, sophisticated query syntax, and advanced relevance tuning. Additionally, its seamless integration with Azure’s scale, security, AI services, and OpenAI further enhances its utility and performance for complex, dynamic applications.

Microsoft has a few different options for your search requirements and the below table from their documentation shows how Azure AI search compares with other options as of August 2024.

How Azure AI search compares with other options from Microsoft

It is worth noting in the above table from Microsoft’s recommendations that although Azure SQL and Azure Cosmos DB have full text and vector search features, resource utilization is an inflection point. Indexing and queries (on the text data) are computationally intensive. Offloading search from the DBMS preserves system resources for transaction processing.

Indexing/ adding data into Azure AI search

In Azure AI Search, queries execute over user-owned content that’s loaded into a search index. There two basic workflows for populating an index: push your data into the index programmatically, or pull in the data using a search indexer.

Pushing data to an index

As per the Microsoft docs, push model is an approach that uses APIs to upload documents into an existing search index. You can upload documents individually or in batches up to 1000 per batch, or 16 MB per batch, whichever limit comes first.

Key benefits include:

No restrictions on data source type. The payload must be composed of JSON documents that map to your index schema, but the data can be sourced from anywhere.
No restrictions on frequency of execution. You can push changes to an index as often as you like. For applications having low latency requirements (for example, when the index needs to be in sync with product inventory fluctuations), the push model is your only option.
Connectivity and the secure retrieval of documents are fully under your control. In contrast, indexer connections are authenticated using the security features provided in Azure AI Search.

Indexing actions: upload, merge, mergeOrUpload, delete

You can control the type of indexing action on a per-document basis, specifying whether the document should be uploaded in full, merged with existing document content, or deleted.

Whether you use the REST API or an Azure SDK, the following document operations are supported for data import:

Upload, similar to an “upsert” where the document is inserted if it’s new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field’s value is set to null.
merge updates a document that already exists, and fails a document that can’t be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type Collection(Edm.String). For example, if a tags field starts with a value of ["budget"] and you execute a merge with ["economy", "pool"], the final value of the tags field is ["economy", "pool"]. It won't be ["budget", "economy", "pool"].
mergeOrUpload behaves like merge if the document exists, and upload if the document is new.
delete removes the entire document from the index. If you want to remove an individual field, use merge instead, setting the field in question to null.

The push pattern can be implemented for RAG use cases using the Azure AI Search client libaries or popular LLM application development frameworks like Langchain. In practice however, I have seen it is better to use the client libraries as they are officially from Microsoft and allow to use the most recent features released by Azure AI Search. The index schema should ideally also contain vector fields to enable vector storage and retrieval. Without vector retrieval a traditional BM25 based text search will be performed and the results may not be semantically relevant to the query. This GitHub repo from Microsoft has a sample app for Retrieval-Augmented Generation pattern running in Azure using Azure AI search for retrieval and Azure OpenAI Large Language Models for text generation to power a RAG application.

Architecture of RAG app using Azure AI search (Source)

Pulling data into an index

The pull model uses indexers connecting to a supported data source, automatically uploading the data into your index. Indexers from Microsoft are available for these platforms:

Indexers connect an index to a data source (usually a table, view, or equivalent structure), and map source fields to equivalent fields in the index. During execution, the rowset is automatically transformed to JSON and loaded into the specified index. All indexers support schedules so that you can specify how frequently the data is to be refreshed. Most indexers provide change tracking if the data source supports it. By tracking changes and deletes to existing documents in addition to recognizing new documents, indexers remove the need to actively manage the data in your index.

In practice, the pull pattern is faster to implement and can be done from within the Azure portal without any development required. However, for complex transformations or customizations, the push pattern is preferred as it provides more control over the indexing operation via the client libraries.

Data consistency for Azure AI Search

In a distributed database like Azure AI Search, understanding data consistency is crucial. Azure AI Search ensures eventual consistency, meaning all documents will be indexed and searchable eventually, though not necessarily in the order ingested (no monotonic reads). An HTTP status 200 from an indexing request indicates data durability but not immediate searchability, with a potential delay of a few seconds depending on load. This system-wide eventual consistency model applies across all service tiers, with minor timing variations due to data partitioning differences. Developers should design their search solutions to accommodate these factors for optimal performance and reliability. For more details, visit the Stack Overflow discussion.

Azure AI Search offers the sessionId property, which can be used to create a sticky session for more consistent results. By using the same sessionId, Azure makes a best-effort attempt to target the same replica set, providing more consistent query results (source). However, it’s important to note that reusing the same sessionId repeatedly may interfere with load balancing and adversely affect the search service’s performance.

Azure AI search tiers

As per the Microsoft docs, part of creating a search service is choosing a pricing tier (or SKU) that’s fixed for the lifetime of the service. In the portal, tier is specified in the Select Pricing Tier page when you create the service.

The tier determines:

Maximum number of indexes and other objects allowed on the service
Size and speed of partitions (physical storage)
Billable rate as a fixed monthly cost, but also an incremental cost if you add capacity

Free creates a limited search service for smaller projects, like running tutorials and code samples. Internally, system resources are shared among multiple subscribers. You can’t scale a free service, run significant workloads, and some premium features aren’t available. You can only have one free search service per Azure subscription.

The most commonly used billable tiers include:

Basic has the ability to meet SLA with its support for three replicas.
Standard (S1, S2, S3) is the default. It gives you more flexibility in scaling for workloads. You can scale both partitions and replicas. With dedicated resources under your control, you can deploy larger projects, optimize performance, and increase capacity.

In practise for user facing applications Basic or Standard tiers are mostly used. There are a few other billable tiers which you can view in Microsoft’s pricing page.

Do note that the portal defaults to the Standard tier when creating a seach service and in case you don’t need the Standard Sku limits, you can change to Basic tier while creating and pay much lesser. However, note that you cannot change the tier once you have procured the search service, so keep that in mind while starting a project. The best way to choose the tier correctly would be to understand the volume of documents that need to indexed initially and then on a recurring basis as the RAG application is used. The vector and inverted indexes of Azure AI search mean that the storage sizes can be significant and can’t be directly estimated easily from documents. However, we can get an estimate of the storage resource consumption in AI search by adding a few documents and then finalizing the tier by extraplorating the resource consumption for the total content and recurring updates to the document library.

Schema for RAG and chat-style apps

Microsoft provides some basic schemas for static content that is indexed an vectorized for powering generative search as below.

"name": "example-index-from-accelerator",
"fields": [
  { "name": "id", "type": "Edm.String", "searchable": false, "filterable": true, "retrievable": true },
  { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true },
  { "name": "content_vector", "type": "Collection(Edm.Single)", "searchable": true, "retrievable": true, "dimensions": 1536, "vectorSearchProfile": "my-vector-profile"},
  { "name": "metadata", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true },
  { "name": "title", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "facetable": true },
  { "name": "source", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true  },
  { "name": "chunk", "type": "Edm.Int32", "searchable": false, "filterable": true, "retrievable": true },
  { "name": "offset", "type": "Edm.Int32", "searchable": false, "filterable": true, "retrievable": true }
]

Querying search index using hybrid search

According to Microsoft docs, hybrid search combines the strengths of vector search and keyword search. The advantage of vector search is finding information that’s conceptually similar to your search query, even if there are no keyword matches in the inverted index. The advantage of keyword or full text search is precision, with the ability to apply semantic ranking that improves the quality of the initial results. Some scenarios — such as querying over product codes, highly specialized jargon, dates, and people’s names — can perform better with keyword search because it can identify exact matches.

Benchmark testing on real-world and benchmark datasets indicates that hybrid retrieval with semantic ranking offers significant benefits in search relevance.

AI Search uses the Reciprocal Rank Fusion algorithm to retrieve final result set from the hybrid search. As per the docs, Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores from multiple, previously ranked results to produce a unified result set. In Azure AI Search, RRF is used whenever there are two or more queries that execute in parallel. Each query produces a ranked result set, and RRF is used to merge and homogenize the rankings into a single result set, returned in the query response. Examples of scenarios where RRF is always used include hybrid search and multiple vector queries executing concurrently.

RRF is based on the concept of reciprocal rank, which is the inverse of the rank of the first relevant document in a list of search results. The goal of the technique is to take into account the position of the items in the original rankings, and give higher importance to items that are ranked higher in multiple lists. This can help improve the overall quality and reliability of the final ranking, making it more useful for the task of fusing multiple ordered search results

Scores in a hybrid search results

The search index in Azure AI search can be queries using the REST API, any of the client libraries in the different programming languages or the LLM application development frameworks like Langchain, Llamaindex, etc. The search can be the default BM25 text search, only ANN based vector search, or a hybrid of these options. The search queries can include exact match filters, scoring profiles, semantic reranking or sorting fields.

As per the Microsoft docs, whenever results are ranked, @search.score property contains the value used to order the results. Scores are generated by ranking algorithms that vary for each method. Each algorithm has its own range and magnitude.

By default, if you aren’t using pagination, the search engine returns the top 50 highest ranking matches for full text search, and the most similar k matches for vector search. In a hybrid query, top determines the number of results in the response. Based on defaults, the top 50 highest ranked matches of the unified result set are returned.

Often, the search engine finds more results than top and k. To return more results, use the paging parameters top, skip, and next. Paging is how you determine the number of results on each logical page and navigate through the full payload. You can set maxTextRecallSize to larger values (the default is 1,000) to return more results from the text side of hybrid query.

Diagram of a search scoring workflow (source)

A query that generates the previous workflow might look like this:

POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: {{admin-api-key}}
{
   "queryType":"semantic",
   "search":"hello world",
   "searchFields":"field_a, field_b",
   "vectorQueries": [
       {
           "kind":"vector",
           "vector": [1.0, 2.0, 3.0],
           "fields": "field_c, field_d"
       },
       {
           "kind":"vector",
           "vector": [4.0, 5.0, 6.0],
           "fields": "field_d, field_e"
       }
   ],
   "scoringProfile":"my_scoring_profile"
}

Conclusion

In conclusion, Azure AI Search offers a robust solution for RAG (Retrieve and Generate) use cases, providing a versatile framework for indexing and adding data. Understanding data consistency is critical, as the service guarantees eventual consistency which requires designing applications to handle potential delays in data visibility. Azure AI Search tiers offer various levels of performance and scalability, catering to different needs without compromising on the eventual consistency model. When it comes to schema design, particularly for RAG and chat-style applications, flexibility and thorough planning are essential to ensure efficient data retrieval and interaction. Lastly, querying the search index using hybrid search methods enhances the ability to deliver precise, context-rich responses, thereby leveraging the full potential of Azure AI Search in complex, dynamic applications. These capabilities make Azure AI Search an invaluable tool for developing intelligent, responsive, and scalable search solutions.