The integrated vectorization feature in Azure AI Search marks a significant advancement in the field of search and retrieval.
By automating the chunking and vectorization processes, it not only simplifies the development of RAG applications but also enhances the overall efficiency and effectiveness of search functionalities within Azure’s ecosystem.
This innovation is poised to be a game-changer for developers and organizations looking to leverage the power of vector search in their applications.
Overview:
Let’s understand steps involve building a Copilot:
Building a Copilot involves several steps:
- Data Collection: Gather the necessary data and make sure it is clean and well-structured.
- Feature Engineering: Prepare the data by extracting relevant features and transforming them into a suitable format.
- Model Training: Train a machine learning model on the prepared data, using appropriate algorithms and hyperparameters.
- Model Evaluation: Evaluate the model’s performance using metrics such as accuracy, precision, recall, and F1-score.
- Deployment and Integration: Deploy the trained model in a production environment and integrate it with other relevant systems or applications.
Lets consider a scenario where you implement a chat app using Python, Azure OpenAI Service, and Retrieval Augmented Generation (RAG) in Azure AI Search to get answers about employee benefits at a fictitious company. The app is seeded with PDF files including the employee handbook, a benefits document and a list of company roles and expectations.
Architectural overview
A simple architecture of the chat app is shown in the following diagram:
Key components of the architecture include:
- A web application to host the interactive chat experience.
- An Azure AI Search resource to get answers from your own data.
- An Azure OpenAI Service to provide:
- Keywords to enhance the search over your own data.
- Answers from the OpenAI model.
- Embeddings from the ada model
Customize Chat App Settings (Retrieval Mode)
The Chat App is designed to work with any PDF documents. You can use chat app settings to change behavior of responses.
The intelligence of the chat is determined by the OpenAI model and the settings that are used to interact with the model. Refer below.
| Setting | Description |
| Override prompt template | This is the prompt that is used to generate the answer. |
| Temperature | The temperature used for the final Chat Completion API call, a number between 0 and 1 that controls the “creativity” of the model. |
| Minimum search score | The minimum score of the search results that are used to generate the answer. Range depends on search mode used. |
| Minimum reranker score | The minimum score from the semantic ranker of the search results that are used to generate the answer. Ranges from 0-4. |
| Retrieve this many search results | This is the number of search results that are used to generate the answer. You can see these sources returned in the Thought process and Supporting content tabs of the citation. |
| Exclude category | This is the category of documents that are excluded from the search results. |
| Use semantic ranker for retrieval | This is a feature of Azure AI Search that uses machine learning to improve the relevance of search results. |
| Use query-contextual summaries instead of whole documents | When both Use semantic ranker and Use query-contextual summaries are checked, the LLM uses captions extracted from key passages, instead of all the passages, in the highest ranked documents. |
| Suggest follow-up questions | Have the chat app suggest follow-up questions based on the answer. |
| Retrieval mode | Vectors + Text means that the search results are based on the text of the documents and the embeddings of the documents. Vectors means that the search results are based on the embeddings of the documents. Text means that the search results are based on the text of the documents. |
| Stream chat completion responses | Stream response instead of waiting until the complete answer is available for a response. |
As you see, there are options to get your search results,
- Use semantic ranker for retrieval
- Use query-contextual summaries instead of whole documents
- Retrieval mode
These settings refer to Indexing documents for the Chat App or can say its part of Model Training, Fine Tuning, and Knowledge retrieval (Indexing documents, Vector Embedding).
Let’s see the Development stage of a Chat Solution and where Vector Embedding is used.
| Stage | LLM Based Solution | RAG Based Solution | Vector Embedding Used |
| Data Collection | Collecting large datasets from diverse sources | Collecting large datasets from diverse sources | No |
| Data Cleaning | Preprocessing with tools like NLTK, regex | Preprocessing with tools like NLTK, regex | No |
| Model Training | Utilizing deep learning frameworks (TensorFlow, PyTorch) | Utilizing deep learning frameworks (TensorFlow, PyTorch) | Yes |
| Model Fine-Tuning | Applying transfer learning techniques | Applying transfer learning techniques | Yes |
| Knowledge Retrieval | Not applicable | Using database management systems (SQL, NoSQL) | Yes |
| Response Generation | Implementing natural language generation algorithms | Combining natural language generation algorithms with retrieval mechanisms | No |
| Evaluation | Employing automated metrics (BLEU, ROUGE) | Employing automated metrics (BLEU, ROUGE) and retrieval accuracy | No |
What is Vector embedding?
Vector embedding is a critical component used during the model training and fine-tuning stages for both LLM and RAG solutions. Additionally, in RAG solutions, vector embedding plays a vital role in the knowledge retrieval stage to effectively match query embeddings with document embeddings. The tools and technologies mentioned are commonly used in these stages to develop a Copilot.
Machines don’t understand human language & that is where we need embeddings.
LLMs store the meaning and context of the data fed in a specialized format known as embeddings. Imagine capturing the essence of a word, image or video in a single mathematical equation. That’s the power of vector embeddings — one of the most fascinating and influential concepts in machine learning today.
For example, the images of animals like cat and dog are unstructured data and cannot be directly stored in a database. Hence, they will be converted into machine readable format, that’s what we call embeddings and then stored in a vector database.
By translating unstructured and high-dimensional data into a lower-dimensional space, embeddings make it possible to perform complex computations more efficiently.
Types of Embedding:
While most of us have commonly used text embedding, Embeddings can also be utilised for various types of data, such as images, graphs, and more.
⮕ Word Embeddings: Embedding of Individual words. Models: Word2Vec, GloVe, and FastText.
⮕ Sentence Embeddings Embedding of entire sentences as vectors that capture the overall meaning and context of the sentences. Models: Universal Sentence Encoder (USE) and SkipThought.
⮕ Document Embeddings Embedding of entire sentences capturing the semantic information and context of the entire document. Models: Doc2Vec and Paragraph Vectors.
⮕ Image Embeddings — captures different visual features. Models: CNNs, ResNet and VGG.
⮕ User/Product Embeddings represent users/products in a system as vectors. Capture user/products preferences, behaviors, attributes and characteristics. These are primarily used in recommendation systems.
Below are some common embedding models we can use.
⮕ Cohere’s Embedding: Powerful for processing short texts with under 512 tokens.
⮕ Mistral Embedding: Strong embedding for AI/ML modeling like text classification, sentiment analysis etc.
⮕ Open AI Embeddings: Open AI is currently one of the market leaders for Embedding Algorithms. Of the all, OpenAI second-gen text-embedding model, ada-002, has proven to give top-notch results across various use cases.
Understanding Manual Indexing vs Integrated Vectorization.
In order to ingest a document format, we need a tool that can turn it into text. By default, use Azure Document Intelligence (DI in the table below), but we also have local parsers for several formats. The local parsers are not as sophisticated as Azure Document Intelligence, but they can be used to decrease charges.
| Format | Manual indexing | Integrated Vectorization |
| Yes (DI or local with PyPDF) | Yes | |
| HTML | Yes (DI or local with BeautifulSoup) | Yes |
| DOCX, PPTX, XLSX | Yes (DI) | Yes |
| Images (JPG, PNG, BPM, TIFF, HEIFF) | Yes (DI) | Yes |
| TXT | Yes (Local) | Yes |
| JSON | Yes (Local) | Yes |
| CSV | Yes (Local) | Yes |
Overview of the manual indexing process
The prepdocs.py script is responsible for both uploading and indexing documents.
The typical usage is to call it using scripts/prepdocs.sh (Mac/Linux) or scripts/prepdocs.ps1 (Windows), as these scripts will set up a Python virtual environment and pass in the required parameters based on the current azd environment.
Whenever azd up or azd provision is run, the script is called automatically.
The script uses the following steps to index documents:
- If it doesn’t yet exist, create a new index in Azure AI Search.
- Upload the PDFs to Azure Blob Storage.
- Split the PDFs into chunks of text.
- Upload the chunks to Azure AI Search. If using vectors (the default), also compute the embeddings and upload those alongside the text.
Chunking
We’re often asked why we need to break up the PDFs into chunks when Azure AI Search supports searching large documents.
Chunking allows us to limit the amount of information we send to OpenAI due to token limits. By breaking up the content, it allows us to easily find potential chunks of text that we can inject into OpenAI. The method of chunking we use leverages a sliding window of text such that sentences that end one chunk will start the next. This allows us to reduce the chance of losing the context of the text.
If needed, you can modify the chunking algorithm in scripts/prepdocslib/textsplitter.py.
Indexing additional documents
To upload more PDFs, put them in the data/ folder and run ./scripts/prepdocs.sh or ./scripts/prepdocs.ps1.
A recent change added checks to see what’s been uploaded before. The prepdocs script now writes an .md5 file with an MD5 hash of each file that gets uploaded. Whenever the prepdocs script is re-run, that hash is checked against the current hash and the file is skipped if it hasn’t changed.
Removing documents
You may want to remove documents from the index. For example, if you’re using the sample data, you may want to remove the documents that are already in the index before adding your own.
To remove all documents, use the --removeall flag. Open either scripts/prepdocs.sh or scripts/prepdocs.ps1 and add --removeall to the command at the bottom of the file. Then run the script as usual.
You can also remove individual documents by using the --remove flag. Open either scripts/prepdocs.sh or scripts/prepdocs.ps1, add --remove to the command at the bottom of the file, and replace /data/* with /data/YOUR-DOCUMENT-FILENAME-GOES-HERE.pdf. Then run the script as usual.
Overview of Integrated Vectorization
Azure AI search recently introduced an integrated vectorization feature in preview mode. This feature is a cloud-based approach to data ingestion, which takes care of document format cracking, data extraction, chunking, vectorization, and indexing, all with Azure technologies.
NOTE: This feature cannot be used on existing index. You need to create a new index or drop and recreate an existing index. In the newly created index schema, a new field ‘parent_id’ is added. This is used internally by the indexer to manage life cycle of chunks.
Now, we have set up the background, lets understand What is Integrated Vectorization?
What is Integrated Vectorization?
Integrated vectorization is a new feature of Azure AI Search that allows chunking and vectorization of data during ingestion through built-in pull-indexers, and vectorization of text queries through vectorizers. With a deployed Azure OpenAI Service embedding model or a custom embedding model, integrated vectorization facilitates automatic chunking and vectorization during data ingestion from various Azure sources such as Blob Storage, SQL, Cosmos DB, Data Lake Gen2, and more. Furthermore, Azure AI Search now incorporates vectorizers referencing your own embedding models that automatically vectorize text queries, effectively eliminating the need for client application coding logic.
Figure 1 – Integrated vectorization diagram
Key Concepts in Integrated Vectorization
Vector search: In Azure AI Search, this is a capability for indexing, storing, and retrieving vector embeddings from a search index. By representing text as vectors, vector search can identify the most similar documents based on their proximity in a vector space. In vector search, vectorization refers to the conversion of text data into vector embeddings.
Chunking: Process of dividing data into smaller manageable parts (chunks) that can be processed independently. Chunking is required if source documents are too large for the maximum input size of embedding and/or large language models.
Retrieval Augmented Generation (RAG): Architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system (i.e., Azure AI Search) that provides the data.
Choosing Between Integrated Vectorization and Other Options in Azure
In the rapidly evolving field of RAG applications, there are various systems offering chunking and vectorization capabilities. Consequently, you might find yourself pondering the best choice for different scenarios.
For instance, Microsoft’s Azure platform now provides a solution that facilitates the creation of end-to-end applications using the RAG pattern across multiple data sources, including Azure AI Search, all from the convenience of Azure AI Studio. In such a case, it’s advisable to utilize the feature for chunking and vectorization, as this functionality is optimized to work seamlessly with your chosen options.
However, if your aim is to construct your RAG application or a traditional application using vector search through alternative methods, or if the built-in functionality of does not align with your specific – business requirements, integrated vectorization is an excellent alternative.
Benefits of Integrated Vectorization
Here are some of the key benefits of integrated vectorization:
Streamlined Maintenance: Users no longer need to maintain a separate data chunking and vectorization pipeline, reducing overhead and simplifying data maintenance.
Up-to-date Results: Use Azure AI Search to pull indexers for incremental tasks, to pull recent results from your search. The feature works seamlessly with Azure AI Search pull indexers to handle incremental tasks. This allows your search service to deliver recent results.
Reduced Complexity: Automatically vectorize your data, to reduce complexity and increase accuracy.
Increased Relevancy: Generate index projections that map one source document to all corresponding vectorized chunks, to enhance the relevance of results. Moreover, this will simplify application development workflows for those developing Retrieval-Augmented Generation (RAG) applications where data chunking is required for retrieval.
DEMO: Getting Started with Integrated Vectorization
Getting started with integrated vectorization is easy. With just a few clicks in the Azure portal, you can automatically parse your data, divide it into chunks, vectorize them, and project all these vectorized chunks into a and start taking advantage of the many benefits of Azure AI Search.
Data Ingestion Chunking and Vectorization
Follow these quick steps to import, chunk, vectorize and index your data:
1. In the Azure portal, navigate to the Overview section in your Azure AI search service and choose Import your data with vectors from the menu.
2. Select and configure your data source.
3. Go to the Vectorize and Enrich data section, add your Azure OpenAI Service enrichment model, and choose a schedule.
4. Add a prefix for the child objects that will be created as part of the deployment.
5. Review and create.
The Import and vectorize data wizard creates the following AI Search child resources:
- A data source.
- A chunked index with its respective vector fields and vectorizers.
- A skillset. The skillset definition has all the logic for the chunking, vectorization, and index projections to map the chunks to the index.
- A built-in pull indexer.
In case you need more customization for chunking or vectorizing operations, you can take advantage of the 2023-10-01 Preview REST API, Azure AI Search latest preview SDKs (.NET,Python,Java and JavaScript) and modify any of the child resources listed above for integrated vectorization.
Vectorization at query time
When using the Import and vectorize data wizard, the selected embedding model is automatically included as a vectorizer in the index vector field. The vectorizer assigned to the index and linked to a vector field will automatically vectorize any text query submitted.
Utilize the Search explorer within the Azure portal to execute text queries against vector fields with vectorizers. The system will automatically vectorize the query. There are multiple ways to access the Search Explorer within the portal:
1. Once the Import and vectorize data wizard has finished and the indexing operation is complete, wait a few minutes and then click on Start Searching.
2. Alternatively, in the Azure portal, under your search service Overview tab, select Search explorer.
3. You can also use the embedded Search Explorer within an index by selecting the Indexes tab and clicking on the index name.
Upon accessing the Search explorer, you can simply enter a text query. If you have a vector field with an associated vectorizer, the query will be automatically vectorized when you click on Search, and the matching results will be displayed.
To hide the vector fields and make it easier to view the matching results, click on Query options. Then, turn on the toggle button for Hide vector values in search results. Close the Query options and resubmit the search query.
This is a code sample of how a vectorizer looks like in the index JSON definition.
{
"name": "vectorized-index,
"vectorSearch": {
"algorithms": [
{
"name": "myalgorithm",
"kind": "hnsw"
}
],
"vectorizers": [
{
"name": "openai",
"kind": "azureOpenAI",
"azureOpenAIParameters":
{
"resourceUri": "<AzureOpenAIURI>”,
"apiKey": "<AzureOpenAIKey>",
"deploymentId": "<modelDeploymentID>"
}
}
],
"profiles": [
{
"name": "myprofile",
"algorithm": "myalgorythm",
"vectorizer":"openai"
}
]
},
"fields": [
{
"name": "chunkKey",
"type": "Edm.String",
"key": true,
"analyzer": "keyword"
},
{
"name": "parentKey",
"type": "Edm.String"
},
{
"name": "page",
"type": "Edm.String"
},
{
"name": "vector",
"type": "Collection(Edm.Single)",
"dimensions": 1536,
"vectorSearchProfile": "myprofile",
"searchable": true,
"retrievable": true,
"filterable": false,
"sortable": false,
"facetable": false
}
]
}
And here is a code sample of how a text query call submitted against that vector field looks like.
POST <AISearchEndpoint>/indexes/<indexName>/docs/search?api-version=2023-10-01-Preview
{
"vectorQueries": [
{
"kind": "text",
"text":"<add your query>",
"fields": "vector"
}
],
"select": "chunkKey, parentKey, page"
}
Skills that Make Integrated Vectorization Possible During Data Ingestion
Integrated vectorization during data ingestion is made possible by a combination of skills and configurations in Azure AI Search pull indexers. An Azure AI Search skill refers to a singular operation that modifies content in a particular manner. This often involves operations such as text recognition or extraction. However, it can also include a utility skill that refines or alters previously established enrichments. The output of this operation is typically text-based, thereby facilitating its use in comprehensive text queries. Skill operations are orchestrated in skillsets.
A skillset in Azure AI Search is a reusable resource linked to an indexer. It includes one or more skills which invoke either built-in AI functions or external custom processing on documents fetched from an outside data source.
The key skills and configuration that make Integrated vectorization possible include:
- Text split cognitive skill: This skill is designed to break down your data into smaller, manageable chunks. This step is essential for adhering to the input constraints of the embedding model and ensuring the data can be efficiently processed. The smaller data pieces not only facilitate the vectorization process but also enhance the query process, especially for RAG applications. Additionally, the skill enables overlapping of data, which is instrumental in preserving the semantic meaning in multiple scenarios, thereby enhancing the accuracy and quality of search results.
- Azure OpenAI Embedding skill: This skill offers a reliable method to call upon your Azure OpenAI Service embedding model. It empowers generation of highly accurate and precise vectors, thereby improving the overall effectiveness for semantic queries.
- Custom Web API skill: This skill allows you to use your own custom embedding model for vectorization, giving you even more control over the vectorization process.
- Index Projections: This configuration allows you to map one source document associated with multiple chunks. This functionality allows you to have either:
- An index with vectorized chunks only
- An index with the parent documents and vectorized chunks altogether
- Two separate indexes: An index with the full indexed documents and an index with the vectorized chunks.
In conclusion, Integrated Vectorization allows you to improve the accuracy and precision of your searches while reducing complexity and overhead. With just a few clicks, you can import your data into Azure AI Search index and start taking advantage of the many benefits Azure AI Search, including easy integration as retriever in RAG applications.
Recap:
- Introduction: Azure AI Search introduces the public preview of integrated vectorization, enhancing vector search capabilities.
- Understanding steps involve building a Copilot: Understanding Manual Indexing vs Integrated Vectorization.
- What is Integrated Vectorization?: It’s a feature that simplifies chunking and vectorization during data ingestion and query processing, using built-in pull-indexers and vectorizers.
- Key Concepts:
- Vector Search: Indexing, storing, and retrieving vector embeddings from a search index to identify similar documents.
- Chunking: Dividing data into smaller parts for processing, necessary for large documents.
- Retrieval-Augmented Generation (RAG): Combines Large Language Models with an information retrieval system to provide data.
- Choosing Integrated Vectorization: Offers a streamlined approach for creating RAG applications across multiple data sources within Azure AI Studio.
- Benefits: Accelerates development, reduces maintenance during data ingestion and query time, and eliminates the need for client application coding logic.
Conclusion:
In conclusion, the integrated vectorization feature in Azure AI Search marks a significant advancement in the field of search and retrieval. By automating the chunking and vectorization processes, it simplifies the development and deployment of RAG applications while enhancing the overall performance and efficiency of search functionalities within Azure’s ecosystem. This innovation is poised to be a game-changer for developers and organizations looking to leverage the power of vector search in their applications.

