7 minutes to read - Mar 30, 2023

Qdrant Tutorial: Text Similarity Search

VISIT
Qdrant Tutorial: Text Similarity Search
Qdrant is a high-performance search engine and database built in Rust, designed for vector similarity. It provides fast and reliable performance, even under high load, making it an ideal choice for applications that require speed and scalability. Qdrant can turn your embeddings or neural network encoders into powerful applications for various use cases, such as matching, searching, recommending, or performing other complex operations on large datasets. With its extended filtering support, it is well-suited for faceted search and semantic-based matching. The user-friendly API simplifies the process of working with Qdrant. Qdrant Cloud offers a managed solution that requires minimal setup and maintenance, making it easy to deploy and manage applications.
Table of Contents
1Prerequisites
2Data
3Environment variables
4Importing libraries
5Load Environment variables
6How to index data and search throught it later?
7Let's use our code!
8RESULTS!
9I got it, and…what’s next?

In this tutorial we will utilize Qdrant Vector Database to store Embeddings from Cohere's model and search using cosine similarity. We will use Cohere SDK to access the model. So, without any further ado, let’s jump in!

Prerequisites

I will use Qdrant Cloud to host my Database. And good to know, that Qdrant provides 1 GB of free forever memory. So go and use Qdrant Cloud. You can find out how to do it here.

Now let's create a new virtual environment inside project directory and install the required packages:

Please create a project .py file.

Data

We will store our data in JSON format. Feel free to copy it:

[

  { "key": "Lion", "desc": "Majestic big cat with golden fur and a loud roar." },

  { "key": "Penguin", "desc": "Flightless bird with a tuxedo-like black and white coat." },

  { "key": "Gorilla", "desc": "Intelligent primate with muscular build and gentle nature." },

  { "key": "Elephant", "desc": "Large mammal with a long trunk and gray skin." },

  { "key": "Koala", "desc": "Cute and cuddly marsupial with fluffy ears and a big nose." },

  { "key": "Dolphin", "desc": "Playful marine mammal known for its intelligence and acrobatics." },

  {

    "key": "Orangutan",

    "desc": "Shaggy-haired great ape found in the rainforests of Borneo and Sumatra."

  },

  { "key": "Giraffe", "desc": "Tallest land animal with a long neck and spots on its fur." },

  {

    "key": "Hippopotamus",

    "desc": "Large, semi-aquatic mammal with a wide mouth and stubby legs."

  },

  { "key": "Kangaroo", "desc": "Marsupial with powerful hind legs and a long tail for balance." },

  { "key": "Crocodile", "desc": "Large reptile with sharp teeth and a tough, scaly hide." },

  {

    "key": "Chimpanzee",

    "desc": "Closest relative to humans, known for its intelligence and tool use."

  },

  { "key": "Tiger", "desc": "Striped big cat with incredible speed and agility." },

  { "key": "Zebra", "desc": "Striped mammal with a distinctive mane and tail." },

  { "key": "Ostrich", "desc": "Flightless bird with long legs and a big, fluffy tail." },

  { "key": "Rhino", "desc": "Large, thick-skinned mammal with a horn on its nose." },

  { "key": "Cheetah", "desc": "Fastest land animal with a spotted coat and sleek build." },

  {

    "key": "Polar Bear",

    "desc": "Arctic bear with a thick white coat and webbed paws for swimming."

  },

  { "key": "Peacock", "desc": "Colorful bird with a vibrant tail of feathers." },

  { "key": "Kangaroo", "desc": "Marsupial with powerful hind legs and a long tail for balance." },

  {

    "key": "Octopus",

    "desc": "Intelligent sea creature with eight tentacles and the ability to change color."

  },

  { "key": "Whale", "desc": "Enormous marine mammal with a blowhole on top of its head." },

  { "key": "Sloth", "desc": "Slow-moving mammal found in the rainforests of South America." },

  { "key": "Flamingo", "desc": "Tall, pink bird with long legs and a curved beak." }

]

Environment variables

Create .env file and store your Cohere API key, Qdrant API key and Qdrant host there:

Importing libraries

Load Environment variables

How to index data and search throught it later?

I will implement the SearchClient class, which will be able to index and access our data. Class will contain all necessary functionalities, such as indexing and searching, but also data conversion to necessary formats.

class SearchClient:

    def __init__(

        self,

        qdrabt_api_key: str = QDRANT_API_KEY,

        qdrant_host: str = QDRANT_HOST,

        cohere_api_key: str = COHERE_API_KEY,

        collection_name: str = "animal",

    ):

        self.qdrant_client = QdrantClient(host=qdrant_host, api_key=qdrabt_api_key)

        self.collection_name = collection_name

        self.qdrant_client.recreate_collection(

            collection_name=self.collection_name,

            vectors_config=models.VectorParams(

                size=COHERE_SIZE_VECTOR, distance=models.Distance.COSINE

            ),

        )

        self.co_client = cohere.Client(api_key=cohere_api_key)

    # Qdrant requires data in float format

    def _float_vector(self, vector: List[float]):

        return list(map(float, vector))

    # Embedding using Cohere Embed model

    def _embed(self, text: str):

        return self.co_client.embed(texts=[text]).embeddings[0]

    # Prepare Qdrant Points

    def _qdrant_format(self, data: List[Dict[str, str]]):

        points = [

            models.PointStruct(

                id=uuid.uuid4().hex,

                payload={"key": point["key"], "desc": point["desc"]},

                vector=self._float_vector(self._embed(point["desc"])),

            )

            for point in data

        ]

        return points

    # Index data

    def index(self, data: List[Dict[str, str]]):

        """

        data: list of dict with keys: "key" and "desc"

        """

        points = self._qdrant_format(data)

        result = self.qdrant_client.upsert(

            collection_name=self.collection_name, points=points

        )

        return result

    # Search using text query

    def search(self, query_text: str, limit: int = 3):

        query_vector = self._embed(query_text)

        return self.qdrant_client.search(

            collection_name=self.collection_name,

            query_vector=self._float_vector(query_vector),

            limit=limit,

        )

Let's use our code!

Let's try to read data from the data.json file, process and index it. Then we can try to search and get top 3 results from our Database!

RESULTS!

As you can see in 1st row: index operation went well. We got 3 results, as we defined. The 1st one is (as expected) a Giraffe. We also got Hippopotamus and Rhino. They are also huge, but I think Giraffe is the tallest 😆.

I got it, and…what’s next?

To practice your Qdrant skills, I recommend building an API that will allow your application to index data, add new records and search. I think you can utilize FastAPI for that!

And if you want to go wild with the new skills, I would recommend you to use them to build an AI based application during the Cohere x Qdrant AI Hackathon this weekend!

Article source
loading...