Many organizations use PostgreSQL® to store their data and to perform advanced analytics, taking advantage of the consistency guarantees, the familiar SQL syntax and the plethora of analytical functions available. However, when solving problems that require Machine Learning capabilities, like similarity search, they were forced to move entire datasets to other platforms, facing the dual challenges of stale data and inconsistent results.
Today, we're thrilled to announce the availability of pgvector as an extension to Aiven for PostgreSQL®, empowering organizations to store vectorized representations of their data alongside other data columns, and to perform similarity searches with SQL. pgvector allows you to bridge the gap between Machine Learning and transactional data.
Bringing vectorized search to Aiven for PostgreSQL® means unlocking new capabilities on top of customer data. For example, with pgvector you can:
Create a recommendation system which only suggests products in stock with PostgreSQL's ACID features by joining product information and stock levels, then performing a vector search.
Build an anomaly detection system to understand patterns, by combining the power of vector search and trend analytics, defining good behaviors and generating alerts for non-compliant scenarios.
Build Natural Language Processing pipelines to deliver tailored insights based on prompts, by taking advantage of the combination of Large Language Models (LLMs) and pgvector. This would allow enhanced search of a company’s existing documentation via natural language interaction.
Perform activities like object detection and facial recognition, by leveraging image detection machine learning models and pgvector. This could include filtering the catalog of object patterns with additional inputs derivied from other data columns in PostgreSQL
Whether you are in e-commerce, media, healthcare, or any other industry, pgvector empowers you to add new capabilities, enhance recommendation systems, and make more informed decisions based on vectorized data. Examples range from product recommendations, frequently used in e-commerce scenarios, to image recognition, which is very common in media and healthcare use cases.
The basic functionality of pgvector is the storage and comparison of vector information in PostgreSQL. A vector representation is a way of representing data as a set of numerical values, where each value represents a feature or attribute of the data. In this representation, the data is encapsulated as a vector, which is a mathematical object that has both magnitude and direction.
Vector representations are widely used in machine learning and natural language processing, where they are called embeddings and used to represent words, phrases, sentences, images and documents. In these applications, each dimension of the vector corresponds to a specific feature of the text, such as the frequency of a particular word or the context in which it appears.
Vector representations are often used for tasks such as classification, clustering, and similarity matching. By representing data as vectors, it becomes possible to apply mathematical operations to the data, such as measuring the distance between two vectors or calculating the dot product of two vectors. These operations can be used to compare and analyze the data, and to extract meaningful insights from it.
pgvecor makes PostgreSQL able to store, retrieve and compare vectors allowing you to perform similarity search to identify items or data points that share intrinsic characteristics or similarities with a given query item. By comparing the query item to a dataset using a defined distance metric, you can identify the most similar items efficiently. pgvector also allows you to build full or partial indexes to speed up similarity search and to trade query performance with search accuracy by defining the amount of rows to scan.
By using pgvector, you are adding vector search to the PostgreSQL capabilities, removing the need to build the dedicated functionality elsewhere. This extension provides the following capabilities:
Accelerate machine learning adoption: vector search is now only a SQL query away, therefore all tools able to connect with PostgreSQL can now benefit from similarity search queries out of the box.
No data movement: by storing embeddings alongside the existing relational data, the queries can run directly on top of PostgreSQL, without the need to export, transform and reconcile the data.
Empower existing architectures: PostgreSQL is already widely used in organizations, with pgvector coming as a plug-and-play extension allowing for any dataset to be addressed with vector search functionality without having additional databases.
Mix vector search with transactional data: vector search alone can be quite limited. The power of pgvector is that it provides the vector similarity functions alongside the relational data, allowing users to enrich vector queries with additional attributes and filters based on other columns and tables stored in the same database.
ACID + similarity search: since it's based on PostgreSQL, you can benefit its ACID (Atomicity, Consistency, Isolation, and Durability) properties and mix them with vector similarity queries.
pgvector is supported in all new Aiven for PostgreSQL instances with version 13 or later. If you have an existing Aiven for PostgreSQL on version 13 or newer and you wish to enable pgvector, check for maintenance updates. After applying the one mentioning pgvector, you'll be able to use the extension.
Once the PostgreSQL instance is equipped with pgvector, you can enable the extension by connecting with a SQL client (e.g. psql) and executing the following command
CREATE EXTENSION VECTOR
With the extension enabled you can now define columns of type
VECTOR specifying the dimensionality. The following example creates a table with a vector column having 3 dimensions.
CREATE TABLE COLORS(COLOR TEXT PRIMARY KEY, RGB VECTOR(3));
You can now insert data in the
COLORS table with:
INSERT INTO COLORS VALUES ('RED','[255,0,0]'); INSERT INTO COLORS VALUES ('GREEN','[0,255,0]'); INSERT INTO COLORS VALUES ('BLUE','[0,0,255]'); INSERT INTO COLORS VALUES ('AQUA','[0,255,255]'); INSERT INTO COLORS VALUES ('BLACK','[0,0,0]'); INSERT INTO COLORS VALUES ('WHITE','[255,255,255]');
And find the euclidean distance between the existing colors in the
COLORS table and the pink color
select *, rgb <-> '[255,153,204]' distance from colors order by 3;
The result shows that
WHITE is the closest color to the
color | rgb | distance --------+-----------------+-------------------- WHITE | [255,255,255] | 114.03946685248927 RED | [255,0,0] | 255 AQUA | [0,255,255] | 279.3385043276347 BLUE | [0,0,255] | 301.72006893808043 GREEN | [0,255,0] | 342.11840055746785 BLACK | [0,0,0] | 360.62445840513925 (6 rows)
Aiven for PostgreSQL is available on all major cloud service providers - AWS, Google Cloud and Microsoft Azure - in over 150 regions globally.
The pgvector extension is available for all customers who are using Aiven for PostgreSQL with PostgreSQL version 13 and above.
You can find all extensions available for Aiven for PostgreSQL in this document.
You can start Aiven for PostgreSQL for free or choose startup, business and premium plans for more storage and high availability. The startup plan starts at $0.151/hour.
pgvector enables vector search on top of any PostgreSQL database, providing a wide range of similarity functions accessible with the same SQL interface we use for day to day queries.
If you want to know more:
Review the pgvector documentation to understand the step-by-step instructions on enabling and using the pgvector extension for your Aiven for PostgreSQL service.
Check the pgvector GitHub repository to understand the set of operators and functions available.
Check out a practical demonstration of pgvector: Image recognition with Python, OpenCV, OpenAI CLIP and pgvector
Jun 9, 2022
Join us for a browse about the ins and outs of PostgreSQL® on this page of information and resources.
Nov 8, 2022
Parsing semi-structured dataset in a relational database seems scary. Read on for how PostgreSQL® JSON functions allow your SQL queries to work with json and jsonb data.
Aug 3, 2022
Aiven now supports the newly released Apache Kafka version 3.2, which comes with a number of enhancements. Read on to find out what it means for you.