Jul 26, 2024

Don’t Buy the Hype: The GenAI Power You Already Have

Your database is more powerful than you think. Learn how built-in vector capabilities can power your GenAI applications and save you from the hassle of adopting a new database.

John Kennedy
|RSS Feed
Head of Databases, Product at Aiven

The heart of Generative AI (GenAI) workloads rely on the ability of computers to categorize and understand the world's data (images, sounds, text) as numerical representations called vectors. This is achieved through a process called "embedding," where a model translates the data into vectors.

Once the vectors representing the data are created, they need to be stored and searched efficiently to form the core of a GenAI engine. The ability to search for similarity or dissimilarity between these vectors allows a model to determine, for example, whether an image represents a muffin or a Chihuahua.

Vector databases are new, dedicated databases designed to efficiently store and search high-dimensional vector data. Due to their critical role in powering various GenAI applications, they have rapidly gained awareness and adoption in the tech ecosystem.

Enterprises and businesses looking to leverage GenAI for productivity gains or product enhancement face a key decision: should they adopt a dedicated vector database or utilize emerging vector capabilities within their existing data technologies?

While dedicated vector databases offer specialized functionality for high-dimensional data, existing data technologies are also incorporating vector capabilities. The best choice depends on specific needs, infrastructure, expertise, and the criticality of production environments.

There are two primary reasons why adopting a dedicated vector database isn't necessary for most users, and why existing vector capabilities may be a better fit.

First, ease of use.

The ROI of GenAI has yet to be proven in the vast majority of cases. Few implementations have reached production, and even fewer are expected to generate revenue within the next two years. Managing the risk of acquiring knowledge and capabilities without overspending or overextending resources is crucial.

Many popular databases are developing vector capabilities. Let’s focus here on PostgreSQL. PostgreSQL, a widely adopted enterprise database, is enhancing its capabilities with the pgvector extension to enable efficient vector operations. This extension caters to 80% of common GenAI use cases, offering a streamlined path for organizations to explore and implement AI-driven solutions. By leveraging pgvector within PostgreSQL, organizations can quickly explore high-impact GenAI applications without a steep learning curve and identify the most impactful use-cases. Additional benefits include PostgreSQL's ACID properties and existing security measures, ensuring accurate and up-to-date data for AI workloads.

PostgreSQL's extensibility has led vendors to develop and open-source additional extensions alongside pgvector, which claim to outperform some dedicated vector databases.

Secondly, database development is complex and time-consuming.

Databases have been around a long time, and they have grown complex and rich with features most companies now expect as the standard. This evolution and hardening over time is just that, a function of high levels of adoption and time.

New databases fill small niches well, but due to the shorter period of time they have been in use, they lack much of the capabilities we have come to expect from an enterprise-grade database. One or two of the current crop will survive long enough to harden, but we expect most to fade away in the coming years as their adoption drops.

Niche capabilities are, relative to the hardening of a new database and enterprise-grade feature parity, easier for more established databases to develop. This is why mature database technologies will gain enterprise-ready vector capabilities before dedicated vector databases harden to meet enterprise needs.

Don't Overcomplicate GenAI: Start Simple.

Now that we've explored the future potential of dedicated vector databases and GenAI, I believe it's best to start simple. Look to the databases you already know and use, and investigate their increasingly mature vector capabilities before looking to adopt new, dedicated vector databases. While I’ve focused on PostgreSQL here for simplicity, OpenSearch and ClickHouse are also extremely capable in handling vector search.

Having spent time talking to customers, this path has shortened time to adoption for their businesses. Even organizations that don’t get to production increase their level of knowledge around GenAI, which will be critical in the coming years.

So, what's the next stage of evolution for GenAI capabilities in current database technologies?

In-Database Embedding: GenAI models will be hosted within dedicated nodes of the database cluster, allowing for embedding directly where the data resides.

Seamless AI Integration: Near-native integration of SQL commands to external AI services and third-party tools as needed by the workload in development.

As the future of vector capabilities continues to rapidly evolve, consider simplifying your GenAI implementation journey with proven technology. Stick with the database technologies you know and their embedded vector capabilities, rather than pivoting to a new, dedicated VectorDB. At least for now.

Table of contents