Tensorflow, Postgres, PGVector & Next.js: building a movie recommender

Leveraging Tensorflow, Postgres, PGVector, and Next.js for vector search with this step-by-step guide with videos.

Here you'll find the instructions to build a movie recommendation system. Each step has a corresponding video showing in
detail what needs to be done. The complete working project can be found in the github repository.

[Video introduction to this guide](TODO once videos are uploaded)

Step 1. Creating Vector Embeddings: Tensorflow universal-sentence-encoder and Node.js

[Video for Step 1](TODO once videos are uploaded)

Dataset

You'll find the original dataset with Wikipedia movie plots in Kaggle. It is in CSV format, however, we'll be working with JSON, you can download converted to JSON format dataset over here.

New to NodeJS?

Download and install from here https://nodejs.org/en/download.

Add dependencies

Install dependencies for Tensorflow. Make sure that you path does not include spaces or weird characters (tfjs-node is very picky):

npm install @tensorflow-models/universal-sentence-encoder --save
npm install @tensorflow/tfjs-node --save

Order is important, otherwise you might have to deal with peer-dependencies issue.

Add encoder

Create encoder.js file.

Include dependencies:

const fs = require("fs"); require('@tensorflow/tfjs-node'); const use = require('@tensorflow-models/universal-sentence-encoder'); const moviePlots = require("./movie-plots.json");

Add code to get embeddings for a single movie:

use.load().then(async model => { const sampleMoviePlot = moviePlots[0]; const embeddings = await model.embed(sampleMoviePlot['Plot']); console.log(embeddings.arraySync()); });

Run:

node encoder.js

Note, even though we don't use the output from require('@tensorflow/tfjs-node'); directly, do not remove this
line, it is needed for Tensorflow to work correctly.

Step 2. Cloud-Hosted free PostgreSQL setup: create Table, enable PGVector

Video for Step 2

Create service pg-movie-app

To host your postgres service for free in the cloud use Aiven for PostgreSQL. To get extra 100$ credits when signing up with Aiven use this link.

Test with pgAdmin

To use Aiven for Postgres with pgAdmin, click on Quick Connect and choose to connect with pgAdmin. You'll see the steps that you need to perform and a link to download pgConnect.json. Open pgAdmin, import a new server and select downloaded pgConnect.json.

Enable PGVector:

CREATE EXTENSION vector;

Create a table:

CREATE TABLE movie_plots ( title VARCHAR, director VARCHAR, "cast" VARCHAR, genre VARCHAR, plot TEXT, "year" SMALLINT, wiki VARCHAR, embedding vector(512) );
Connect with NodeJS

Install node-postgres:

npm install pg --save

Install dotenv to store credentials:

npm install dotenv --save

Create .env file and add the following connection information:

PG_NAME= PG_PASSWORD= PG_HOST= PG_PORT=

Download the ca.pem certificate.
Add both .env and ca.pem to .gitignore.

Send request to PG from NodeJS

In encoder.js include

require('dotenv').config();

and

const pg = require('pg');

Add Postgres connection configuration:

const config = { user: process.env.PG_NAME, password: process.env.PG_PASSWORD, host: process.env.PG_HOST, port: process.env.PG_PORT, database: "defaultdb", ssl: { rejectUnauthorized: true, ca: fs.readFileSync('./ca.pem').toString(), }, };

Create the client, connect to Postgres and send a test SQL request:

const client = new pg.Client(config); await client.connect(); try { const pgResponse = await client.query(`SELECT count(*) FROM movie_plots`); console.log(pgResponse.rows); } catch (err) { console.error(err); } finally { await client.end(); }

Step 3. Efficiency: Batch Tensorflow vector generation and data insertion with pg-promise multiple rows

Video for Step 3

Add pg-promise

To generate and send a multi-row insert query, we'll use pg-promise. Install it with:

npm install pg-promise --save

Include pg-promise to encoder.js:

const pgp = require('pg-promise')({ capSQL: true // capitalize all generated SQL }); const db = pgp(config);

Add the following code to send a multi-row insert query to Postgres:

const storeInPG = async (moviePlots) => { const columns = new pgp.helpers.ColumnSet(['title', 'director', 'plot', 'year', 'wiki', 'cast', 'genre', 'embedding'], {table: 'movie_plots'}); const values = []; for(let i = 0; i < moviePlots.length; i++) { values.push({ title: moviePlots[i]['Title'], director: moviePlots[i]['Director'], plot: moviePlots[i]['Plot'], year: moviePlots[i]['Release Year'], cast: moviePlots[i]['Cast'], genre: moviePlots[i]['Genre'], wiki: moviePlots[i]['Wiki Page'], embedding: `[${moviePlots[i]['embedding']}]` }) } const query = pgp.helpers.insert(values, columns); await db.none(query); }

db.none executes a query that expects no data to be returned.

Tensorflow and batch processing

Next load the model and iterate over all movies to get encodings with Tensorflow.
We'll divide data into batches for faster processing:

use.load().then(async model => { const batchSize = 1000; for (let start = 0; start < moviePlots.length; start += batchSize) { const end = Math.min(start + batchSize, moviePlots.length); console.log(`Processing items from ${start} till ${end}.`); const movieBatch = moviePlots.slice(start, end); const plotDescriptions = movieBatch.map(plot => plot['Plot']); const embeddingsRequest = await model.embed(plotDescriptions); const embeddings = embeddingsRequest.arraySync(); for (let i = 0; i < movieBatch.length; i++) { movieBatch[i]['embedding'] = embeddings[i]; } await storeInPG(movieBatch); } });
Send the complete dataset with embeddings to PostgreSQL

To execute the code that we wrote and send data to Postgres, run:

node encoder.js

Step 4. Contextual Search with PGVector: Node.js and Tensorflow Magic

Video for Step 4

Build recommendation logic

Create recommender.js and include dependencies:

require('dotenv').config(); const fs = require('fs'); const pg = require('pg'); require('@tensorflow/tfjs-node'); const use = require('@tensorflow-models/universal-sentence-encoder');

Connect to Postgres:

const config = { user: process.env.PG_NAME, password: process.env.PG_PASSWORD, host: process.env.PG_HOST, port: process.env.PG_PORT, database: "defaultdb", ssl: { rejectUnauthorized: true, ca: fs.readFileSync('./ca.pem').toString(), }, };

We'll be looking for "a lot of cute puppies". Generate embedding for the test string and use PGVector to find the closest suggestions among the movies we have in the database:

use.load().then(async model => { const embeddings = await model.embed("a lot of cute puppies"); const embeddingArray = embeddings.arraySync()[0]; const client = new pg.Client(config); await client.connect(); try { const pgResponse = await client.query(`SELECT * FROM movie_plots ORDER BY embedding <-> '${JSON.stringify(embeddingArray)}' LIMIT 5;`); console.log(pgResponse.rows); } catch (err) { console.error(err); } finally { await client.end() } });

Run to get the results:

node recommender.js

Step 5. Next.js Project Setup: Postgres and Tensorflow Dependencies, testing backend

Video for Step 5

Get started with Next.js project

Find more about Next.js at https://nextjs.org/. Create a project with

npx create-next-app@latest

We'll be using following settings:

What is your project named? what-to-watch Would you like to use TypeScript? No / *Yes* Would you like to use ESLint? *No* / Yes Would you like to use Tailwind CSS? No / *Yes* Would you like to use `src/` directory? *No* / Yes Would you like to use App Router? (recommended) *No* / Yes Would you like to customize the default import alias? *No* / Yes

Once the project is installed navigate to the folder where it is located, or open in the preferred IDE.

Add dependencies

Before we can use Tensorflow and PG, we need to install them:

npm install @tensorflow-models/universal-sentence-encoder --save
npm install @tensorflow/tfjs-node --save
npm install pg --save

Additionally, add dependency for dotenv, to simplify the work with credentials:

npm install dotenv --save
Add PG credentials

Create .env file and add following placeholder for the properties that we need to define:

PG_NAME= PG_PASSWORD= PG_HOST= PG_PORT=

Go to the service page of your Aiven for PostgreSQL and copy User, Password, Host and Port from the tab with the connection information.

Download ca.pem and add it to a folder /certificates

Add both .env and /certificates to .gitignore.

.env /certificates
Run

Start the server with:

npm dev run

Open localhost:3000 to see the landing page. Open localhost:3000/api/hello
to see a test backend api call.

Step 6. Nearest Vector Retrieval: Tensorflow universal-sentence-encoder and PGVector-Powered Queries in Next.js

Video for Step 6

Add an interface for a movie

Declare a movie type by creating a file movie.d.ts:

declare type Movie = { title: string, director: string, cast: string, genre: string, plot: string, year: number, wiki: string, embedding: number[] } export default Movie;
Add backend calls

Rename existing pages/api/hello.ts API Route into pages/api/recommendations.ts (or just create a new one).

Add dependencies to pages/api/recommendations.ts:

const {readFileSync} = require('fs'); const pg = require('pg'); const tf = require('@tensorflow/tfjs-node'); const use = require('@tensorflow-models/universal-sentence-encoder');

Create connect configuration for Postgres:

const config = { user: process.env.PG_NAME, password: process.env.PG_PASSWORD, host: process.env.PG_HOST, port: process.env.PG_PORT, database: "defaultdb", ssl: { rejectUnauthorized: true, ca: readFileSync('./certificates/ca.pem').toString(), }, };

Add handler to process the requests:

export default async function handler( req: NextApiRequest, res: NextApiResponse<Movie[]> ) { const model = await use.load(); const embeddings = await model.embed(req.body.search); const embeddingArray = embeddings.arraySync()[0]; const client = new pg.Client(config); await client.connect(); try { const pgResponse = await client.query(`SELECT * FROM movie_plots ORDER BY embedding <-> '${JSON.stringify(embeddingArray)}' LIMIT 5;`); res.status(200).json(pgResponse.rows) } catch (err) { console.error(err); } finally { await client.end() } }

Step 7. Frontend Integration: Next.js Movie Recommender UI and calls to Tensorflow and PG

Video for Step 7

Open pages/index.tsx and clean up existing layout and dependencies - we won't need them. Instead, add this code to connect to the API Route /api/recommendations :

const [moviePlots, setMoviePlots] = useState < Movie[] > ([]) const searchInput = useRef(); function search(event) { event.preventDefault(); const enteredSearch = searchInput.current.value; fetch('/api/recommendations', { method: 'POST', body: JSON.stringify({ search: enteredSearch }), headers: { 'Content-Type': 'application/json' } }).then(response => response.json()).then(data => { setMoviePlots(data); }); }

Add a simple layout to input a search phrase and see the results

return ( <> <form onSubmit={search}> <input type="search" id="default-search" ref={searchInput} autoComplete="off" placeholder="Type what do you want to watch about" required/> <button type="submit"> Search </button> </form> <div> { moviePlots.map(item => <div key={item.title}> {item.director} {item.year} item.title} {item.wiki} </div>)} </div> </> )

Step 8. Polishing and Testing: Styling Movie Recommender UI with Tailwind CSS

Video for Step 8

We'll add some styling with Tailwind CSS.

Find tailwind.config.ts in your Next.js project and update it with

module.exports = { content: [ './pages/**/*.{js,ts,jsx,tsx,mdx}', './components/**/*.{js,ts,jsx,tsx,mdx}', './app/**/*.{js,ts,jsx,tsx,mdx}', ], theme: { extend: { colors: { veryDarkBlue: '#1B262C', darkBlue: '#0F4C75', lightBlue: '#3282B8', veryLightBlue: '#BBE1FA', }, }, }, plugins: [], }

In index.tsx do the following changes. Replace the form element with the section:

<section> <div className="max-w-4xl mx-auto p-6 space-y-6"> <form onSubmit={search}> <label htmlFor="default-search" className="mb-2 text-sm font-medium sr-only text-white">Search</label> <div className="relative"> <div className="absolute inset-y-0 left-0 flex items-center pl-3 pointer-events-none"> <svg className="w-4 h-4 text-gray-400" aria-hidden="true" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 20 20"> <path stroke="currentColor" strokeLinecap="round" strokeLinejoin="round" strokeWidth="2" d="m19 19-4-4m0-7A7 7 0 1 1 1 8a7 7 0 0 1 14 0Z"/> </svg> </div> <input type="search" id="default-search" ref={searchInput} autoComplete="off" className="block w-full p-4 pl-10 text-sm border rounded-lg bg-gray-700 border-gray-600 placeholder-gray-400 text-white focus:ring-blue-500 focus:border-blue-500" placeholder="Type what do you want to watch about" required/> <button type="submit" className="text-white absolute right-2.5 bottom-2.5 focus:ring-4 focus:outline-none font-medium rounded-lg text-sm px-4 py-2 bg-lightBlue hover:bg-darkBlue focus:ring-blue-800">Search </button> </div> </form> </div> </section>

To style the list of the movies and add a loading indicator, replace the existing movie list with:

<div className="flex gap-8 flex-wrap flex-col grow shrink items-start mx-24"> {isLoading ? (<div className="flex justify-center items-center h-32 w-32 mx-auto"> {/* Embedding the SVG loading indicator */} <svg className="animate-spin h-6 w-6 text-white" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" > <circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" ></circle> <path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" ></path> </svg> </div>) : moviePlots.map(item => <div key={item.title} className="relative p-10 rounded-xl binline-block justify-start rounded-lg shadow-[0_2px_15px_-3px_rgba(0,0,0,0.07),0_10px_20px_-2px_rgba(0,0,0,0.04)] bg-darkBlue items-start"> <div className="text-6xl absolute top-4 right-4 opacity-80">­čŹ┐</div> <div> <h4 className="opacity-90 text-xl">From {item.director}</h4> <p className="opacity-50 text-sm">Year {item.year}</p> </div> <h1 className="text-4xl mt-6">{item.title}</h1> <p className="relative mt-6 text opacity-80 italic"> {item.plot} </p> <div> <p className="opacity-50 text-sm mt-6"> <a href={item.wiki} className="underline decoration-transparent transition duration-300 ease-in-out hover:decoration-inherit" >{item.wiki}</a > </p> </div> </div>)} </div>

You can find the complete index.js in the github repository

Final Verdict: PGVector, Tensorflow, Node.js, and Next.js - Success or Hiccup?

Video for Step 9