TensorFlow, PostgreSQL®, PGVector & Next.js: building a movie recommender
Leveraging TensorFlow, PostgreSQL®, PGVector, and Next.js for vector search with this step-by-step video guide.
Leveraging TensorFlow, PostgreSQL®, PGVector, and Next.js for vector search with this step-by-step video guide.
Here you'll find the instructions to build a movie recommendation system. Each step has a corresponding video that shows in detail what needs to be done. The complete working project can be found in the GitHub repository.
You'll find the original dataset in Kaggle. The dataset contains metadata about a movie (title, release year, etc) as well as descriptions of the movies from Wikipedia. It is in CSV format, however, we'll be working with JSON. You can download the dataset in JSON format dataset here.
Download and install Node.js here.
Install dependencies for TensorFlow. Make sure that the path does not include spaces or special characters (tfjs-node is very picky):
Loading code...
Loading code...
Installing these in order is important, otherwise you might have peer-dependency issues.
In the root project directory, create the encoder.js file.
Include these dependencies:
Loading code...
Add code to get embeddings for a single movie:
Loading code...
Run:
Loading code...
Note: though we don't use the output from require('@tensorflow/tfjs-node'); directly, do not remove this line, as TensorFlow needs it to work correctly.
To host your PostgreSQL service for free in the cloud, use Aiven for PostgreSQL®. To get an extra 100$ credits when signing up with Aiven, use this link.
To use Aiven for PostgreSQL with pgAdmin, click on Quick Connect and choose Connect with pgAdmin. You'll see the steps that you need to perform and a link to download the pgConnect.json file. Open pgAdmin, import a new server and select downloaded pgConnect.json.
Enable PGVector:
Loading code...
Create a table:
Loading code...
Install node-postgres:
Loading code...
Install dotenv to store credentials:
Loading code...
Create an .env file and add the following connection information:
Loading code...
Download the ca.pem certificate from the Aiven console.
Add both .env and ca.pem to .gitignore.
In encoder.js include:
Loading code...
and
Loading code...
Add the PostgreSQL connection configuration as well:
Loading code...
Create the client, connect it to PostgreSQL and send a test SQL request:
Loading code...
To generate and send a multi-row insert query, we'll use pg-promise. Install it with:
Loading code...
Include pg-promise in encoder.js:
Loading code...
Add the following code to send a multi-row insert query to PostgreSQL:
Loading code...
db.none executes a query that expects no data to be returned.
Next load the model and iterate over all movies to get encodings with TensorFlow.
We'll divide data into batches for faster processing:
Loading code...
To execute the code that we wrote and send data to PostgreSQL, run:
Loading code...
Create the recommender.js file and include dependencies:
Loading code...
Connect to PostgreSQL:
Loading code...
We'll be looking for "a lot of cute puppies". Generate an embedding for the test string and use PGVector to find the closest suggestions among the movies we have in the database:
Loading code...
Run to get the results:
Loading code...
Find more about Next.js at https://nextjs.org/. Create a project with:
Loading code...
We'll be using following settings:
Loading code...
Once the project is installed navigate to the folder where it is located, or open it in your preferred IDE.
Before we can use TensorFlow and PostgreSQL, we need to install them:
Loading code...
Loading code...
Loading code...
Additionally, add a dependency for dotenv, to simplify the work with credentials:
Loading code...
Create a .env file and add following placeholders for the properties that we need to define:
Loading code...
Go to the service page of your Aiven for PostgreSQL and copy User, Password, Host and Port from the tab with the connection information and add them to the appropriate fields above.
Download ca.pem and add it to a folder /certificates
Add both .env and /certificates to .gitignore.
Loading code...
Start the server with:
Loading code...
Open localhost:3000 to see the landing page. Open localhost:3000/api/hello
to see a test backend api call.
Declare a movie type by creating movie.d.ts and adding the following:
Loading code...
Rename existing the pages/api/hello.ts API Route to pages/api/recommendations.ts.
Add dependencies to pages/api/recommendations.ts:
Loading code...
Create the connection configuration for PostgreSQL:
Loading code...
Add a handler to process the requests:
Loading code...
Open pages/index.tsx and delete the existing layout and dependencies - we won't need them. Instead, add this code to connect to the API Route /api/recommendations :
Loading code...
Add a simple layout to input a search phrase and see the results:
Loading code...
We'll add some styling with Tailwind CSS.
Find tailwind.config.ts in your Next.js project and update it with:
Loading code...
In index.tsx Replace the form element with the section:
Loading code...
To style the list of the movies and add a loading indicator, replace the existing movie list with:
Loading code...
You can find the complete index.js in the github repository.
npm install @tensorflow-models/universal-sentence-encoder --savenpm install @tensorflow/tfjs-node --saveconst fs = require("fs");
require('@tensorflow/tfjs-node');
const use = require('@tensorflow-models/universal-sentence-encoder');
const moviePlots = require("./movie-plots.json");use.load().then(async model => {
const sampleMoviePlot = moviePlots[0];
const embeddings = await model.embed(sampleMoviePlot['Plot']);
console.log(embeddings.arraySync());
});node encoder.jsCREATE EXTENSION vector;CREATE TABLE movie_plots (
title VARCHAR,
director VARCHAR,
"cast" VARCHAR,
genre VARCHAR,
plot TEXT,
"year" SMALLINT,
wiki VARCHAR,
embedding vector(512)
);npm install pg --savenpm install dotenv --savePG_NAME=
PG_PASSWORD=
PG_HOST=
PG_PORT=require('dotenv').config();const pg = require('pg');const config = {
user: process.env.PG_NAME,
password: process.env.PG_PASSWORD,
host: process.env.PG_HOST,
port: process.env.PG_PORT,
database: "defaultdb",
ssl: {
rejectUnauthorized: true,
ca: fs.readFileSync('./ca.pem').toString(),
},
};const client = new pg.Client(config);
await client.connect();
try {
const pgResponse = await client.query(`SELECT count(*) FROM movie_plots`);
console.log(pgResponse.rows);
} catch (err) {
console.error(err);
} finally {
await client.end();
}npm install pg-promise --saveconst pgp = require('pg-promise')({
capSQL: true // capitalize all generated SQL
});
const db = pgp(config);const storeInPG = async (moviePlots) => {
const columns = new pgp.helpers.ColumnSet(['title', 'director', 'plot', 'year', 'wiki', 'cast', 'genre', 'embedding'], {table: 'movie_plots'});
const values = [];
for(let i = 0; i < moviePlots.length; i++) {
values.push({
title: moviePlots[i]['Title'],
director: moviePlots[i]['Director'],
plot: moviePlots[i]['Plot'],
year: moviePlots[i]['Release Year'],
cast: moviePlots[i]['Cast'],
genre: moviePlots[i]['Genre'],
wiki: moviePlots[i]['Wiki Page'],
embedding: `[${moviePlots[i]['embedding']}]`
})
}
const query = pgp.helpers.insert(values, columns);
await db.none(query);
}use.load().then(async model => {
const batchSize = 1000;
for (let start = 0; start < moviePlots.length; start += batchSize) {
const end = Math.min(start + batchSize, moviePlots.length);
console.log(`Processing items from ${start} till ${end}.`);
const movieBatch = moviePlots.slice(start, end);
const plotDescriptions = movieBatch.map(plot => plot['Plot']);
const embeddingsRequest = await model.embed(plotDescriptions);
const embeddings = embeddingsRequest.arraySync();
for (let i = 0; i < movieBatch.length; i++) {
movieBatch[i]['embedding'] = embeddings[i];
}
await storeInPG(movieBatch);
}
});node encoder.jsrequire('dotenv').config();
const fs = require('fs');
const pg = require('pg');
require('@tensorflow/tfjs-node');
const use = require('@tensorflow-models/universal-sentence-encoder');const config = {
user: process.env.PG_NAME,
password: process.env.PG_PASSWORD,
host: process.env.PG_HOST,
port: process.env.PG_PORT,
database: "defaultdb",
ssl: {
rejectUnauthorized: true,
ca: fs.readFileSync('./ca.pem').toString(),
},
};use.load().then(async model => {
const embeddings = await model.embed("a lot of cute puppies");
const embeddingArray = embeddings.arraySync()[0];
const client = new pg.Client(config);
await client.connect();
try {
const pgResponse = await client.query(`SELECT * FROM movie_plots ORDER BY embedding <-> '${JSON.stringify(embeddingArray)}' LIMIT 5;`);
console.log(pgResponse.rows);
} catch (err) {
console.error(err);
} finally {
await client.end()
}
});node recommender.jsnpx create-next-app@latestWhat is your project named? what-to-watch
Would you like to use TypeScript? No / *Yes*
Would you like to use ESLint? *No* / Yes
Would you like to use Tailwind CSS? No / *Yes*
Would you like to use `src/` directory? *No* / Yes
Would you like to use App Router? (recommended) *No* / Yes
Would you like to customize the default import alias? *No* / Yesnpm install @tensorflow-models/universal-sentence-encoder --savenpm install @tensorflow/tfjs-node --savenpm install pg --savenpm install dotenv --savePG_NAME=
PG_PASSWORD=
PG_HOST=
PG_PORT=.env
/certificatesnpm dev rundeclare type Movie = {
title: string,
director: string,
cast: string,
genre: string,
plot: string,
year: number,
wiki: string,
embedding: number[]
}
export default Movie;const {readFileSync} = require('fs');
const pg = require('pg');
const tf = require('@tensorflow/tfjs-node');
const use = require('@tensorflow-models/universal-sentence-encoder');const config = {
user: process.env.PG_NAME,
password: process.env.PG_PASSWORD,
host: process.env.PG_HOST,
port: process.env.PG_PORT,
database: "defaultdb",
ssl: {
rejectUnauthorized: true,
ca: readFileSync('./certificates/ca.pem').toString(),
},
};export default async function handler(
req: NextApiRequest,
res: NextApiResponse<Movie[]>
) {
const model = await use.load();
const embeddings = await model.embed(req.body.search);
const embeddingArray = embeddings.arraySync()[0];
const client = new pg.Client(config);
await client.connect();
try {
const pgResponse = await client.query(`SELECT * FROM movie_plots ORDER BY embedding <-> '${JSON.stringify(embeddingArray)}' LIMIT 5;`);
res.status(200).json(pgResponse.rows)
} catch (err) {
console.error(err);
} finally {
await client.end()
}
}
const [moviePlots, setMoviePlots] = useState < Movie[] > ([])
const searchInput = useRef();
function search(event) {
event.preventDefault();
const enteredSearch = searchInput.current.value;
fetch('/api/recommendations', {
method: 'POST',
body: JSON.stringify({
search: enteredSearch
}),
headers: {
'Content-Type': 'application/json'
}
}).then(response => response.json()).then(data => {
setMoviePlots(data);
});
}return (
<>
<form onSubmit={search}>
<input type="search" id="default-search" ref={searchInput} autoComplete="off"
placeholder="Type what do you want to watch about" required/>
<button type="submit">
Search
</button>
</form>
<div>
{ moviePlots.map(item =>
<div key={item.title}>
{item.director}
{item.year}
item.title}
{item.wiki}
</div>)}
</div>
</>
)module.exports = {
content: [
'./pages/**/*.{js,ts,jsx,tsx,mdx}',
'./components/**/*.{js,ts,jsx,tsx,mdx}',
'./app/**/*.{js,ts,jsx,tsx,mdx}',
],
theme: {
extend: {
colors: {
veryDarkBlue: '#1B262C',
darkBlue: '#0F4C75',
lightBlue: '#3282B8',
veryLightBlue: '#BBE1FA',
},
fontFamily: {
sans: ['Poppins', 'sans-serif']
},
spacing: {
},
},
},
plugins: [],
}<section id="shorten">
<div className="max-w-4xl mx-auto p-6 space-y-6">
<form onSubmit={search}>
<label htmlFor="default-search"
className="mb-2 text-sm font-medium sr-only text-white">Search</label>
<div className="relative">
<div className="absolute inset-y-0 left-0 flex items-center pl-3 pointer-events-none">
<svg className="w-4 h-4 text-gray-400" aria-hidden="true"
xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 20 20">
<path stroke="currentColor" strokeLinecap="round" strokeLinejoin="round"
strokeWidth="2" d="m19 19-4-4m0-7A7 7 0 1 1 1 8a7 7 0 0 1 14 0Z"/>
</svg>
</div>
<input type="search" id="default-search" ref={searchInput} autoComplete="off"
className="block w-full p-4 pl-10 text-sm border rounded-lg bg-gray-700 border-gray-600 placeholder-gray-400 text-white focus:ring-blue-500 focus:border-blue-500"
placeholder="Type what do you want to watch about" required/>
<button type="submit"
className="text-white absolute right-2.5 bottom-2.5 focus:ring-4 focus:outline-none font-medium rounded-lg text-sm px-4 py-2 bg-lightBlue hover:bg-darkBlue focus:ring-blue-800">Search
</button>
</div>
</form>
</div>
</section><div className="flex gap-8 flex-wrap flex-col grow shrink items-start mx-24">
{isLoading ? (<div className="flex justify-center items-center h-32 w-32 mx-auto">
{/* Embedding the SVG loading indicator */}
<svg
className="animate-spin h-6 w-6 text-white"
xmlns="http://www.w3.org/2000/svg"
fill="none"
viewBox="0 0 24 24"
>
<circle
className="opacity-25"
cx="12"
cy="12"
r="10"
stroke="currentColor"
strokeWidth="4"
></circle>
<path
className="opacity-75"
fill="currentColor"
d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"
></path>
</svg>
</div>) : moviePlots.map(item =>
<div key={item.title}
className="relative p-10 rounded-xl binline-block justify-start rounded-lg shadow-[0_2px_15px_-3px_rgba(0,0,0,0.07),0_10px_20px_-2px_rgba(0,0,0,0.04)] bg-darkBlue items-start">
<div className="text-6xl absolute top-4 right-4 opacity-80">🍿</div>
<div>
<h4 className="opacity-90 text-xl">From {item.director}</h4>
<p className="opacity-50 text-sm">Year {item.year}</p>
</div>
<h1 className="text-4xl mt-6">{item.title}</h1>
<p className="relative mt-6 text opacity-80 italic">
{item.plot}
</p>
<div>
<p className="opacity-50 text-sm mt-6">
<a
href={item.wiki}
className="underline decoration-transparent transition duration-300 ease-in-out hover:decoration-inherit"
>{item.wiki}</a
>
</p>
</div>
</div>)}
</div>