Welcome to our workshop!
Thank you so much for taking part in our hands-on workshop: Build a movie recommendation app with Tensorflow and pgvector
We hope you will join us LIVE at the appointed time, but what follows are written instructions for those keeners who would like to jump ahead, or for those newer folks who might hit bumps along the way during the workshop to help get themselves caught up.
If you want to have a look beforehand, here is the GitHub repo.
A tutorial following similar steps is also available in Aiven’s Developer Center at TensorFlow, PostgreSQL®, PGVector & Next.js: building a movie recommender.
Pre-requisites
To get the most out of your workshop experience, we recommend doing the following ahead of time:
- Get signed in to Aiven Console
- Set up a GitHub Codespace
Get signed in to Aiven Console
(If you prefer a video walkthrough, see Set up PG service with Aiven from the repo’s README.md file)
- Head to https://go.aiven.io/signup-movie-workshop
- If you’ve used Aiven before, go ahead and log in. You’re done with this step!
- If not, sign up through whichever method you’d like!
- If you chose email, you’ll need to click the link sent to you to validate your email address
- Once logged in, you’ll be asked to enter some additional information. Choose either Personal or Business, whichever applies to you, and specify the Name of your first project on Aiven.
- Once at your project’s Services screen, you can close your browser; we’ll do this part during the workshop.
Spoiler alert: we’re going to create a PostgreSQL service here.
Set up a GitHub Codespace
GitHub Codespaces offer an entire development environment running in the cloud, accessible from your browser, including Visual Studio Code and a Terminal. We’ll use this tool to eliminate issues with individual machines during the workshop, and ensure we’re all using the same versions of all the things so commands work correctly.
As mentioned before, we have all this nifty code ready to go for you in our workshop GitHub repo.
- From the repo’s README.md file, click the “Open in GitHub Codespaces” button to begin.
- Leave the settings at their default options and click Create codespace.
(Note: You shouldn’t need to worry about getting charged as a result of this workshop, as the number of core hours + storage required should fit well under their monthly included storage) - Once completed, you should see an interface that looks something like the following:
- If you ever lose this window and need to get back here again, you can view your full list of GitHub Codespaces from https://github.com/codespaces
NOTE: By default, these will end up with doofy auto-generated names like “probable space orbit” … feel free to click the “3-dots” menu > Rename and give it something a bit more intelligible.
Code overview
Once the workshop repo is open in GitHub Codespaces, you can use the file browser under “Explorer” on the left to inspect the code.
If you drill into the part1-core folder, some notable files in here are:
- package.json: This contains all of our dependencies along with which versions we’re downloading, including PostgreSQL, TensorFlow (including its Universal Sentence Encoder model), and some helper utilities.
- movie-plots.json A copy of the Wikipedia Movie Plots dataset from Kaggle. (Note this file is BIG (~80 MB))
- encode-single-movie-plot.js: Encodes a single movie plot using the TensorFlow library and its sentence encoder model to act as a test run before you run process-all-movies.js to import everything.
- pg-commands.js: A set of helper commands for connecting to PostgreSQL from JavaScript.
- .env-example: A secure way to store your PostgreSQL connection credentials from Aiven. (Needs to be renamed to just .env and your specific settings added, see instructions below)
- search-for-nearest-vectors.js: Performs a similarity search for a given test phrase (which defaults to “a lot of cute puppies” ) to find the best matches in the dataset.
Following along with the workshop
Once your pre-requisites are all set, you can now follow the instructions under Part 2: Core Functionality in the README. Copy/paste the command line prompts from there into the Terminal of GitHub Codespaces.
Note: You’ll need to click “Allow” on a pop-up the first time to you paste to allow it to do this.
FAQs / Troubleshooting
When I run a command, I get back just []
… what gives?
That just means the command didn’t return any output, which will happen for a few of of them (e.g. node pg-commands.js enablePGVector
just runs the query CREATE EXTENSION vector;
so there’s no return value)
I get errors when trying to connect to PostgreSQL, what gives?
Connecting securely to PostgreSQL requires a couple of different steps (this is covered in the workshop as well, but restating for completeness):
- From GitHub Codespaces, in your
part1-core
folder, rename the.env-example
file to.env
- From your PostgreSQL Service Overview page in Aiven Console, copy and paste the following values into the file:
- User =
PG_NAME
- Password =
PG_PASSWORD
- Host =
PG_HOST
- Port =
PG_PORT
- On the same page, click the Download icon next to CA Certificate and save your ca.pem file to, for example, your desktop.
- Click and drag the file into your GitHub Codespaces file explorer (also in the
part1-core
folder)
I got some errors about memory usage during the movie import. HALP!
If you see errors like the following during the movie import:
2024-06-11 20:22:53.626423: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 16477184 exceeds 10% of free system memory.
… don’t stress! “W” just means “Warning” — as long as you also see a bunch of lines like the following, you’re good to go.
...
Processing starting from 600 with the step 150 of total amount 34886.
Processing starting from 750 with the step 150 of total amount 34886.
Processing starting from 900 with the step 150 of total amount 34886.
...
Oopsie doodle… I forgot to say “Yes” or “No” correctly to one of the Next.js app questions… what now?
Under Next.js project setup there are some specific directions on what to answer during a series of questions as npx create-next-app@latest
is running. If you accidentally mess one of them up, simply delete and re-create the part2-fullstack
folder and run the command again.
After doing the TypeScript part, my IDE has a bunch of angry red underlines under some things, why might that be?
Things might be okay, but do double check that all various files you’re creating/copying have the correct placement, which would be:
/part2-fullstack/movie-recommender:
- .env
- ca.pem
- movie.d.ts
/part2-fullstack/movie-recommender/pages/api:
- recommendations.ts
Hit [your-codespaces-url]/api/recommendations and if you get a bunch of JSON barf, you’re golden!
Ok, wiseguy, what if when I go to that URL I have an error?
Such as, for example, this:
This can happen if you stepped away for awhile and came back and your Codespace had to be restarted. To fix, run the following from your part2-fullstack/movie-recommender
directory:
npm run dev
This restarts the application and you can re-open it in Codespace’s browser to continue where you left off.