Aiven Blog

May 20, 2021

Analyzing Netflix shows with pgAdmin and PostgreSQL

Learn to use pgAdmin with PostgreSQL by... watching movies? Find out more!

Francesco Tisiot

|RSS Feed

Field CTO at Aiven

PostgreSQL is the perfect database for a huge variety of business cases, from IoT-type fast insertions to bulk analytics workloads. We even make it possible to explore exoplanet data!

Database interaction can be performed at various levels: command line tools are great for people glued to their keyboards... but some people prefer a slick point-and-click UI that shows all the information without having to write tons of commands. In this blog post, we'll look at how to use Aiven for PostgreSQL with pgAdmin, one of the most popular tools for administration and development.

Here's what this post will cover:

  1. Aiven PostgreSQL service creation, since you can only query a database if it exists
  2. How to install pgAdmin and connect to a PostgreSQL database
  3. How to load a Netflix dataset into PostgreSQL via pgAdmin
  4. How to check Samuel L. Jackson statistics via pgAdmin queries

1. Create a PostgreSQL database

If you don't have a PostgreSQL database ready, why not create one with Aiven's console? Here's how:

  • In the Aiven Console, click + Create a new service
  • As Product, select PostgreSQL
  • Select a cloud provider, a region and a plan.
    Once created, you'll be able to find the information about your database by double-clicking on the instance name. The Overview tab shows information like Database Name, Host and Port, together with the pre-created avnadmin user and password.

We just need to wait a couple of minutes until the Nodes lights become green, indicating our PostgreSQL service is ready to be used.

2. Install and Connect pgAdmin

I chose pgAdmin because it's one of the most popular options among PostgreSQL client tools. It can run either as a web or desktop application, and for the purposes of this blog post we'll choose the latter. You can download it from the pgAdmin website, and the installation just takes a couple of clicks.

Once installed and launched, the tool asks you to set a Master Password. This is a good way to secure all the credentials we're going to store.

Now it's time to connect to PostgreSQL. To do that, click Create New Server and fill in the required parameters. On the General tab, set the connection name; on the Connection tab, enter the hostname, port, and maintenance database as well as username and password used to authenticate. (All this information available in Aiven Console on the Overview tab). Finally, on the SSL tab, select SSL Mode = required; this is Aiven's default security method.

We should now be able to access our PostgreSQL default screen. It contains a set of visualisations around the number of sessions, transactions per second, in and out tuples and server activity. Very useful information for monitoring our database.

pgAdmin offers a wide set of administration features, from local database backups management to Schema Diff, a valid tool for enabling a DDL comparison across databases.

3. Load Netflix data

pgAdmin is not only a monitoring and administration tool. Since it also offers development experience, we can use it to create all kinds of objects in the database via GUI or pgsql. For example, you can create a table, upload in it a csv file and start querying it.

We'll use some data related to Neflix shows, taken from Kaggle. You only need a free account to download it. We can then unzip it and we'll use the file netflix_titles.csv for the rest of the blog.

We have the dataset, what about the table structure? We can generate the table DDL with the handy ddlgenerator tool as explained in a previous post all we have to do is to run the following from a terminal window

pip install ddlgenerator ddlgenerator postgres ~/Downloads/netflix_titles.csv

The above command should provide the following output, which we can now copy and paste in pgAdmin query editor available by selecting Tools -> Query Tools and run it.

CREATE TABLE netflix_titles ( show_id VARCHAR(5) NOT NULL, type VARCHAR(7) NOT NULL, title VARCHAR(104) NOT NULL, director VARCHAR(208), _cast VARCHAR(771), country VARCHAR(123), date_added TIMESTAMP WITHOUT TIME ZONE, release_year BIGINT NOT NULL, rating VARCHAR(8), duration VARCHAR(10) NOT NULL, listed_in VARCHAR(79) NOT NULL, description VARCHAR(248) NOT NULL );

Browse the object tree in the pgAdmin left-side panel to Default-db -> Schemas -> Public -> Tables, and you should see a table named netflix_titles.

To load the csv file into the netflix_titles table, do the following:

  1. Right-click on the netflix_titles table and select Import/Export.
  2. Set Import in the Import/Export slider.
  3. Select the netflix_titles.csv file from the local computer
  4. Enable the Header flag.
  5. Click OK.

You can see your import thread being started and populating the netflix_titles table. Once finished, pgAdmin shows the text Successfully completed.

4. Have fun with the data!

Now that we've loaded the data, we can have some fun querying it. Let's create a new query editor and check which were the top 3 countries based on movie production according to the dataset with the following query:

select country, count(*) nr_movies from netflix_titles group by country order by 2 desc limit 3;

The result, unsurprisingly, puts United States on top, followed by India - and then Null, probably due to problems during the data collection.

Now it's time to explore the filmography of one of my favourite actors: Samuel L. Jackson. Which actor was in the most movies with him?

select trim(s.actor) actor, count(*) nr_movies from netflix_titles nt, unnest(string_to_array(nt._cast, ',')) s(actor) where _cast like '%Samuel L. Jackson%' and trim(s.actor) <> 'Samuel L. Jackson' group by trim(s.actor) order by 2 desc limit 10;

The following query allows us to explode, with the unnest function, the _cast field containing a comma separated list of actors for each movie. The result tells us that Tim Roth and Walton Goggins are the two lucky actors who shot three movies with Samuel L. Jackson.

Finally, what are the 10 most used words (with more than 3 characters) in movie titles?

select title_word, count(*) nr_movies from netflix_titles nt, unnest(string_to_array(upper(nt.title), ' ')) s(title_word) where char_length(title_word) > 3 group by title_word order by 2 desc limit 10;

Again, the unnest function over the title field enables us to verify that, as expected, LOVE is the most frequently used word. But run this query yourself to find out which surprising word takes third place!

pgAdmin is a perfect fit for PostgreSQL

pgAdmin offers a great UI for PostgreSQL: monitoring the database, comparing changes across instances, managing local backups, querying, importing and exporting data. The tool offers a series of functionalities which make developer's and administrators' lives easier, and daily tasks accessible.

If you want more information here are few links:

And some further reading:

Wrapping up

Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!

In the meantime, make sure you follow our changelog and blog RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.

Further reading


Subscribe to the Aiven newsletter

All things open source, plus our product updates and news in a monthly newsletter.

Related resources