Aiven Blog

Feb 17, 2021

Discover exoplanets with PostgreSQL® sample data

In this post, Lorna Mitchell populates an empty PostgreSQL database for testing with cool data about exoplanets. Read to learn how!

lorna-mitchell

Lorna Mitchell

|RSS Feed

Developer Advocate at Aiven

PostgreSQL is and remains one of our most popular and growing storage platforms; other storage technologies come and go but modern Postgres is a solid choice for so many applications. When you spin up your first Aiven PostgreSQL, you'll want to take some time to play with the features ... but there's a problem. Your new shiny database is empty.

Finding and using some open datasets is a great way to fill this gap, and one option is go try the Kaggle platform. It's a place to find open data, advice about data science, and some competitions you can participate in to hone your skills. There's quite a selection of datasets to choose from, but today we'll be using the exoplanets data from the Kepler mission. You'll need a (free) account to log in and download the data. Go ahead and extract the zip file too, we'll be using cumulative.csv for the example in this post.

Here's the process in brief:

Get started with Aiven

If you are not already an Aiven user, you can sign up for an Aiven account to follow the steps in this post - we will wait for you!

We will also be using the Aiven CLI. This tool requires Python 3.6 or later, and can be installed from PyPI:

pip install aiven-client

You will also need to authenticate your Aiven account against the CLI tool. Replace your own details in the command below:

avn user login <email@example.com>

You have everything you need to create an Aiven database in the cloud.

Create PostgreSQL service

A good first step for a new project is to create a project to keep the services in. All it needs is a name:

avn project create exoplanets

Aiven offers many options when creating services but to get us going quickly, we'll use the newest postgres available and the smallest package, called hobbyist. One of the most fun things though is being able to choose any cloud platform you like so take a moment to check the list and copy the CLOUD_NAME field of your favorite:

avn cloud list

I chose google-europe-west1 for my example, but you can replace that with the cloud you chose. Here is the command to run to create the postgres database:

avn service create -t pg -p hobbyist --cloud google-europe-west1 pg-exoplanets

It takes a few minutes for the node to be ready, but the Aiven CLI has a handy "wait" command that doesn't return until the service is ready to talk to us. This is less critical when we're running the commands by hand as we do here, but it's super useful when your CI system is spinning up the data platforms by itself!

avn service wait

When the command returns, our PostgreSQL cluster is ready to use. Let's create a database to hold the sample data; the command below creates one named "exoplanets":

avn service database-create --dbname exoplanets pg-exoplanets

Now we have our own sad and empty database, let's look at the sample data and get it imported.

Adding CSV data to PostgreSQL

PostgreSQL has built-in support for importing CSV data into an existing table, but we don't have the table structure, just a CSV. Luckily there's a tool for that - ddlgenerator is another Python commandline tool.

Here's how to install the ddlgenerator tool and then generate the CREATE TABLE statement from the CSV we downloaded earlier:

pip install ddlgenerator ddlgenerator postgres cumulative.csv > create.sql

Have a look inside the file and you will see that we have the structure we need to explain to PostgreSQL how to hold the data. The avn service cli command will give us a psql prompt on the new database:

avn service cli pg-exoplanets

From within psql we can connect to the database we created, and then run the SQL file to create the table structure:

\c exoplanets \i create.sql

Adding the final piece to the puzzle, and still from the psql prompt, the next command brings in the CSV data:

\copy cumulative from data/cumulative.csv csv header

Nice work! The cumulative table should now have some data for you to play with!

Dreaming of Exoplanets

Now you have a database full of measurements of exoplanets taken by the Kepler Space Telescope. If you're not already familiar with the project, the NASA mission page is worth a read. The mission went into a second phase when one of the controls failed, which serves to remind us that engineering systems we can see and touch, or at least ssh into, is much easier gig than operating in space!

You can explore the dataset, which describes observations and compares the Kepler assessment of each exoplanet with its official status in the pre-existing literature. For example, try this to see the false-positives identified by Kepler:

select kepler_name, koi_pdisposition from cumulative where koi_disposition = 'CONFIRMED' and koi_pdisposition = 'FALSE POSITIVE';

You can also connect this data to other tools to use the dataset further. Either grab the connection details from the web console, or use jq with avn for a one-liner:

avn service get pg-exoplanets --json | jq ".service_uri"

Wrapping up

Good cloud experimentation practice suggests that if you've finished with your exoplanets database, you can delete it:

avn service terminate pg-exoplanets

For even more fun and learning, how about one of these resources:

Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!

In the meantime, make sure you follow our changelog and blog RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.


Subscribe to the Aiven newsletter

All things open source, plus our product updates and news in a monthly newsletter.

Related resources