(Originally published in March 2020.)
This post is all about PostgreSQL. We'll cover a few of the many things it does, how it does them, why it does them that way, and what’s unique about PostgreSQL that has made it a go-to data store today. We’ll also consider the advantages of a managed and hosted PostgreSQL solution, and finally look at how to get started.
- What is PostgreSQL and what does it do?
- How PostgreSQL does what it does
- The power of the PostgreSQL approach
- The PostgreSQL ecosystem
- Aiven for PostgreSQL
What is PostgreSQL and what does it do?
PostgreSQL is a flexible open-source object relational database management system with features meant to meet changes in workloads, from single machines to data warehouses to web services with many concurrent users. PostgreSQL uses and extends SQL (hence the name), and is broadly extensible to a range of use cases beyond mere transactional data.
PostgreSQL is strongly supported by its global development group of companies and individual contributors.
But what does it do?
PostgreSQL is a relational database. As such, it’s a store of relations between tuples representing entities (such as documents and people) and relationships (such as authorship). Relations hold fixed-type attributes representing entity properties (such as a title) along with a primary key. Attribute types can be either atomic (such as integer, floating point, or boolean) or structured (such as an array or a procedure).
PostgreSQL supports transactions with ACID properties. This means transactions must support four attributes:
Atomicity — transactions are considered complete units; a transaction can either completely succeed or completely fail - in the case of failure, the database state is left unchanged.
Consistency — a database, between transactions, can only exist in a valid state; all data written to the database must adhere to extant constraints, triggers, cascades and related combinations.
Isolation, — a function of concurrency control, ensures that data is not corrupted by illegal or concurrent transactions — as transactions are treated as if they happened sequentially.
Durability — ensures that a transaction remains committed even when the system fails — typically, completed transactions are recorded, for example, in a write-ahead log.
For more information about the types of features that PostgreSQL supports, there's a great breakdown at PostgreSQL , so we’ll look at just a few of the main ones here.
In addition to SQL support, PostgreSQL features automatically updatable views, meaning that a view resulting from a query will automatically update when a transaction updates the data serving that view. A materialized view is a data object containing a given query’s results: it's possible to query a materialized view just as you would a database table directly.
Triggers execute code in response to transactional or other changes to a table or view: when a new item is entered to an inventory table, a similar item could be entered automatically to each of the
Foreign keys exist to ensure inclusion dependencies or members shared between parent and child tables; for example, a member of the
managers table might share the
employee_name column with an
Stored procedures, maintained in the data dictionary of a database, may contain several combined SQL procedures to provide applications access to commonly-used data validation, access control, or other methods.
PostgreSQL offers a means for publish/subscribe logical replication; essentially allowing a PostgreSQL instance to serve as an event cache between applications and event consumers. There is also a means for change data capture, meaning that prior database changes can be rewound and replayed, a capability essential for disaster recovery.
And that’s just the beginning. The PostgreSQL home page lists features including supported data types, more in-depth data integrity features, features supporting concurrency, disaster recovery features, security features, extensibility and text search, including international character support.
How PostgreSQL does it
Core PostgreSQL mandates that data be fully-structured, requiring that it adheres to a specific schema; records added to a database that do not comply will be rejected. This places PostgreSQL historically on the far left of a data-structure continuum ranging from fully structured to relatively unstructured data:
NOTE: PostgreSQL has since evolved to also support JSON/JSONB, XML and other common formats for data ingress/egress.
As an open-source database management system, PostgreSQL users can continually evolve their databases, while switching languages and without need to recompile. While SQL conformance is ongoing, new features are being constantly added. Let’s look at how objects are created and manipulated in PostgreSQL.
DDL - data definition language
Data definition language (DDL) is used to define data structures, and includes the following commands:
- Data type declaration
CREATE- creates a new database, table, index or procedure.
ALTER- modifies an existing database object.
DROP- deletes an existing database, table, index or procedure.
TRUNCATE TABLE- empties a table for reuse.
CREATE DATABASE LIBRARY; CREATE TABLE Borrowers ( id INT PRIMARY KEY, first_name VARCHAR(20) not null, last_name VARCHAR(50) not null, birth_date DATE not null); ALTER TABLE Borrowers ADD COLUMN years_member INT NOT NULL CHECK years_member > 0; CREATE TABLE books ( id INT PRIMARY KEY, title VARCHAR(50), author VARCHAR(50) publication_date DATE not null); /* after populating ‘books‘ table */ TRUNCATE TABLE books; /* removes all data from ‘books’ table */
DML - data manipulation language
On the other hand, data manipulation language is used to work with tables and their contents as in:
INSERT- inserts new rows in a table.
UPDATE- updates existing rows in the table.
DELETE- deletes rows from a table.
SELECT- the main method for querying data from the database.
INSERT INTO Borrowers(id,first_name,last_name,birth_date,years_member) VALUES(1,'Riley','Jimbo',1964-05-02,12); INSERT INTO Borrowers(id,first_name,last_name,birth_date,years_member) VALUES(2,'Smith','Bubba',1973-09-21,5); INSERT INTO Borrowers(id,first_name,last_name,birth_date,years_member) VALUES(3,'Eberhardt','Jocelyn',1930-12-22,71); UPDATE Borrowers SET years_member = 1 WHERE id = 1; DELETE FROM Borrowers WHERE years_member > 70; SELECT * FROM Borrowers WHERE years_member > 5;
Compact syntax for bulk insert
Naturally, there is an easier way to insert multiple rows at once, using a single command:
INSERT INTO Borrowers (id,first_name,last_name,birth_date,years_member) VALUES (4, 'Evelyn', 'Snarfington', 1959-08-03, 8), (5, 'Robert', 'Jones', 2000-01-23, 6), (6, 'Joseph', 'Chu', 1987-11-07, 10);
NOTE: There are actually now many other ways to achieve the same thing, including automating large data imports programmatically — and omitting field specifications when that data doesn’t exist.
Most querying on PostgreSQL (as in SQL itself) happens via the
SELECT statement, as in the following examples:
SELECT * FROM Borrowers;
Queries can be refined in a variety of ways, for example limiting results to a condition:
SELECT * FROM Borrowers WHERE years_member > 3;
Limiting results to a specific results count:
SELECT * FROM Borrowers LIMIT 5;
Querying only certain fields:
SELECT id, last_name, years_member FROM Borrowers;
Sorting query results by certain fields:
SELECT * FROM Borrowers ORDER BY years_member DESC;
and so on.
NOTE: There are also Data Control Language (DCL) and Transactional Control Language (TCL) commands to consider, but they are out of the scope of this introduction.
The power of the PostgreSQL approach
The real beauty in building a modular open-source database from the ground up are the features and performance that have been built-in as PostgreSQL has evolved. Here are just a few of these features, along with how they look in practice:
PostgreSQL can support the sort of SQL queries that look for column values based on some textual contents combined with wildcards, for example:
SELECT * from books WHERE column_name LIKE ‘zen%’;
could return records containing book titles like Zen and the Art of Motorcycle Maintenance and Zen Flesh, Zen Bones. Those wildcards could be positioned also at the start of the search string. The same query, but with the search string as
%zen% could return the results above -- in addition to titles where ‘Zen’ isn’t the initial word in the string, like The Three Pillars of Zen and The Essential Teachings of Zen Master Hakuin.
In addition to text search, PostgreSQL supports a
tsvector (a text-search) data type, while the function
to_tsvector lets you create a list of tokens from which to derive a
to_tsquery is used to query the
tsvector for occurrences of certain words or phrases.
NOTE: These advanced text search features echo similar features in OpenSearch, blurring the lines between the two systems somewhat.
json language support
PostgreSQL is one of the best-known examples of an open source database which supports concurrent ACID transactions. To achieve this, PostgreSQL may use multiple locks to serialize database changes. When concurrent changes are made, the first transaction may proceed (as a lock is invoked); the second pauses until the first commits or aborts.
Here is an example. Imagine opening two terminal windows. In a
psql session, you’d try:
CREATE TABLE records ( header text NOT NULL primary key, info json NOT NULL ); BEGIN; ALTER TABLE records ADD COLUMN recent_update timestamp;
psql session, you’d try to run:
SELECT * FROM records;
NOTE: nothing happens, until the first transaction is complete.
Reliability and disaster recovery
Before writing a full page image to disk, PostgreSQL periodically writes to a write-ahead log (WAL). This enables PostgreSQL to recover from crashes by replaying the writes in the log.
full_page_writes parameter in
postgresql.conf can be toggled off if there is anything in the system preventing partial writes.
Individual records in WALs are checksummed by a CRC-32 check to validate correctness of record contents (upon restore operations), although there are a few items that bypass the checks, including data pages, and internal data structures.
PostgreSQL handles a range of security issues in a variety of ways. To name a few, with PostgreSQL, the client connections are allowed, by default, only via local Unix socket, not via TCP/IP sockets, unless
-i switch is used to start the server backend. Also, client connections can be black/whitelisted via the
pg_hba.conf file in
These are precisely the sorts of things you never need to worry about with a hosted, managed solution.
Superuser rights to objects are one option to be able to perform a lot of different operations in PostgreSQL. For example, DB files themselves are only readable by the PostgreSQL superuser account. These privileges can be implemented to copy table data, create publications that publish data automatically, or create subscriptions. They are also important privileges for use in logical replication.
In fact, each PostgreSQL user takes a username and password; users may be assigned to groups, whereby table — or even row — access can be assigned or restricted based on these group privileges.
PostgreSQL also offers a range of encryption options like encryption for specific columns, data partitions, across a network, etc. Server spoofing prevention works by creating a symbolic link to a relocated vulnerable socket file.
PostgreSQL is catalog-driven: data dictionaries — system catalogs — hold information about the database itself, cells, columns, data types, functions, access methods, etc. These catalogs are tables in their own-right: user-queryable and modifiable.
As users can modify databases directly, as well as add dynamically-loadable code changes or user-defined functions (UDFs), many choose PostgreSQL for rapid prototyping applications and data structures.
Extensibility is critical for many organizations with needs specific to a given industry or line of business. When traditional database systems fall short and business users are not themselves database experts, modifying the database may not interrupt availablity nor require core database engine modifications.
In fact, users want seamless solutions. With PostgreSQL you can seamlessly create your own data types, functions, aggregates, operators, operator classes (for index access), data wrappers, and so on, providing the tools and add-ons for domain-specific solutions.
Sometimes extensions become complete products in their own right. One concrete example: PostGIS, which has been modified to support location queries using geographic coordinates. Additionally, Timescale is PostgreSQL optimized for time series data.
Developers these days can enjoy a number of tools to interact with the database that go way beyond psql. pgAdmin supports PostgreSQL across the board and is a long-time favorite of DBAs and developers. DBeaver, while not native to PostgreSQL, does many of the same things. Navicat is a cloud-compatible tool that also works with MySQL and MongoDB.
DataGrip, from JetBrains, is a popular integrated development environment (IDE) that works cross-platform and with a range of databases.
usql is a universal command-line interface that works with many databases natively. Finally, OmniDB is a complete collaborative workspace for PostgreSQL and other databases.
In addition to core PostgreSQL itself, Aiven developers have contributed to a number of PostgreSQL extensions, including pghoard (for backup and restore), pglookout (for replication monitoring and as a failover daemon), pgmemcache (a set of PostgreSQL user-defined functions that provide an interface to memcached), and a range of connectors.
What does all of this mean?
PostgreSQL is not just a database; it’s the sum of the features, functions and ecosystem comprising and surrounding it. With such a modifiable approach at its core, PostgreSQL can munge yet-to-be-defined data types, manage larger data sets than ever, run parallel queries at scale, support complex configurations with ease, and even improve and enhance monitoring capabilities, spawning new uses and derivatives. Let’s look briefly at the PostgreSQL ecosystem.
The PostgreSQL ecosystem — use cases and users
Today in addition to enterprise users, a number of 3rd party vendors, consultancies and independent groups derive products from, consult, support, and offer auxiliary enhancements for PostgreSQL. There are also integrations with artificial intelligence (AI) and machine learning (ML), many cloud versions, hardware-optimized versions, and so forth.
The ecosystem is truly massive. By far the biggest industry sector using PostgreSQL is the computer software industry, followed by information technology and services. Other industries include hospital and health care, higher education, financial services, telecommunication, education, marketing and advertising. Government agencies, including the FBI, are also users.
According to the db-engines ranking, PostgreSQL is the fourth most popular database management system (and most popular open source DBMS). In fact, Aiven has a number of case studies with customers using hosted and managed PostgreSQL services.
Aiven for PostgreSQL
As a fully managed and hosted solution for object-relational database management, Aiven for PostgreSQL is the fastest solution for any provider, offering high-availability with automatic updates and backups. Aiven supports various extensions, such as PostGIS and TimescaleDB and provides publish/subscribe logical replication.
Aiven for PostgreSQL is easy to set up. Here's how to get started on the Aiven Console:
References (more linked throughout the article):  Funding a Revolution: Government Support for Computing Research. National Academy Press, 1999.  A Relational Model of Data for Large Shared Data Banks. E.F. Codd, IBM, 1970.  The POSTGRES Data Model.
On the Aiven blog:
- Aiven for PostgreSQL 13 performance on GCP, AWS and Azure [Benchmark]
- How to stand up multiregion PostgreSQL with Pulumi
- Discover exoplanets with PostgreSQL sample data
- Aiven for PostgreSQL for your Go application
- Analysing Netflix shows with pgAdmin and PostgreSQL
- Solving the knapsack problem in PostgreSQL
In this article, we’ve looked at PostgreSQL: what it is, what it does, and how it does it. We’ve examined Data Definition Language (DDL), Data Manipulation Language (DML) and the many tools and compelling features that come with this open-source database. Finally, after looking at use cases, users and PostgreSQL’s industry position, we’ve considered the advantages of a hosted, managed solution, and looked at a few ways to get started.
Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!