Use PostgreSQL® DOMAIN rules to validate columns of data
Learn how to validate your data as it goes into your databases to improve data quality
Learn how to validate your data as it goes into your databases to improve data quality
When dealing with data inputs, there's always one or more fields that need strict validation: a T-shirt size can accept only a few precise values, a shoe size should be within a range of numbers, a phone number should follow a precise pattern. It's good to perform these checks at the application level, for example as a part of a form input, but for data consistency it's best to validate that data when adding it to the database as well.
PostgreSQL® allows these checks to be performed in several ways, like using table constraints. In this article, we'll talk about the DOMAIN feature, which defines column validation centrally for use across multiple database tables. We'll showcase the power and the limits of the DOMAIN approach, using some fairly basic examples of verifying Italian addresses in a table.
As mentioned before, the DOMAIN option allows defining the constraints of a column type once and reusing it across several tables. The benefit of a unique definition are evident: there's one truth and one point of contact. Needing to define the same constraints across several tables could result in human error or misalignment during constraint evolution.
On the other side, having a unique definition can be tricky in cases when different tables need to evolve such a definition in different ways or across different times. In such cases, having a unique point of contact can become dangerous. Check the examples below regarding altering and dropping DOMAIN definitions for more info.
For our test case, we can use any kind of PostgreSQL, it being on-premises, Docker, or cloud, since the DOMAIN function is native to PostgreSQL and has been available for a long time. For the purpose of the blog we'll use Aiven for PostgreSQL®. Sign up for an Aiven account if you don't have one already and install the Aiven CLI.
You'll need to authenticate to the CLI before you can create your first service and once authenticated, you can create a service:
Loading code...
The above call creates an Aiven for PostgreSQL service (--service-type pg) named demo-pg-italian-addresses, using the smallest hobbyist plan on the google-europe-west3 region.
We can wait for the service to be up and running with:
Loading code...
And connect to it (this runs psql for you, so that needs to be installed) with:
Loading code...
For the purposes of this demonstration, let's say Italian streets begin with three indicators: Via, Viale, and Piazza. In reality there are more ways to indicate streets in Italy, but let's keep it simple for now.
Let's use the DOMAIN feature to create a domain for these three values, ITALIAN_ADDRESS.
Loading code...
If we check the details of the above command we can see:
ITALIAN_ADDRESSTEXTNOT NULL constraint forcing the column to never be emptyCONSTRAINT named ITALIAN_ADDRESS_CHECK. This performs a check, based on a regular expression, ensuring the address:
VIA, VIALE or PIAZZA identifying the type of street/squareA-Z, space or - characters identifying the street/square name+) identifying the building number*) identifying the block within the buildingNow that we've defined the domain, we can use it when creating a table. Let's create the table ADDRESSES to host our customer information, such as the customer's name and their address:
Loading code...
In the above SQL, the ADDRESS column refers to the ITALIAN_ADDRESS domain. Now we can insert rows, checking that the domain rules are obeyed:
Loading code...
The above insert is successful, since it follows the domain definition. But what happens if we try something different, like an address starting with STRADA?
Loading code...
We get an error, since the domain check is failing:
Loading code...
Our domain is working as expected and we can now use it in other table definitions as well. For example if we have a SHOPS table that needs to contain a validated address, we can define it as SHOPS ITALIAN_ADDRESS. This is the beauty of the DOMAIN feature: one central definition, multiple usages.
The domain definition prohibits null values in the column, but we might want to allow an insert that skips the column definition by setting a domain default value:
Loading code...
Now, if we try to add a new customer without any associated address, we might expect a successful insert:
Loading code...
Instead we get an error:
Loading code...
The error is due to the fact that our default value UNKNOWN ADDRESS is not compliant with the constraint defined in the domain (it doesn't start with VIA, VIALE or PIAZZA). If we switch the default to a compatible value like VIA UNKNOWN ADDRESS 1:
Loading code...
We can successfully push the customer information without the address now:
Loading code...
Checking the data in the ADDRESSES table with:
Loading code...
Will show the row for LUIGI with the default address set.
Loading code...
Note that there is a hierarchy for default values: Data type default -> Domain default -> Table column default.
The data type default value can be overridden by the domain default and the domain default can be overridden by the table column default.
Changing domains already in use can be tricky. For example, let's say that tomorrow italy decides that all PIAZZA addresses should be renamed to PIZZA, so we need to update our domain to reflect this. First we remove the old constraint:
Loading code...
Then add the new check, changing PIAZZA to PIZZA:
Loading code...
But the command above generates an error:
Loading code...
Why can't we enable the new constraint? Let's have a look at the data that doesn't satisfy the new constraint, with:
Loading code...
The WHERE clause in that query checks the address against the same regular expression as defined in the domain. The output is:
Loading code...
We can see that the PIAZZA DEL CORSO 8A row doesn't satisfy our new constraint (it should start with PIZZA). Let's define the constraint as NOT VALID for now:
Loading code...
The NOT VALID clause, allows us to add the constraint but not validate the rows in the table. If we check the validity:
Loading code...
We still get the same error since we didn't update the data:
Loading code...
We need to update the data in the table:
Loading code...
Then check the data with SELECT * FROM ADDRESSES;
Loading code...
The UGA row should now be in line with the new constraint, so we can validate it:
Loading code...
Success! Now the check is enabled, therefore trying to insert a PIAZZA address is prohibited.
Loading code...
The above SQL throws the error:
Loading code...
Besides this example, another catch with the ALTER DOMAIN command is that any expression or function we add as part of the domain creation should be immutable. If we have a user-defined function as part of the CREATE DOMAIN expression, and we're able to modify the expression/function after domain creation, we'll get an error when reloading the data in that database.
What happens when dropping a domain used in one or more tables?
Loading code...
We get the following error, signaling that there are dependencies
Loading code...
If we really want to drop it we can use the CASCADE option:
Loading code...
Check the table:
Loading code...
No ADDRESS column anymore!
Loading code...
It's difficult to drop an in-use domain without affecting the rest of the database tables. The beauty of centrally managed column definition is playing against us in this scenario. Since the definition is unique, any table using it will have columns disappear if the related definition gets deleted.
PostgreSQL domains are a powerful feature, allowing a central definition of column validation rules that can be then applied across several tables. The evolution of the definition can be tricky, but for standard, non-evolving columns it can represent a good way to centralize the definition and checks.
Check out some additional resources on the topic:
avn service create demo-pg-italian-addresses \
--service-type pg \
--plan hobbyist \
--cloud google-europe-west3avn service wait demo-pg-italian-addressesavn service cli demo-pg-italian-addressesCREATE DOMAIN ITALIAN_ADDRESS
TEXT
NOT NULL
CONSTRAINT ITALIAN_ADDRESS_CHECK CHECK(UPPER(VALUE) ~ '^(VIA|VIALE|PIAZZA) [A-Z \-]+ [0-9]+[A-Z]*$');CREATE TABLE ADDRESSES(
ID SERIAL,
NAME TEXT,
ADDRESS ITALIAN_ADDRESS
);INSERT INTO ADDRESSES(NAME, ADDRESS) VALUES ('UGA', 'PIAZZA DEL CORSO 8A');INSERT INTO ADDRESSES(NAME, ADDRESS) VALUES ('UGO', 'STRADA NASCOSTA 5');ERROR: value for domain italian_address violates check constraint "italian_address_check"ALTER DOMAIN ITALIAN_ADDRESS SET DEFAULT 'UNKNOWN ADDRESS';INSERT INTO ADDRESSES(NAME) VALUES ('LUIGI');ERROR: value for domain italian_address violates check constraint "italian_address_check"ALTER DOMAIN ITALIAN_ADDRESS SET DEFAULT 'VIA UNKNOWN ADDRESS 1';INSERT INTO ADDRESSES(NAME) VALUES ('LUIGI');SELECT * FROM ADDRESSES; id | name | address
----+-------+-----------------------
1 | UGA | PIAZZA DEL CORSO 8A
4 | LUIGI | VIA UNKNOWN ADDRESS 1
(2 rows)ALTER DOMAIN ITALIAN_ADDRESS
DROP CONSTRAINT ITALIAN_ADDRESS_CHECK;ALTER DOMAIN ITALIAN_ADDRESS
ADD CONSTRAINT ITALIAN_ADDRESS_NEW_CHECK CHECK(UPPER(VALUE) ~ '^(VIA|VIALE|PIZZA) [A-Z \-]+ [0-9]+[A-Z]*$');ERROR: column "address" of table "addresses" contains values that violate the new constraintSELECT *
from ADDRESSES
where NOT ADDRESS ~ '^(VIA|VIALE|PIZZA) [A-Z \-]+ [0-9]+[A-Z]*$'; id | name | address
----+------+---------------------
1 | UGA | PIAZZA DEL CORSO 8A
(1 row)ALTER DOMAIN ITALIAN_ADDRESS
ADD CONSTRAINT ITALIAN_ADDRESS_NEW_CHECK CHECK(UPPER(VALUE) ~ '^(VIA|VIALE|PIZZA) [A-Z \-]+ [0-9]+[A-Z]*$') NOT VALID;ALTER DOMAIN ITALIAN_ADDRESS
VALIDATE CONSTRAINT ITALIAN_ADDRESS_NEW_CHECK;ERROR: column "address" of table "addresses" contains values that violate the new constraintUPDATE ADDRESSES SET ADDRESS=REPLACE(ADDRESS, 'PIAZZA', 'PIZZA') WHERE ADDRESS LIKE 'PIAZZA%'; id | name | address
----+-------+-----------------------
1 | UGA | PIZZA DEL CORSO 8A
4 | LUIGI | VIA UNKNOWN ADDRESS 1
(2 rows)ALTER DOMAIN ITALIAN_ADDRESS
VALIDATE CONSTRAINT ITALIAN_ADDRESS_NEW_CHECK;INSERT INTO ADDRESSES (NAME, ADDRESS) VALUES ('CARLA', 'PIAZZA MAGGIORE 33');ERROR: value for domain italian_street violates check constraint "italian_street_new_check"DROP DOMAIN ITALIAN_ADDRESS;ERROR: cannot drop type italian_address because other objects depend on it
DETAIL: column address of table addresses depends on type italian_address
HINT: Use DROP ... CASCADE to drop the dependent objects too.DROP DOMAIN ITALIAN_ADDRESS CASCADE;SELECT * FROM ADDRESSES; id | name
----+-------
1 | UGA
4 | LUIGI
(2 rows)