Solving the knapsack problem in PostgreSQL®
The knapsack problem: how to fit all the items you're most likely to need on holiday? Find out how to use the world's best OS database to help you pack.
The knapsack problem: how to fit all the items you're most likely to need on holiday? Find out how to use the world's best OS database to help you pack.
The days before a holiday are exciting. You've planned a vacation somewhere and can't wait to hit the road.
But first, there's a whole list of decisions to make: how many underpants should I bring? Are the running shoes necessary? What about the hairdryer? Which jacket should I pack? Lots of decisions to take with the suitcase size as constraint and the best holiday experience as ultimate goal.
The aim of the blog post is to help solve the packing problem using technology. Specifically, PostgreSQL® and the recursive Common Table Expressions.
People have had to pack luggage for a long time, so this optimization dilemma is far from new. It even has a name: the knapsack problem. It can be applied to a variety of use cases where there is a set of items with a defined weight (space occupied in the luggage in our example) and value (item benefits during holiday) and where the total weight is limited (luggage size).
The end goal is to come up with a set of items that fits within the weight constraint and has the maximum possible value.
For example, let's assume we have the following items:
| Item | Value | Weight |
|---|---|---|
| 🧦 Socks | 10 | 3 |
| 🎩 Hat | 15 | 5 |
| 👖 Trousers | 5 | 15 |
| 👟 Shoes | 8 | 10 |
| 👕 T-Shirt | 7 | 10 |
And we have a knapsack with a capacity of 20.
This means we could pack:
The second option gives us a much better overall value based on the settings in the table.
(At this point, we should note that this concept and solution is presented as an oversimplified mathematical problem. Aiven is not responsible if you decide to leave for your holidays without any shoes or trousers!)
How can we be sure to make the best choice? Can technology help? Of course! Just pull out your trusty PostgreSQL (which you should always remember to pack).
Let's start by creating an instance with the help of Aiven's command line interface.
Loading code...
The above creates a PostgreSQL instance (--service-type pg) called demo-pg in the region google-europe-west3 with an hobbyist plan. You can size and move your instance as you wish; check out all the combinations available.
Once the instance is up and running (we can use avn service wait demo-pg to wait for it), we can log into the database from the prompt with:
Loading code...
Great, we're in PostgreSQL! Now we can create the new inventory table and fill it with our dataset.
Loading code...
Let's verify the data:
Loading code...
And we get the expected output:
Loading code...
We're now ready to crack our luggage knapsack problem.
There are multiple ways of solving the knapsack problem. One of them is to replicate in PostgreSQL what you would do in a trial & error case: you pick one item, add it to the luggage, then pick another item and add it, going on like that until the suitcase is full. Once no more items can fit in, you sum all the values of the inserted items and take note of the list. Then you empty the luggage and start from scratch again, trying to make sure you don't end up with the same sequence of items just in a different order.
When all the possible item combinations have been checked, you can then select the one with maximum value. It's a long and error prone process, but luckily, computers can handle it for us.
The above process defines the steps we need to perform to find all the combinations:
inventory.picked_items list and store the item costinventory, making sure that:picked_items list andweight of the current item plus the total_weight of all the items in the picked_items list is less or equal to the maximum allowed weightpicked_items listweight to the total_weightinventory contains new items satisfying conditions a and b.We can spot an iterative pattern here of sequential queries against the inventory table. How can we map it into PostgreSQL? We want to perform select statements against the inventory table for a variable number of iterations (the list of picked_items can have different sizes). We need a way to recursively call the select statement: here is were PostgreSQL recursive CTEs shine, since they allow the creation of queries that can reference their own output.
For example we can get all the combination of 3 items starting with Socks in our inventory table with:
Loading code...
The RECURSIVE keyword sets the scene. The second SELECT statement is what does the trick by joining the inventory table with the result of the RECURSIVE items query we are still defining. The result lists all the 25 combinations of 3 items in the inventory table starting with Socks.
Loading code...
The above query returns all the combinations available in the inventory table limited to 3 items. However, our knapsack problems has different constraints:
total_costLet's check how we can add the above constraints in the recursive query, keeping 20 as total allowed cost:
Loading code...
Let's cut the query down in sections: the first piece is the seed of our recursive query, we select all rows from the inventory table
Loading code...
We store the item_id as single value and we add it to the picked_items array. We also store the total number of items (1), the associated weight as total_weight and value as total_value. The next phase, after the UNION ALL, is the recursive bit:
Loading code...
We select the item_id, store it a single value and add it to the picked_items array. Add +1 to count of items and add the weight and value of the current item to total_weight and total_value of the previously selected items respectively.
The interesting bit comes in the following section. We perform a CROSS JOIN between inventory and items (our recursive query) creating a cartesian product of the tables. In other words, we then have a row for each item that exists in inventory multiplied for each row in items. Not very clever, but the filtering does the magic:
picked_items::varchar[] @> ARRAY[item_name] = false: makes sure that we exclude from the result set any rows for which the item_name is already in the list of picked_itemsweight + total_weight <= 20: discards from the result set all items for which the related weight summed to the previously calculated total_weight exceeds the 20 threshold we set for our luggage.inventory.item_id > items.item_id: it's an optimization filter; {Trousers,Shoes} and {Shoes,Trousers} are two combinations of the same items, and we want to keep only one. Thus we allow the selection of items only in ascending item_id order, this will help improving performances since we are not analysing the same subset of items multiple times.The result of the full query above is:
Loading code...
It shows all the combinations of items that fit in our 20 capacity luggage. When we sort it by total_value, it becomes evident that the combination {Socks,Hat,Shoes} is the one that provides the maximum value.
Again, please take this as pure theoretical exercise and not as holiday packing suggestion. What? do you have also other constrains to add? Taking this as a basis, feel free to add any additional complexity that your real case requires!
The knapsack problem is very common: from project planning to resource allocation there is a variety of real cases where we need to fit as much items as possible in some sort of limited bucket. PostgreSQL recursive CTEs represent a valid option where a query iterative approach is needed.
A word of caution here: performance can quickly go wild if the recursion is not properly limited by filters. Define your recursion, set your boundaries, and enjoy how PostgreSQL can help you packing before the holidays!
Some more references:
Not using Aiven services yet? Sign up now for your free trial at Aiven free trial signup!
In the meantime, make sure you follow our changelog and RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.
avn service create demo-pg \
--service-type pg \
--cloud google-europe-west3 \
--plan hobbyistavn service cli demo-pgcreate table inventory (item_id serial, item_name varchar, value int, weight int);
insert into inventory(item_name, value, weight) values ('Socks', 10,3);
insert into inventory(item_name, value, weight) values ('Hat', 15,5);
insert into inventory(item_name, value, weight) values ('Trousers', 5,15);
insert into inventory(item_name, value, weight) values ('Shoes', 8,10);
insert into inventory(item_name, value, weight) values ('T-Shirt', 7,10);select * from inventory; item_id | item_name | value | weight
---------+-----------+-------+--------
1 | Socks | 10 | 3
2 | Hat | 15 | 5
3 | Trousers | 5 | 15
4 | Shoes | 8 | 10
5 | T-Shirt | 7 | 10
(5 rows)WITH RECURSIVE items(picked_items, nr_items) as (
SELECT ARRAY[item_name] as picked_items,
1 nr_items
from inventory
where item_name = 'Socks'
UNION ALL
select picked_items || item_name,
nr_items + 1
from inventory cross join items
where nr_items+1 <= 3
)
select * from items where nr_items=3; name | nr_items
-----------------------------+----------
{Socks,Socks,Socks} | 3
{Socks,Socks,Hat} | 3
{Socks,Socks,Trousers} | 3
{Socks,Socks,Shoes} | 3
{Socks,Socks,T-Shirt} | 3
{Socks,Hat,Socks} | 3
...
{Socks,T-Shirt,Hat} | 3
{Socks,T-Shirt,Trousers} | 3
{Socks,T-Shirt,Shoes} | 3
{Socks,T-Shirt,T-Shirt} | 3
(25 rows)WITH RECURSIVE items(item_id, picked_items, nr_items, total_weight, total_value) as (
SELECT
item_id,
ARRAY[item_name] as picked_items,
1 nr_items,
weight total_weight,
value total_value
from inventory
UNION ALL
select
inventory.item_id,
picked_items || item_name,
nr_items + 1,
weight + total_weight,
value + total_value
from inventory cross join items
where
picked_items::varchar[] @> ARRAY[item_name] = false
and weight + total_weight <= 20
and inventory.item_id > items.item_id
)
select * from items order by total_value;SELECT
item_id,
ARRAY[item_name] as picked_items,
1 nr_items,
weight total_weight,
value total_value
from inventoryselect
inventory.item_id,
picked_items || item_name,
nr_items + 1,
weight + total_weight,
value + total_value
from inventory cross join items
where
picked_items::varchar[] @> ARRAY[item_name] = false
and weight + total_weight <= 20
and inventory.item_id > items.item_id item_id | picked_items | nr_items | total_weight | total_value
---------+---------------------------+----------+--------------+-------------
3 | {Trousers} | 1 | 15 | 5
5 | {T-Shirt} | 1 | 10 | 7
4 | {Shoes} | 1 | 10 | 8
1 | {Socks} | 1 | 3 | 10
2 | {Hat} | 1 | 5 | 15
5 | {Shoes,T-Shirt} | 2 | 20 | 15
3 | {Socks,Trousers} | 2 | 18 | 15
5 | {Socks,T-Shirt} | 2 | 13 | 17
4 | {Socks,Shoes} | 2 | 13 | 18
3 | {Hat,Trousers} | 2 | 20 | 20
5 | {Hat,T-Shirt} | 2 | 15 | 22
4 | {Hat,Shoes} | 2 | 15 | 23
2 | {Socks,Hat} | 2 | 8 | 25
5 | {Socks,Hat,T-Shirt} | 3 | 18 | 32
4 | {Socks,Hat,Shoes} | 3 | 18 | 33
(15 rows)