Solving the knapsack problem in PostgreSQL®
The knapsack problem: how to fit all the items you're most likely to need on holiday? Find out how to use the world's best OS database to help you pack.
The days before a holiday are exciting. You've planned a vacation somewhere and can't wait to hit the road.
But first, there's a whole list of decisions to make: how many underpants should I bring? Are the running shoes necessary? What about the hairdryer? Which jacket should I pack? Lots of decisions to take with the suitcase size as constraint and the best holiday experience as ultimate goal.
The aim of the blog post is to help solve the packing problem using technology. Specifically, PostgreSQL® and the recursive Common Table Expressions.
Combinatorial optimization - the knapsack problem
People have had to pack luggage for a long time, so this optimization dilemma is far from new. It even has a name: the knapsack problem. It can be applied to a variety of use cases where there is a set of items with a defined weight (space occupied in the luggage in our example) and value (item benefits during holiday) and where the total weight is limited (luggage size).
The end goal is to come up with a set of items that fits within the weight constraint and has the maximum possible value.
For example, let's assume we have the following items:
Item | Value | Weight |
---|---|---|
🧦 Socks | 10 | 3 |
🎩 Hat | 15 | 5 |
👖 Trousers | 5 | 15 |
👟 Shoes | 8 | 10 |
👕 T-Shirt | 7 | 10 |
And we have a knapsack with a capacity of 20.
This means we could pack:
- 👖 Trousers (weight 15) + 🎩 Hat (weight 5) for a total value of (5 + 15) = 20
OR - 👕 T-Shirt (weight 10) + 🎩 Hat (weight 5) + 🧦 Socks (weight 3) for a total value of (7 + 15 + 10) = 32
The second option gives us a much better overall value based on the settings in the table.
(At this point, we should note that this concept and solution is presented as an oversimplified mathematical problem. Aiven is not responsible if you decide to leave for your holidays without any shoes or trousers!)
PostgreSQL to the rescue
How can we be sure to make the best choice? Can technology help? Of course! Just pull out your trusty PostgreSQL (which you should always remember to pack).
Let's start by creating an instance with the help of Aiven's command line interface.
avn service create demo-pg \ --service-type pg \ --cloud google-europe-west3 \ --plan hobbyist
The above creates a PostgreSQL instance (--service-type pg
) called demo-pg
in the region google-europe-west3
with an hobbyist
plan. You can size and move your instance as you wish; check out all the combinations available.
Once the instance is up and running (we can use avn service wait demo-pg
to wait for it), we can log into the database from the prompt with:
avn service cli demo-pg
Great, we're in PostgreSQL! Now we can create the new inventory
table and fill it with our dataset.
create table inventory (item_id serial, item_name varchar, value int, weight int); insert into inventory(item_name, value, weight) values ('Socks', 10,3); insert into inventory(item_name, value, weight) values ('Hat', 15,5); insert into inventory(item_name, value, weight) values ('Trousers', 5,15); insert into inventory(item_name, value, weight) values ('Shoes', 8,10); insert into inventory(item_name, value, weight) values ('T-Shirt', 7,10);
Let's verify the data:
select * from inventory;
And we get the expected output:
item_id | item_name | value | weight ---------+-----------+-------+-------- 1 | Socks | 10 | 3 2 | Hat | 15 | 5 3 | Trousers | 5 | 15 4 | Shoes | 8 | 10 5 | T-Shirt | 7 | 10 (5 rows)
We're now ready to crack our luggage knapsack problem.
Iterative approach and PostgreSQL recursive CTEs
There are multiple ways of solving the knapsack problem. One of them is to replicate in PostgreSQL what you would do in a trial & error case: you pick one item, add it to the luggage, then pick another item and add it, going on like that until the suitcase is full. Once no more items can fit in, you sum all the values of the inserted items and take note of the list. Then you empty the luggage and start from scratch again, trying to make sure you don't end up with the same sequence of items just in a different order.
When all the possible item combinations have been checked, you can then select the one with maximum value. It's a long and error prone process, but luckily, computers can handle it for us.
The above process defines the steps we need to perform to find all the combinations:
- Select one random item from
inventory
. - Add it to the
picked_items
list and store the item cost - Select another item from
inventory
, making sure that:
a. it wasn't already in thepicked_items
list and
b. theweight
of the current item plus thetotal_weight
of all the items in thepicked_items
list is less or equal to the maximum allowed weight - Add the new item in the
picked_items
list - Add the item
weight
to thetotal_weight
- Go back to step #3 until
inventory
contains new items satisfying conditions a and b.
We can spot an iterative pattern here of sequential queries against the inventory
table. How can we map it into PostgreSQL? We want to perform select statements against the inventory
table for a variable number of iterations (the list of picked_items
can have different sizes). We need a way to recursively call the select statement: here is were PostgreSQL recursive CTEs shine, since they allow the creation of queries that can reference their own output.
For example we can get all the combination of 3 items starting with Socks
in our inventory
table with:
WITH RECURSIVE items(picked_items, nr_items) as ( SELECT ARRAY[item_name] as picked_items, 1 nr_items from inventory where item_name = 'Socks' UNION ALL select picked_items || item_name, nr_items + 1 from inventory cross join items where nr_items+1 <= 3 ) select * from items where nr_items=3;
The RECURSIVE
keyword sets the scene. The second SELECT
statement is what does the trick by joining the inventory
table with the result of the RECURSIVE items
query we are still defining. The result lists all the 25 combinations of 3 items in the inventory
table starting with Socks
.
name | nr_items -----------------------------+---------- {Socks,Socks,Socks} | 3 {Socks,Socks,Hat} | 3 {Socks,Socks,Trousers} | 3 {Socks,Socks,Shoes} | 3 {Socks,Socks,T-Shirt} | 3 {Socks,Hat,Socks} | 3 ... {Socks,T-Shirt,Hat} | 3 {Socks,T-Shirt,Trousers} | 3 {Socks,T-Shirt,Shoes} | 3 {Socks,T-Shirt,T-Shirt} | 3 (25 rows)
Add the knapsack constraints
The above query returns all the combinations available in the inventory
table limited to 3 items. However, our knapsack problems has different constraints:
- Once an item is picked, we can't reuse it
- The sequence doesn't have a fixed length but it's driven by the
total_cost
Let's check how we can add the above constraints in the recursive query, keeping 20
as total allowed cost:
WITH RECURSIVE items(item_id, picked_items, nr_items, total_weight, total_value) as ( SELECT item_id, ARRAY[item_name] as picked_items, 1 nr_items, weight total_weight, value total_value from inventory UNION ALL select inventory.item_id, picked_items || item_name, nr_items + 1, weight + total_weight, value + total_value from inventory cross join items where picked_items::varchar[] @> ARRAY[item_name] = false and weight + total_weight <= 20 and inventory.item_id > items.item_id ) select * from items order by total_value;
Let's cut the query down in sections: the first piece is the seed of our recursive query, we select all rows from the inventory
table
SELECT item_id, ARRAY[item_name] as picked_items, 1 nr_items, weight total_weight, value total_value from inventory
We store the item_id
as single value and we add it to the picked_items
array. We also store the total number of items (1
), the associated weight
as total_weight
and value
as total_value
. The next phase, after the UNION ALL
, is the recursive bit:
select inventory.item_id, picked_items || item_name, nr_items + 1, weight + total_weight, value + total_value from inventory cross join items where picked_items::varchar[] @> ARRAY[item_name] = false and weight + total_weight <= 20 and inventory.item_id > items.item_id
We select the item_id
, store it a single value and add it to the picked_items
array. Add +1
to count of items and add the weight
and value
of the current item to total_weight
and total_value
of the previously selected items respectively.
The interesting bit comes in the following section. We perform a CROSS JOIN
between inventory
and items
(our recursive query) creating a cartesian product of the tables. In other words, we then have a row for each item that exists in inventory
multiplied for each row in items
. Not very clever, but the filtering does the magic:
picked_items::varchar[] @> ARRAY[item_name] = false
: makes sure that we exclude from the result set any rows for which theitem_name
is already in the list ofpicked_items
weight + total_weight <= 20
: discards from the result set all items for which the relatedweight
summed to the previously calculatedtotal_weight
exceeds the20
threshold we set for our luggage.inventory.item_id > items.item_id
: it's an optimization filter;{Trousers,Shoes}
and{Shoes,Trousers}
are two combinations of the same items, and we want to keep only one. Thus we allow the selection of items only in ascendingitem_id
order, this will help improving performances since we are not analysing the same subset of items multiple times.
The result of the full query above is:
item_id | picked_items | nr_items | total_weight | total_value ---------+---------------------------+----------+--------------+------------- 3 | {Trousers} | 1 | 15 | 5 5 | {T-Shirt} | 1 | 10 | 7 4 | {Shoes} | 1 | 10 | 8 1 | {Socks} | 1 | 3 | 10 2 | {Hat} | 1 | 5 | 15 5 | {Shoes,T-Shirt} | 2 | 20 | 15 3 | {Socks,Trousers} | 2 | 18 | 15 5 | {Socks,T-Shirt} | 2 | 13 | 17 4 | {Socks,Shoes} | 2 | 13 | 18 3 | {Hat,Trousers} | 2 | 20 | 20 5 | {Hat,T-Shirt} | 2 | 15 | 22 4 | {Hat,Shoes} | 2 | 15 | 23 2 | {Socks,Hat} | 2 | 8 | 25 5 | {Socks,Hat,T-Shirt} | 3 | 18 | 32 4 | {Socks,Hat,Shoes} | 3 | 18 | 33 (15 rows)
It shows all the combinations of items that fit in our 20
capacity luggage. When we sort it by total_value
, it becomes evident that the combination {Socks,Hat,Shoes}
is the one that provides the maximum value.
Again, please take this as pure theoretical exercise and not as holiday packing suggestion. What? do you have also other constrains to add? Taking this as a basis, feel free to add any additional complexity that your real case requires!
Some thoughts
The knapsack problem is very common: from project planning to resource allocation there is a variety of real cases where we need to fit as much items as possible in some sort of limited bucket. PostgreSQL recursive CTEs represent a valid option where a query iterative approach is needed.
A word of caution here: performance can quickly go wild if the recursion is not properly limited by filters. Define your recursion, set your boundaries, and enjoy how PostgreSQL can help you packing before the holidays!
Some more references:
- Definition and variations of the Knapsack problem
- Aiven for PostgreSQL plans and clouds information
- PostgreSQL recursive CTEs documentation
- An introduction to PostgreSQL on the Aiven blog
- (Postgre)SQL concepts and terms
Wrapping up
Not using Aiven services yet? Sign up now for your free trial at Aiven free trial signup!
In the meantime, make sure you follow our changelog and RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.