Do multiple updates to the same row cause more table bloat? - Databases

TopAnswers Databases
Top Answer
Jack Douglas
**tl;dr**: No. Yes. Multiple updates to the same row within a transaction will add additional dead tuples to the heap. However this is not the case for multiple updates within a single statement.

---

You can use [the pageinspect extension](https://www.postgresql.org/docs/12/pageinspect.html) to see what is happening at the physical storage level. In each test below I set up with[^1]:

```
create table foo(id integer primary key);
insert into foo(id) values(1);
```

inspected[^2] the one and only block[^3] with:

```
select x_pageinspect.get_raw_page('foo',0);
```

and cleaned up afterwards with:

```
drop table foo;
```

1. **Single update for comparison**

   ```
   update foo set id = 2;
   ```
   ```none 
   01000000f037d035000000002000c01f002004201ab30000 [block header]
   e09f3800 [item identifier 1 (offset=8160, flags=1, length=28)]
   c09f3800 [item identifier 2 (offset=8128, flags=1, length=28)]
   … [lots of zeroes]
   26b3000000000000000000000000000002000100002818000200000000000000 [row with id=2]
   25b3000026b30000000000000000000002000120000118000100000000000000 [row with id=1]
   ```
   
   only 1 dead tuple


2. **Multiple updates in a single transaction**

   ```
   do $$
   begin
     update foo set id = 2;
     update foo set id = 3;
   end$$;
   ```
   ```none
   01000000f037d035000000002000c01f002004201ab30000 [block header]
   e09f3800 [item identifier 1 (offset=8160, flags=1, length=28)]
   c09f3800 [item identifier 2 (offset=8128, flags=1, length=28)]
   a09f3800 [item identifier 3 (offset=8096, flags=1, length=28)]
   … [lots of zeroes]
   2ab3000000000000010000000000000003000100002818000300000000000000 [row with id=3]
   2ab300002ab30000000000000000000003000120202018000200000000000000 [row with id=2]
   29b300002ab30000000000000000000002000120000118000100000000000000 [row with id=1]
   ```

   2 dead tuples
   
3. **Multiple updates in a single statement**

   ```
   with u as (update foo set id = 2)
   update foo set id = 3;
   ```
   ```none
   01000000f037d035000000002000c01f002004201ab30000 [block header]
   e09f3800 [item identifier 1 (offset=8160, flags=1, length=28)]
   c09f3800 [item identifier 2 (offset=8128, flags=1, length=28)]
   … [lots of zeroes]
   4ab3000000000000000000000000000002000100002818000300000000000000 [row with id=2]
   49b300004ab30000000000000000000002000120000118000100000000000000 [row with id=1]
   ```

   only 1 dead tuple again
   
It is interesting to note that the 'extra' dead tuple in example (2) has the same value for `xmin` and `xmax` (`2ab30000`), as you would expect. Perhaps this tuple could have also just been overwritten if the `xmin`=`xmax` condition was detected, but I suspect there is a good reason why it can't be!

[^1]: The `primary key` is there to prevent [heap only tuples](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD) from being created, as they make the raw block data slightly harder to interpret.
[^2]: After first installing `pageinspect` in its own schema with `create schema x_pageinspect;` and `create extension "pageinspect" schema x_pageinspect;`
[^3]: See [the postgres documentation](https://www.postgresql.org/docs/12/storage-page-layout.html) for how the block is physically laid out, and [the source code](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/storage/itemid.h;h=5b0229b6a40b67595fe13addf87451e74e245c2f;hb=HEAD) for the bit layout of the 4-byte item identifier structure.
1 Answer