Do irrelevant columns affect query time of select statements? - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Do irrelevant columns affect query time of select statements?

sql-server add tag

user45867 (imported from SE)

Say you have a table of one million rows and run:

    select order_value from store.orders

Does it make a difference whether that table has one column, two columns, or a hundred columns, in actual query time? I mean all columns other than `order_value`.

Right now I'm pushing data to a data warehouse. Sometimes I dump columns into the table that "may be used in the future, someday", but they aren't being queried right now, by anything. Would these 'extraneous' columns affect select statements that do not include them, directly or indirectly (no `select *` I mean)?

Top Answer

meme (imported from SE)

This really depends on indexes and data types.

Using the Stack Overflow database as an example, this is what the Users table looks like:

[![NUTS][1]][1]

It has a PK/CX on the Id column. So it's the entirety of the table data sorted by Id.

With that as the only index, SQL has to read that whole thing (sans the LOB columns) into memory if it's not already there.

DBCC DROPCLEANBUFFERS-- Don't run this anywhere near prod.

SET STATISTICS TIME, IO ON

SELECT u.Id
INTO #crap1
FROM dbo.Users AS u

The stats time and io profile looks like this:

Table 'Users'. Scan count 7, logical reads 80846, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 2406 ms, elapsed time = 446 ms.

If I add an additional nonclustered index on just Id

CREATE INDEX ix_whatever ON dbo.Users (Id)

I now have a much smaller index that satisfies my query.

DBCC DROPCLEANBUFFERS-- Don't run this anywhere near prod.

SELECT u.Id
INTO #crap2
FROM dbo.Users AS u

The profile here:

Table 'Users'. Scan count 7, logical reads 6587, physical reads 0, read-ahead reads 6549, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 2344 ms, elapsed time = 384 ms.

We're able to do far fewer reads and save a little CPU time.

Without more information about your table definition, I can't really try to reproduce what you're trying to measure any better.

> But you're saying that unless there is a specific index on that lone column, the other columns/ fields will also be scanned? Is this just a drawback inherent to the design of rowstore tables? Why would irrelevant fields be scanned?

Yes, this is specific to rowstore tables. Data is stored by the row on data pages. Even if other data on the page is irrelevant to your query, that whole row > page > index needs to be read into memory. I wouldn't say that the other columns are "scanned" so much as the pages they exist on are scanned to retrieve the single value on them relevant to the query.

Using the ol' phonebook example: even if you're just reading phone numbers, when you turn the page, you're turning last name, first name, address, etc along with the phone number.

[1]: https://i.stack.imgur.com/n9Zbd.jpg

1 Answer