Troubles with 'WHERE...IN' clause - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Troubles with 'WHERE...IN' clause

sqlite add tag

Anonymous 1732

I'm trying to run the following query through pandasql, but the output I get is not what I was expecting. I was expecting to get a table with exactly 800 rows as I am selecting the only employee_day_transmitters of the table employee_days_transmitters, but what I get is a table with more than 800 rows. What's wrong? How can I get exactly 800 rows related to the employee_day_transmitters selected in the table employee_days_transmitters?

```
  query_text = '''WITH employee_days_transmitters AS (
                   SELECT DISTINCT
                   employeeId
                   , theDate
                   , transmitterId
                   , employeeId || '-' || CAST(theDate AS STRING) || '-' || transmitterId AS employee_day_transmitter
                   FROM
                   table1
                   WHERE variable='rpv'
                   ORDER BY
                   RANDOM()
                   LIMIT
                   800
                   )
                     SELECT
                     * 
                     FROM
                     table1
                     WHERE
                     (employeeId || '-' || CAST(theDate AS STRING) || '-' || transmitterId) IN (SELECT employee_day_transmitter FROM employee_days_transmitters) AND variable = 'rpv'
                     '''
table2=pandasql.sqldf(query_text,globals())
```

Your `employee_days_transmitters` CTE guarantees no more than 800 _distinct_ combinations of `employeeId, theDate, transmitterId`. If those are not guaranteed to be unique in `table1`, then it's absolutely expected that your main `SELECT` can return more than 800 rows. That's because it doesn't specify that only one instance of each combination of `employeeId, theDate, transmitterId` should be shown. If you want one instance of each combination, then what you are facing is called a 'Greatest N per group' or 'Top N per group' problem. I'm not sure how it can be solved in PandaSQL, but even before we get to that, you need to figure out for your self _which_ instance of each combination you want, if there can indeed be more than one of them. As FoggyFinder mentioned, it would be helpful if you provided a [minimal, complete, verifiable example (MCVE)](https://www.sqlserverscience.com/mcve/) to demonstrate the problem, as well as described the expected result in more detail.

0 Answers