Using transform method from Pandas. - Python

TopAnswers Python

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Using transform method from Pandas.

add tag

anoldmaninthesea

Let's assume I have the following data frame `df`:
```
key	  data1	data2
0	A	0	5
1	B	1	0
2	C	2	3
3	A	3	3
4	B	4	7
5	C	5	9
```
A way to center the data is to compute the following:

```
df.groupby("key").transform(lambda x:x-x.mean())
```
and we get

```
	data1	data2
0	-1.5	1.0
1	-1.5	-3.5
2	-1.5	-3.0
3	1.5		-1.0
4	1.5		3.5
5	1.5		3.0
```

I don't understand why this works... 
1. The lambda function accepts `x`. To compute the correct mean, I'm deducing `x` must be `df.groupby("key")`, so that `x.mean()` is  equal to `df.groupby("key").mean()` .

2. However, if `x` is `df.groupby("key")`, then how to interpret `x-x.mean()`? Is `df.groupby("key")` simply the original `df` with special methods, such as aggregates,etc.?

Top Answer

Somebody

Just tried this with a [REPL](https://pythonprogramminglanguage.com/repl/): 

`df.groupby("key").transform(lambda x: help(x))` 

which tells me that the `x` in your lambda is a [pandas.core.series object](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html). 

(WARNING: don't do this with a large frame, it will call help for each group!)

This class has a [`mean`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mean.html#pandas.Series.mean) function that returns a scalar value. Additionally the Series also allows you to [subtract a scalar](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sub.html#pandas.Series.sub) (`__sub__` forwards to this function), which will be subtracted from each entry of the whole series. Transform then picks that series up again and splices all groups' `Series` objects together to the final `DataFrame`.

To answer your explicit questions:

1. `groupby` returns a `DataFrameGroupBy` object. The value `x` within your lambda is only part -- namely one group at a time -- of what `groupby` returns.
2. As shown above, x has a consistent type of `pandas.core.series`.

1 Answer