Let's assume I have the following data frame `df`:
key data1 data2
0 A 0 5
1 B 1 0
2 C 2 3
3 A 3 3
4 B 4 7
5 C 5 9
A way to center the data is to compute the following:
and we get
0 -1.5 1.0
1 -1.5 -3.5
2 -1.5 -3.0
3 1.5 -1.0
4 1.5 3.5
5 1.5 3.0
I don't understand why this works...
1. The lambda function accepts `x`. To compute the correct mean, I'm deducing `x` must be `df.groupby("key")`, so that `x.mean()` is equal to `df.groupby("key").mean()` .
2. However, if `x` is `df.groupby("key")`, then how to interpret `x-x.mean()`? Is `df.groupby("key")` simply the original `df` with special methods, such as aggregates,etc.?
Just tried this with a [REPL](https://pythonprogramminglanguage.com/repl/):
`df.groupby("key").transform(lambda x: help(x))`
which tells me that the `x` in your lambda is a [pandas.core.series object](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html).
(WARNING: don't do this with a large frame, it will call help for each group!)
This class has a [`mean`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mean.html#pandas.Series.mean) function that returns a scalar value. Additionally the Series also allows you to [subtract a scalar](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sub.html#pandas.Series.sub) (`__sub__` forwards to this function), which will be subtracted from each entry of the whole series. Transform then picks that series up again and splices all groups' `Series` objects together to the final `DataFrame`.
To answer your explicit questions:
1. `groupby` returns a `DataFrameGroupBy` object. The value `x` within your lambda is only part -- namely one group at a time -- of what `groupby` returns.
2. As shown above, x has a consistent type of `pandas.core.series`.