So in R when I have a data frame consisting of say 4 columns, call it `df` and I want to compute the ratio by sum product of a group, I can it in such a way:

``````// generate data
df = data.frame(a=c(1,1,0,1,0),b=c(1,0,0,1,0),c=c(10,5,1,5,10),d=c(3,1,2,1,2));
| a   b   c    d |
| 1   1   10   3 |
| 1   0   5    1 |
| 0   0   1    2 |
| 1   1   5    1 |
| 0   0   10   2 |
// compute sum product ratio
df = df%>% group_by(a,b) %>%
mutate(
ratio=c/sum(c*d)
);
| a   b   c    d  ratio |
| 1   1   10   3  0.286 |
| 1   1   5    1  0.143 |
| 1   0   5    1  1     |
| 0   0   1    2  0.045 |
| 0   0   10   2  0.454 |
``````

But in python I need to resort to loops.
I know there should be a more elegant way than raw loops in python, anyone got any ideas?

It can be done with similar syntax with `groupby()` and `apply()`:

``````df['ratio'] = df.groupby(['a','b'], group_keys=False).apply(lambda g: g.c/(g.c * g.d).sum())
`````` According to this thread on pandas github we can use the `transform()` method to replicate the combination of `dplyr::groupby()` and `dplyr::mutate()`. For this example, it would look as follows:

``````df = pd.DataFrame(
dict(
a=(1 , 1, 0, 1, 0 ),
b=(1 , 0, 0, 1, 0 ),
c=(10, 5, 1, 5, 10),
d=(3 , 1, 2, 1, 2 ),
)
).assign(
prod_c_d = lambda x: x['c'] * x['d'],
ratio    = lambda x: x['c'] / (x.groupby(['a','b']).transform('sum')['prod_c_d'])
)
``````

This example uses pandas method chaining. For more information on how to use method chaining to replicate `dplyr` workflows see this blogpost.

The method using `apply()` and `groupby()` does not work for me because it does not seem to be adaptable. For example, it does not work if we delete `g.c/` from the lambda expression.

``````df['ratio'] = df.groupby(['a','b'], group_keys=False)\
.apply(lambda g: (g.c * g.d).sum() )
``````

It’s pretty easy to translate your R code into python with `datar`:

``````>>> from datar.all import f, c, tibble, sum, group_by, mutate
[2021-06-24 13:32:29][datar][WARNING] Builtin name "sum" has been overriden by datar.
>>>
>>> df = tibble(a=c(1,1,0,1,0),b=c(1,0,0,1,0),c=c(10,5,1,5,10),d=c(3,1,2,1,2))
>>> df
a       b       c       d
<int64> <int64> <int64> <int64>
0       1       1      10       3
1       1       0       5       1
2       0       0       1       2
3       1       1       5       1
4       0       0      10       2
>>> df >> group_by(f.a, f.b) >> mutate(ratio=f.c/sum(f.c*f.d))
a       b       c       d     ratio
<int64> <int64> <int64> <int64> <float64>
0       1       1      10       3  0.285714
1       1       0       5       1  1.000000
2       0       0       1       2  0.045455
3       1       1       5       1  0.142857
4       0       0      10       2  0.454545

[Groups: a, b (n=3)]
``````

Disclaimer: I am the author of the `datar` package.