I would like to create a new column with a numerical value based on the following conditions:

a. `if gender is male & pet1=pet2, points = 5`

b. `if gender is female & (pet1 is 'cat' or pet1='dog'), points = 5`

c. all other combinations, `points = 0`

``````    gender    pet1      pet2
0   male      dog       dog
1   male      cat       cat
2   male      dog       cat
3   female    cat       squirrel
4   female    dog       dog
5   female    squirrel  cat
6   squirrel  dog       cat
``````

I would like the end result to be as follows:

``````    gender    pet1      pet2      points
0   male      dog       dog       5
1   male      cat       cat       5
2   male      dog       cat       0
3   female    cat       squirrel  5
4   female    dog       dog       5
5   female    squirrel  cat       0
6   squirrel  dog       cat       0
``````

How do I accomplish this?

### `numpy.select`

This is a perfect case for `np.select` where we can create a column based on multiple conditions and it’s a readable method when there are more conditions:

``````conditions = [
df['gender'].eq('male') & df['pet1'].eq(df['pet2']),
df['gender'].eq('female') & df['pet1'].isin(['cat', 'dog'])
]

choices = [5,5]

df['points'] = np.select(conditions, choices, default=0)

print(df)
gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0
``````

You can do this using `np.where`, the conditions use bitwise `&` and `|` for `and` and `or` with parentheses around the multiple conditions due to operator precedence. So where the condition is true `5` is returned and `0` otherwise:

``````In :
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df

Out:
gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0
``````

using apply.

``````def f(x):
if x['gender'] == 'male' and x['pet1'] == x['pet2']: return 5
elif x['gender'] == 'female' and (x['pet1'] == 'cat' or x['pet1'] == 'dog'): return 5
else: return 0

data['points'] = data.apply(f, axis=1)
``````

The apply method described by @RuggeroTurra takes a lot longer for 500k rows. I ended up using something like

``````df['result'] = ((df.a == 0) & (df.b != 1)).astype(int) * 2 + \
((df.a != 0) & (df.b != 1)).astype(int) * 3 + \
((df.a == 0) & (df.b == 1)).astype(int) * 4 + \
((df.a != 0) & (df.b == 1)).astype(int) * 5
``````

where the apply method took 25 seconds and this method above took about 18ms.

You can also use the `apply` function. For example:

``````def myfunc(gender, pet1, pet2):
if gender=='male' and pet1==pet2:
myvalue=5
elif gender=='female' and (pet1=='cat' or pet1=='dog'):
myvalue=5
else:
myvalue=0
return myvalue
``````

And then using the apply function by setting `axis=1`

``````df['points'] = df.apply(lambda x: myfunc(x['gender'], x['pet1'], x['pet2']), axis=1)
``````

We get:

``````     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0
``````

One option is with case_when from pyjanitor; under the hood it uses `pd.Series.mask`.

The basic idea is a pairing of condition and expected value; you can pass as many pairings as required, followed by a default value and a target column name:

``````# pip install pyjanitor
import pandas as pd
import janitor
df.case_when(
# condition, value
df.gender.eq('male') & df.pet1.eq(df.pet2), 5,
df.gender.eq('female') & df.pet1.isin(['cat', 'dog']), 5,
0, # default
column_name="points")

gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0
``````

You could use strings for the conditions, as long as they can be evaluated by `pd.eval` on the parent dataframe – note that speed wise, this can be slower for small datasets:

``````df.case_when(
"gender == 'male' and pet1 == pet2", 5,
"gender == 'female' and pet2 == ['cat', 'dog']", 5,
0,
column_name="points")

gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       0
4    female       dog       dog       5
5    female  squirrel       cat       5
6  squirrel       dog       cat       0
``````

Anonymous functions are also possible, which can be handy in chained operations:

``````df.case_when(
lambda df: df.gender.eq('male') & df.pet1.eq(df.pet2), 5,
lambda df: df.gender.eq('female') & df.pet1.isin(['cat', 'dog']), 5,
0, # default
column_name="points")

gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0
``````