How to apply conditional logic to a Pandas DataFrame.

See DataFrame shown below,

   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

My original data is show in the ‘data’ column and the desired_output is shown next to it. If the number in ‘data’ is below 2.5, the desired_output is False.

I could apply a loop and do re-construct the DataFrame… but that would be ‘un-pythonic’

In [1]: df
Out[1]:
   data
0     1
1     2
2     3
3     4

You want to apply a function that conditionally returns a value based on the selected dataframe column.

In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
Out[2]:
0     true
1     true
2    false
3    false
Name: data

You can then assign that returned column to a new column in your dataframe:

In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')

In [4]: df
Out[4]:
   data desired_output
0     1           true
1     2           true
2     3          false
3     4          false

Just compare the column with that value:

In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])

In [10]: df
Out[10]: 
   data
0     1
1     2
2     3
3     4

In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
Out[12]: 
   data desired
0     1   False
1     2   False
2     3    True
3     4    True

In [34]: import pandas as pd

In [35]: import numpy as np

In [36]:  df = pd.DataFrame([1,2,3,4], columns=["data"])

In [37]: df
Out[37]: 
   data
0     1
1     2
2     3
3     4

In [38]: df["desired_output"] = np.where(df["data"] <2.5, "False", "True")

In [39]: df
Out[39]: 
   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

In this specific example, where the DataFrame is only one column, you can write this elegantly as:

df['desired_output'] = df.le(2.5)

le tests whether elements are less than or equal 2.5, similarly lt for less than, gt and ge.