Each Answer to this Q is separated by one/two green lines.
I want to apply a custom function and create a derived column called population2050 that is based on two columns already present in my data frame.
import pandas as pd import sqlite3 conn = sqlite3.connect('factbook.db') query = "select * from facts where area_land =0;" facts = pd.read_sql_query(query,conn) print(list(facts.columns.values)) def final_pop(initial_pop,growth_rate): final = initial_pop*math.e**(growth_rate*35) return(final) facts['pop2050'] = facts['population','population_growth'].apply(final_pop,axis=1)
When I run the above code, I get an error. Am I not using the ‘apply’ function correctly?
You were almost there:
facts['pop2050'] = facts.apply(lambda row: final_pop(row['population'],row['population_growth']),axis=1)
Using lambda allows you to keep the specific (interesting) parameters listed in your function, rather than bundling them in a ‘row’.
Apply will pass you along the entire row with axis=1. Adjust like this assuming your two columns are called
def final_pop(row): return row.initial_pop*math.e**(row.growth_rate*35)
def function(x): // your operation return x
call your function as,
You can achieve the same result without the need for
DataFrame.apply(). Pandas series (or dataframe columns) can be used as direct arguments for NumPy functions and even built-in Python operators, which are applied element-wise. In your case, it is as simple as the following:
import numpy as np facts['pop2050'] = facts['population'] * np.exp(35 * facts['population_growth'])
This multiplies each element in the column
population_growth, applies numpy’s
exp() function to that new column (
35 * population_growth) and then adds the result with