Each Answer to this Q is separated by one/two green lines.
I have a dataframe of taxi data with two columns that looks like this:
Neighborhood Borough Time Midtown Manhattan X Melrose Bronx Y Grant City Staten Island Z Midtown Manhattan A Lincoln Square Manhattan B
Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:
Which gives me something like this:
borough Bronx High Bridge 3424 Mott Haven 2515 Concourse Village 1443 Port Morris 1153 Melrose 492 North Riverdale 463 Eastchester 434 Concourse 395 Fordham 252 Wakefield 214 Kingsbridge 212 Mount Hope 200 Parkchester 191 ...... Staten Island Castleton Corners 4 Dongan Hills 4 Eltingville 4 Graniteville 4 Great Kills 4 Castleton 3 Woodrow 1
How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren’t helpful to my case.
I think you can use
nlargest – you can change
s = df['Neighborhood'].groupby(df['Borough']).value_counts() print s Borough Bronx Melrose 7 Manhattan Midtown 12 Lincoln Square 2 Staten Island Grant City 11 dtype: int64 print s.groupby(level=[0,1]).nlargest(1) Bronx Bronx Melrose 7 Manhattan Manhattan Midtown 12 Staten Island Staten Island Grant City 11 dtype: int64
additional columns were getting created, specified level info
You can do this in a single line by slightly extending your original groupby with ‘nlargest’:
>>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5) Borough Neighborhood Neighborhood Bronx Melrose Melrose 1 Manhattan Midtown Midtown 1 Manhatten Lincoln Square Lincoln Square 1 Midtown Midtown 1 Staten Island Grant City Grant City 1 dtype: int64
head() gets the top 5 rows in a data frame.
.value_counts().nlargest(5)only give you one group top 5
group_keys=Falseto avoid duplicated index
value_counts()has already sorted, just need
Try this one (just change the number in head() to your choice):
# top 3 : total counts of 'Neighborhood' in each Borough Z = df.groupby('Borough')['Neighborhood'].value_counts().groupby(level=0).head(3).sort_values(ascending=False).to_frame('counts').reset_index() Z
You can also try below code to get only top 10 values of value counts
‘country_code’ and ‘raised_amount_usd’ is column names.
[0:10] shows index 0 to 10 from array for slicing. you can choose your slicing option.