[Solved] KeyError: False in pandas dataframe

import pandas as pd

businesses = pd.read_json(businesses_filepath, lines=True, encoding='utf_8')
restaurantes = businesses['Restaurants' in businesses['categories']]

I would like to remove the lines that do not have Restaurants in the categories column, and this column has lists, however gave the error ‘KeyError: False’ and I would like to understand why and how to solve.

Solution #1:

The expression 'Restaurants' in businesses['categories'] returns the boolean value False. This is passed to the brackets indexing operator for the DataFrame businesses which does not contain a column called False and thus raises a KeyError.

What you are looking to do is something called boolean indexing which works like this.

businesses[businesses['categories'] == 'Restaurants']
Respondent: Ted Petrou
Solution #2:

If you find that your data contains spelling variations or alternative restaurant related terms, the following may be of benefit. Essentially you put your restaurant related terms in restuarant_lst. The lambda function returns true if any of the items in restaurant_lst are contained within each row of the business series. The .loc indexer filters out rows which return false for the lambda function.

restaurant_lst = ['Restaurant','restaurantes','diner','bistro']
restaurant = businesses.loc[businesses.apply(lambda x: any(restaurant_str in x for restaurant_str in restaurant_lst))]
Respondent: Joe
Solution #3:

I think what you meant was :

businesses = businesses.loc[businesses['categories'] == 'Restaurants']

that will only keep rows with the category restaurants

Respondent: Rayhane Mama
Solution #4:

None of the answers here actually worked for me,

businesses[businesses['categories'] == 'Restaurants']

obviously won’t work since the value in ‘categories’ is not a string, it’s a list, meaning the comparison will always fail.

What does, however, work, is converting the column into tuples instead of strings:

businesses['categories'] = businesses['categories'].apply(tuple)

That allows you to use the standard .loc thing:

businesses.loc[businesses['categories'] == ('Restaurants',)]
Respondent: RedAero
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.