Determine whether a key is present in a dictionary [duplicate]

Each Answer to this Q is separated by one/two green lines.

Possible Duplicate:
‘has_key()’ or ‘in’?

I have a Python dictionary like :

mydict = {'name':'abc','city':'xyz','country','def'}

I want to check if a key is in dictionary or not.
I am eager to know that which is more preferable from the following two cases and why?

1> if mydict.has_key('name'):
2> if 'name' in mydict:

if 'name' in mydict:

is the preferred, pythonic version. Use of has_key() is discouraged, and this method has been removed in Python 3.

In the same vein as martineau’s response, the best solution is often not to check. For example, the code

if x in d:
    foo = d[x]
else:
    foo = bar

is normally written

foo = d.get(x, bar)

which is shorter and more directly speaks to what you mean.

Another common case is something like

if x not in d:
    d[x] = []

d[x].append(foo)

which can be rewritten

d.setdefault(x, []).append(foo)

or rewritten even better by using a collections.defaultdict(list) for d and writing

d[x].append(foo)

In terms of bytecode, in saves a LOAD_ATTR and replaces a CALL_FUNCTION with a COMPARE_OP.

>>> dis.dis(indict)
  2           0 LOAD_GLOBAL              0 (name)
              3 LOAD_GLOBAL              1 (d)
              6 COMPARE_OP               6 (in)
              9 POP_TOP             


>>> dis.dis(haskey)
  2           0 LOAD_GLOBAL              0 (d)
              3 LOAD_ATTR                1 (haskey)
              6 LOAD_GLOBAL              2 (name)
              9 CALL_FUNCTION            1
             12 POP_TOP             

My feelings are that in is much more readable and is to be preferred in every case that I can think of.

In terms of performance, the timing reflects the opcode

$ python -mtimeit -s'd = dict((i, i) for i in range(10000))' "'foo' in d"
 10000000 loops, best of 3: 0.11 usec per loop

$ python -mtimeit -s'd = dict((i, i) for i in range(10000))' "d.has_key('foo')"
  1000000 loops, best of 3: 0.205 usec per loop

in is almost twice as fast.

My answer is “neither one”.

I believe the most “Pythonic” way to do things is to NOT check beforehand if the key is in a dictionary and instead just write code that assumes it’s there and catch any KeyErrors that get raised because it wasn’t.

This is usually done with enclosing the code in a try...except clause and is a well-known idiom usually expressed as “It’s easier to ask forgiveness than permission” or with the acronym EAFP, which basically means it is better to try something and catch the errors instead for making sure everything’s OK before doing anything. Why validate what doesn’t need to be validated when you can handle exceptions gracefully instead of trying to avoid them? Because it’s often more readable and the code tends to be faster if the probability is low that the key won’t be there (or whatever preconditions there may be).

Of course, this isn’t appropriate in all situations and not everyone agrees with the philosophy, so you’ll need to decide for yourself on a case-by-case basis. Not surprisingly the opposite of this is called LBYL for “Look Before You Leap”.

As a trivial example consider:

if 'name' in dct:
    value = dct['name'] * 3
else:
    logerror('"%s" not found in dictionary, using default' % name)
    value = 42

vs

try:
    value = dct['name'] * 3
except KeyError:
    logerror('"%s" not found in dictionary, using default' % name)
    value = 42

Although in the case it’s almost exactly the same amount of code, the second doesn’t spend time checking first and is probably slightly faster because of it (try…except block isn’t totally free though, so it probably doesn’t make that much difference here).

Generally speaking, testing in advance can often be much more involved and the savings gain from not doing it can be significant. That said, if 'name' in dict: is better for the reasons stated in the other answers.

If you’re interested in the topic, this message titled “EAFP vs LBYL (was Re: A little disappointed so far)” from the Python mailing list archive probably explains the difference between the two approached better than I have here. There’s also a good discussion about the two approaches in the book Python in a Nutshell, 2nd Ed by Alex Martelli in chapter 6 on Exceptions titled Error-Checking Strategies. (I see there’s now a newer 3rd edition, publish in 2017, which covers both Python 2.7 and 3.x).


The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .