What is the difference between dict and collections.defaultdict?

Each Answer to this Q is separated by one/two green lines.

I was checking out Peter Norvig’s code on how to write simple spell checkers. At the beginning, he uses this code to insert words into a dictionary.

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

What is the difference between a Python dict and the one that was used here? In addition, what is the lambda for? I checked the API documentation here and it says that defaultdict is actually derived from dict but how does one decide which one to use?

The difference is that a defaultdict will “default” a value if that key has not been set yet. If you didn’t use a defaultdict you’d have to check to see if that key exists, and if it doesn’t, set it to what you want.

The lambda is defining a factory for the default value. That function gets called whenever it needs a default value. You could hypothetically have a more complicated default function.

Help on class defaultdict in module collections:

class defaultdict(__builtin__.dict)
 |  defaultdict(default_factory) --> dict with default factory
 |  
 |  The default factory is called without arguments to produce
 |  a new value when a key is not present, in __getitem__ only.
 |  A defaultdict compares equal to a dict with the same items.
 |  

(from help(type(collections.defaultdict())))

{}.setdefault is similar in nature, but takes in a value instead of a factory function. It’s used to set the value if it doesn’t already exist… which is a bit different, though.

Courtesy :- https://shirishweb.wordpress.com/2017/05/06/python-defaultdict-versus-dict-get/

Using Normal dict

d={}
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes'])# This gives Key Error

We can avoid this KeyError by using defaulting in normal dict as well, let see how we can do it

d={}
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d.get('Apple'))
print(d.get('Grapes',0)) # DEFAULTING

Using default dict

from collections import defaultdict
d = defaultdict(int) ## inside parenthesis we say what should be the default value.
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes']) ##? This gives Will not give error

Using an user defined function to default the value

from collections import defaultdict
def mydefault():
        return 0

d = defaultdict(mydefault)
d['Apple']=50
d['Orange']=20
print(d['Apple'])
print(d['Grapes'])

Summary

  1. Defaulting in normal dict is on case to case basis and in defaultdict we can provide default in general manner

  2. Efficiency of using defaulting by defaultdict is two time greater than defaulting with normal dict. You can refer below link to know better on this performance testing
    https://shirishweb.wordpress.com/2017/05/06/python-defaultdict-versus-dict-get/

Use a defaultdict if you have some meaningful default value for missing keys and don’t want to deal with them explicitly.

The defaultdict constructor takes a function as a parameter and constructs a value using that function.

lambda: 1

is the same as the parameterless function f that does this

def f():
 return 1

I forgot the reason the API was designed this way instead of taking a value as a parameter. If I designed the defaultdict interface, it would be slightly more complicated, the missing value creation function would take the missing key as a parameter.

Let’s deep dive into Python dictionary and Python defaultdict() class

Python Dictionaries

Dict is one of the data structures available in Python which allows data to be stored in the form of key-value pairs.

Example:

d = {'a': 2, 'b': 5, 'c': 6}

Problem with Dictionary

Dictionaries work well unless you encounter missing keys. Suppose you are looking for a key-value pair where there is no value in the dictionary – then you might encounter a KeyError problem. Something like this:

d = {'a': 2, 'b': 5, 'c': 6}
d['z']  # z is not present in dict so it will throw a error

You will see something like this:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
    d['z'] 
KeyError: 'z'

Solution to the above problem

To overcome the above problem we can use different ways:

Using inbuilt functions

setdefault

If the key is in the dictionary, return its value. If not, insert a key with a value of default and return default. default defaults to None:

>>> d = {'a' :2, 'b': 5, 'c': 6}
>>> d.setdefault('z', 0)
0  # returns 0 
>>> print(d)  # add z to the dictionary
{'a': 2, 'b': 5, 'c': 6, 'z': 0}

get

Return the value for key if the key is in the dictionary, else default. If the default is not given, it defaults to None, so that this method never raises a KeyError:

>>> d = {'a': 2, 'b': 5, 'c': 6}
>>> d.get('z', 0)
0  # returns 0 
>>> print(d)  # Doesn't add z to the dictionary unlike setdefault
{'a': 2, 'b': 5, 'c': 6}

The above 2 methods are the solutions to our problem. It never raises KeyError. Apart from the above 2 methods, Python also has a collections module that can handle this problem. Let’s dig deep into the defaultdict in the collections module:

defaultdict

defaultdict can be found in the collections module of Python. You can use it using:

from collections import defaultdict

d = defaultdict(int)

defaultdict constructor takes default_factory as an argument that is a callable. This can be for example:

  • int: default will be an integer value of 0

  • str: default will be an empty string ""

  • list: default will be an empty list []

Code:

from collections import defaultdict

d = defaultdict(list)
d['a']  # access a missing key and returns an empty list
d['b'] = 1 # add a key-value pair to dict
print(d)

output will be defaultdict(<class 'list'>, {'b': 1, 'a': []})

The defaultdict works the same as the get() and setdefault() methods, so when to use them?

When to use get()

If you specifically need to return a certain key-value pair without KeyError and also it should not update in the dictionary – then dict.get is the right choice for you. It returns the default value specified by you but does not modify the dictionary.

When to use setdefault()

If you need to modify the original dictionary with a default key-value pair – then setdefault is the right choice.

When to use defaultdict

setdefault method can be achieved using defaultdict but instead of providing default value every time in setdefault, we can do it at once in defaultdict. Also, setdefault has a choice of providing different default values for the keys. Both have their own advantages depending on the use case.

When it comes to efficiency:

defaultdict > setdefault() or get()

defaultdict is 2 times faster than get()!

You can check the results here.


The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .