# Merge and sum of two dictionaries

Each Answer to this Q is separated by one/two green lines.

I have a dictionary below, and I want to add to another dictionary with not necessarily distinct elements and merge it’s results. Is there any built-in function for this, or will I need to make my own?

``````{
'6d6e7bf221ae24e07ab90bba4452267b05db7824cd3fd1ea94b2c9a8': 6,
'7c4a462a6ed4a3070b6d78d97c90ac230330603d24a58cafa79caf42': 7,
'9c37bdc9f4750dd7ee2b558d6c06400c921f4d74aabd02ed5b4ddb38': 9,
'2ca9e1f9cbcd76a5ce1772f9b59995fd32cbcffa8a3b01b5c9c8afc2': 11
}
``````

The number of elements in the dictionary is also unknown.

Where the merge considers two identical keys, the values of these keys should be summed instead of overwritten.

You didn’t say how exactly you want to merge, so take your pick:

``````x = {'both1': 1, 'both2': 2, 'only_x': 100}
y = {'both1': 10, 'both2': 20, 'only_y': 200}

print {k: x.get(k, 0) + y.get(k, 0) for k in set(x)}
print {k: x.get(k, 0) + y.get(k, 0) for k in set(x) & set(y)}
print {k: x.get(k, 0) + y.get(k, 0) for k in set(x) | set(y)}
``````

Results:

``````{'both2': 22, 'only_x': 100, 'both1': 11}
{'both2': 22, 'both1': 11}
{'only_y': 200, 'both2': 22, 'both1': 11, 'only_x': 100}
``````

You can perform `+`, `-`, `&`, and `|` (intersection and union) with `collections.Counter()`.

We can do the following (Note: only positive count values will remain in the dictionary):

``````from collections import Counter

x = {'both1':1, 'both2':2, 'only_x': 100 }
y = {'both1':10, 'both2': 20, 'only_y':200 }

z = dict(Counter(x)+Counter(y))

print(z)
[out]:
{'both2': 22, 'only_x': 100, 'both1': 11, 'only_y': 200}
``````

To address adding values where the result may be zero or negative, use `Counter.update()` for addition, and `Counter.subtract()` for subtraction:

``````x = {'both1':0, 'both2':2, 'only_x': 100 }
y = {'both1':0, 'both2': -20, 'only_y':200 }
xx = Counter(x)
yy = Counter(y)
xx.update(yy)
dict(xx)
[out]:
{'both2': -18, 'only_x': 100, 'both1': 0, 'only_y': 200}
``````

Additional notes based on the answers of georg, NPE, Scott and Havok.

I was trying to perform this action on collections of 2 or more dictionaries and was interested in seeing the time it took for each. Because I wanted to do this on any number of dictionaries, I had to change some of the answers a bit. If anyone has better suggestions for them, feel free to edit.

Here’s my test method. I’ve updated it recently to include tests with MUCH larger dictionaries, and again to include Havok’s and Scott’s newer methods:

Firstly I used the following data:

``````import random

x = {'xy1': 1, 'xy2': 2, 'xyz': 3, 'only_x': 100}
y = {'xy1': 10, 'xy2': 20, 'xyz': 30, 'only_y': 200}
z = {'xyz': 300, 'only_z': 300}

small_tests = [x, y, z]

# 200,000 random 8 letter keys
keys = [''.join(random.choice("abcdefghijklmnopqrstuvwxyz") for _ in range(8)) for _ in range(200000)]

a, b, c = {}, {}, {}

# 50/50 chance of a value being assigned to each dictionary, some keys will be missed but meh
for key in keys:
if random.getrandbits(1):
a[key] = random.randint(0, 1000)
if random.getrandbits(1):
b[key] = random.randint(0, 1000)
if random.getrandbits(1):
c[key] = random.randint(0, 1000)

large_tests = [a, b, c]

print("a:", len(a), "b:", len(b), "c:", len(c))
#: a: 100069 b: 100385 c: 99989
``````

Now each of the methods:

``````from collections import defaultdict, Counter
from functools import reduce

def georg_method(tests):
return {k: sum(t.get(k, 0) for t in tests) for k in set.union(*[set(t) for t in tests])}

def georg_method_nosum(tests):
# If you know you will have exactly 3 dicts
return {k: tests[0].get(k, 0) + tests[1].get(k, 0) + tests[2].get(k, 0) for k in set.union(*[set(t) for t in tests])}

def npe_method(tests):
ret = defaultdict(int)
for d in tests:
for k, v in d.items():
ret[k] += v
return dict(ret)

# Note: There is a bug with scott's method. See below for details.
# Scott included a similar version using counters that is fixed
# See the scott_update_method below
def scott_method(tests):
return dict(sum((Counter(t) for t in tests), Counter()))

def scott_method_nosum(tests):
# If you know you will have exactly 3 dicts
return dict(Counter(tests[0]) + Counter(tests[1]) + Counter(tests[2]))

def scott_update_method(tests):
ret = Counter()
for test in tests:
ret.update(test)
return dict(ret)

def scott_update_method_static(tests):
# If you know you will have exactly 3 dicts
xx = Counter(tests[0])
yy = Counter(tests[1])
zz = Counter(tests[2])
xx.update(yy)
xx.update(zz)
return dict(xx)

def havok_method(tests):
def reducer(accumulator, element):
for key, value in element.items():
accumulator[key] = accumulator.get(key, 0) + value
return accumulator
return reduce(reducer, tests, {})

methods = {
"georg_method": georg_method, "georg_method_nosum": georg_method_nosum,
"npe_method": npe_method,
"scott_method": scott_method, "scott_method_nosum": scott_method_nosum,
"scott_update_method": scott_update_method, "scott_update_method_static": scott_update_method_static,
"havok_method": havok_method
}
``````

I also wrote a quick function find whatever differences there were between the lists. Unfortunately, that’s when I found the problem in Scott’s method, namely, if you have dictionaries that total to 0, the dictionary won’t be included at all because of how `Counter()` behaves when adding.

Test setup:

• MacBook Pro (15-inch, Late 2016), 2.9 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3 RAM, running macOS Mojave Version 10.14.5
• Python 3.6.5 via IPython 6.1.0

Finally, the results:

## Results: Small Tests

``````for name, method in methods.items():
print("Method:", name)
%timeit -n10000 method(small_tests)
#: Method: georg_method
#: 7.81 µs ± 321 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: georg_method_nosum
#: 4.6 µs ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: npe_method
#: 3.2 µs ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method
#: 24.9 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method_nosum
#: 18.9 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method
#: 9.1 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method_static
#: 14.4 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: havok_method
#: 3.09 µs ± 47.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
``````

## Results: Large Tests

Naturally, couldn’t run anywhere near as many loops

``````for name, method in methods.items():
print("Method:", name)
%timeit -n10 method(large_tests)
#: Method: georg_method
#: 347 ms ± 20 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: georg_method_nosum
#: 280 ms ± 4.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: npe_method
#: 119 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method
#: 324 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method_nosum
#: 289 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method
#: 123 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method_static
#: 136 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: havok_method
#: 103 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
``````

## Conclusion

``````???????????????????????????????????????????????????????????????????
?                           ?       ?    Best of Time Per Loop    ?
?         Algorithm         ?  By   ???????????????????????????????
?                           ?       ?  small_tests ?  large_tests ?
???????????????????????????????????????????????????????????????????
? functools reduce          ? Havok ?       3.1 µs ?   103,000 µs ?
? defaultdict sum           ? NPE   ?       3.2 µs ?   119,000 µs ?
? Counter().update loop     ? Scott ?       9.1 µs ?   123,000 µs ?
? Counter().update static   ? Scott ?      14.4 µs ?   136,000 µs ?
? set unions without sum()  ? georg ?       4.6 µs ?   280,000 µs ?
? set unions with sum()     ? georg ?       7.8 µs ?   347,000 µs ?
? Counter() without sum()   ? Scott ?      18.9 µs ?   289,000 µs ?
? Counter() with sum()      ? Scott ?      24.9 µs ?   324,000 µs ?
???????????????????????????????????????????????????????????????????
``````

Important. YMMV.

You could use `defaultdict` for this:

``````from collections import defaultdict

def dsum(*dicts):
ret = defaultdict(int)
for d in dicts:
for k, v in d.items():
ret[k] += v
return dict(ret)

x = {'both1':1, 'both2':2, 'only_x': 100 }
y = {'both1':10, 'both2': 20, 'only_y':200 }

print(dsum(x, y))
``````

This produces

``````{'both1': 11, 'both2': 22, 'only_x': 100, 'only_y': 200}
``````

Another options using a reduce function. This allows to sum-merge an arbitrary collection of dictionaries:

``````from functools import reduce

collection = [
{'a': 1, 'b': 1},
{'a': 2, 'b': 2},
{'a': 3, 'b': 3},
{'a': 4, 'b': 4, 'c': 1},
{'a': 5, 'b': 5, 'c': 1},
{'a': 6, 'b': 6, 'c': 1},
{'a': 7, 'b': 7},
{'a': 8, 'b': 8},
{'a': 9, 'b': 9},
]

def reducer(accumulator, element):
for key, value in element.items():
accumulator[key] = accumulator.get(key, 0) + value
return accumulator

total = reduce(reducer, collection, {})

assert total['a'] == sum(d.get('a', 0) for d in collection)
assert total['b'] == sum(d.get('b', 0) for d in collection)
assert total['c'] == sum(d.get('c', 0) for d in collection)

print(total)
``````

Execution:

``````{'a': 45, 'b': 45, 'c': 3}
``````

• Simple, clear, Pythonic.
• Schema-less, as long all keys are “sumable”.
• O(n) temporal complexity and O(1) memory complexity.

``````d1 = {'apples': 2, 'banana': 1}
d2 = {'apples': 3, 'banana': 2}
merged = reduce(
lambda d, i: (
d.update(((i[0], d.get(i[0], 0) + i[1]),)) or d
),
d2.iteritems(),
d1.copy(),
)
``````

There is also pretty simple replacement of `dict.update()`:

``````merged = dict(d1, **d2)
``````

``````class dict_merge(dict):
result = dict_merge({})
for key in self.keys():
if key in other.keys():
result[key] = self[key] + other[key]
else:
result[key] = self[key]
for key in other.keys():
if key in self.keys():
pass
else:
result[key] = other[key]
return result

a = dict_merge({"a":2, "b":3, "d":4})
b = dict_merge({"a":1, "b":2})
c = dict_merge({"a":5, "b":6, "c":5})
d = dict_merge({"a":8, "b":6, "e":5})

print((a + b + c +d))

>>> {'a': 16, 'b': 17, 'd': 4, 'c': 5, 'e': 5}
``````

That is operator overloading. Using `__add__`, we have defined how to use the operator `+` for our `dict_merge` which inherits from the inbuilt python `dict`. You can go ahead and make it more flexible using a similar way to define other operators in this same class e.g. `*` with `__mul__` for multiplying, or `/` with `__div__` for dividing, or even `%` with `__mod__` for modulo, and replacing the `+` in `self[key] + other[key]` with the corresponding operator, if you ever find yourself needing such merging.
I have only tested this as it is without other operators but I don’t foresee a problem with other operators. Just learn by trying.

A fairly simple approach:

``````from collections import Counter
from functools import reduce

data = [
{'x': 10, 'y': 1, 'z': 100},
{'x': 20, 'y': 2, 'z': 200},
{'a': 10, 'z': 300}
]

result = dict(reduce(lambda x, y: Counter(x) + Counter(y), data))
``````

TL;DR;

This code works for both the `list of dicts` and `pandas series` (when dicts are row items). It is super fast.

@Havok method is far the best method according to my tests, since some other tests also confirm this, I am not going to put test results here, but instead, I share my code in addition to Havok’s method.
So, the following code works for a list of dictionaries and also for the pandas series where each row has a dictionary.

``````from functools import reduce
def reducer(accumulator, element):
"""Set unions two dictionary keys, and sums their values if keys are same,
see explanation here https://stackoverflow.com/a/46128481/2234161"""
for key, value in element.items():
if accumulator.get(key, 0)!=0 and not accumulator.get(key, 0):
print("why not", accumulator.get(key, 0))
elif not value:
print("why not value",value)
accumulator[key] = accumulator.get(key, 0) + value
return accumulator

def sum_dicts(dicts_collection, init_dict = None):
"""
For a given a collection of dictionaries, it sums values of the same keys
:param dicts_collection: [list of dictonaries, it can be a pandas series where each column has a dictionary]
:param init_dict: [if there is a initial dictionary where the dicts_collection will be added on], defaults to dict()
"""
res=None
if not init_dict:
init_dict = dict()
try:
res = reduce(reducer, dicts_collection, init_dict)
except Exception as ex:
print(f"Error while reducing dict: {dicts_collection}", ex)
raise ex
return res

result_dict = sum_dicts(list_of_dicts_or_pandasSeries)
``````

If you want to create a new `dict` as `|` use:

``````>>> dict({'a': 1,'c': 2}, **{'c': 1})
{'a': 1, 'c': 1}
``````

The approach of Scott using `collections.Counter` is nice, but it has the disadvantage of not being usable with `sum`; also the need of taking care of negative or zero values is a bit counterintuitive to me, when you simply want to add values component-wise.

So I think, it might be a good idea to write a custom class for this. This has also been the idea of John Mutuma. However, I want to add my solution:

I create a class which behaves very much like a `dict`, passing basically all member calls to the underlying `_data` in the getatrr method. The only two things which are different, is:

1. it has a `DEFAULT_VALUE` (similar to `collections.defaultdict`) which is used as value for non-existing keys.
2. it implements an `__add__()` method which (together the `__radd__()` method) takes care of adding the dicts component-wise.
``````from typing import Union, Any

DEFAULT_VALUE = 0

def __init__(self, data: dict) -> None:
self._data = data

def __getattr__(self, attr: str) -> Any:
return getattr(self._data, attr)

def __getitem__(self, item) -> Any:
try:
return self._data[item]
except KeyError:
return self.DEFAULT_VALUE

def __repr__(self):
return self._data.__repr__()

key: self[key] + other[key]
for key in set(self.keys()) | set(other.keys())
})

if other == 0:
return self
``````

That way we can add two those objects and `sum` iterables of those objects as well:

``````>>> alpha = AddableDict({"a": 1})
>>> beta = AddableDict({"a": 10, "b": 5})
>>> alpha + beta
{'a': 11, 'b': 5}

>>> sum([beta]*10)
{'a': 100, 'b': 50}
``````

To my eye, this solution has the advantage of providing a simple and understandable interface for the developer to use. Of course, you can also inherit from `dict` instead of using composition.

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .