Using a global dictionary with threads in Python

Each Answer to this Q is separated by one/two green lines.

Is accessing/changing dictionary values thread-safe?

I have a global dictionary foo and multiple threads with ids id1, id2, … , idn. Is it OK to access and change foo‘s values without allocating a lock for it if it’s known that each thread will only work with its id-related value, say thread with id1 will only work with foo[id1]?

Assuming CPython: Yes and no. It is actually safe to fetch/store values from a shared dictionary in the sense that multiple concurrent read/write requests won’t corrupt the dictionary. This is due to the global interpreter lock (“GIL”) maintained by the implementation. That is:

Thread A running:

a = global_dict["foo"]

Thread B running:

global_dict["bar"] = "hello"

Thread C running:

global_dict["baz"] = "world"

won’t corrupt the dictionary, even if all three access attempts happen at the “same” time. The interpreter will serialize them in some undefined way.

However, the results of the following sequence is undefined:

Thread A:

if "foo" not in global_dict:
   global_dict["foo"] = 1

Thread B:

global_dict["foo"] = 2

as the test/set in thread A is not atomic (“time-of-check/time-of-use” race condition). So, it is generally best, if you lock things:

from threading import RLock

lock = RLock()

def thread_A():
    with lock:
        if "foo" not in global_dict:
            global_dict["foo"] = 1

def thread_B():
    with lock:
        global_dict["foo"] = 2

The best, safest, portable way to have each thread work with independent data is:

import threading
tloc = threading.local()

Now each thread works with a totally independent tloc object even though it’s a global name. The thread can get and set attributes on tloc, use tloc.__dict__ if it specifically needs a dictionary, etc.

Thread-local storage for a thread goes away at end of thread; to have threads record their final results, have them put their results, before they terminate, into a common instance of Queue.Queue (which is intrinsically thread-safe). Similarly, initial values for data a thread is to work on could be arguments passed when the thread is started, or be taken from a Queue.

Other half-baked approaches, such as hoping that operations that look atomic are indeed atomic, may happen to work for specific cases in a given version and release of Python, but could easily get broken by upgrades or ports. There’s no real reason to risk such issues when a proper, clean, safe architecture is so easy to arrange, portable, handy, and fast.

Since I needed something similar, I landed here. I sum up your answers in this short snippet :

#!/usr/bin/env python3

import threading

class ThreadSafeDict(dict) :
    def __init__(self, * p_arg, ** n_arg) :
        dict.__init__(self, * p_arg, ** n_arg)
        self._lock = threading.Lock()

    def __enter__(self) :
        self._lock.acquire()
        return self

    def __exit__(self, type, value, traceback) :
        self._lock.release()

if __name__ == '__main__' :

    u = ThreadSafeDict()
    with u as m :
        m[1] = 'foo'
    print(u)

as such, you can use the with construct to hold the lock while fiddling in your dict()

The GIL takes care of that, if you happen to be using CPython.

global interpreter lock

The lock used by Python threads to assure that only one thread executes in the CPython virtual machine at a time. This simplifies the CPython implementation by assuring that no two processes can access the same memory at the same time. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines. Efforts have been made in the past to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity), but so far none have been successful because performance suffered in the common single-processor case.

See are-locks-unnecessary-in-multi-threaded-python-code-because-of-the-gil.

How it works?:

>>> import dis
>>> demo = {}
>>> def set_dict():
...     demo['name'] = 'Jatin Kumar'
...
>>> dis.dis(set_dict)
  2           0 LOAD_CONST               1 ('Jatin Kumar')
              3 LOAD_GLOBAL              0 (demo)
              6 LOAD_CONST               2 ('name')
              9 STORE_SUBSCR
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

Each of the above instructions is executed with GIL lock hold and STORE_SUBSCR instruction adds/updates the key+value pair in a dictionary. So you see that dictionary update is atomic and hence thread safe.


The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .