# Python find elements in one list that are not in the other [duplicate]

Each Answer to this Q is separated by one/two green lines.

I need to compare two lists in order to create a new list of specific elements found in one list but not in the other. For example:

``````main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]
``````

I want to loop through list_1 and append to main_list all the elements from list_2 that are not found in list_1.

The result should be:

``````main_list=["f", "m"]
``````

How can I do it with python?

You can use sets:

``````main_list = list(set(list_2) - set(list_1))
``````

Output:

``````>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']
``````

Per @JonClements’ comment, here is a tidier version:

``````>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']
``````

TL;DR:
SOLUTION (1)

``````import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`
``````

SOLUTION (2) You want a sorted list

``````def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
main_list = setdiff_sorted(list_2,list_1)
``````

EXPLANATIONS:
(1) You can use NumPy’s `setdiff1d` (`array1`,`array2`,`assume_unique`=`False`).

`assume_unique` asks the user IF the arrays ARE ALREADY UNIQUE.
If `False`, then the unique elements are determined first.
If `True`, the function will assume that the elements are already unique AND function will skip determining the unique elements.

This yields the unique values in `array1` that are not in `array2`. `assume_unique` is `False` by default.

If you are concerned with the unique elements (based on the response of Chinny84), then simply use (where `assume_unique=False` => the default value):

``````import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"]
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`
``````

(2)
For those who want answers to be sorted, I’ve made a custom function:

``````import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
``````

To get the answer, run:

``````main_list = setdiff_sorted(list_2,list_1)
``````

SIDE NOTES:
(a) Solution 2 (custom function `setdiff_sorted`) returns a list (compared to an array in solution 1).

(b) If you aren’t sure if the elements are unique, just use the default setting of NumPy’s `setdiff1d` in both solutions A and B. What can be an example of a complication? See note (c).

(c) Things will be different if either of the two lists is not unique.

Say `list_2` is not unique: `list2 = ["a", "f", "c", "m", "m"]`. Keep `list1` as is: `list_1 = ["a", "b", "c", "d", "e"]`
Setting the default value of `assume_unique` yields `["f", "m"]` (in both solutions). HOWEVER, if you set `assume_unique=True`, both solutions give `["f", "m", "m"]`. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP `assume_unique` to its default value. Note that both answers are sorted.

Use a list comprehension like this:

``````main_list = [item for item in list_2 if item not in list_1]
``````

Output:

``````>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"]
>>>
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']
``````

Edit:

Like mentioned in the comments below, with large lists, the above is not the ideal solution. When that’s the case, a better option would be converting `list_1` to a `set` first:

``````set_1 = set(list_1)  # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]
``````

Not sure why the above explanations are so complicated when you have native methods available:

``````main_list = list(set(list_2)-set(list_1))
``````

If you want a one-liner solution (ignoring imports) that only requires `O(max(n, m))` work for inputs of length `n` and `m`, not `O(n * m)` work, you can do so with the `itertools` module:

``````from itertools import filterfalse

main_list = list(filterfalse(set(list_1).__contains__, list_2))
``````

This takes advantage of the functional functions taking a callback function on construction, allowing it to create the callback once and reuse it for every element without needing to store it somewhere (because `filterfalse` stores it internally); list comprehensions and generator expressions can do this, but it’s ugly.†

That gets the same results in a single line as:

``````main_list = [x for x in list_2 if x not in list_1]
``````

with the speed of:

``````set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]
``````

Of course, if the comparisons are intended to be positional, so:

``````list_1 = [1, 2, 3]
list_2 = [2, 3, 4]
``````

should produce:

``````main_list = [2, 3, 4]
``````

(because no value in `list_2` has a match at the same index in `list_1`), you should definitely go with Patrick’s answer, which involves no temporary `list`s or `set`s (even with `set`s being roughly `O(1)`, they have a higher “constant” factor per check than simple equality checks) and involves `O(min(n, m))` work, less than any other answer, and if your problem is position sensitive, is the only correct solution when matching elements appear at mismatched offsets.

†: The way to do the same thing with a list comprehension as a one-liner would be to abuse nested looping to create and cache value(s) in the “outermost” loop, e.g.:

``````main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]
``````

which also gives a minor performance benefit on Python 3 (because now `set_1` is locally scoped in the comprehension code, rather than looked up from nested scope for each check; on Python 2 that doesn’t matter, because Python 2 doesn’t use closures for list comprehensions; they operate in the same scope they’re used in).

``````main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

for i in list_2:
if i not in list_1:
main_list.append(i)

print(main_list)
``````

output:

``````['f', 'm']
``````

I used two methods and I found one method useful over other. Here is my answer:

My input data:

``````crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']
``````

Method1: `np.setdiff1d` I like this approach over other because it preserves the position

``````test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']
``````

Method2: Though it gives same answer as in Method1 but disturbs the order

``````test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']
``````

Method1 `np.setdiff1d` meets my requirements perfectly.
This answer for information.

I would `zip` the lists together to compare them element by element.

``````main_list = [b for a, b in zip(list1, list2) if a!= b]
``````

If the number of occurences should be taken into account you probably need to use something like `collections.Counter`:

``````list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1[key] != counts]

>>> final
['f', 'm']
``````

As promised this can also handle differing number of occurences as “difference”:

``````list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1[key] != counts]

>>> final
['a', 'f', 'm']
``````

From ser1 remove items present in ser2.

# Input

ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

# Solution

ser1[~ser1.isin(ser2)]

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .