# Pythonic way of removing reversed duplicates in list

Each Answer to this Q is separated by one/two green lines.

I have a list of pairs:

``````[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]
``````

and I want to remove any duplicates where

``````[a,b] == [b,a]
``````

So we end up with just

``````[0, 1], [0, 4], [1, 4]
``````

I can do an inner & outer loop checking for the reverse pair and append to a list if that’s not the case, but I’m sure there’s a more Pythonic way of achieving the same results.

If you need to preserve the order of the elements in the list then, you can use a the `sorted` function and set comprehension with `map` like this:

``````lst = [0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]
data = {tuple(item) for item in map(sorted, lst)}
# {(0, 1), (0, 4), (1, 4)}
``````

or simply without `map` like this:

``````data = {tuple(sorted(item)) for item in lst}
``````

Another way is to use a `frozenset` as shown here however note that this only work if you have distinct elements in your list. Because like `set`, `frozenset` always contains unique values. So you will end up with unique value in your sublist(lose data) which may not be what you want.

To output a list, you can always use `list(map(list, result))` where result is a set of tuple only in Python-3.0 or newer.

If you only want to remove reversed pairs and don’t want external libraries you could use a simple generator function (loosly based on the `itertools` “unique_everseen” recipe):

``````def remove_reversed_duplicates(iterable):
# Create a set for already seen elements
seen = set()
for item in iterable:
# Lists are mutable so we need tuples for the set-operations.
tup = tuple(item)
if tup not in seen:
# If the tuple is not in the set append it in REVERSED order.
# If you also want to remove normal duplicates uncomment the next line
yield item

>>> list(remove_reversed_duplicates(a))
[[0, 1], [0, 4], [1, 4]]
``````

The generator function might be a pretty fast way to solve this problem because set-lookups are really cheap. This approach also keeps the order of your initial list and only removes reverse duplicates while being faster than most of the alternatives!

If you don’t mind using an external library and you want to remove all duplicates (reversed and identical) an alternative is: `iteration_utilities.unique_everseen`

``````>>> a = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]

>>> from iteration_utilities import unique_everseen

>>> list(unique_everseen(a, key=set))
[[0, 1], [0, 4], [1, 4]]
``````

This checks if any item has the same contents in arbitary order (thus the `key=set`) as another. In this case this works as expected but it also removes duplicate `[a, b]` instead of only `[b, a]` occurences. You could also use `key=sorted` (like the other answers suggest). The `unique_everseen` like this has a bad algorithmic complexity because the result of the `key` function is not hashable and thus the fast lookup is replaced by a slow lookup. To speed this up you need to make the keys hashable, for example by converting them to sorted tuples (like some other answers suggest):

``````>>> from iteration_utilities import chained
>>> list(unique_everseen(a, key=chained(sorted, tuple)))
[[0, 1], [0, 4], [1, 4]]
``````

The `chained` is nothing else than a faster alternative to `lambda x: tuple(sorted(x))`.

EDIT: As mentioned by @jpmc26 one could use `frozenset` instead of normal sets:

``````>>> list(unique_everseen(a, key=frozenset))
[[0, 1], [0, 4], [1, 4]]
``````

To get an idea about the performance I did some `timeit` comparisons for the different suggestions:

``````>>> a = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]

>>> %timeit list(remove_reversed_duplicates(a))
100000 loops, best of 3: 16.1 µs per loop
>>> %timeit list(unique_everseen(a, key=frozenset))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(set(map(frozenset, a)))
100000 loops, best of 3: 7.23 µs per loop

>>> %timeit list(unique_everseen(a, key=set))
10000 loops, best of 3: 26.4 µs per loop
>>> %timeit list(unique_everseen(a, key=chained(sorted, tuple)))
10000 loops, best of 3: 25.8 µs per loop
>>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))]
10000 loops, best of 3: 29.8 µs per loop
>>> %timeit set(tuple(item) for item in map(sorted, a))
10000 loops, best of 3: 28.5 µs per loop
``````

Long list with many duplicates:

``````>>> import random
>>> a = [[random.randint(0, 10), random.randint(0,10)] for _ in range(10000)]

>>> %timeit list(remove_reversed_duplicates(a))
100 loops, best of 3: 12.5 ms per loop
>>> %timeit list(unique_everseen(a, key=frozenset))
100 loops, best of 3: 10 ms per loop
>>> %timeit set(map(frozenset, a))
100 loops, best of 3: 10.4 ms per loop

>>> %timeit list(unique_everseen(a, key=set))
10 loops, best of 3: 47.7 ms per loop
>>> %timeit list(unique_everseen(a, key=chained(sorted, tuple)))
10 loops, best of 3: 22.4 ms per loop
>>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))]
10 loops, best of 3: 24 ms per loop
>>> %timeit set(tuple(item) for item in map(sorted, a))
10 loops, best of 3: 35 ms per loop
``````

And with fewer duplicates:

``````>>> a = [[random.randint(0, 100), random.randint(0,100)] for _ in range(10000)]

>>> %timeit list(remove_reversed_duplicates(a))
100 loops, best of 3: 15.4 ms per loop
>>> %timeit list(unique_everseen(a, key=frozenset))
100 loops, best of 3: 13.1 ms per loop
>>> %timeit set(map(frozenset, a))
100 loops, best of 3: 11.8 ms per loop

>>> %timeit list(unique_everseen(a, key=set))
1 loop, best of 3: 1.96 s per loop
>>> %timeit list(unique_everseen(a, key=chained(sorted, tuple)))
10 loops, best of 3: 24.2 ms per loop
>>> %timeit [list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in a]))]
10 loops, best of 3: 31.1 ms per loop
>>> %timeit set(tuple(item) for item in map(sorted, a))
10 loops, best of 3: 36.7 ms per loop
``````

So the variants with `remove_reversed_duplicates`, `unique_everseen`(`key=frozenset`) and `set(map(frozenset, a))` seem to be by far the fastest solutions. Which one depends on the length of the input and the number of duplicates.

### TL;DR

``````set(map(frozenset, lst))
``````

### Explanation

If the pairs are logically unordered, they’re more naturally expressed as sets. It would be better to have them as sets before you even get to this point, but you can convert them like this:

``````lst = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
lst_as_sets = map(frozenset, lst)
``````

And then the natural way of eliminating duplicates in an iterable is to convert it to a `set`:

``````deduped = set(lst_as_sets)
``````

(This is the main reason I chose `frozenset` in the first step. Mutable `set`s are not hashable, so they can’t be added to a `set`.)

Or you can do it in a single line like in the TL;DR section.

I think this is much simpler, more intuitive, and more closely matches how you think about the data than fussing with sorting and tuples.

### Converting back

If for some reason you really need a `list` of `list`s as the final result, converting back is trivial:

``````result_list = list(map(list, deduped))
``````

But it’s probably more logical to leave it all as `set`s as long as possible. I can only think of one reason that you might need this, and that’s compatibility with existing code/libraries.

You could sort each pair, convert your list of pairs to a set of tuples and back again :

``````l = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
[list(tpl) for tpl in list(set([tuple(sorted(pair)) for pair in l]))]
#=> [[0, 1], [1, 4], [0, 4]]
``````

The steps might be easier to understand than a long one-liner :

``````>>> l = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
>>> [sorted(pair) for pair in l]
# [[0, 1], [0, 4], [0, 1], [1, 4], [0, 4], [1, 4]]
>>> [tuple(pair) for pair in _]
# [(0, 1), (0, 4), (0, 1), (1, 4), (0, 4), (1, 4)]
>>> set(_)
# set([(0, 1), (1, 4), (0, 4)])
>>> list(_)
# [(0, 1), (1, 4), (0, 4)]
>>> [list(tpl) for tpl in _]
# [[0, 1], [1, 4], [0, 4]]
``````

You could use the builtin `filter` function.

``````from __future__ import print_function

def my_filter(l):
seen = set()

def not_seen(it):
s = min(*it), max(*it)
if s in seen:
return False
else:
return True

out = filter(not_seen, l)

return out

myList = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
print(my_filter(myList)) # [[0, 1], [0, 4], [1, 4]]
``````

As a complement I would orient you to the Python itertools module which describes a `unique_everseen` function which does basically the same thing as above but in a lazy, generator-based, memory-efficient version. Might be better than any of our solutions if you are working on large arrays. Here is how to use it:

``````from itertools import ifilterfalse

def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
yield element

gen = unique_everseen(myList, lambda x: (min(x), max(x))) # gen is an iterator
print(gen) # <generator object unique_everseen at 0x7f82af492fa0>
result = list(gen) # consume generator into a list.
print(result) # [[0, 1], [0, 4], [1, 4]]
``````

I haven’t done any metrics to see who’s fastest. However memory-efficiency and O complexity seem better in this version.

## Timing min/max vs sorted

The builtin `sorted` function could be passed to `unique_everseen` to order items in the inner vectors. Instead, I pass `lambda x: (min(x), max(x))`. Since I know the vector size which is exactly 2, I can proceed like this.

To use `sorted` I would need to pass `lambda x: tuple(sorted(x))` which adds overhead. Not dramatically, but still.

``````myList = [[random.randint(0, 10), random.randint(0,10)] for _ in range(10000)]
timeit.timeit("list(unique_everseen(myList, lambda x: (min(x), max(x))))", globals=globals(), number=20000)
>>> 156.81979029000013
timeit.timeit("list(unique_everseen(myList, lambda x: tuple(sorted(x))))", globals=globals(), number=20000)
>>> 168.8286430349999
``````

Timings done in Python 3, which adds the `globals` kwarg to `timeit.timeit`.

An easy and unnested solution:

``````pairs = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
s=set()
for p in pairs:
# Lists are unhashable so make the "elements" into tuples
p = tuple(p)
if p not in s and p[::-1] not in s:

print s
``````

### EDITED to better explain

First get each list sorted and next use the dictionaries keys to get a unique set of elements and them list comprehension.

Why tuples?
Replacing lists with tuples is necessary to avoid the “unhashable” error when passing through the fromkeys() function

``````my_list = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
tuple_list = [ tuple(sorted(item)) for item in my_list ]
final_list = [ list(item) for item in list({}.fromkeys(tuple_list)) ]
``````

Using OrderedDict even preserve the list order.

``````from collections import OrderedDict

my_list = [[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
tuple_list = [ tuple(sorted(item)) for item in my_list ]
final_list = [ list(item) for item in list(OrderedDict.fromkeys(tuple_list)) ]
``````

The above code will result in the desired list

``````[[0, 1], [0, 4], [1, 4]]
``````

If the order of pairs and pair-items matters, creating a new list by testing for membership might be the way to go here.

``````pairs = [0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]
no_dups = []
for pair in pairs:
if not any( all( i in p for i in pair ) for p in no_dups ):
no_dups.append(pair)
``````

Otherwise, I’d go with Styvane’s answer.

Incidentally, the above solution will not work for cases in which you have matching pairs. For example, `[0,0]` would not be added to the list. For that, you’d need to add an additional check:

``````for pair in pairs:
if not any( all( i in p for i in pair ) for p in no_dups ) or ( len(set(pair)) == 1 and not pair in no_dups ):
no_dups.append(pair)
``````

However, that solution will not pick up empty “pairs” (eg, `[]`). For that, you’ll need one more adjustment:

``````    if not any( all( i in p for i in pair ) for p in no_dups ) or ( len(set(pair)) in (0,1) and not pair in no_dups ):
no_dups.append(pair)
``````

The `and not pair in no_dups` bit is required to prevent adding the `[0,0]` or `[]` to `no_dups` twice.

Well, I am “checking for the reverse pair and append to a list if that’s not the case” as you said you could do, but I’m using a single loop.

``````x=[[0, 1], [0, 4], [1, 0], [1, 4], [4, 0], [4, 1]]
out = []
for pair in x:
if pair[::-1] not in out:
out.append(pair)
print out
``````

The advantage over existing answers is being, IMO, more readable. No deep knowledge of the standard library is needed here. And no keeping track of anything complex. The only concept that might be unfamiliar for beginners it that `[::-1]` reverts the pair.

The performance is O(n**2) though, so do not use if performance is an issue and/or lists are big.

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .