Finding duplicate and unique nested sequence items using count in Python - python

List1=[1, 2, 2, 3]
I like the following to extract duplicate and unique list items.
DuplicateItems = [x for x in list1 if list1.count(x) > 1]
[2]
[2]
UniqueItems = [x for x in list1 if list1.count(x) == 1]
[1]
[3]
However, can I do the same with nested sequences?
List2 = [[1, 1], [2, 2], [3, 2], [3, 3]]
All list2 nested sequences containing duplicate second items should be:
[2, 2]
[3, 2]
While all list2 nested sequences unique second items should be:
[1, 1]
[3, 3]
I've tried several combinations, but it seems the following should work:
DuplicateItems = [x for x in list2 if list2.count(x[1]) > 1]
UniqueItems = [x for x in list2 if list2.count(x[1]) == 1]
It does not work. What am I doing wrong? I realize there are other methods to extract duplicate and unique items but I like the simplicity of this.

What you're doing wrong is trying to find a single number (x[1]) in a list of pairs (list2). Here's a solution you can try, using collections.Counter:
import collections
# First, extract 2nd items into new list:
counts = collections.Counter(x for _, x in list2)
# Then apply your previous logic:
duplicate_items = [x for x in list2 if counts[x[1]] > 1]
unique_items = [x for x in list2 if counts[x[1]] == 1]
Edit:
If you'd like to work with arbitrary (but equal) length nested lists (e.g: [[1, 1, 1], [2, 2, 2], ...]), assuming you're still only interested in the last element, you can replace the counts line with:
counts = collections.Counter(x for *rest, x in list2) # Python3 only, or
counts = collections.Counter(x[-1] for x in list2) # All versions of Python
Remember to update the indices in duplicate_items and unique_items too.

You can use numpy array...
import numpy as np
list2 = np.array([[1, 1], [2, 2], [3, 2], [3, 3]])
DuplicateItems = [x.tolist() for x in list2 if list2[:,1].tolist().count(x[1]) > 1]
UniqueItems = [x.tolist() for x in list2 if list2[:,1].tolist().count(x[1]) == 1]

Related

Remove duplicate items from lists in Python lists

I want to remove duplicate items from lists in sublists on Python.
Exemple :
myList = [[1,2,3], [4,5,6,3], [7,8,9], [0,2,4]]
to
myList = [[1,2,3], [4,5,6], [7,8,9], [0]]
I tried with this code :
myList = [[1,2,3],[4,5,6,3],[7,8,9], [0,2,4]]
nbr = []
for x in myList:
for i in x:
if i not in nbr:
nbr.append(i)
else:
x.remove(i)
But some duplicate items are not deleted.
Like this : [[1, 2, 3], [4, 5, 6], [7, 8, 9], [0, 4]]
I still have the number 4 that repeats.
You iterate over a list that you are also modifying:
...
for i in x:
...
x.remove(i)
That means that it may skip an element on next iteration.
The solution is to create a shallow copy of the list and iterate over that while modifying the original list:
...
for i in x.copy():
...
x.remove(i)
You can make this much faster by:
Using a set for repeated membership testing instead of a list, and
Rebuilding each sublist rather than repeatedly calling list.remove() (a linear-time operation, each time) in a loop.
seen = set()
for i, sublist in enumerate(myList):
new_list = []
for x in sublist:
if x not in seen:
seen.add(x)
new_list.append(x)
myList[i] = new_list
>>> print(myList)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [0]]
If you want mild speed gains and moderate readability loss, you can also write this as:
seen = set()
for i, sublist in enumerate(myList):
myList[i] = [x for x in sublist if not (x in seen or seen.add(x))]
Why you got wrong answer: In your code, after scanning the first 3 sublists, nbr = [1, 2, 3, 4, 5, 6, 7, 8, 9]. Now x = [0, 2, 4]. Duplicate is detected when i = x[1], so x = [0, 4]. Now i move to x[2] which stops the for loop.
Optimization has been proposed in other answers. Generally, 'list' is only good for retrieving element and appending/removing at the rear.

Multiply a list by the elements of other list

I have this list of list, with a series of values:
factors = [1,2,3]
values = [[1,2,3],[3,1,4],[5,5,2]]
I want to multiply each list1 in values, but the corresponding element of list1. I am trying with this:
factors = [1,2,3]
values = [[1,2,3],[3,1,4],[5,5,2]]
multiply = []
for i in factors:
multiply = [values[i]*i]
But it does not work. The expected value would be:
[[1, 2, 3], [6, 2, 8], [15, 15, 6]]
Try this:
factors = [1, 2, 3]
values = [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
multiply = []
for idx, lst in enumerate(values):
multiply.append([factors[idx] * x for x in lst])
print(multiply)
For a list comprehension version of the above code, see #Hommes answer
Update: Given more general setup of the problem I changed the comprehension
Solution with list comprehension:
factors = [1, 2, 3]
values = [[1, 2, 3], [3, 1, 4], [5, 5, 2]]
multiply = [[factors[idx] * elem for elem in lst] for idx, lst in enumerate(values)]
Out[39]: [[1, 2, 3], [6, 2, 8], [15, 15, 6]]
There is a few problems with the code as it is:
Python uses zero-indexing
The first element of a list in Python has the index 0. In your for loop:
for i in factor:
multiply = [values[i]*i]
The last iteration will try to access values[3]. But values has only 3 elements, so the last element is values[2], not values[3].
Multiplication * of a list and an integer doesn't actually multiply
Multiplying a list by an Int, say n gives you a new list that concatenates the original n times. For example:
>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
The most straightforward way of actually broadcasting the multiplication over the list is to use a 'list comprehension'. For example:
>>> [3*x for x in [1,2,3]]
[3, 6, 9]
Applying this into your example would look something like:
for i in factors:
multiply = [i*x for x in values[i-1]]
You are only keeping the last calculation in multiply
Each go around your for loop you assign a new value to multiply, overwriting whatever was there previously. If you want to collect all your results, then you should append to the multiply list.
multiply.append([i*x for x in values[i-1]])
All together, your example is fixed as:
factors = [1,2,3]
values = [[1,2,3],[3,1,4],[5,5,2]]
multiply = []
for i in factors:
multiply.append([i*x for x in values[i-1]])
Improvements
However, there are still ways to improve the code in terms of concision and readability.
The root of the problem is to multiply a list's elements by a number. * can't do it, but you can write your own function which can:
def multiply_list(X, n):
return [n*x for x in X]
Then you can use this function, and a list comprehension, to remove the for loop:
multiply = [multiply_list(x, i) for (x, i) in zip(values, factors)]
Or, if you think it is readable, then you can just use the nested list comprehension, which is much more concise:
multiply = [[factor * x for x in value] for (value, factor) in zip(values, factors)]
You can use list comprehension for a one liner. Try this
multiply = [[x * y for y in list1] for x in list1]

Nested list - create new nested list with indices as items

I would like to create a new nested list from an already existing nested list. This new list should include the indices+1 from the existing list.
Example:
my_list = [[20, 45, 80],[56, 29],[76],[38,156,11,387]]
Result:
my_new_list = [[1,2,3],[1,2],[1],[1,2,3,4]]
How can I create such a list?
save a python loop, force iteration of range (required for python 3) in a list comprehension, so it's faster than a classical double nested comprehension:
my_list = [[20, 45, 80],[56, 29],[76],[38,156,11,387]]
index_list = [list(range(1,len(x)+1)) for x in my_list]
There's a few ways to do this, but the first that comes to mind is to enumerate the elements with a starting index of 1 in a nested list comprehension.
>>> [[index for index, value in enumerate(sub, 1)] for sub in my_list]
[[1, 2, 3], [1, 2], [1], [1, 2, 3, 4]]
Another solution could be:
new_list = [list(range(1,len(item)+1)) for item in my_list]
Here is are some simple solutions:
>>> lst = [[20, 45, 80],[56, 29],[76],[38,156,11,387]]
>>> out = [[x+1 for x,_ in enumerate(y)] for y in lst]
>>> out
[[1, 2, 3], [1, 2], [1], [1, 2, 3, 4]]
>>>
>>>
>>> out = [[x+1 for x in range(len(y))] for y in lst]
>>> out
[[1, 2, 3], [1, 2], [1], [1, 2, 3, 4]]
Using nested list comprehension
First you want a list with every number from 1 to the length of your sublist. There are several ways to do this with list comprehension.
For example
[i for i in range(1, len(sublist) + 1)]
or
[i + 1 for i in range(len(sublist))]
Second you want to do this for every sublist inside your my_list. Therefore you have to use nested list comprehension:
>>> my_list = [[20, 45, 80],[56, 29],[76],[38,156,11,387]]
>>> my_new_list = [[i+1 for i in range(len(sublist))] for sublist in my_list]
>>> my_new_list
[[1, 2, 3], [1, 2], [1], [1, 2, 3, 4]]
Using list comprehension with range
Another way would be using the range built-in function as a generator for your sublists:
>>> [list(range(1, len(sublist) + 1)) for sublist in my_list]
[[1, 2, 3], [1, 2], [1], [1, 2, 3, 4]]
Using map with range
Or you can use the map built-in function
>>> list(map(
... lambda sublist: list(range(1, len(sublist) + 1)),
... my_list
... ))
[[1, 2, 3], [1, 2], [1], [1, 2, 3, 4]]

Joining two List Elements one by one to make sublists inside a third list

I have two lists of equal length.
The first list consists of 1000 sublists with two elements each e.g
listone = [[1,2], [1,3], [2,3],...]
My second list consists of 1000 elements e.g.
secondlist = [1,2,3,...]
I want to use these two lists to make my third list consisting 1000 sublists of three elements. I want to have my list in such a fashion that every index is added from secondlist to listone as the third element e.g.
thirdlist = [[1,2,1], [1,3,2], [2,3,3],...]
Look into zip:
listone = [[1,2], [1,3], [2,3]]
secondlist = [1,2,3]
thirdlist = [x + [y] for x, y in zip(listone, secondlist)]
print(thirdlist)
# Output:
# [[1, 2, 1], [1, 3, 2], [2, 3, 3]]
you can do:
[x + [y] for x, y in zip(listone, secondlist)]
You can use the zip function to combine listOne and second. To get the desired format you have to create a new list using list comprehension
listOne = [[1,2], [1,3], [2,3]]
secondList = [1,2,3]
thirdList = [x + [y] for x,y in zip(listOne, secondList)]
print thirdList
>>> [[1, 2, 1], [1, 3, 2], [2, 3, 3]]

Check if a value in a list is equal to some number

I've got a list of data:
[[0, 3], [1, 2], [2, 1], [3, 0]]
And I'm trying to check if any of the individual numers is equal to 3, and if so return which element, so list[0],list[3] e.t.c within the original list contains this value 3.
I've gotten as far as:
for i in range(0, len(gcounter_selection)):
for x in range(0,len(gcounter_selection)):
if any(x) in gcounter_selection[i][x]==3:
print(i)
My list is called gcounter_selection by the way.
But I'm getting a type error:
TypeError: argument of type 'int' is not iterable
I've tried using a generator expression but I couldnt get that to work either.
If I understood correctly, you're looking for list comprehensions:
value = 3
lst = [[0, 3], [1, 2], [2, 1], [3, 0]]
items = [x for x in lst if value in x]
print(items)
#[[0, 3], [3, 0]]
To get elements' positions instead of just elements, add enumerate:
indexes = [n for n, x in enumerate(lst) if value in x]
Fixed version of your original
gcounter_selection = [[0, 3], [1, 2], [2, 1], [3, 0]]
for i in range(0, len(gcounter_selection)):
if any(x == 3 for x in gcounter_selection[i]):
print(i)
However this can be simplified to
for i, x in enumerate(gcounter_selection):
if any(y == 3 for y in x):
print(i)
And there is no need for any in this case, just check with in
for i, x in enumerate(gcounter_selection):
if 3 in x:
print(i)

Categories

Resources