How can I count repetitive array in numpy? - python

X = [[1,2], [5,1], [1,2], [2,-1] , [5,1]]
I want to count "frequency" of repetitive elements for example [1,2]

Unless speed is really an issue, the simplest approach is to map the sub arrays to tuples and use a Counter dict:
X = [[1,2], [5,1], [1,2], [2,-1] , [5,1]]
from collections import Counter
cn = Counter(map(tuple, X))
print(cn)
print(list(filter(lambda x:x[1] > 1,cn.items())))
Counter({(1, 2): 2, (5, 1): 2, (2, -1): 1})
((1, 2), 2), ((5, 1), 2)]
If you consider [1, 2]equal to [2, 1] then you could use a frozenset Counter(map(frozenset, X)

Take a look at numpy.unique: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.unique.html
You can use the return_counts argument for getting the count of each item:
values, counts = numpy.unique(X, return_counts = True)
repeated = values[counts > 1]

Assuming I understand what you want:
Try to count each item in your list into a dictionary dict then select from dict items that its count > 1
The following code might help you:
freq = dict()
for item in x:
if tuple(item) not in x:
freq[tuple(item)] = 1
else:
freq[tuple(item)] += 1
print {k:v for(k,v) in freq.items() if v > 1}
That code will give you the output:
{(1, 2): 2}

Related

Find out if a tuple is contained into another one with repetitions in Python

Tuple1 = (1,2,2)
TupleList = [(1,2,3), (1,2,3,2)]
I want to search in TupleList for any tuple being a superset of Tuple1. The result should be in this case:
(1,2,3,2)
But if I use the .issuperset() function, it will not take into account the repetition of the 2 inside Tuple1.
How to solve this problem?
If you need to consider element frequency this is probably a good use of the collections.Counter utility.
from collections import Counter
tuple_1 = (1, 2, 2)
tuple_list = [(1, 2, 3), (3, 4, 1), (1, 2, 3, 2)]
def find_superset(source, targets):
source_counter = Counter(source)
for target in targets:
target_counter = Counter(target)
if is_superset(source_counter, target_counter):
return target
return None # no superset found
def is_superset(source_counter, target_counter):
for key in source_counter:
if not target_counter[key] >= source_counter[key]:
return False
return True
print(find_superset(tuple_1, tuple_list))
Output:
(1, 2, 3, 2)
from collections import Counter
def contains(container, contained):
" True if all values in dictionary container >= values in contained"
return all(container[x] >= contained[x] for x in contained)
def sublist(lst1, lst2):
" Detects if all elements in lst1 are in lst2 with at least as many count "
return contains(Counter(lst1), Counter(lst2), )
Tuple1 = (1,2,2)
TupleList = [(1,2,3), (1,2,3,2)]
# List of tuples from TupleList that contains Tuple1
result = [x for x in TupleList if sublist(x, Tuple1)]
print(result)
>>>[(1, 2, 3, 2)]

Algorithm to extract edge list efficiently from a vector

I have a long list (~10 million elements) and the elements that have repeated values are pairs. I want to extract the list of pairs from the list, e.g.
R = [1,3,1,6,9,6,1,2,3,0]
will spit out list of pairs
P = [[e1,e3],[e1,e7],[e3,e7],[e4,e6],[e2,e9]]
What is the efficient algorithm to achieve this for a long list?
Group the indices together based on value, then iterate through pairs of indices using combinations.
from collections import defaultdict
from itertools import combinations
R = [1,3,1,6,9,6,1,2,3,0]
d = defaultdict(list)
for idx,item in enumerate(R,1):
d[item].append(idx)
result = []
for indices in d.itervalues():
result.extend(combinations(indices, 2))
print result
Result:
[(1, 3), (1, 7), (3, 7), (2, 9), (4, 6)]
Populating the defaultdict takes O(len(R)) time on average. Finding combinations is O(N!) time, where N is the number of indices in the largest group.
My simple solution:
def extract(edges):
dic = {}
for i in range(len(edges)):
if edges[i] in dic.keys():
dic[edges[i]].append(i+1)
else:
dic[edges[i]] = [i+1]
res = []
for k in sorted(dic.keys()):
res += combinations(dic[k])
return res
def combinations(positions):
ret = []
print positions
for i in range(len(positions)):
for j in range(i+1, len(positions)):
ret.append(["e"+str(positions[i]), "e"+str(positions[j])])
print ret
return ret
R = [1,3,1,6,9,6,1,2,3,0]
res = extract(R)
print res
As we can't see your input, you might encounter problems if there are many combinations. One thing to try is pypy, which sometimes gives me a (free) speed up.
Unless I understood it the wrong way, the simplest and optimal way to do this is to use a dictionary of already encountered values.
elem_dict = {}
output = []
for i, elem in zip (range (length(R))),R):
if elem_dict.has_key (elem):
output += [[duplicate, i] for duplicate in elem_dict[elem]]
else
elem_dict[elem] = set ()
elem_dict[elem].add (i)
print output #[[0, 2], [3, 5], [0, 6], [2, 6], [1, 8]]
Should be O(n log (n)) in average case, if I'm not mistaken, unless you have a lot of similar values in which case your output is O(n^2) anyway.
My approach would be to do a pass over the list to find the elements with the same value and store them into new lists, then find the elements that appear more than once and collect the combinations:
In [18]: from collections import defaultdict
In [19]: d = defaultdict(list)
In [20]: for i, e in enumerate(R, 1):
....: d[e].append(i)
....:
In [21]: from itertools import combinations
In [22]: from itertools import chain
In [23]: list(chain(*[list(combinations(v,2)) for v in d.values() if len(v) > 1]))
Out[23]: [(1, 3), (1, 7), (3, 7), (2, 9), (4, 6)]

compare to lists and return the different indices and elements in python

I want to compare to lists and return the different indices and elements.
So I write the following code:
l1 = [1,1,1,1,1]
l2 = [1,2,1,1,3]
ind = []
diff = []
for i in range(len(l1)):
if l1[i] != l2[i]:
ind.append(i)
diff.append([l1[i], l2[i]])
print ind
print diff
# output:
# [1, 4]
# [[1, 2], [1, 3]]
The code works, but are there any better ways to do that?
Update the Question:
I want to ask for another solutions, for example with the iterator, or ternary expression like [a,b](expression) (Not the easiest way like what I did. I want to exclude it.) Thanks very much for the patient! :)
You could use a list comprehension to output all the information in a single list.
>>> [[idx, (i,j)] for idx, (i,j) in enumerate(zip(l1, l2)) if i != j]
[[1, (1, 2)], [4, (1, 3)]]
This will produce a list where each element is: [index, (first value, second value)] so all the information regarding a single difference is together.
An alternative way is the following
>>> l1 = [1,1,1,1,1]
>>> l2 = [1,2,1,1,3]
>>> z = zip(l1,l2)
>>> ind = [i for i, x in enumerate(z) if x[0] != x[1]]
>>> ind
[1, 4]
>>> diff = [z[i] for i in ind]
>>> diff
[(1, 2), (1, 3)]
In Python3 you have to add a call to list around zip.
You can try functional style:
res = filter(lambda (idx, x): x[0] != x[1], enumerate(zip(l1, l2)))
# [(1, (1, 2)), (4, (1, 3))]
to unzip res you can use:
zip(*res)
# [(1, 4), ((1, 2), (1, 3))]

Identify duplicate values in a list in Python

Is it possible to get which values are duplicates in a list using python?
I have a list of items:
mylist = [20, 30, 25, 20]
I know the best way of removing the duplicates is set(mylist), but is it possible to know what values are being duplicated? As you can see, in this list the duplicates are the first and last values. [0, 3].
Is it possible to get this result or something similar in python? I'm trying to avoid making a ridiculously big if elif conditional statement.
These answers are O(n), so a little more code than using mylist.count() but much more efficient as mylist gets longer
If you just want to know the duplicates, use collections.Counter
from collections import Counter
mylist = [20, 30, 25, 20]
[k for k,v in Counter(mylist).items() if v>1]
If you need to know the indices,
from collections import defaultdict
D = defaultdict(list)
for i,item in enumerate(mylist):
D[item].append(i)
D = {k:v for k,v in D.items() if len(v)>1}
Here's a list comprehension that does what you want. As #Codemonkey says, the list starts at index 0, so the indices of the duplicates are 0 and 3.
>>> [i for i, x in enumerate(mylist) if mylist.count(x) > 1]
[0, 3]
You can use list compression and set to reduce the complexity.
my_list = [3, 5, 2, 1, 4, 4, 1]
opt = [item for item in set(my_list) if my_list.count(item) > 1]
The following list comprehension will yield the duplicate values:
[x for x in mylist if mylist.count(x) >= 2]
simplest way without any intermediate list using list.index():
z = ['a', 'b', 'a', 'c', 'b', 'a', ]
[z[i] for i in range(len(z)) if i == z.index(z[i])]
>>>['a', 'b', 'c']
and you can also list the duplicates itself (may contain duplicates again as in the example):
[z[i] for i in range(len(z)) if not i == z.index(z[i])]
>>>['a', 'b', 'a']
or their index:
[i for i in range(len(z)) if not i == z.index(z[i])]
>>>[2, 4, 5]
or the duplicates as a list of 2-tuples of their index (referenced to their first occurrence only), what is the answer to the original question!!!:
[(i,z.index(z[i])) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0), (4, 1), (5, 0)]
or this together with the item itself:
[(i,z.index(z[i]),z[i]) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0, 'a'), (4, 1, 'b'), (5, 0, 'a')]
or any other combination of elements and indices....
I tried below code to find duplicate values from list
1) create a set of duplicate list
2) Iterated through set by looking in duplicate list.
glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
if(glist.count(c)>1):
dup.append(c)
print(dup)
OUTPUT
[1, 'one']
Now get the all index for duplicate element
glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
if(glist.count(c)>1):
indices = [i for i, x in enumerate(glist) if x == c]
dup.append((c,indices))
print(dup)
OUTPUT
[(1, [0, 6]), ('one', [3, 7])]
Hope this helps someone
That's the simplest way I can think for finding duplicates in a list:
my_list = [3, 5, 2, 1, 4, 4, 1]
my_list.sort()
for i in range(0,len(my_list)-1):
if my_list[i] == my_list[i+1]:
print str(my_list[i]) + ' is a duplicate'
The following code will fetch you desired results with duplicate items and their index values.
for i in set(mylist):
if mylist.count(i) > 1:
print(i, mylist.index(i))
You should sort the list:
mylist.sort()
After this, iterate through it like this:
doubles = []
for i, elem in enumerate(mylist):
if i != 0:
if elem == old:
doubles.append(elem)
old = None
continue
old = elem
You can print duplicate and Unqiue using below logic using list.
def dup(x):
duplicate = []
unique = []
for i in x:
if i in unique:
duplicate.append(i)
else:
unique.append(i)
print("Duplicate values: ",duplicate)
print("Unique Values: ",unique)
list1 = [1, 2, 1, 3, 2, 5]
dup(list1)
mylist = [20, 30, 25, 20]
kl = {i: mylist.count(i) for i in mylist if mylist.count(i) > 1 }
print(kl)
It looks like you want the indices of the duplicates. Here is some short code that will find those in O(n) time, without using any packages:
dups = {}
[dups.setdefault(v, []).append(i) for i, v in enumerate(mylist)]
dups = {k: v for k, v in dups.items() if len(v) > 1}
# dups now has keys for all the duplicate values
# and a list of matching indices for each
# The second line produces an unused list.
# It could be replaced with this:
for i, v in enumerate(mylist):
dups.setdefault(v, []).append(i)
m = len(mylist)
for index,value in enumerate(mylist):
for i in xrange(1,m):
if(index != i):
if (L[i] == L[index]):
print "Location %d and location %d has same list-entry: %r" % (index,i,value)
This has some redundancy that can be improved however.
def checkduplicate(lists):
a = []
for i in lists:
if i in a:
pass
else:
a.append(i)
return i
print(checkduplicate([1,9,78,989,2,2,3,6,8]))

Find an element in a list of tuples

I have a list 'a'
a= [(1,2),(1,4),(3,5),(5,7)]
I need to find all the tuples for a particular number. say for 1 it will be
result = [(1,2),(1,4)]
How do I do that?
If you just want the first number to match you can do it like this:
[item for item in a if item[0] == 1]
If you are just searching for tuples with 1 in them:
[item for item in a if 1 in item]
There is actually a clever way to do this that is useful for any list of tuples where the size of each tuple is 2: you can convert your list into a single dictionary.
For example,
test = [("hi", 1), ("there", 2)]
test = dict(test)
print test["hi"] # prints 1
Read up on List Comprehensions
[ (x,y) for x, y in a if x == 1 ]
Also read up up generator functions and the yield statement.
def filter_value( someList, value ):
for x, y in someList:
if x == value :
yield x,y
result= list( filter_value( a, 1 ) )
[tup for tup in a if tup[0] == 1]
for item in a:
if 1 in item:
print item
The filter function can also provide an interesting solution:
result = list(filter(lambda x: x.count(1) > 0, a))
which searches the tuples in the list a for any occurrences of 1. If the search is limited to the first element, the solution can be modified into:
result = list(filter(lambda x: x[0] == 1, a))
Or takewhile, ( addition to this, example of more values is shown ):
>>> a= [(1,2),(1,4),(3,5),(5,7),(0,2)]
>>> import itertools
>>> list(itertools.takewhile(lambda x: x[0]==1,a))
[(1, 2), (1, 4)]
>>>
if unsorted, like:
>>> a= [(1,2),(3,5),(1,4),(5,7)]
>>> import itertools
>>> list(itertools.takewhile(lambda x: x[0]==1,sorted(a,key=lambda x: x[0]==1)))
[(1, 2), (1, 4)]
>>>
Using filter function:
>>> def get_values(iterables, key_to_find):
return list(filter(lambda x:key_to_find in x, iterables))
>>> a = [(1,2),(1,4),(3,5),(5,7)]
>>> get_values(a, 1)
>>> [(1, 2), (1, 4)]
>>> [i for i in a if 1 in i]
[(1, 2), (1, 4)]
if you want to search tuple for any number which is present in tuple then you can use
a= [(1,2),(1,4),(3,5),(5,7)]
i=1
result=[]
for j in a:
if i in j:
result.append(j)
print(result)
You can also use if i==j[0] or i==j[index] if you want to search a number in particular index

Categories

Resources