Should be returning a dictionary in the following format:
key_count([1, 3, 2, 1, 5, 3, 5, 1, 4]) ⇒ {
1: 3,
2: 1,
3: 2,
4: 1,
5: 2,
}
I know the fastest way to do it is the following:
import collections
def key_count(l):
return collections.Counter(l)
However, I would like to do it without importing collections.Counter.
So far I have:
x = []
def key_count(l):
for i in l:
if i not in x:
x.append(i)
count = []
for i in l:
if i == i:
I approached the problem by trying to extract the two sides (keys and values) of the dictionary into separate lists and then use zip to create the dictionary. As you can see, I was able to extract the keys of the eventual dictionary but I cannot figure out how to add the number of occurrences for each number from the original list in a new list. I wanted to create an empty list count that will eventually be a list of numbers that denote how many times each number in the original list appeared. Any tips? Would appreciate not giving away the full answer as I am trying to solve this! Thanks in advance
Separating the keys and values is a lot of effort when you could just build the dict directly. Here's the algorithm. I'll leave the implementation up to you, though it sort of implements itself.
Make an empty dict
Iterate through the list
If the element is not in the dict, set the value to 1. Otherwise, add to the existing value.
See the implementation here:
https://stackoverflow.com/a/8041395/4518341
Classic reduce problem. Using a loop:
a = [1, 3, 2, 1, 5, 3, 5, 1, 4]
m = {}
for n in a:
if n in m: m[n] += 1
else: m[n] = 1
print(m)
Or explicit reduce:
from functools import reduce
a = [1, 3, 2, 1, 5, 3, 5, 1, 4]
def f(m, n):
if n in m: m[n] += 1
else: m[n] = 1
return m
m2 = reduce(f, a, {})
print(m2)
use a dictionary to pair keys and values and use your x[] to track the diferrent items founded.
import collections
def keycount(l):
return collections.Counter(l)
key_count=[1, 3, 2, 1, 5, 3, 5, 1, 4]
x = []
dictionary ={}
def Collection_count(l):
for i in l:
if i not in x:
x.append(i)
dictionary[i]=1
else:
dictionary[i]=dictionary[i]+1
Collection_count(key_count)
[print(key, value) for (key, value) in sorted(dictionary.items())]
I have a list-like python object of positive integers and I want to get which locations on that list have repeated values. For example
if input is [0,1,1] the function should return [1,2] because the value of 1, which is the element at position 1 and 2 of the input array appears twice. Similarly:
[0,13,13] should return [[1, 2]]
[0,1,2,1,3,4,2,2] should return [[1, 3], [2, 6, 7]] because 1 appears twice, at positions [1, 3] of the input array and 2 appears 3 times at positions [2, 6, 7]
[1, 2, 3] should return an empty array []
What I have written is this:
def get_locations(labels):
out = []
label_set = set(labels)
for label in list(label_set):
temp = [i for i, j in enumerate(labels) if j == label]
if len(temp) > 1:
out.append(np.array(temp))
return np.array(out)
While it works ok for small input arrays it gets too slow when size grows. For instance, The code below on my pc, skyrockets from 0.14secs when n=1000 to 12secs when n = 10000
from timeit import default_timer as timer
start = timer()
n = 10000
a = np.arange(n)
b = np.append(a, a[-1]) # append the last element to the end
out = get_locations(b)
end = timer()
print(out)
print(end - start) # Time in seconds
How can I speed this up please? Any ideas highly appreciated
Your nested loop results in O(n ^ 2) in time complexity. You can instead create a dict of lists to map indices to each label, and extract the sub-lists of the dict only if the length of the sub-list is greater than 1, which reduces the time complexity to O(n):
def get_locations(labels):
positions = {}
for index, label in enumerate(labels):
positions.setdefault(label, []).append(index)
return [indices for indices in positions.values() if len(indices) > 1]
so that get_locations([0, 1, 2, 1, 3, 4, 2, 2]) returns:
[[1, 3], [2, 6, 7]]
Your code is slow because of the nested for-loop. You can solve this in a more efficient way by using another data structure:
from collections import defaultdict
mylist = [0,1,2,1,3,4,2,2]
output = defaultdict(list)
# Loop once over mylist, store the indices of all unique elements
for i, el in enumerate(mylist):
output[el].append(i)
# Filter out elements that occur only once
output = {k:v for k, v in output.items() if len(v) > 1}
This produces the following output for your example b:
{1: [1, 3], 2: [2, 6, 7]}
You can turn this result into the desired format:
list(output.values())
> [[1, 3], [2, 6, 7]]
Know however that this relies on the dictionary being insertion ordered, which is only the case as of python 3.6.
Heres a code i implemented. It runs in linear time:
l = [0,1,2,1,3,4,2,2]
dict1 = {}
for j,i in enumerate(l): # O(n)
temp = dict1.get(i) # O(1) most cases
if not temp:
dict1[i] = [j]
else:
dict1[i].append(j) # O(1)
print([item for item in dict1.values() if len(item) > 1]) # O(n)
Output:
[[1, 3], [2, 6, 7]]
This is essentially a time-complexity issue. Your algorithm has nested for loops that iterate through the list twice, so the time complexity is of the order of n^2, where n is the size of the list. So when you multiply the size of the list by 10 (from 1,000 to 10,000), you see an approximate time increase of 10^2 = 100. This is why it goes from 0.14 s to 12 s.
Here is a simple solution with no extra libraries required:
def get_locations(labels):
locations = {}
for index, label in enumerate(labels):
if label in locations:
locations[label].append(index)
else:
locations[label] = [index]
return [locations[i] for i in locations if len(locations[i]) > 1]
Since the for loops are not nested, the time complexity is approximately 2n, so you should see about a 4-times increase in time whenever the problem size is doubled.
you can try using "Counter" function from "collections" module
from collections import Counter
list1 = [1,1,2,3,4,4,4]
Counter(list1)
you will get an output similar to this
Counter({4: 3, 1: 2, 2: 1, 3: 1})
Whenever I code on online platforms and somehow I have to compare the elements of a list to one another, I use the following code which according to me is the most efficient possible. This is the last code which I was practicing. It was to find the maximum index between 2 same elements.
max=0
for i in range(len(mylist)):
if max==(len(mylist)-1):
break
for j in range(i + 1, len(mylist)):
if mylist[i] == mylist[j]:
if max>(abs(i-j)):
max=abs(i-j)
It runs most of the test cases, but sometimes it shows "time limit exceeded." I know it is related to the constraints and time complexity but I still can't find a better way. If anyone could help me, that would be great.
It's easier to use C based functions in Python. Also don't name variables python types like list.
x = [item for i, item in enumerate(l) if item in l[i+1:]]
# do something with list of values
You could group by equal elements and then find the difference in-group, and keep the maximum:
lst = [1, 3, 5, 3, 7, 8, 9, 1]
groups = {}
for i, v in enumerate(lst):
groups.setdefault(v, []).append(i)
result = max(max(group) - min(group) for group in groups.values())
print(result)
Output
7
The complexity of this approach is O(n).
def get_longest_distance_between_same_elements_in_list(mylist):
positions = dict()
longest_distance = 0
if len(mylist) < 1:
return longest_distance
for index in range(0, len(mylist)):
if mylist[index] in positions:
positions[mylist[index]].append(index)
else:
positions[mylist[index]] = [index]
for key, value in positions.items():
if len(value) > 1 and longest_distance < value[len(value)-1] - value[0]:
longest_distance = value[len(value)-1] - value[0]
return longest_distance
l1 = [1, 3, 5, 3, 7, 8, 9, 1]
l2 = [9]
l3 = []
l4 = [4, 4, 4, 4, 4]
l5 = [10, 10, 3, 4, 5, 4, 10, 56, 4]
print(get_longest_distance_between_same_elements_in_list(l1))
print(get_longest_distance_between_same_elements_in_list(l2))
print(get_longest_distance_between_same_elements_in_list(l3))
print(get_longest_distance_between_same_elements_in_list(l4))
print(get_longest_distance_between_same_elements_in_list(l5))
Output -
7
0
0
4
6
Time Complexity : O(n)
I have a list with duplicate elements:
list_a=[1,2,3,5,6,7,5,2]
tmp=[]
for i in list_a:
if tmp.__contains__(i):
print i
else:
tmp.append(i)
I have used the above code to find the duplicate elements in the list_a. I don't want to remove the elements from list.
But I want to use for loop here.
Normally C/C++ we use like this I guess:
for (int i=0;i<=list_a.length;i++)
for (int j=i+1;j<=list_a.length;j++)
if (list_a[i]==list_a[j])
print list_a[i]
how do we use like this in Python?
for i in list_a:
for j in list_a[1:]:
....
I tried the above code. But it gets solution wrong. I don't know how to increase the value for j.
Just for information, In python 2.7+, we can use Counter
import collections
x=[1, 2, 3, 5, 6, 7, 5, 2]
>>> x
[1, 2, 3, 5, 6, 7, 5, 2]
>>> y=collections.Counter(x)
>>> y
Counter({2: 2, 5: 2, 1: 1, 3: 1, 6: 1, 7: 1})
Unique List
>>> list(y)
[1, 2, 3, 5, 6, 7]
Items found more than 1 time
>>> [i for i in y if y[i]>1]
[2, 5]
Items found only one time
>>> [i for i in y if y[i]==1]
[1, 3, 6, 7]
Use the in operator instead of calling __contains__ directly.
What you have almost works (but is O(n**2)):
for i in xrange(len(list_a)):
for j in xrange(i + 1, len(list_a)):
if list_a[i] == list_a[j]:
print "duplicate:", list_a[i]
But it's far easier to use a set (roughly O(n) due to the hash table):
seen = set()
for n in list_a:
if n in seen:
print "duplicate:", n
else:
seen.add(n)
Or a dict, if you want to track locations of duplicates (also O(n)):
import collections
items = collections.defaultdict(list)
for i, item in enumerate(list_a):
items[item].append(i)
for item, locs in items.iteritems():
if len(locs) > 1:
print "duplicates of", item, "at", locs
Or even just detect a duplicate somewhere (also O(n)):
if len(set(list_a)) != len(list_a):
print "duplicate"
You could always use a list comprehension:
dups = [x for x in list_a if list_a.count(x) > 1]
Before Python 2.3, use dict() :
>>> lst = [1, 2, 3, 5, 6, 7, 5, 2]
>>> stats = {}
>>> for x in lst : # count occurrences of each letter:
... stats[x] = stats.get(x, 0) + 1
>>> print stats
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1} # filter letters appearing more than once:
>>> duplicates = [dup for (dup, i) in stats.items() if i > 1]
>>> print duplicates
So a function :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
stats = {}
for x in iterable :
stats[x] = stats.get(x, 0) + 1
return (dup for (dup, i) in stats.items() if i > 1)
With Python 2.3 comes set(), and it's even a built-in after than :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
try: # try using built-in set
found = set()
except NameError: # fallback on the sets module
from sets import Set
found = Set()
for x in iterable:
if x in found : # set is a collection that can't contain duplicate
yield x
found.add(x) # duplicate won't be added anyway
With Python 2.7 and above, you have the collections module providing the very same function than the dict one, and we can make it shorter (and faster, it's probably C under the hood) than solution 1 :
import collections
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
return (dup for (dup, i) in collections.counter(iterable).items() if i > 1)
I'd stick with solution 2.
You can use this function to find duplicates:
def get_duplicates(arr):
dup_arr = arr[:]
for i in set(arr):
dup_arr.remove(i)
return list(set(dup_arr))
Examples
print get_duplicates([1,2,3,5,6,7,5,2])
[2, 5]
print get_duplicates([1,2,1,3,4,5,4,4,6,7,8,2])
[1, 2, 4]
If you're looking for one-to-one mapping between your nested loops and Python, this is what you want:
n = len(list_a)
for i in range(n):
for j in range(i+1, n):
if list_a[i] == list_a[j]:
print list_a[i]
The code above is not "Pythonic". I would do it something like this:
seen = set()
for i in list_a:
if i in seen:
print i
else:
seen.add(i)
Also, don't use __contains__, rather, use in (as above).
The following requires the elements of your list to be hashable (not just implementing __eq__ ).
I find it more pythonic to use a defaultdict (and you have the number of repetitions for free):
import collections
l = [1, 2, 4, 1, 3, 3]
d = collections.defaultdict(int)
for x in l:
d[x] += 1
print [k for k, v in d.iteritems() if v > 1]
# prints [1, 3]
Using only itertools, and works fine on Python 2.5
from itertools import groupby
list_a = sorted([1, 2, 3, 5, 6, 7, 5, 2])
result = dict([(r, len(list(grp))) for r, grp in groupby(list_a)])
Result:
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1}
It looks like you have a list (list_a) potentially including duplicates, which you would rather keep as it is, and build a de-duplicated list tmp based on list_a. In Python 2.7, you can accomplish this with one line:
tmp = list(set(list_a))
Comparing the lengths of tmp and list_a at this point should clarify if there were indeed duplicate items in list_a. This may help simplify things if you want to go into the loop for additional processing.
You could just "translate" it line by line.
c++
for (int i=0;i<=list_a.length;i++)
for (int j=i+1;j<=list_a.length;j++)
if (list_a[i]==list_a[j])
print list_a[i]
Python
for i in range(0, len(list_a)):
for j in range(i + 1, len(list_a))
if list_a[i] == list_a[j]:
print list_a[i]
c++ for loop:
for(int x = start; x < end; ++x)
Python equivalent:
for x in range(start, end):
Just quick and dirty,
list_a=[1,2,3,5,6,7,5,2]
holding_list=[]
for x in list_a:
if x in holding_list:
pass
else:
holding_list.append(x)
print holding_list
Output [1, 2, 3, 5, 6, 7]
Using numpy:
import numpy as np
count,value = np.histogram(list_a,bins=np.hstack((np.unique(list_a),np.inf)))
print 'duplicate value(s) in list_a: ' + ', '.join([str(v) for v in value[count>1]])
In case of Python3 and if you two lists
def removedup(List1,List2):
List1_copy = List1[:]
for i in List1_copy:
if i in List2:
List1.remove(i)
List1 = [4,5,6,7]
List2 = [6,7,8,9]
removedup(List1,List2)
print (List1)
Granted, I haven't done tests, but I guess it's going to be hard to beat pandas in speed:
pd.DataFrame(list_a, columns=["x"]).groupby('x').size().to_dict()
You can use:
b=['E', 'P', 'P', 'E', 'O', 'E']
c={}
for i in b:
value=0
for j in b:
if(i == j):
value+=1
c[i]=value
print(c)
Output:
{'E': 3, 'P': 2, 'O': 1}
Find duplicates in the list using loops, conditional logic, logical operators, and list methods
some_list = ['a','b','c','d','e','b','n','n','c','c','h',]
duplicates = []
for values in some_list:
if some_list.count(values) > 1:
if values not in duplicates:
duplicates.append(values)
print("Duplicate Values are : ",duplicates)
Finding the number of repeating elements in a list:
myList = [3, 2, 2, 5, 3, 8, 3, 4, 'a', 'a', 'f', 4, 4, 1, 8, 'D']
listCleaned = set(myList)
for s in listCleaned:
count = 0
for i in myList:
if s == i :
count += 1
print(f'total {s} => {count}')
Try like this:
list_a=[1,2,3,5,6,7,5,2]
unique_values = []
duplicates = []
for i in list_a:
if i not in unique_values:
unique_values.append(i)
else:
found = False
for x in duplicates:
if x.get("key") == i:
found = True
if found:
x["occurrence"] += 1
else:
duplicates.append({
"key": i,
"occurrence": 1
})
some_string= list(input("Enter any string:\n"))
count={}
dup_count={}
for i in some_string:
if i not in count:
count[i]=1
else:
count[i]+=1
dup_count[i]=count[i]
print("Duplicates of given string are below:\n",dup_count)
A little bit more Pythonic implementation (not the most, of course), but in the spirit of your C code could be:
for i, elem in enumerate(seq):
if elem in seq[i+1:]:
print elem
Edit: yes, it prints the elements more than once if there're more than 2 repetitions, but that's what the op's C pseudo code does too.