So I'm trying to write a function elem_sum(lst1:List[int], lst2:List[int]) that takes 2 inputs as lists and returns the summation element-wise in lst1 and lst2. lst1 and lst2 might have different lengths. Suppose lst1 = [a, b, c] and lst2 = [d, e]. Your function should return [a+d, b+e, c].
Examples
elem_sum([1, 2, 3], [10, 20]) == [11, 22, 3]
elem_sum([1, 2, 3], [10, 20, 30, 40]) == [11, 22, 33, 40]
elem_sum([1], [2, 12]) == [3, 12]
Here's what I have tried, which works...
from itertools import zip_longest
def elem_sum(lst1, lst2):
return [sum(t) for t in zip_longest(lst1, lst2, fillvalue=0)]
However, I want to find a solution which works without using itertools AND Import... what should I add or change in my code?
You need not use the itertools if you take the approach like below
def elem_sum(ls1, ls2):
iter_list, other_list = (ls1, ls2) if len(ls1) < len(ls2) else (ls2, ls1)
return [sum(x) for x in zip(iter_list, other_list)]+ other_list[len(iter_list):]
One approach could be to substitute zip_longest by the built-in zip function, but as you know it will just cut of the remaining elements of the longer array. So what you can do is to use zip and then just append the remaining elements of the longer array:
def elem_sum(lst1, lst2):
shorter_length = min(len(lst1), len(lst2))
return [sum(t) for t in zip(lst1, lst2)] + lst1[shorter_length:] + lst2[shorter_length:]
Using the indexing [shorter_length:] on the shorter array will just return an empty array. Thus, we can just concatenate them.
Append zeros to the shorter:
def elem_sum(l1, l2):
len_1 = len(l1)
len_2 = len(l2)
max_len = max(len_1, len_2)
min_len = min(len_1, len_2)
for i in range(max_len-min_len):
if len_2 < len_1:
l2.append(0)
if len_1 < len_2:
l1.append(0)
l_r = []
for i in range(max_len):
l_r.append(l1[i] + l2[i])
return l_r
print(elem_sum([1, 2, 3], [10, 20]))
print(elem_sum([1, 2, 3], [10, 20, 30, 40]))
print(elem_sum([1], [2, 12]))
Output:
[11, 22, 3]
[11, 22, 33, 40]
[3, 12]
def every_other (l):
alist = []
alist = l
for i in range (len (l)-1):
print (i)
if i % 2 == 1:
del (alist [i])
print (alist)
every_other ([0, -12, 4, 18, 9, 10, 11, -23])
output is [0, 4, 18, 10, 11]
when it should be: [0, 4, 9, 11]
Thanks in advance.
You can't remove items from a list and iterate through it at the same time, because that confuses the iterator. Either create a new list and add to it instead of removing items from the old one, or you can use Python's slice syntax to do this in one operation:
def every_other(l):
print l[::2]
Also, you can use list comprehension and filter the list based on its count:
the_list = [0, -12, 4, 18, 9, 10, 11, -23]
new_list = [i for a, i in enumerate(the_list) if a%2 == 0]
Below code will solve your logic:
a=[0, -12, 4, 18, 9, 10, 11, -23]
l=[]
for i,x in enumerate(a):
print "index : ",i,"data : ",x,">>>",i%2
if i%2!=1:
l.append(x)
you are deleting the same list that you are trying to iterate.so when your are at index 1 you delete-12 your new list is [0,4, 18, 9, 10,11,-23]
now you reach index 2 of iteration your value is 18 so its not deleted and same logic continues till the iteration completes.
This code should work if you only want to iterate every other element in a list;
def every_other(l):
return l[0::2]
But if you want to remove every other element then you should do that first before you print or return the list.
def remove_every_other(my_list):
del my_list[1::2]
return my_list
I am trying to do the following..
I have a list of n elements. I want to split this list into 32 separate lists which contain more and more elements as we go towards the end of the original list. For example from:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
I want to get something like this:
b = [[1],[2,3],[4,5,6,7],[8,9,10,11,12]]
I've done the following for a list containing 1024 elements:
for i in range (0, 32):
c = a[i**2:(i+1)**2]
b.append(c)
But I am stupidly struggling to find a reliable way to do it for other numbers like 256, 512, 2048 or for another number of lists instead of 32.
Use an iterator, a for loop with enumerate and itertools.islice:
import itertools
def logsplit(lst):
iterator = iter(lst)
for n, e in enumerate(iterator):
yield itertools.chain([e], itertools.islice(iterator, n))
Works with any number of elements. Example:
for r in logsplit(range(50)):
print(list(r))
Output:
[0]
[1, 2]
[3, 4, 5]
[6, 7, 8, 9]
... some more ...
[36, 37, 38, 39, 40, 41, 42, 43, 44]
[45, 46, 47, 48, 49]
In fact, this is very similar to this problem, except it's using enumerate to get variable chunk sizes.
This is incredibly messy, but gets the job done. Note that you're going to get some empty bins at the beginning if you're logarithmically slicing the list. Your examples give arithmetic index sequences.
from math import log, exp
def split_list(_list, divs):
n = float(len(_list))
log_n = log(n)
indices = [0] + [int(exp(log_n*i/divs)) for i in range(divs)]
unfiltered = [_list[indices[i]:indices[i+1]] for i in range(divs)] + [_list[indices[i+1]:]]
filtered = [sublist for sublist in unfiltered if sublist]
return [[] for _ in range(divs- len(filtered))] + filtered
print split_list(range(1024), 32)
Edit: After looking at the comments, here's an example that may fit what you want:
def split_list(_list):
copy, output = _list[:], []
length = 1
while copy:
output.append([])
for _ in range(length):
if len(copy) > 0:
output[-1].append(copy.pop(0))
length *= 2
return output
print split_list(range(15))
# [[0], [1, 2], [3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13, 14]]
Note that this code is not efficient, but it can be used as a template for writing a better algorithm.
Something like this should solve the problem.
for i in range (0, int(np.sqrt(2*len(a)))):
c = a[i**2:min( (i+1)**2, len(a) )]
b.append(c)
Not very pythonic but does what you want.
def splitList(a, n, inc):
"""
a list to split
n number of sublist
inc ideal difference between the number of elements in two successive sublists
"""
zr = len(a) # remaining number of elements to split into sublists
st = 0 # starting index in the full list of the next sublist
nr = n # remaining number of sublist to construct
nc = 1 # number of elements in the next sublist
#
b=[]
while (zr/nr >= nc and nr>1):
b.append( a[st:st+nc] )
st, zr, nr, nc = st+nc, zr-nc, nr-1, nc+inc
#
nc = int(zr/nr)
for i in range(nr-1):
b.append( a[st:st+nc] )
st = st+nc
#
b.append( a[st:max(st+nc,len(a))] )
return b
# Example of call
# b = splitList(a, 32, 2)
# to split a into 32 sublist, where each list ideally has 2 more element
# than the previous
There's always this.
>>> def log_list(l):
if len(l) == 0:
return [] #If the list is empty, return an empty list
new_l = [] #Initialise new list
new_l.append([l[0]]) #Add first iteration to new list inside of an array
for i in l[1:]: #For each other iteration,
if len(new_l) == len(new_l[-1]):
new_l.append([i]) #Create new array if previous is full
else:
new_l[-1].append(i) #If previous not full, add to it
return new_l
>>> log_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
[[1], [2, 3], [4, 5, 6], [7, 8, 9, 10]]
I am trying to find if any of the sublists in list1 has a repeated value, so i need to be told if a number in list1[0] is the same number in list[1] (which 20 is repeated)
the numbers represent coords and the coords of each item in list1 cannot over lap, if they do then i have a module that reruns a make a new list1 untill no coords are the smae
please help
list1 = [[7, 20], [20, 31, 32], [66, 67, 68],[7, 8, 9, 2],
[83, 84, 20, 86, 87], [144, 145, 146, 147, 148, 149]]
x=0
while x != 169:
if list1.count(x) > 0:
print ("repeat found")
else:
print ("no repeat found")
x+=1
How about something like:
is_dup = sum(1 for l in list1 if len(set(l)) < len(l))
if is_dup > 0:
print ("repeat found")
else:
print ("no repeat found")
Another example using any:
any(len(set(l)) < len(l) for l in list1)
To check if only one item is repeated in all of the lists I would chain them and check. Credit to this answer for flattening a list of lists.
flattened = sum(list1, [])
if len(flattened) > len(set(flattened)):
print ("dups")
else:
print ("no dups")
I guess the proper way to flatten lists is to use itertools.chain which can be used as such:
flattened = list(itertools.chain(*list1))
This can replace the sum call I used above if that seems like a hack.
Solution for the updated question
def has_duplicates(iterable):
"""Searching for duplicates in sub iterables.
This approach can be faster than whole-container solutions
with flattening if duplicates in large iterables are found
early.
"""
seen = set()
for sub_list in iterable:
for item in sub_list:
if item in seen:
return True
seen.add(item)
return False
>>> has_duplicates(list1)
True
>>> has_duplicates([[1, 2], [4, 5]])
False
>>> has_duplicates([[1, 2], [4, 5, 1]])
True
Lookup in a set is fast. Don't use a list for seen if you want it to be fast.
Solution for the original version of the question
If the length of the list is larger than the length of the set made form this list there must be repeated items because a set can only have unique elements:
>>> L = [[1, 1, 2], [1, 2, 3], [4, 4, 4]]
>>> [len(item) - len(set(item)) for item in L]
[1, 0, 2]
This is the key here
>>> {1, 2, 3, 1, 2, 1}
set([1, 2, 3])
EDIT
If your are not interested in the number of repeats for each sub list. This would be more efficient because its stops after the first number greater than 0:
>>> any(len(item) - len(set(item)) for item in L)
True
Thanks to #mata for pointing this out.
from collections import Counter
list1=[[7, 20], [20, 31, 32], [66, 67, 68],
[7, 8, 9, 2], [83, 84, 20, 86, 87],
[144,144, 145, 146, 147, 148, 149]]
for i,l in enumerate(list1):
for r in [x for x,y in Counter(x for x in l).items() if y > 1]:
print 'at list ', i, ' item ', r , ' repeats'
and this one gives globally repeated values:
expl=sorted([x for l in list1 for x in l])
print [x for x,y in zip(expl, expl[1:]) if x==y]
For Python 2.7+, you should try a Counter:
import collections
list = [1, 2, 3, 2, 1]
count = collections.Counter(list)
Then count would be like:
Counter({1: 2, 2: 2, 3:1})
Read more
This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(31 answers)
Closed 8 years ago.
For example:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> unique(x)
[1, 2, 'a', 3]
Assume list elements are hashable.
Clarification: The result should keep the first duplicate in the list. For example, [1, 2, 3, 2, 3, 1] becomes [1, 2, 3].
def unique(items):
found = set()
keep = []
for item in items:
if item not in found:
found.add(item)
keep.append(item)
return keep
print unique([1, 1, 2, 'a', 'a', 3])
Using:
lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18, 13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1, 5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9, 9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
And using the timeit module:
$ python -m timeit -s 'import uniquetest' 'uniquetest.etchasketch(uniquetest.lst)'
And so on for the various other functions (which I named after their posters), I have the following results (on my first generation Intel MacBook Pro):
Allen: 14.6 µs per loop [1]
Terhorst: 26.6 µs per loop
Tarle: 44.7 µs per loop
ctcherry: 44.8 µs per loop
Etchasketch 1 (short): 64.6 µs per loop
Schinckel: 65.0 µs per loop
Etchasketch 2: 71.6 µs per loop
Little: 89.4 µs per loop
Tyler: 179.0 µs per loop
[1] Note that Allen modifies the list in place – I believe this has skewed the time, in that the timeit module runs the code 100000 times and 99999 of them are with the dupe-less list.
Summary: Straight-forward implementation with sets wins over confusing one-liners :-)
Update: on Python3.7+:
>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']
old answer:
Here is the fastest solution so far (for the following input):
def del_dups(seq):
seen = {}
pos = 0
for item in seq:
if item not in seen:
seen[item] = True
seq[pos] = item
pos += 1
del seq[pos:]
lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18,
13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1,
5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9,
9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
del_dups(lst)
print(lst)
# -> [8, 9, 7, 15, 2, 20, 13, 24, 6, 11, 12, 4, 10, 18, 23, 3, 5, 22, 19, 14,
# 21, 1, 0, 16, 17]
Dictionary lookup is slightly faster then the set's one in Python 3.
What's going to be fastest depends on what percentage of your list is duplicates. If it's nearly all duplicates, with few unique items, creating a new list will probably be faster. If it's mostly unique items, removing them from the original list (or a copy) will be faster.
Here's one for modifying the list in place:
def unique(items):
seen = set()
for i in xrange(len(items)-1, -1, -1):
it = items[i]
if it in seen:
del items[i]
else:
seen.add(it)
Iterating backwards over the indices ensures that removing items doesn't affect the iteration.
This is the fastest in-place method I've found (assuming a large proportion of duplicates):
def unique(l):
s = set(); n = 0
for x in l:
if x not in s: s.add(x); l[n] = x; n += 1
del l[n:]
This is 10% faster than Allen's implementation, on which it is based (timed with timeit.repeat, JIT compiled by psyco). It keeps the first instance of any duplicate.
repton-infinity: I'd be interested if you could confirm my timings.
Obligatory generator-based variation:
def unique(seq):
seen = set()
for x in seq:
if x not in seen:
seen.add(x)
yield x
This may be the simplest way:
list(OrderedDict.fromkeys(iterable))
As of Python 3.5, OrderedDict is now implemented in C, so this was is now the shortest, cleanest, and fastest.
Taken from http://www.peterbe.com/plog/uniqifiers-benchmark
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
One-liner:
new_list = reduce(lambda x,y: x+[y][:1-int(y in x)], my_list, [])
An in-place one-liner for this:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> [ item for pos,item in enumerate(x) if x.index(item)==pos ]
[1, 2, 'a', 3]
This is the fastest one, comparing all the stuff from this lengthy discussion and the other answers given here, refering to this benchmark. It's another 25% faster than the fastest function from the discussion, f8. Thanks to David Kirby for the idea.
def uniquify(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if x not in seen and not seen_add(x)]
Some time comparison:
$ python uniqifiers_benchmark.py
* f8_original 3.76
* uniquify 3.0
* terhorst 5.44
* terhorst_localref 4.08
* del_dups 4.76
You can actually do something really cool in Python to solve this. You can create a list comprehension that would reference itself as it is being built. As follows:
# remove duplicates...
def unique(my_list):
return [x for x in my_list if x not in locals()['_[1]'].__self__]
Edit: I removed the "self", and it works on Mac OS X, Python 2.5.1.
The _[1] is Python's "secret" reference to the new list. The above, of course, is a little messy, but you could adapt it fit your needs as necessary. For example, you can actually write a function that returns a reference to the comprehension; it would look more like:
return [x for x in my_list if x not in this_list()]
Do the duplicates necessarily need to be in the list in the first place? There's no overhead as far as looking the elements up, but there is a little bit more overhead in adding elements (though the overhead should be O(1) ).
>>> x = []
>>> y = set()
>>> def add_to_x(val):
... if val not in y:
... x.append(val)
... y.add(val)
... print x
... print y
...
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>>
Remove duplicates and preserve order:
This is a fast 2-liner that leverages built-in functionality of list comprehensions and dicts.
x = [1, 1, 2, 'a', 'a', 3]
tmpUniq = {} # temp variable used below
results = [tmpUniq.setdefault(i,i) for i in x if i not in tmpUniq]
print results
[1, 2, 'a', 3]
The dict.setdefaults() function returns the value as well as adding it to the temp dict directly in the list comprehension. Using the built-in functions and the hashes of the dict will work to maximize efficiency for the process.
O(n) if dict is hash, O(nlogn) if dict is tree, and simple, fixed. Thanks to Matthew for the suggestion. Sorry I don't know the underlying types.
def unique(x):
output = []
y = {}
for item in x:
y[item] = ""
for item in x:
if item in y:
output.append(item)
return output
has_key in python is O(1). Insertion and retrieval from a hash is also O(1). Loops through n items twice, so O(n).
def unique(list):
s = {}
output = []
for x in list:
count = 1
if(s.has_key(x)):
count = s[x] + 1
s[x] = count
for x in list:
count = s[x]
if(count > 0):
s[x] = 0
output.append(x)
return output
There are some great, efficient solutions here. However, for anyone not concerned with the absolute most efficient O(n) solution, I'd go with the simple one-liner O(n^2*log(n)) solution:
def unique(xs):
return sorted(set(xs), key=lambda x: xs.index(x))
or the more efficient two-liner O(n*log(n)) solution:
def unique(xs):
positions = dict((e,pos) for pos,e in reversed(list(enumerate(xs))))
return sorted(set(xs), key=lambda x: positions[x])
Here are two recipes from the itertools documentation:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return imap(next, imap(itemgetter(1), groupby(iterable, key)))
I have no experience with python, but an algorithm would be to sort the list, then remove duplicates (by comparing to previous items in the list), and finally find the position in the new list by comparing with the old list.
Longer answer: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
>>> def unique(list):
... y = []
... for x in list:
... if x not in y:
... y.append(x)
... return y
If you take out the empty list from the call to set() in Terhost's answer, you get a little speed boost.
Change:
found = set([])
to:
found = set()
However, you don't need the set at all.
def unique(items):
keep = []
for item in items:
if item not in keep:
keep.append(item)
return keep
Using timeit I got these results:
with set([]) -- 4.97210427363
with set() -- 4.65712377445
with no set -- 3.44865284975
x = [] # Your list of items that includes Duplicates
# Assuming that your list contains items of only immutable data types
dict_x = {}
dict_x = {item : item for i, item in enumerate(x) if item not in dict_x.keys()}
# Average t.c. = O(n)* O(1) ; furthermore the dict comphrehension and generator like behaviour of enumerate adds a certain efficiency and pythonic feel to it.
x = dict_x.keys() # if you want your output in list format
>>> x=[1,1,2,'a','a',3]
>>> y = [ _x for _x in x if not _x in locals()['_[1]'] ]
>>> y
[1, 2, 'a', 3]
"locals()['_[1]']" is the "secret name" of the list being created.
I don't know if this one is fast or not, but at least it is simple.
Simply, convert it first to a set and then again to a list
def unique(container):
return list(set(container))
One pass.
a = [1,1,'a','b','c','c']
new_list = []
prev = None
while 1:
try:
i = a.pop(0)
if i != prev:
new_list.append(i)
prev = i
except IndexError:
break
I haven't done any tests, but one possible algorithm might be to create a second list, and iterate through the first list. If an item is not in the second list, add it to the second list.
x = [1, 1, 2, 'a', 'a', 3]
y = []
for each in x:
if each not in y:
y.append(each)
a=[1,2,3,4,5,7,7,8,8,9,9,3,45]
def unique(l):
ids={}
for item in l:
if not ids.has_key(item):
ids[item]=item
return ids.keys()
print a
print unique(a)
Inserting elements will take theta(n)
retrieving if element is exiting or not will take constant time
testing all the items will take also theta(n)
so we can see that this solution will take theta(n).
Bear in mind that dictionary in python implemented by hash table.