Python - summing numbers that is the same - python

a = [2,2,2,3,4,5,5,6,6,6,7,7]
b = [1,2,3,4,5,6,7,8,9,10,11,12]
I want to sum the numbers in 'b' that in 'a' are the same and remove the duplicated numbers in 'a', so the output should look like:
a = [2,3,4,5,6,7]
b = [6,4,5,13,27,23]

Use a list comprehension, zipping the two lists together:
sums = [sum(y for x, y in zip(a, b) if x == i) for i in [j[0] for j in groupby(a)]]

You can use izip_longest and collections.defaultdict this is a fast and comprehensive way for solve this problem as it works if the length of a and b was not the same :
>>> from collections import defaultdict
>>> from itertools import izip_longest
>>> d=defaultdict(int)
>>> for i,j in izip_longest(a,b):
... d[i]+=j
...
>>> d
defaultdict(<type 'int'>, {2: 6, 3: 4, 4: 5, 5: 13, 6: 27, 7: 23})
>>> d.values()
[6, 4, 5, 13, 27, 23]
But as Padraic Cunningham noted dicts are not ordered although in this case the answer is true!!
as an alternative answer but less efficient you can use itertools.groupby :
>>> from itertools import izip_longest,groupby
>>> from operator import itemgetter
>>> [sum([i[1] for i in g]) for _,g in groupby(izip_longest(a,b),key=itemgetter(0))]
[6, 4, 5, 13, 27, 23]

OrderedDict.fromkeys will create a set of the elements in a and keep order:
a = [2,2,2,3,4,5,5,6,6,6,7,7]
b = [1,2,3,4,5,6,7,8,9,10,11,12]
from collections import OrderedDict
from itertools import islice
od = OrderedDict.fromkeys(a,0,)
for ele in a:
od[ele] += 1
it = iter(b)
sums = [sum(islice(it,v)) for v in od.values()]
print(list(od))
print(sums)
[2, 3, 4, 5, 6, 7]
[6, 4, 5, 13, 27, 23]
If you use a set you will have no guaranteed order, it is also unclear if you have elements that repeat later in your list a and what exactly happens if that is the case.
To work with later repeating elements:
a = [2, 2, 2, 3, 4, 5, 5, 6, 6, 7, 6, 7, 7]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13]
from collections import OrderedDict
from itertools import islice, count
it = iter(a)
seq = 1
prev = next(it)
cn = count()
od = OrderedDict()
for ele in it:
if ele == prev:
seq += 1
else:
od[next(cn)] = seq
seq = 1
prev = ele
if prev == ele:
seq += 1
od[next(cn)] = seq
st_a = OrderedDict.fromkeys(a)
it = iter(b)
sums = [sum(islice(it, v)) for v in od.values()]
print(list(st_a))
print(sums)
[0, 1, 2, 3, 4, 5, 6, 7]
[6, 4, 5, 13, 17, 10, 11, 25]

Related

Remove specific words within a list with over twice occurance python

Here is a list A=[1,1,1,2,2,2,4,5,6,6,7,7,7]
Can we have an algorithm that modifies all the numbers that occur more than twice to be maximized twice in list A?
e.g. list new_A=[1,1,2,2,4,5,6,6,7,7]
I have tried the conditions to separate:
if Counter(list A)[num]>2:```
You can use groupby and islice:
from itertools import groupby, islice
lst = [1,1,1,2,2,2,4,5,6,6,7,7,7]
output = [x for _, g in groupby(lst) for x in islice(g, 2)]
print(output) # [1, 1, 2, 2, 4, 5, 6, 6, 7, 7]
You can do something like this:
from collections import Counter
max_accurance =2
list_A=[1,1,1,2,2,2,4,5,6,6,7,7,7]
d = dict(Counter(list_A))
new_list=[]
for k,v in d.items():
if v>=max_accurance:
new_list.extend([k]*max_accurance)
else:
new_list.append(k)
output
[1, 1, 2, 2, 4, 5, 6, 7, 7]
Most compressed way I could think of:
import operator as op
A=[1,1,1,2,2,2,4,5,6,6,7,7,7]
B = []
for elem in set(A):
B.extend([elem, elem]) if op.countOf(A, elem) > 2 else B.extend([elem])
Output:
[1, 1, 2, 2, 4, 5, 6, 7, 7]

Get the corresponding sums of parts of a list

I have a list [0, 1, 2, 3, 4, 5, 6] and I sum its parts so that:
l = [0, 1, 2, 3, 4, 5, 6] -> 21
l = [1, 2, 3, 4, 5, 6] -> 21
l = [2, 3, 4, 5, 6] -> 20
l = [3, 4, 5, 6] -> 18
l = [4, 5, 6] -> 15
l = [5, 6] -> 11
l = [6] -> 6
l = [] -> 0
So, I get the corresponding sums of the list's parts: [21, 21, 20, 18, 15, 11, 6, 0]
The code I use is:
[sum(l[i:]) for i in range(len(l) + 1)]
But, for lists with range greater than 100000 the code slows down significantly.
Any idea why and how to optimize it?
I would suggest itertools.accumulate for this (which i recall is faster than np.cumsum), with some list reversing to get your desired output:
>>> from itertools import accumulate
>>> lst = [0, 1, 2, 3, 4, 5, 6]
>>> list(accumulate(reversed(lst)))[::-1]
[21, 21, 20, 18, 15, 11, 6]
(you can trivially add 0 to the end if needed)
This might help to reduce calculation time for big lists :
l = [0, 1, 2, 3, 4, 5, 6]
output = list(np.cumsum(l[::-1]))[::-1]+[0]
Output :
[21, 21, 20, 18, 15, 11, 6, 0]
Here is one comparison over performance for four different methods, all of which does the same thing :
from timeit import timeit
def sum10(l):
from itertools import accumulate
return list(accumulate(reversed(l)))[::-1]+[0]
def sum11(l):
from itertools import accumulate
return list(accumulate(l[::-1]))[::-1]+[0]
def sum20(l):
from numpy import cumsum
return list(cumsum(l[::-1]))[::-1]+[0]
def sum21(l):
from numpy import cumsum
return list(cumsum(list(reversed(l))))[::-1]+[0]
l = list(range(1000000))
iter_0 = timeit(lambda: sum10(l), number=10) #0.14102990700121154
iter_1 = timeit(lambda: sum11(l), number=10) #0.1336850459993002
nump_0 = timeit(lambda: sum20(l), number=10) #0.6019859320003889
nump_1 = timeit(lambda: sum21(l), number=10) #0.3818727100006072
There is no clean way of doing it with list comprehensions as far as I know.
This code will work without any other libraries:
def cumulative_sum(a):
total= 0
for item in a:
total += item
yield total
list(cumulative_sum(listname))
From Python 3.8 on, there is a new operator that might help:
[(x, total := total + x) for x in items]

Drop values in list except first and last two elements in consecutive integers in python

My list, for example, is
my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
I would like to save the first and last boundary of two elements in consecutive values. So what I need to get is:
output = [1,2,4,5, 9,10,13,14, 20,21,26,27]
How can I simply or efficiently get this result?
Use more_itertools.consecutive_groups
import more_itertools as mit
my_list = [1,2,3,4,5,9,10,11,12,13,14,15]
x = [list(group) for group in mit.consecutive_groups(my_list)]
oputput = []
for i in x:
temp = [i[0],i[1],i[-2],i[-1]]
output.extend(temp)
Output:
[1,2,4,5,9,10,14,15]
Use groupby and itemgetter:
from operator import itemgetter
from itertools import groupby
my_list = [1,2,3,4,5,9,10,11,12,13,14,20,21,22,23,24,25,26,27]
output = []
for k, g in groupby(enumerate(my_list), lambda x: x[0]-x[1]):
lst = list(map(itemgetter(1), g))
output.extend([lst[0], lst[1], lst[-2], lst[-1]])
print(output)
# [1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
Using only the standard itertools module, you can do:
from itertools import count, groupby
def remove_middle_of_seq(lst):
out = []
index = count()
for _, sequence in groupby(lst, lambda value: value - next(index)):
seq = list(sequence)
out.extend([seq[0], seq[1], seq[-2], seq[-1]])
return out
my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
print(remove_middle_of_seq(my_list))
# [1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
In groups of consecutive values, the difference between the values and their index is constant, so groupby can group them using this difference as key.
There isn't really a function that does this kind of thing in the standard library, so you have to write most of it manually. It's easiest to first group all ascending numbers, and then delete the middle of each group:
import itertools
def group_consecutive(sequence):
"""
Aggregates consecutive integers into groups.
>>> group_consecutive([8, 9, 1, 3, 4, 5])
[[8, 9], [1], [3, 4, 5]]
"""
result = []
prev_num = None
for num in sequence:
if prev_num is None or num != prev_num + 1:
group = [num]
result.append(group)
else:
group.append(num)
prev_num = num
return result
def drop_consecutive(sequence, keep_left=2, keep_right=2):
"""
Groups consecutive integers and then keeps only the 2 first and last numbers
in each group. The result is then flattened.
>>> drop_consecutive([1, 2, 3, 4, 5, 8, 9])
[1, 2, 4, 5, 8, 9]
"""
grouped_seq = group_consecutive(sequence)
for group in grouped_seq:
del group[keep_left:-keep_right]
return list(itertools.chain.from_iterable(grouped_seq))
>>> my_list = [1,2,3,4,5, 9,10,11,12,13,14, 20,21,22,23,24,25,26,27]
>>> drop_consecutive(my_list)
[1, 2, 4, 5, 9, 10, 13, 14, 20, 21, 26, 27]
See also:
itertools.chain and itertools.chain.from_iterable
You can pair adjacent list items by zipping the list with itself with an offset of 1, but pad the shifted list with a non-consecutive value, so that you can iterate through the pairings and determine that there is a separate group when the difference of a pair is not 1:
def consecutive_groups(l):
o = []
for a, b in zip([l[0] - 2] + l, l):
if b - a != 1:
o.append([])
o[-1].append(b)
return [s[:2] + s[-2:] for s in o]
Given your sample input, consecutive_groups(my_list) returns:
[[1, 2, 4, 5], [9, 10, 13, 14], [20, 21, 26, 27]]

Optimize making a flat list from a list of strings each evaluable as a list

For example, how would I optimally merge:
res_str = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
for example: ['[{'a': u'中国', 'b': u'美国', 'c': u'日本', 'd': u'德国', 'e': u'法国'},]','[{'a': u'中国', 'b': u'美国', 'c': u'日本', 'd': u'德国', 'e': u'法国'},]',]
into:
[1,2,3,4,5,6,7,8,9,10,11,12]
I used the following code, but is is not fast enough:
[x for j in res_str for x in eval(j)] spend time 0.65s
list(itertools.chain.from_iterable([eval(i) for i in res_str])) spend time 0.57s
Is there a better way to write this?
apart from a generator
(x for j in res_str for x in eval(j))
other way
sum([eval(i) for i in res_str],[]) spend time 3.87s
this way:
import ast
import itertools
l = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
l = list(itertools.chain(*map(ast.literal_eval, l)))
spend time 0.95s
if use eval
list(itertools.chain(*map(eval, res_str)))
spend time 0.58s
this way:
eval('+'.join('+'.join(arr))) spend time 3.5s
this way:
import ast
import numpy as np
res_str = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
print(list(np.array([ast.literal_eval(i) for i in res_str]).flatten()))
spend time 1s
if use eval
list(np.array([eval(i) for i in res_str]).flatten())
spend time 0.58s
Use ast & itertools
Ex:
import ast
import itertools
l = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
l = list(itertools.chain(*map(ast.literal_eval, l)))
print( l )
Output:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
ast.literal_eval to convert string elements to list objects
itertools.chain to flatten the list.
If you like to do it without eval / ast.literal_eval
>>> list(itertools.chain(*[map(int, w.strip('[]').split(',')) for w in l]))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Encountered this problem a while ago. Here's how I did it
List = [[1,2,3],[4,5,6],[7,8,9]]
result = []
for x in range(len(List)):
for y in range(len(List[x])):
result.append(List[x][y])
print(result)
Result prints [1,2,3,4,5,6,7,8,9]
May not be as efficient as some other answers but it works and is more simple
Here is my one-line stylish solution without using itertools and easily readable:
import ast
myList= ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
myNewList = [i for sublist in map(ast.literal_eval, myList) for i in sublist]
print(myNewList)
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Here is also a second solution that may be faster:
import ast
myList = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
myNewList = []
for sublist in myList:
myNewList += ast.literal_eval(sublist)
print(myNewList)
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
You can try the below simple approach.
>>> arr
['[1,2,3]', '[4,5,6]', '[7,8,9]', '[10,11,12]']
>>>
>>> '+'.join(arr)
'[1,2,3]+[4,5,6]+[7,8,9]+[10,11,12]'
>>>
>>> eval('+'.join(arr))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
>>>
Another approach using reduce() and lambda.
>>> import json
>>>
>>> arr = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
>>>
>>> arr2 = reduce(lambda list1, list2: list1 + '+' + list2, arr)
>>>
>>> arr2
'[1,2,3]+[4,5,6]+[7,8,9]+[10,11,12]'
>>>
>>> eval(arr2)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
ast.literal_eval+numpy.flatten:
import ast
import numpy as np
res_str = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
print(list(np.array([ast.literal_eval(i) for i in res_str]).flatten()))
and:
import ast
l = []
res_str = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
for i in res_str:
l.extend(ast.literal_eval(i))
print(l)
Using List Comprehension
import json
string_list = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
output_list = [y for x in string_list for y in json.loads(x)]
print output_list
OUTPUT
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Here the time taken is only to traverse the list and space complexity is the new list.
import json
string_list = ['[1,2,3]','[4,5,6]','[7,8,9]','[10,11,12]']
output_list = []
for str_list in string_list:
output_list.extend(json.loads(str_list))
print output_list
Output
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Create dictionary where keys are from a list and values are the sum of corresponding elements in another list

I have two lists L1 and L2. Each unique element in L1 is a key which has a value in the second list L2. I want to create a dictionary where the values are the sum of elements in L2 that are associated to the same key in L1.
I did the following but I am not very proud of this code. Is there any simpler pythonic way to do it ?
L = [2, 3, 7, 3, 4, 5, 2, 7, 7, 8, 9, 4] # as L1
W = range(len(L)) # as L2
d = { l:[] for l in L }
for l,w in zip(L,W): d[l].append(w)
d = {l:sum(v) for l,v in d.items()}
EDIT:
Q: How do I know which elements of L2 are associated to a given key element of L1?
A: if they have the same index. For example if the element 7 is repeated 3 times in L1 (e.g. L1[2] == L1[7] == L1[8] = 7), then I want the value of the key 7 to be L2[2]+L2[7]+L2[8]
You can use enumerate() in order to access to item's index while you loop over the list and use collections.defaultdict() (by passing the int as it's missing function which will be evaluated as 0 at first time) to preserve the items and add the values while encounter a duplicate key:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i,j in enumerate(L):
... d[j]+=i
...
>>> d
defaultdict(<type 'int'>, {2: 6, 3: 4, 4: 15, 5: 5, 7: 17, 8: 9, 9: 10})
If you don't need the intermediate dict of lists you can use the collections.Counter:
import collections
L = [2, 3, 7, 3, 4, 5, 2, 7, 7, 8, 9, 4] # as L1
W = range(len(L)) # as L2
d2 = collections.Counter()
for i, value in enumerate(L):
d2[value] += i
which behaves like a normal dict:
Counter({2: 6, 3: 4, 4: 15, 5: 5, 7: 17, 8: 9, 9: 10})
Hope this may help you.
L = [2, 3, 7, 3, 4, 5, 2, 7, 7, 8, 9, 4] # as L1
dict_a = dict.fromkeys(set(L),0)
for l,w in enumerate(L):
dict_a[w] = int(dict_a[w]) + l

Categories

Resources