Sum columns of parts of 2D array Python - python

how do you take the sum of the two last columns if the two first columns matches?
input:
M = [[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4]]
output:
M = [[1,1,7,11],
[1,2,9,13],
[2,1,3,13],
[2,2,12,10]]
and can you do it whit a for loop?

Assuming that 2 similar lists follow each other always you could iterate over M[:-1] and then check the current list values against the next list values
M =[[1,1,3,5],[1,1,4,6],[1,2,3,7],[1,2,6,6],[2,1,0,8],[2,1,3,5],[2,2,9,6],[2,2,3,4]]
t=[]
for i,m in enumerate(M[:-1]):
if m[0] == M[i+1][0] and m[1]==M[i+1][1]:
t.append([m[0],m[1],m[2]+M[i+1][2],m[3]+M[i+1][3]])
print(t)
#[[1, 1, 7, 11], [1, 2, 9, 13], [2, 1, 3, 13], [2, 2, 12, 10]]
If the order might be scrambled I'd use 2 for loops. The second one will check m against every other list after it (it doesn't need to check those earlier since they checked against it).
for i,m in enumerate(M[:-1]):
for x,n in enumerate(M[i+1:]):
if m[0] == n[0] and m[1]==n[1]:
t.append([m[0],m[1],m[2]+n[2],m[3]+n[3]])

We can find the unique tuples in the first two columns and then iterate over those to find the sum of each column whose rows equal the tuple.
Not sure what the fastest solution is, but this is one option:
M =[[1,1,3,5],[1,1,4,6],[1,2,3,7],[1,2,6,6],[2,1,0,8],[2,1,3,5],[2,2,9,6],[2,2,3,4]]
ans = []
for vals in list(set((x[0], x[1]) for x in M)):
ans.append([vals[0], vals[1], sum(res[2] for res in M if (res[0], res[1]) == vals), sum(res[3] for res in M if (res[0], res[1]) == vals)])

A solution with list comprehension and itertools' groupby:
from itertools import groupby
M = [
[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4],
]
print([
[
key[0],
key[1],
sum(x[2] for x in group),
sum(x[3] for x in group),
]
for key, group in [
(key, list(group))
for key, group in groupby(sorted(M), lambda x: (x[0], x[1]))
]
])
Result:
[[1, 1, 7, 11], [1, 2, 9, 13], [2, 1, 3, 13], [2, 2, 12, 10]]
With reduce it can be simplified to:
from itertools import groupby
from functools import reduce
M = [
[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4],
]
print([
reduce(
lambda x, y: [y[0], y[1], y[2] + x[2], y[3] + x[3]],
group,
(0, 0, 0, 0),
)
for _, group in groupby(sorted(M), lambda x: (x[0], x[1]))
])

Related

Finding consecutive duplicates and listing their indexes of where they occur in python

I have a list in python for example:
mylist = [1,1,1,1,1,1,1,1,1,1,1,
0,0,1,1,1,1,0,0,0,0,0,
1,1,1,1,1,1,1,1,0,0,0,0,0,0]
my goal is to find where there are five or more zeros in a row and then list the indexes of where this happens, for example the output for this would be:
[17,21][30,35]
here is what i have tried/seen in other questions asked on here:
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
runs = zero_runs(list)
this gives output:
[0,10]
[11,12]
...
which is basically just listing indexes of all duplicates, how would i go about separating this data into what i need
You could use itertools.groupby, it will identify the contiguous groups in the list:
from itertools import groupby
lst = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
groups = [(k, sum(1 for _ in g)) for k, g in groupby(lst)]
cursor = 0
result = []
for k, l in groups:
if not k and l >= 5:
result.append([cursor, cursor + l - 1])
cursor += l
print(result)
Output
[[17, 21], [30, 35]]
Your current attempt is very close. It returns all of the runs of consecutive zeros in an array, so all you need to accomplish is adding a check to filter runs of less than 5 consecutive zeros out.
def threshold_zero_runs(a, threshold):
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
m = (np.diff(ranges, 1) >= threshold).ravel()
return ranges[m]
array([[17, 22],
[30, 36]], dtype=int64)
Use the shift operator on the array. Compare the shifted version with the original. Where they do not match, you have a transition. You then need only to identify adjacent transitions that are at least 5 positions apart.
Can you take it from there?
Another way using itertools.groupby and enumerate.
First find the zeros and the indices:
from operator import itemgetter
from itertools import groupby
zerosList = [
list(map(itemgetter(0), g))
for i, g in groupby(enumerate(mylist), key=itemgetter(1))
if not i
]
print(zerosList)
#[[11, 12], [17, 18, 19, 20, 21], [30, 31, 32, 33, 34, 35]]
Now just filter zerosList:
runs = [[x[0], x[-1]] for x in zerosList if len(x) >= 5]
print(runs)
#[[17, 21], [30, 35]]

matching occurrence of opposite number in a list python

I have a list such has
results = [100, 100, -100, 100, -100, -100]
I would like to figure out the first occurrence of the opposite number. so first 100 would be match with the first -100, the second 100 would be match with the second -100.
I would like to have position as output such has:
[0, 2], [1, 4], [3, 5]
i.e : [0,2] represent the results[0] and results[2] where first occurrence of 100 is match with the first occurrence of -100
edit : you can assume there will always be the same amount of positive / negative and that the list will only contain 1 number
any help would be appricated
For your simple case where the list only contains 2 integers (x and -x), you could simply zip() together the indexes:
indexes = [[],[]]
for i,x in enumerate(results):
indexes[0].append(i) if x > 0 else indexes[1].append(i)
list(zip(*indexes))
Example:
>>> results = [100, 100, -100, 100, -100, -100]
>>> indexes = [[],[]]
>>> for i,x in enumerate(results): indexes[0].append(i) if x > 0 else indexes[1].append(i)
...
>>> list(zip(*indexes))
[(0, 2), (1, 4), (3, 5)]
Note for small inputs 2 separate list comprehensions (e.g. [i for i,x in enumerate(results) if x > 0] may be faster than appending in a for loop.
IMO, the fastest approach (for large inputs) should be the following one (though, my solution doesn't assume that the input list contains just one value and its opposite, so it can be made even faster if that assumption is added):
x = [100, 300, -300, 100, -100, -100]
from collections import defaultdict, deque
unmatched_positives = defaultdict(deque)
solution=[]
for i, val in enumerate(x):
if val > 0:
unmatched_positives[val].append(i)
else:
solution.append( (unmatched_positives[-val].popleft(), i) )
print('Unsorted solution:', solution)
# If you need the result to be sorted
print('Sorted solution:', sorted(solution))
Output:
Unsorted solution: [(1, 2), (0, 4), (3, 5)]
Sorted solution: [(0, 4), (1, 2), (3, 5)]
This should work:
results = [100, 100, -100, 100, -100, -100]
solution = []
for i, x in enumerate(results):
if x > 0 and isinstance(x, int):
y = results.index(-x)
results[results.index(-x)] = 'found'
solution.append([i,y])
print solution
This would work as well for the general case in which different numbers occur:
solutions = []
for x in set(abs(x) for x in results):
solutions += list(zip([i for i, x2 in enumerate(results) if x2 == x],
[i for i, x2 in enumerate(results) if x2 == x*-1]))
Well we can do this efficiently in two phases. In the analysis phase, we filter out positive numbers, sort them and group them by index, like:
from itertools import groupby
subresult = dict(map(lambda x:(x[0],iter(tuple(x[1]))),
groupby(sorted(filter(lambda x:x[1] < 0,enumerate(results)),
key=lambda x:x[::-1]),lambda x:x[1])
))
Or we can generate it step-by-step, like:
subresult = filter(lambda x:x[1] < 0,enumerate(results)) # filter negative values
subresult = sorted(subresult,key=lambda x:x[::-1]) # sort them on value and then on index
subresult = groupby(subresult,lambda x:x[1]) # group them on the value
subresult = map(lambda x:(x[0],iter(tuple(x[1]))),subresult) # construct a sequence of tuples (value,list of indices)
subresult = dict(subresult) # make it a dictionary
This generates a dictionary:
{-100: <itertools._grouper object at 0x7fedfb523ef0>}
Next in construction phase, we iterate over all positive integers, and always take the next opposite one from the subresult dictionary. Like:
end_result = [[i,next(subresult[-v])[0]] for i,v in enumerate(results) if v > 0]
This generates:
>>> subresult = dict(map(lambda x:(x[0],iter(tuple(x[1]))),groupby(sorted(filter(lambda x:x[1] < 0,enumerate(results)),key=lambda x:x[::-1]),lambda x:x[1])))
>>> [[i,next(subresult[-v])[0]] for i,v in enumerate(results) if v > 0]
[[0, 2], [1, 4], [3, 5]]
Usually because of the dictionary lookup and because we use an iterator (that thus does bookkeeping on at which index we are), this will work quite efficiently.
How about this simple observation based approach? Split it into two lists using list comprehension and then just zip them in the order you want it.
Using list comprehension
In [18]: neg_list = [idx for idx, el in enumerate(results) if el < 0]
In [19]: pos_list = [idx for idx, el in enumerate(results) if el > 0]
In [20]: neg_list
Out[20]: [2, 4, 5]
In [21]: pos_list
Out[21]: [0, 1, 3]
In [22]: list(zip(pos_list, neg_list))
Out[22]: [(0, 2), (1, 4), (3, 5)]
You can also modify what index you need from the order you zip them.
NumPy Version:
For larger lists (or arrays equivalently), the numpy version should be much faster.
In [30]: res = np.array(results)
In [38]: pos_idx = np.where(res > 0)[0]
In [39]: pos_idx
Out[39]: array([0, 1, 3])
In [40]: neg_idx = np.where(res < 0)[0]
In [42]: neg_idx
Out[42]: array([2, 4, 5])
In [44]: list(zip(pos_idx, neg_idx))
Out[44]: [(0, 2), (1, 4), (3, 5)]
# If you want to avoid using zip, then
# just use np.vstack and transpose the result
In [59]: np.vstack((pos_idx, neg_idx)).T
Out[59]:
array([[0, 2],
[1, 4],
[3, 5]])
P.S.: You could also use generator comprehension to achieve the same result but please note that it will be exhausted after you convert the generator to list once.
Using generator comprehension
In [24]: neg_gen = (idx for idx, el in enumerate(results) if el < 0)
In [25]: pos_gen = (idx for idx, el in enumerate(results) if el > 0)
In [27]: list(zip(pos_gen, neg_gen))
Out[27]: [(0, 2), (1, 4), (3, 5)]
# on 2nd run, there won't be any element in the generator.
In [28]: list(zip(pos_gen, neg_gen))
Out[28]: []
pos = {}
for i,item in enumerate(results ):
if item < 0: continue
if item not in pos:
pos[item] = []
pos[item].append(i)
[ [pos[-item].pop(0), i] for i,item in enumerate(results ) if item < 0]
[[0, 2], [1, 4], [3, 5]]
For the sample case where results only contains two different integers:
import numpy as np
results = np.array([100, 100, -100, 100, -100, -100])
output = list(zip(np.where(results > 0)[0], np.where(results < 0)[0]))
Output:
[(0, 2), (1, 4), (3, 5)]
Time is ~0.002 for results * 1000.

Python Subtract Arrays Based on Same Time

Is there a way I can subtract two arrays, but making sure I am subtracting elements that have the same day, hour, year, and or minute values?
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'] [20, '2013-06-25'], [20, '2013-06-30']]
Looking for:
list1-list2 = [[5, '2013-06-18'], [45, '2013-06-23'] [10, '2013-06-30']]
How about using a defaultdict of lists?
import itertools
from operator import sub
from collections import defaultdict
def subtract_lists(l1, l2):
data = defaultdict(list)
for sublist in itertools.chain(l1, l2):
value, date = sublist
data[date].append(value)
return [(reduce(sub, v), k) for k, v in data.items() if len(v) > 1]
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'], [20, '2013-06-25'], [20, '2013-06-30']]
>>> subtract_lists(list1, list2)
[(-5, '2013-06-30'), (45, '2013-06-23'), (5, '2013-06-18')]
>>> # if you want them sorted by date...
>>> sorted(subtract_lists(list1, list2), key=lambda t: t[1])
[(5, '2013-06-18'), (45, '2013-06-23'), (-5, '2013-06-30')]
Note that the difference for date 2013-06-30 is -5, not +5.
This works by using the date as a dictionary key for a list of all values for the given date. Then those lists having more than one value in its list are selected, and the values in those lists are reduced by subtraction. If you want the resulting list sorted, you can do so using sorted() with the date item as the key. You could move that operation into the subtract_lists() function if you always want that behavior.
I think this code does what you want:
list1 = [[10, '2013-06-18'],[20, '2013-06-19'], [50, '2013-06-23'], [15, '2013-06-30']]
list2 = [[5, '2013-06-18'], [5, '2013-06-23'], [20, '2013-06-25'], [20, '2013-06-30']]
list1=dict([[i[1],i[0]] for i in list1])
list2=dict([[i[1],i[0]] for i in list2])
def minus(a,b):
return { k: a.get(k, 0) - b.get(k, 0) for k in set(a) & set(b) }
minus(list2,list1)
# returns the below, which is now a dictionary
{'2013-06-18': 5, '2013-06-23': 45, '2013-06-30': 5}
# you can convert it back into your format like this
data = [[value,key] for key, value in minus(list1,list2).iteritems()]
But you seem to have an error in your output data. If you want to include data when it's in either list, define minus like this instead:
def minus(a,b):
return { k: a.get(k, 0) - b.get(k, 0) for k in set(a) | set(b) }
See this answer, on Merge and sum of two dictionaries, for more info.

Python sort array by another positions array

Assume I have two arrays, the first one containing int data, the second one containing positions
a = [11, 22, 44, 55]
b = [0, 1, 10, 11]
i.e. I want a[i] to be be moved to position b[i] for all i. If I haven't specified a position, then insert a -1
i.e
sorted_a = [11, 22,-1,-1,-1,-1,-1,-1,-1,-1, 44, 55]
^ ^ ^ ^
0 1 10 11
Another example:
a = [int1, int2, int3]
b = [5, 3, 1]
sorted_a = [-1, int3, -1, int2, -1, int1]
Here's what I've tried:
def sort_array_by_second(a, b):
sorted = []
for e1 in a:
sorted.appendAt(b[e1])
return sorted
Which I've obviously messed up.
Something like this:
res = [-1]*(max(b)+1) # create a list of required size with only -1's
for i, v in zip(b, a):
res[i] = v
The idea behind the algorithm:
Create the resulting list with a size capable of holding up to the largest index in b
Populate this list with -1
Iterate through b elements
Set elements in res[b[i]] with its proper value a[i]
This will leave the resulting list with -1 in every position other than the indexes contained in b, which will have their corresponding value of a.
I would use a custom key function as an argument to sort. This will sort the values according to the corresponding value in the other list:
to_be_sorted = ['int1', 'int2', 'int3', 'int4', 'int5']
sort_keys = [4, 5, 1, 2, 3]
sort_key_dict = dict(zip(to_be_sorted, sort_keys))
to_be_sorted.sort(key = lambda x: sort_key_dict[x])
This has the benefit of not counting on the values in sort_keys to be valid integer indexes, which is not a very stable thing to bank on.
>>> a = ["int1", "int2", "int3", "int4", "int5"]
>>> b = [4, 5, 1, 2, 3]
>>> sorted(a, key=lambda x, it=iter(sorted(b)): b.index(next(it)))
['int4', 'int5', 'int1', 'int2', 'int3']
Paulo Bu answer is the best pythonic way. If you want to stick with a function like yours:
def sort_array_by_second(a, b):
sorted = []
for n in b:
sorted.append(a[n-1])
return sorted
will do the trick.
Sorts A by the values of B:
A = ['int1', 'int2', 'int3', 'int4', 'int5']
B = [4, 5, 1, 2, 3]
from operator import itemgetter
C = [a for a, b in sorted(zip(A, B), key = itemgetter(1))]
print C
Output
['int3', 'int4', 'int5', 'int1', 'int2']
a = [11, 22, 44, 55] # values
b = [0, 1, 10, 11] # indexes to sort by
sorted_a = [-1] * (max(b) + 1)
for index, value in zip(b, a):
sorted_a[index] = value
print(sorted_a)
# -> [11, 22, -1, -1, -1, -1, -1, -1, -1, -1, 44, 55]

Finding indices of the lowest 5 numbers of an array in Python

How to find out which indices belong to the lowest x (say, 5) numbers of an array?
[10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7]
Also, how to directly find the sorted (from low to high) lowest x numbers?
The existing answers are nice, but here's the solution if you're using numpy:
mylist = np.array([10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7])
x = 5
lowestx = np.argsort(mylist)[:x]
#array([ 2, 3, 5, 10, 4])
You could do something like this:
>>> l = [5, 1, 2, 4, 6]
>>> sorted(range(len(l)), key=lambda i: l[i])
[1, 2, 3, 0, 4]
mylist = [10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7]
# lowest 5
lowest = sorted(mylist)[:5]
# indices of lowest 5
lowest_ind = [i for i, v in enumerate(mylist) if v in lowest]
# 5 indices of lowest 5
import operator
lowest_5ind = [i for i, v in sorted(enumerate(mylist), key=operator.itemgetter(1))[:5]]
[a.index(b) for b in sorted(a)[:5]]
sorted(a)[.x]

Categories

Resources