Iterate over numpy array in a specific order based on values - python

I want to iterate over a numpy array starting at the index of the highest value working through to the lowest value
import numpy as np #imports numpy package
elevation_array = np.random.rand(5,5) #creates a random array 5 by 5
print elevation_array # prints the array out
ravel_array = np.ravel(elevation_array)
sorted_array_x = np.argsort(ravel_array)
sorted_array_y = np.argsort(sorted_array_x)
sorted_array = sorted_array_y.reshape(elevation_array.shape)
for index, rank in np.ndenumerate(sorted_array):
print index, rank
I want it to print out:
index of the highest value
index of the next highest value
index of the next highest value etc

If you want numpy doing the heavy lifting, you can do something like this:
>>> a = np.random.rand(100, 100)
>>> sort_idx = np.argsort(a, axis=None)
>>> np.column_stack(np.unravel_index(sort_idx[::-1], a.shape))
array([[13, 62],
[26, 77],
[81, 4],
...,
[83, 40],
[17, 34],
[54, 91]], dtype=int64)
You first get an index that sorts the whole array, and then convert that flat index into pairs of indices with np.unravel_index. The call to np.column_stack simply joins the two arrays of coordinates into a single one, and could be replaced by the Python zip(*np.unravel_index(sort_idx[::-1], a.shape)) to get a list of tuples instead of an array.

Try this:
from operator import itemgetter
>>> a = np.array([[2, 7], [1, 4]])
array([[2, 7],
[1, 4]])
>>> sorted(np.ndenumerate(a), key=itemgetter(1), reverse=True)
[((0, 1), 7),
((1, 1), 4),
((0, 0), 2),
((1, 0), 1)]
you can iterate this list if you so wish. Essentially I am telling the function sorted to order the elements of np.ndenumerate(a) according to the key itemgetter(1). This function itemgetter gets the second (index 1) element from the tuples ((0, 1), 7), ((1, 1), 4), ... (i.e the values) generated by np.ndenumerate(a).

Related

Numpy/Pandas: Merge two numpy arrays based on one array efficiently

I have two numpy arrays comprised of two-set tuples:
a = [(1, "alpha"), (2, 3), ...]
b = [(1, "zylo"), (1, "xen"), (2, "potato", ...]
The first element in the tuple is the identifier and shared between both arrays, so I want to create a new numpy array which looks like this:
[(1, "alpha", "zylo", "xen"), (2, 3, "potato"), etc...]
My current solution works, but it's way too inefficient for me. Looks like this:
aggregate_collection = []
for tuple_set in a:
for tuple_set2 in b:
if tuple_set[0] == tuple_set2[0] and other_condition:
temp_tup = (tuple_set[0], other tuple values)
aggregate_collection.append(temp_tup)
How can I do this efficiently?
I'd concatenate these into a data frame and just groupby+agg
(pd.concat([pd.DataFrame(a), pd.DataFrame(b)])
.groupby(0)
.agg(lambda s: [s.name, *s])[1])
where 0 and 1 are the default column names given by creating a dataframe via pd.DataFrame. Change it to your column names.
In [278]: a = [(1, "alpha"), (2, 3)]
...: b = [(1, "zylo"), (1, "xen"), (2, "potato")]
In [279]: a
Out[279]: [(1, 'alpha'), (2, 3)]
In [280]: b
Out[280]: [(1, 'zylo'), (1, 'xen'), (2, 'potato')]
Note that if I try to make an array from a I get something quite different.
In [281]: np.array(a)
Out[281]:
array([['1', 'alpha'],
['2', '3']], dtype='<U21')
In [282]: _.shape
Out[282]: (2, 2)
defaultdict is a handy tool for collecting like-keyed values
In [283]: from collections import defaultdict
In [284]: dd = defaultdict(list)
In [285]: for tup in a+b:
...: k,v = tup
...: dd[k].append(v)
...:
In [286]: dd
Out[286]: defaultdict(list, {1: ['alpha', 'zylo', 'xen'], 2: [3, 'potato']})
which can be cast as a list of tuples with:
In [288]: [(k,*v) for k,v in dd.items()]
Out[288]: [(1, 'alpha', 'zylo', 'xen'), (2, 3, 'potato')]
I'm using a+b to join the lists, since it apparently doesn't matter where the tuples occur.
Out[288] is even a poor numpy fit, since the tuples differ in size, and items (other than the first) might be strings or numbers.

Pandas Group 2-D NumPy Data by Range of Values

I have a large data set in the form of a 2D array. The 2D array represents continuous intensity data and I want to use this to create another 2D array of the same size only this time, the values are grouped into discreet values. In other words if I have a 2D array like this,
[(11, 23, 33, 12),
(21, 31, 13, 19),
(33, 22, 26, 31)]
The output would be as shown below with the values from 10 to 19 assigned to 1, 20 to 29 assigned to 2 and 30 to 39 assigned to 3.
[(1, 2, 3, 1),
(2, 3, 1, 1),
(3, 2, 2, 3)]
More ideally, I would like to make these assignments based on percentiles. As in, the values that fall into the top ten percent get assigned to 5, the values in the top 20 to 4 and so on.
My data set is in a NumPy format. I have looked at the functions groupby but this does not seem to allow me to specify ranges. I have also looked at cut however cut only works on 1D arrays. I have considered running the cut function through a loop as I go through each row of the data but I am concerned that this may take too much time. My matrices could be as big as 4000 rows by 4000 columns.
You need to stack the dataframe to have a 1-D representation and then apply cut. After that you can unstack it.
[tuple(x) for x in (pd.cut(pd.DataFrame(a).stack(), bins=[10,20,30,40], labels=False)+1).unstack().values]
OR (using #user3483203's magic)
[tuple(x) for x in np.searchsorted([10, 20, 30, 40], np.array(a))]
Output:
[(1, 2, 3, 1),
(2, 3, 1, 1),
(3, 2, 2, 3)]

matching occurrence of opposite number in a list python

I have a list such has
results = [100, 100, -100, 100, -100, -100]
I would like to figure out the first occurrence of the opposite number. so first 100 would be match with the first -100, the second 100 would be match with the second -100.
I would like to have position as output such has:
[0, 2], [1, 4], [3, 5]
i.e : [0,2] represent the results[0] and results[2] where first occurrence of 100 is match with the first occurrence of -100
edit : you can assume there will always be the same amount of positive / negative and that the list will only contain 1 number
any help would be appricated
For your simple case where the list only contains 2 integers (x and -x), you could simply zip() together the indexes:
indexes = [[],[]]
for i,x in enumerate(results):
indexes[0].append(i) if x > 0 else indexes[1].append(i)
list(zip(*indexes))
Example:
>>> results = [100, 100, -100, 100, -100, -100]
>>> indexes = [[],[]]
>>> for i,x in enumerate(results): indexes[0].append(i) if x > 0 else indexes[1].append(i)
...
>>> list(zip(*indexes))
[(0, 2), (1, 4), (3, 5)]
Note for small inputs 2 separate list comprehensions (e.g. [i for i,x in enumerate(results) if x > 0] may be faster than appending in a for loop.
IMO, the fastest approach (for large inputs) should be the following one (though, my solution doesn't assume that the input list contains just one value and its opposite, so it can be made even faster if that assumption is added):
x = [100, 300, -300, 100, -100, -100]
from collections import defaultdict, deque
unmatched_positives = defaultdict(deque)
solution=[]
for i, val in enumerate(x):
if val > 0:
unmatched_positives[val].append(i)
else:
solution.append( (unmatched_positives[-val].popleft(), i) )
print('Unsorted solution:', solution)
# If you need the result to be sorted
print('Sorted solution:', sorted(solution))
Output:
Unsorted solution: [(1, 2), (0, 4), (3, 5)]
Sorted solution: [(0, 4), (1, 2), (3, 5)]
This should work:
results = [100, 100, -100, 100, -100, -100]
solution = []
for i, x in enumerate(results):
if x > 0 and isinstance(x, int):
y = results.index(-x)
results[results.index(-x)] = 'found'
solution.append([i,y])
print solution
This would work as well for the general case in which different numbers occur:
solutions = []
for x in set(abs(x) for x in results):
solutions += list(zip([i for i, x2 in enumerate(results) if x2 == x],
[i for i, x2 in enumerate(results) if x2 == x*-1]))
Well we can do this efficiently in two phases. In the analysis phase, we filter out positive numbers, sort them and group them by index, like:
from itertools import groupby
subresult = dict(map(lambda x:(x[0],iter(tuple(x[1]))),
groupby(sorted(filter(lambda x:x[1] < 0,enumerate(results)),
key=lambda x:x[::-1]),lambda x:x[1])
))
Or we can generate it step-by-step, like:
subresult = filter(lambda x:x[1] < 0,enumerate(results)) # filter negative values
subresult = sorted(subresult,key=lambda x:x[::-1]) # sort them on value and then on index
subresult = groupby(subresult,lambda x:x[1]) # group them on the value
subresult = map(lambda x:(x[0],iter(tuple(x[1]))),subresult) # construct a sequence of tuples (value,list of indices)
subresult = dict(subresult) # make it a dictionary
This generates a dictionary:
{-100: <itertools._grouper object at 0x7fedfb523ef0>}
Next in construction phase, we iterate over all positive integers, and always take the next opposite one from the subresult dictionary. Like:
end_result = [[i,next(subresult[-v])[0]] for i,v in enumerate(results) if v > 0]
This generates:
>>> subresult = dict(map(lambda x:(x[0],iter(tuple(x[1]))),groupby(sorted(filter(lambda x:x[1] < 0,enumerate(results)),key=lambda x:x[::-1]),lambda x:x[1])))
>>> [[i,next(subresult[-v])[0]] for i,v in enumerate(results) if v > 0]
[[0, 2], [1, 4], [3, 5]]
Usually because of the dictionary lookup and because we use an iterator (that thus does bookkeeping on at which index we are), this will work quite efficiently.
How about this simple observation based approach? Split it into two lists using list comprehension and then just zip them in the order you want it.
Using list comprehension
In [18]: neg_list = [idx for idx, el in enumerate(results) if el < 0]
In [19]: pos_list = [idx for idx, el in enumerate(results) if el > 0]
In [20]: neg_list
Out[20]: [2, 4, 5]
In [21]: pos_list
Out[21]: [0, 1, 3]
In [22]: list(zip(pos_list, neg_list))
Out[22]: [(0, 2), (1, 4), (3, 5)]
You can also modify what index you need from the order you zip them.
NumPy Version:
For larger lists (or arrays equivalently), the numpy version should be much faster.
In [30]: res = np.array(results)
In [38]: pos_idx = np.where(res > 0)[0]
In [39]: pos_idx
Out[39]: array([0, 1, 3])
In [40]: neg_idx = np.where(res < 0)[0]
In [42]: neg_idx
Out[42]: array([2, 4, 5])
In [44]: list(zip(pos_idx, neg_idx))
Out[44]: [(0, 2), (1, 4), (3, 5)]
# If you want to avoid using zip, then
# just use np.vstack and transpose the result
In [59]: np.vstack((pos_idx, neg_idx)).T
Out[59]:
array([[0, 2],
[1, 4],
[3, 5]])
P.S.: You could also use generator comprehension to achieve the same result but please note that it will be exhausted after you convert the generator to list once.
Using generator comprehension
In [24]: neg_gen = (idx for idx, el in enumerate(results) if el < 0)
In [25]: pos_gen = (idx for idx, el in enumerate(results) if el > 0)
In [27]: list(zip(pos_gen, neg_gen))
Out[27]: [(0, 2), (1, 4), (3, 5)]
# on 2nd run, there won't be any element in the generator.
In [28]: list(zip(pos_gen, neg_gen))
Out[28]: []
pos = {}
for i,item in enumerate(results ):
if item < 0: continue
if item not in pos:
pos[item] = []
pos[item].append(i)
[ [pos[-item].pop(0), i] for i,item in enumerate(results ) if item < 0]
[[0, 2], [1, 4], [3, 5]]
For the sample case where results only contains two different integers:
import numpy as np
results = np.array([100, 100, -100, 100, -100, -100])
output = list(zip(np.where(results > 0)[0], np.where(results < 0)[0]))
Output:
[(0, 2), (1, 4), (3, 5)]
Time is ~0.002 for results * 1000.

Python: Calculate difference between all elements in a set of integers

I want to calculate absolute difference between all elements in a set of integers. I am trying to do abs(x-y) where x and y are two elements in the set. I want to do that for all combinations and save the resulting list in a new set.
I want to calculate absolute difference between all elements in a set of integers (...) and save the resulting list in a new set.
You can use itertools.combinations:
s = { 1, 4, 7, 9 }
{ abs(i - j) for i,j in combinations(s, 2) }
=>
set([8, 2, 3, 5, 6])
combinations returns the r-length tuples of all combinations in s without replacement, i.e.:
list(combinations(s, 2))
=>
[(9, 4), (9, 1), (9, 7), (4, 1), (4, 7), (1, 7)]
As sets do not maintain order, you may use something like an ordered-set and iterate till last but one.
For completeness, here's a solution based on Numpy ndarray's and pdist():
In [69]: import numpy as np
In [70]: from scipy.spatial.distance import pdist
In [71]: s = {1, 4, 7, 9}
In [72]: set(pdist(np.array(list(s))[:, None], 'cityblock'))
Out[72]: {2.0, 3.0, 5.0, 6.0, 8.0}
Here is another solution based on numpy:
data = np.array([33,22,21,1,44,54])
minn = np.inf
index = np.array(range(data.shape[0]))
for i in range(data.shape[0]):
to_sub = (index[:i], index[i+1:])
temp = np.abs(data[i] - data[np.hstack(to_sub)])
min_temp = np.min(temp)
if min_temp < minn : minn = min_temp
print('Min difference is',minn)
Output: "Min difference is 1"
Here is another way using combinations:
from itertools import combinations
def find_differences(lst):
" Find all differences, min & max difference "
d = [abs(i - j) for i, j in combinations(set(lst), 2)]
return min(d), max(d), d
Test:
list_of_nums = [1, 9, 7, 13, 56, 5]
min_, max_, diff_ = find_differences(list_of_nums)
print(f'All differences: {diff_}\nMaximum difference: {max_}\nMinimum difference: {min_}')
Result:
All differences: [4, 6, 8, 12, 55, 2, 4, 8, 51, 2, 6, 49, 4, 47, 43]
Maximum difference: 55
Minimum difference: 2

Apply function to an array of tuples

I have a function that I would like to apply to an array of tuples and I am wondering if there is a clean way to do it.
Normally, I could use np.vectorize to apply the function to each item in the array, however, in this case "each item" is a tuple so numpy interprets the array as a 3d array and applies the function to each item within the tuple.
So I can assume that the incoming array is one of:
tuple
1 dimensional array of tuples
2 dimensional array of tuples
I can probably write some looping logic but it seems like numpy most likely has something that does this more efficiently and I don't want to reinvent the wheel.
This is an example. I am trying to apply the tuple_converter function to each tuple in the array.
array_of_tuples1 = np.array([
[(1,2,3),(2,3,4),(5,6,7)],
[(7,2,3),(2,6,4),(5,6,6)],
[(8,2,3),(2,5,4),(7,6,7)],
])
array_of_tuples2 = np.array([
(1,2,3),(2,3,4),(5,6,7),
])
plain_tuple = (1,2,3)
# Convert each set of tuples
def tuple_converter(tup):
return tup[0]**2 + tup[1] + tup[2]
# Vectorizing applies the formula to each integer rather than each tuple
tuple_converter_vectorized = np.vectorize(tuple_converter)
print(tuple_converter_vectorized(array_of_tuples1))
print(tuple_converter_vectorized(array_of_tuples2))
print(tuple_converter_vectorized(plain_tuple))
Desired Output for array_of_tuples1:
[[ 6 11 38]
[54 14 37]
[69 13 62]]
Desired Output for array_of_tuples2:
[ 6 11 38]
Desired Output for plain_tuple:
6
But the code above produces this error (because it is trying to apply the function to an integer rather than a tuple.)
<ipython-input-209-fdf78c6f4b13> in tuple_converter(tup)
10
11 def tuple_converter(tup):
---> 12 return tup[0]**2 + tup[1] + tup[2]
13
14
IndexError: invalid index to scalar variable.
array_of_tuples1 and array_of_tuples2 are not actually arrays of tuples, but just 3- and 2-dimensional arrays of integers:
In [1]: array_of_tuples1 = np.array([
...: [(1,2,3),(2,3,4),(5,6,7)],
...: [(7,2,3),(2,6,4),(5,6,6)],
...: [(8,2,3),(2,5,4),(7,6,7)],
...: ])
In [2]: array_of_tuples1
Out[2]:
array([[[1, 2, 3],
[2, 3, 4],
[5, 6, 7]],
[[7, 2, 3],
[2, 6, 4],
[5, 6, 6]],
[[8, 2, 3],
[2, 5, 4],
[7, 6, 7]]])
So, instead of vectorizing your function, because it then will basically for-loop through the elements of the array (integers), you should apply it on the suitable axis (the axis of the "tuples") and not care about the type of the sequence:
In [6]: np.apply_along_axis(tuple_converter, 2, array_of_tuples1)
Out[6]:
array([[ 6, 11, 38],
[54, 14, 37],
[69, 13, 62]])
In [9]: np.apply_along_axis(tuple_converter, 1, array_of_tuples2)
Out[9]: array([ 6, 11, 38])
The other answer above is certainly correct, and probably what you're looking for. But I noticed you put the word "clean" into your question, and so I'd like to add this answer as well.
If we can make the assumption that all the tuples are 3 element tuples (or that they have some constant number of elements), then there's a nice little trick you can do so that the same piece of code will work on any single tuple, 1d array of tuples, or 2d array of tuples without an if/else for the 1d/2d cases. I'd argue that avoiding switches is always cleaner (although I suppose this could be contested).
import numpy as np
def map_to_tuples(x):
x = np.array(x)
flattened = x.flatten().reshape(-1, 3)
return np.array([tup[0]**2 + tup[1] + tup[2] for tup in flattened]).reshape(x.shape[:-1])
Outputs the following for your inputs (respectively), as desired:
[[ 6 11 38]
[54 14 37]
[69 13 62]]
[ 6 11 38]
6
If you are serious about the tuples bit, you could define a structured dtype.
In [535]: dt=np.dtype('int,int,int')
In [536]: x1 = np.array([
[(1,2,3),(2,3,4),(5,6,7)],
[(7,2,3),(2,6,4),(5,6,6)],
[(8,2,3),(2,5,4),(7,6,7)],
], dtype=dt)
In [537]: x1
Out[537]:
array([[(1, 2, 3), (2, 3, 4), (5, 6, 7)],
[(7, 2, 3), (2, 6, 4), (5, 6, 6)],
[(8, 2, 3), (2, 5, 4), (7, 6, 7)]],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
Note that the display uses tuples. x1 is a 3x3 array of type dt. The elements, or records, are displayed as tuples. This more useful if the tuple elements differ - float, integer, string etc.
Now define a function that works with fields of such an array:
In [538]: def foo(tup):
return tup['f0']**2 + tup['f1'] + tup['f2']
It applies neatly to x1.
In [539]: foo(x1)
Out[539]:
array([[ 6, 11, 38],
[54, 14, 37],
[69, 13, 62]])
It also applies to a 1d array of the same dtype.
In [540]: x2=np.array([(1,2,3),(2,3,4),(5,6,7) ],dtype=dt)
In [541]: foo(x2)
Out[541]: array([ 6, 11, 38])
And a 0d array of matching type:
In [542]: foo(np.array(plain_tuple,dtype=dt))
Out[542]: 6
But foo(plain_tuple) won't work, since the function is written to work with named fields, not indexed ones.
The function could be modified to cast the input to the correct dtype if needed:
In [545]: def foo1(tup):
temp = np.asarray(tup, dtype=dt)
.....: return temp['f0']**2 + temp['f1'] + temp['f2']
In [548]: plain_tuple
Out[548]: (1, 2, 3)
In [549]: foo1(plain_tuple)
Out[549]: 6
In [554]: foo1([(1,2,3),(2,3,4),(5,6,7)]) # list of tuples
Out[554]: array([ 6, 11, 38])

Categories

Resources