I'm working on a basic program allows me to navigate in a two dimensions grid (for now 3X3), using a function specified with the initial point and all the steps moved on each axis, returning the end point. I want to raise an error if a specific point has been already visited, and I'm trying to do it by logging all previous points. This is my code:
matrix = [[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3]]
def position(inital_point,*steps):
end_point = inital_point
point_logs = []
point_logs += end_point
for step in steps:
end_point[0] += step[0]; end_point[1] += step[1]
point_logs += end_point
print(point_logs)
My problem is, I want to reference each point with a nested list inside the main list, but the code above is giving me this result:
position([1,1],[1,1],[1,1])
>>>[1, 1, 2, 2, 3, 3]
When I try to convert it to a nested list, I get:
matrix = [[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3]]
def position(inital_point,*steps):
end_point = inital_point
point_logs = []
point_logs += [end_point]
for step in steps:
end_point[0] += step[0]; end_point[1] += step[1]
point_logs += [end_point]
print(point_logs)
position([1,1],[1,1],[1,1])
>>> [[3, 3], [3, 3], [3, 3]]
Instead of simply [[1, 1],[2, 2],[3, 3]], please help!
This approach basically takes the last appended points from point_logs and adds the new coordinates to it after which it is also appended.
def position(inital_point,*steps):
end_point = inital_point
point_logs = []
point_logs.append(end_point)
for step in steps:
x, y = point_logs[-1]
point_logs.append([x+step[0], y+step[1]])
print(point_logs)
Output
[[1, 1], [2, 2], [3, 3]]
In the case of the set np.array([1, 2, 3]), there are only 9 possible combinations/sequences of its constituent elements: [1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3].
If we have the following array:
np.array([1, 1],
[1, 2],
[1, 3],
[2, 2],
[2, 3],
[3, 1],
[3, 2])
What is the best way, with NumPy/SciPy, to determine that [2, 1] and [3, 3] are missing? Put another way, how do we find the inverse list of sequences (when we know all of the possible element values)? Manually doing this with a couple of for loops is easy to figure out, but that would negate whatever speed gains we get from using NumPy over native Python (especially with larger datasets).
Your can generate a list of all possible pairs using itertools.product and collect all of them which are not in your array:
from itertools import product
pairs = [ [1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 1], [3, 2] ]
allPairs = list(map(list, product([1, 2, 3], repeat=2)))
missingPairs = [ pair for pair in allPairs if pair not in pairs ]
print(missingPairs)
Result:
[[2, 1], [3, 3]]
Note that map(list, ...) is needed to convert your list of list to a list of tuples that can be compared to the list of tuples returned by product. This can be simplified if your input array already was a list of tuples.
This is one way using itertools.product and set.
The trick here is to note that sets may only contain immutable types such as tuples.
import numpy as np
from itertools import product
x = np.array([1, 2, 3])
y = np.array([[1, 1], [1, 2], [1, 3], [2, 2],
[2, 3], [3, 1], [3, 2]])
set(product(x, repeat=2)) - set(map(tuple, y))
{(2, 1), (3, 3)}
If you want to stay in numpy instead of going back to raw python sets, you can do it using void views (based on #Jaime's answer here) and numpy's built in set methods like in1d
def vview(a):
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
x = np.array([1, 2, 3])
y = np.array([[1, 1], [1, 2], [1, 3], [2, 2],
[2, 3], [3, 1], [3, 2]])
xx = np.array([i.ravel() for i in np.meshgrid(x, x)]).T
xx[~np.in1d(vview(xx), vview(y))]
array([[2, 1],
[3, 3]])
a = np.array([1, 2, 3])
b = np.array([[1, 1],
[1, 2],
[1, 3],
[2, 2],
[2, 3],
[3, 1],
[3, 2]])
c = np.array(list(itertools.product(a, repeat=2)))
If you want to use numpy methods, try this...
Compare the array being tested against the product using broadcasting
d = b == c[:,None,:]
#d.shape is (9,7,2)
Check if both elements of a pair matched
e = np.all(d, -1)
#e.shape is (9,7)
Check if any of the test items match an item of the product.
f = np.any(e, 1)
#f.shape is (9,)
Use f as a boolean index into the product to see what is missing.
>>> print(c[np.logical_not(f)])
[[2 1]
[3 3]]
>>>
Every combination corresponds to the number in range 0..L^2-1 where L=len(array). For example, [2, 2]=>3*(2-1)+(2-1)=4. Off by -1 arises because elements start from 1, not from zero. Such mapping might be considered as natural perfect hashing for this data type.
If operations on integer sets in NumPy are faster than operations on pairs - for example, integer set of known size might be represented by bit sequence (integer sequence) - then it is worth to traverse pair list, mark corresponding bits in integer set, then look for unset ones and retrieve corresponding pairs.
How do I sort a NumPy array by its nth column?
For example, given:
a = array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
I want to sort the rows of a by the second column to obtain:
array([[7, 0, 5],
[9, 2, 3],
[4, 5, 6]])
To sort by the second column of a:
a[a[:, 1].argsort()]
#steve's answer is actually the most elegant way of doing it.
For the "correct" way see the order keyword argument of numpy.ndarray.sort
However, you'll need to view your array as an array with fields (a structured array).
The "correct" way is quite ugly if you didn't initially define your array with fields...
As a quick example, to sort it and return a copy:
In [1]: import numpy as np
In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])
In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]:
array([[0, 0, 1],
[1, 2, 3],
[4, 5, 6]])
To sort it in-place:
In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None
In [7]: a
Out[7]:
array([[0, 0, 1],
[1, 2, 3],
[4, 5, 6]])
#Steve's really is the most elegant way to do it, as far as I know...
The only advantage to this method is that the "order" argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=['f1','f2','f0'].
You can sort on multiple columns as per Steve Tjoa's method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:
a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]
This sorts by column 0, then 1, then 2.
In case someone wants to make use of sorting at a critical part of their programs here's a performance comparison for the different proposals:
import numpy as np
table = np.random.rand(5000, 10)
%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop
%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop
import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop
So, it looks like indexing with argsort is the quickest method so far...
From the NumPy mailing list, here's another solution:
>>> a
array([[1, 2],
[0, 0],
[1, 0],
[0, 2],
[2, 1],
[1, 0],
[1, 0],
[0, 0],
[1, 0],
[2, 2]])
>>> a[np.lexsort(np.fliplr(a).T)]
array([[0, 0],
[0, 0],
[0, 2],
[1, 0],
[1, 0],
[1, 0],
[1, 0],
[1, 2],
[2, 1],
[2, 2]])
As the Python documentation wiki suggests:
a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]);
a = sorted(a, key=lambda a_entry: a_entry[1])
print a
Output:
[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]
I had a similar problem.
My Problem:
I want to calculate an SVD and need to sort my eigenvalues in descending order. But I want to keep the mapping between eigenvalues and eigenvectors.
My eigenvalues were in the first row and the corresponding eigenvector below it in the same column.
So I want to sort a two-dimensional array column-wise by the first row in descending order.
My Solution
a = a[::, a[0,].argsort()[::-1]]
So how does this work?
a[0,] is just the first row I want to sort by.
Now I use argsort to get the order of indices.
I use [::-1] because I need descending order.
Lastly I use a[::, ...] to get a view with the columns in the right order.
import numpy as np
a=np.array([[21,20,19,18,17],[16,15,14,13,12],[11,10,9,8,7],[6,5,4,3,2]])
y=np.argsort(a[:,2],kind='mergesort')# a[:,2]=[19,14,9,4]
a=a[y]
print(a)
Desired output is [[6,5,4,3,2],[11,10,9,8,7],[16,15,14,13,12],[21,20,19,18,17]]
note that argsort(numArray) returns the indices of an numArray as it was supposed to be arranged in a sorted manner.
example
x=np.array([8,1,5])
z=np.argsort(x) #[1,3,0] are the **indices of the predicted sorted array**
print(x[z]) #boolean indexing which sorts the array on basis of indices saved in z
answer would be [1,5,8]
A little more complicated lexsort example - descending on the 1st column, secondarily ascending on the 2nd. The tricks with lexsort are that it sorts on rows (hence the .T), and gives priority to the last.
In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]])
In [121]: b
Out[121]:
array([[1, 2, 1],
[3, 1, 2],
[1, 1, 3],
[2, 3, 4],
[3, 2, 5],
[2, 1, 6]])
In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)]
Out[122]:
array([[3, 1, 2],
[3, 2, 5],
[2, 1, 6],
[2, 3, 4],
[1, 1, 3],
[1, 2, 1]])
Here is another solution considering all columns (more compact way of J.J's answer);
ar=np.array([[0, 0, 0, 1],
[1, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 1],
[0, 0, 1, 0],
[1, 1, 0, 0]])
Sort with lexsort,
ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]
Output:
array([[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 1],
[1, 0, 1, 0],
[1, 1, 0, 0]])
Pandas Approach Just For Completeness:
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
a = pd.DataFrame(a)
a.sort_values(1, ascending=True).to_numpy()
array([[7, 0, 5], # '1' means sort by second column
[9, 2, 3],
[4, 5, 6]])
prl900
Did the Benchmark, comparing with the accepted answer:
%timeit pandas_df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop
%timeit numpy_table[numpy_table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop
It is an old question but if you need to generalize this to a higher than 2 dimension arrays, here is the solution than can be easily generalized:
np.einsum('ij->ij', a[a[:,1].argsort(),:])
This is an overkill for two dimensions and a[a[:,1].argsort()] would be enough per #steve's answer, however that answer cannot be generalized to higher dimensions. You can find an example of 3D array in this question.
Output:
[[7 0 5]
[9 2 3]
[4 5 6]]
#for sorting along column 1
indexofsort=np.argsort(dataset[:,0],axis=-1,kind='stable')
dataset = dataset[indexofsort,:]
def sort_np_array(x, column=None, flip=False):
x = x[np.argsort(x[:, column])]
if flip:
x = np.flip(x, axis=0)
return x
Array in the original question:
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
The result of the sort_np_array function as expected by the author of the question:
sort_np_array(a, column=1, flip=False)
[2]: array([[7, 0, 5],
[9, 2, 3],
[4, 5, 6]])
Thanks to this post: https://stackoverflow.com/a/5204280/13890678
I found a more "generic" answer using structured array.
I think one advantage of this method is that the code is easier to read.
import numpy as np
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
struct_a = np.core.records.fromarrays(
a.transpose(), names="col1, col2, col3", formats="i8, i8, i8"
)
struct_a.sort(order="col2")
print(struct_a)
[(7, 0, 5) (9, 2, 3) (4, 5, 6)]
Simply using sort, use column number based on which you want to sort.
a = np.array([1,1], [1,-1], [-1,1], [-1,-1]])
print (a)
a = a.tolist()
a = np.array(sorted(a, key=lambda a_entry: a_entry[0]))
print (a)
I have four time series of the same length. When one element is missing, it has a value of 0.
What is the idiomatic way to find the index of the latest occurrence of 0 in any of the lists?
A functional approach (returns None if 0 is not found anywhere):
timeseries = [
[1, 2, 3, 5],
[4, 3, 0, 2],
[4, 2, 0, 1],
[4, 2, 6, 0],
]
etimeseries = list(enumerate(zip(*timeseries)))
index = next((idx for (idx, xs) in reversed(etimeseries) if not all(xs)), None)
# 3
If timeseries is a list containing the four timeseries, you could reverse each timeseries, use zip to group together 4-tuple time-slices, and enumerate to record the index.
# timeseries=[range(1,5),range(1,5),range(1,5),range(1,5)]
for idx,data in enumerate(zip(*[ts[::-1] for ts in timeseries])):
if not all(data):
break
else:
idx=None
idx=len(timeseries[0])-idx-1
# print(idx)
When the for-loop breaks, idx will hold the value of the index with a timeseries-value of zero. If no value is zero, then idx set to equal None.
def rindex(l,v):
for i,el in enumerate(reversed(l)):
if el == v:
return len(l) - i - 1
return -1
series = [[1,2], [0, 0, 1], ...]
max(rindex(l, 0) for l in series)
timeseries = [ [1, 2, 3, 5], [4, 3, 0, 2], [4, 2, 0, 1], [4, 2, 6, 0] ]
print "timeseries==",timeseries
res = [ (i for i in xrange(len(li)-1,-1,-1) if li[i]==0).next() if 0 in li else None
for li in timeseries ]
print
print 'res==',res
result
timeseries== [[1, 2, 3, 5], [4, 3, 0, 2], [4, 2, 0, 1], [4, 2, 6, 0]]
res== [None, 2, 2, 3]