Sort a numpy array like a table - python

I have a list
[[0, 3], [5, 1], [2, 1], [4, 5]]
which I have made into an array using numpy.array:
[[0 3]
[5 1]
[2 1]
[4 5]]
How do I sort this like a table? In particular, I want to sort by the second column in ascending order and then resolve any ties by having the first column sorted in ascending order. Thus I desire:
[[2 1]
[5 1]
[0 3]
[4 5]]
Any help would be greatly appreciated!

See http://docs.scipy.org/doc/numpy/reference/generated/numpy.lexsort.html#numpy.lexsort
Specifically in your case,
import numpy as np
x = np.array([[0,3],[5,1],[2,1],[4,5]])
x[np.lexsort((x[:,0],x[:,1]))]
outputs
array([[2,1],[5,1],[0,3],[4,5]])

You can use numpy.lexsort():
>>> a = numpy.array([[0, 3], [5, 1], [2, 1], [4, 5]])
>>> a[numpy.lexsort(a.T)]
array([[2, 1],
[5, 1],
[0, 3],
[4, 5]])

Another way of doing this - slice out the bits of data you want, get the sort indices using argsort, then use the result of that to slice your original array:
a = np.array([[0, 3], [5, 1], [2, 1], [4, 5]])
subarray = a[:,1] # 3,1,1,5
indices = np.argsort(subarray) # Returns array([1,2,0,3])
result = a[indices]
Or, all in one go:
a[np.argsort(a[:,1])]

If you want to sort using a single column only (e.g., second column), you can do something like:
from operator import itemgetter
a = [[0, 3], [5, 1], [2, 1], [4, 5]]
a_sorted = sorted(a, key=itemgetter(1))
If there are more than one key, then use numpy.lexsort() as pointed out in the other answers.

Related

pandas fill a dataframe according to column and row value operations

Let's say that I have this dataframe:
,,,,,,
,,2.0,,,,
,2.0,,2.23606797749979,,,
,,2.23606797749979,,,2.0,
,,,,,2.23606797749979,
,,,2.0,2.23606797749979,,
,,,,,,
I would like to get a two dimensional vector with values of the indexes and the columns of each element which is not nan.
For example, in this case, I am expecting:
[[2,1],[1,2],[3,2],[2,3],[5,3],[3,5],[4,5],[5,4]].
I am thinking about using iloc and the np.where functions but I am not able to merge the two concepts.
Use DataFrame.stack for remove missing values, if necessary add Series.swaplevel and in list comprehension convert nested tuples to lists:
L = [list(y) for y in df.stack().swaplevel().index]
print (L)
[[2, 1], [1, 2], [3, 2], [2, 3], [5, 3], [5, 4], [3, 5], [4, 5]]
Or if use indices after np.where solution is similar:
r, c = np.where(df.notna())
L = [list(x) for x in zip(c, r)]
print (L)
[[2, 1], [1, 2], [3, 2], [2, 3], [5, 3], [5, 4], [3, 5], [4, 5]]

Numpy - combine two feature arrays but keep original index

I have two feature arrays, e.g.
a = [1, 2, 3]
b = [4, 5, 6]
Now I want to combine these arrays in the following way:
[[1, 4], [2, 5], [3, 6]]
The location in the array corresponds to a timestep. I tried appending and then reshaping, but then I get:
[[1, 2], [3, 4], [5, 6]]
you can use np.dstack to stack your lists depth-wise:
>>> np.dstack([a, b])
array([[[1, 4],
[2, 5],
[3, 6]]])
As noted by #BramVanroy, this does add an unwanted dimension. Two ways around that are to squeeze the result, or to use column_stack instead:
np.dstack([a, b]).squeeze()
# or
np.column_stack([a, b])
Both of which return:
array([[1, 4],
[2, 5],
[3, 6]])
As an alternative to sacuL's reply, you can also simply do
>>> np.array(list(zip(a, b)))
array([[1, 4],
[2, 5],
[3, 6]])
In fact, this is closer to the expected result in terms of the number of dimensions (two, rather than three in sacuL's answer which you still need to .squeeze() to achieve the correct result).

Remove duplicate tuples in numpy array (ones directly next to each other)

I am more or less new to python/numpy and I have this problem:
I have numpy arrays in which the first and last tuples are always the same. In between, there are sometimes duplicate tuples (only the ones directly next to each other) that I want to get rid of. The used parenthesis structure should be maintained.
I tried np.unique already (e.g. 1, 2), but it changes my original order (which has to be maintained). My sample array looks like this:
myarray = np.array([[[1,1],[1,1],[4,4],[4,4],[2,2],[3,3],[1,1]]])
I need a result that looks like this:
myarray = np.array([[[1,1],[4,4],[2,2],[3,3],[1,1]]])
Thank you in advance for your support!
Get the one-off offsetted comparisons along the second axis and use boolean-indexing to select -
myarray[:,np.r_[True,(myarray[0,1:] != myarray[0,:-1]).any(-1)]]
Sample run -
In [42]: myarray
Out[42]:
array([[[1, 1],
[1, 1],
[4, 4],
[4, 4],
[2, 2],
[3, 3],
[1, 1]]])
In [43]: myarray[:,np.r_[True,(myarray[0,1:] != myarray[0,:-1]).any(-1)]]
Out[43]:
array([[[1, 1],
[4, 4],
[2, 2],
[3, 3],
[1, 1]]])
Or with equality comparison and then look for ALL matches -
In [47]: myarray[:,np.r_[True,~((myarray[0,1:] == myarray[0,:-1]).all(-1))]]
Out[47]:
array([[[1, 1],
[4, 4],
[2, 2],
[3, 3],
[1, 1]]])

How to find missing combinations/sequences in a 2D array with finite element values

In the case of the set np.array([1, 2, 3]), there are only 9 possible combinations/sequences of its constituent elements: [1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3].
If we have the following array:
np.array([1, 1],
[1, 2],
[1, 3],
[2, 2],
[2, 3],
[3, 1],
[3, 2])
What is the best way, with NumPy/SciPy, to determine that [2, 1] and [3, 3] are missing? Put another way, how do we find the inverse list of sequences (when we know all of the possible element values)? Manually doing this with a couple of for loops is easy to figure out, but that would negate whatever speed gains we get from using NumPy over native Python (especially with larger datasets).
Your can generate a list of all possible pairs using itertools.product and collect all of them which are not in your array:
from itertools import product
pairs = [ [1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 1], [3, 2] ]
allPairs = list(map(list, product([1, 2, 3], repeat=2)))
missingPairs = [ pair for pair in allPairs if pair not in pairs ]
print(missingPairs)
Result:
[[2, 1], [3, 3]]
Note that map(list, ...) is needed to convert your list of list to a list of tuples that can be compared to the list of tuples returned by product. This can be simplified if your input array already was a list of tuples.
This is one way using itertools.product and set.
The trick here is to note that sets may only contain immutable types such as tuples.
import numpy as np
from itertools import product
x = np.array([1, 2, 3])
y = np.array([[1, 1], [1, 2], [1, 3], [2, 2],
[2, 3], [3, 1], [3, 2]])
set(product(x, repeat=2)) - set(map(tuple, y))
{(2, 1), (3, 3)}
If you want to stay in numpy instead of going back to raw python sets, you can do it using void views (based on #Jaime's answer here) and numpy's built in set methods like in1d
def vview(a):
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
x = np.array([1, 2, 3])
y = np.array([[1, 1], [1, 2], [1, 3], [2, 2],
[2, 3], [3, 1], [3, 2]])
xx = np.array([i.ravel() for i in np.meshgrid(x, x)]).T
xx[~np.in1d(vview(xx), vview(y))]
array([[2, 1],
[3, 3]])
a = np.array([1, 2, 3])
b = np.array([[1, 1],
[1, 2],
[1, 3],
[2, 2],
[2, 3],
[3, 1],
[3, 2]])
c = np.array(list(itertools.product(a, repeat=2)))
If you want to use numpy methods, try this...
Compare the array being tested against the product using broadcasting
d = b == c[:,None,:]
#d.shape is (9,7,2)
Check if both elements of a pair matched
e = np.all(d, -1)
#e.shape is (9,7)
Check if any of the test items match an item of the product.
f = np.any(e, 1)
#f.shape is (9,)
Use f as a boolean index into the product to see what is missing.
>>> print(c[np.logical_not(f)])
[[2 1]
[3 3]]
>>>
Every combination corresponds to the number in range 0..L^2-1 where L=len(array). For example, [2, 2]=>3*(2-1)+(2-1)=4. Off by -1 arises because elements start from 1, not from zero. Such mapping might be considered as natural perfect hashing for this data type.
If operations on integer sets in NumPy are faster than operations on pairs - for example, integer set of known size might be represented by bit sequence (integer sequence) - then it is worth to traverse pair list, mark corresponding bits in integer set, then look for unset ones and retrieve corresponding pairs.

How to search in a sorted 2d matrix

i have got list of x y coordinates:
import numpy as np
a=np.array([[2,1],[1,3],[1,5],[2,3],[3,5]])
that i've sorted with
a=np.sort(a,axis=0)
print a
>[[1 3] [1 5] [2 1] [2 3] [3 5]]
i'd like to perform a search :
a.searchsorted([2,1])
>Value error : object too deep for desired array
Any ideas how to do that ?
This gona work may be , if I got what you asking :
>>> a = [[1, 3], [1, 5], [2, 1], [2, 3], [3, 5]]
>>> [2, 1] in a
True

Categories

Resources