process some iterations on python ndarray - python

I created two sorted ndarrays of the same length and joined them via vstack().
I refer to my array in the following as:
[[x1 y1][x2 y2][x3 y3][x4 y4]].
However, in reality I have a different value for x in every entry but only a few different values for y ascending from 0 to n.
So I got something like this:
[[x1 0],[x2 0],[x3 0],[x4 1],[x5 1],[x6 2],[x7 2],[x8 2][x9 3][x10 3]...]
My goal is to create a loop to get every first and last x-value for all the different y-values. So that the loop returns x1 and x3 (y == 0), x4 and x5 (y == 1), x6 and x8 (y == 2) and so on.
I am trying an ugly solution for this at the moment, creating sub arrays for all the different y-values, so that I can take the first and last element of each array to get the y-values I need but I was wondering what the most effective or pythonic way to achieve this would look like.

You can do in using 2 list comprehension. At the first one you can use itertools.groupby() in order to grouping your sub lists based on their second elements then choose the first and last item.
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
>>> groups = [list(g) for _, g in groupby(lst, key=itemgetter(1))]
>>>
>>> [sub if len(sub)<2 else [sub[0], sub[-1]] for sub in groups]
[[['x1', 0], ['x3', 0]], [['x4', 1], ['x5', 1]], [['x6', 2], ['x8', 2]], [['x9', 3], ['x10', 3]]]

A default_dict is a nice way of collecting values like this
define the array (wish I could simply have copy-n-pasted):
In [186]: A=np.array([[1, 0],[2, 0],[3, 0],[4 ,1],[5 ,1],[6, 2],[7, 2],[8 ,2],[9 ,3],[10 ,3]])
In [187]: A
Out[187]:
array([[ 1, 0],
[ 2, 0],
[ 3, 0],
[ 4, 1],
[ 5, 1],
[ 6, 2],
[ 7, 2],
[ 8, 2],
[ 9, 3],
[10, 3]])
Make the dictionary, default value of list(), and append the array row:
In [188]: from collections import defaultdict
In [189]: dd = defaultdict(list)
In [190]: for row in A:
.....: dd[row[1]].append(row)
.....:
In [191]: dd
Out[191]: defaultdict(<class 'list'>, {0: [array([1, 0]), array([2, 0]), array([3, 0])], 1: [array([4, 1]), array([5, 1])], 2: [array([6, 2]), array([7, 2]), array([8, 2])], 3: [array([9, 3]), array([10, 3])]})
I can extract the 1st and last values into another dictionary:
In [192]: {key:[value[0],value[-1]] for key,value in dd.items()}
Out[192]:
{0: [array([1, 0]), array([3, 0])],
1: [array([4, 1]), array([5, 1])],
2: [array([6, 2]), array([8, 2])],
3: [array([9, 3]), array([10, 3])]}
Or I could have collected values in lists, etc., or a 3d array
In [195]: np.array([np.array([value[0],value[-1]]) for key,value in dd.items()])
Out[195]:
array([[[ 1, 0],
[ 3, 0]],
[[ 4, 1],
[ 5, 1]],
[[ 6, 2],
[ 8, 2]],
[[ 9, 3],
[10, 3]]])
itertools.groupby is nice, and may be faster. But you need to be comfortable with generators.
If the y values are sorted, you could find the values where value changes, and use those indices to split the array.

Related

Unexpected result from Numpy Matrix insert, How does this work?

My goal was to insert a column to the right on a numpy matrix. However, I found that the code I was using is putting in two columns rather than just one.
# This one results in a 4x1 matrix, as expected
np.insert(np.matrix([[0],[0]]), 1, np.matrix([[0],[0]]), 0)
>>>matrix([[0],
[0],
[0],
[0]])
# I would expect this line to return a 2x2 matrix, but it returns a 2x3 matrix instead.
np.insert(np.matrix([[0],[0]]), 1, np.matrix([[0],[0]]), 1)
>>>matrix([[0, 0, 0],
[0, 0, 0]]
Why do I get the above, in the second example, instead of [[0,0], [0,0]]?
While new use of np.matrix is discouraged, we get the same result with np.array:
In [41]: np.insert(np.array([[1],[2]]),1, np.array([[10],[20]]), 0)
Out[41]:
array([[ 1],
[10],
[20],
[ 2]])
In [42]: np.insert(np.array([[1],[2]]),1, np.array([[10],[20]]), 1)
Out[42]:
array([[ 1, 10, 20],
[ 2, 10, 20]])
In [44]: np.insert(np.array([[1],[2]]),1, np.array([10,20]), 1)
Out[44]:
array([[ 1, 10],
[ 2, 20]])
Insert as [1]:
In [46]: np.insert(np.array([[1],[2]]),[1], np.array([[10],[20]]), 1)
Out[46]:
array([[ 1, 10],
[ 2, 20]])
In [47]: np.insert(np.array([[1],[2]]),[1], np.array([10,20]), 1)
Out[47]:
array([[ 1, 10, 20],
[ 2, 10, 20]])
np.insert is a complex function written in Python. So we need to look at that code, and see how values are being mapped on the target space.
The docs elaborate on the difference between insert at 1 and [1]. But off hand I don't see an explanation of how the shape of values matters.
Difference between sequence and scalars:
>>> np.insert(a, [1], [[1],[2],[3]], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> np.array_equal(np.insert(a, 1, [1, 2, 3], axis=1),
... np.insert(a, [1], [[1],[2],[3]], axis=1))
True
When adding an array at the end of another, I'd use concatenate (or one of its stack variants) rather than insert. None of these operate in-place.
In [48]: np.concatenate([np.array([[1],[2]]), np.array([[10],[20]])], axis=1)
Out[48]:
array([[ 1, 10],
[ 2, 20]])

How to set individual indices in Numpy arrays

I am trying to use arrays to set values in other arrays. Unfortunately instead of setting a value it is somehow overwriting a bunch of values. What is going on, and how can I achieve what I want?
>>> target = np.array( [ [0,1],[1,2],[2,3] ])
>>> target
array([[0, 1],
[1, 2],
[2, 3]])
>>> actions = np.array([0,0,0])
>>> target[actions] #The first row, 3 times
array([[0, 1],
[0, 1],
[0, 1]])
>>> target[:,actions] #The first column, 3 times
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
>>> values = np.array([7,8,9])
>>> target[:,actions] = values #why isnt this working?
>>> target
array([[9, 1],
[9, 2],
[9, 3]])
#Actually want
#array([[7, 1],
# [8, 2],
# [9, 3]])
>>> target = np.array( [ [0,1],[1,2],[2,3] ]) #reset to original value
>>> actions = np.array([0,1,0])
>>> target[:,actions] = values.reshape(3, 1)
array([[7, 7],
[8, 8],
[9, 9]])
#Actually want
#array([[7, 1],
# [1, 8],
# [9, 3]])
target[:,actions] selects the same column of target thrice.
When you say target[:,actions] = values, what you are doing is:
Assign 7 to all the values in the column, three times.
Assign 8 to all the values in the column, three times.
Assign 9 to all the values in the column, three times.
So you end up with 9 in all the values in the column.
If you insist on this awkward triple-writing of data, you can fix it by transposing the write:
target[:,actions] = values.reshape(3, 1)
This will write [7,8,9] to the column, three times. Obviously that's wasteful, and you could do this instead:
target[:,actions[-1]] = values
The effect should be the same, and it saves computation.
2 ways to write [7,8,9] to the first column:
basic indexing (with slice):
In [396]: target[:,0] = [7,8,9] # all rows, 1st column
In [397]: target
Out[397]:
array([[7, 1],
[8, 2],
[9, 3]])
Advanced indexing (with 2 lists)
In [398]: target[[0,1,2],[0,0,0]] = [7,8,9] # pair [0,0],[1,0],[2,0]
In [399]: target
Out[399]:
array([[7, 1],
[8, 2],
[9, 3]])
The 2nd method also works for a mix of columns:
In [400]: target = np.array( [ [0,1],[1,2],[2,3] ])
In [401]: target[[0,1,2],[0,1,0]] = [7,8,9]
In [402]: target
Out[402]:
array([[7, 1],
[1, 8],
[9, 3]])
Broadcasting comes into play. In a case like this the are 3 potential arrays to broadcast - the 2 dimensions and the source array.
Advanced indexing like this produces a 1d array. So the source array has to match:
In [403]: target[[0,1,2],[0,1,0]]
Out[403]: array([7, 8, 9])
A (1,3) can broadcast to (3,), but a (3,1) can't:
In [404]: target[[0,1,2],[0,1,0]] = np.array([[7,8,9]])
In [405]: target[[0,1,2],[0,1,0]] = np.array([[7,8,9]]).T
...
ValueError: shape mismatch: value array of shape (3,1) could not be broadcast to indexing result of shape (3,)
This sort of indexing is unusual. Note that the result is (3,3).
In [412]: target[:,[0,0,0]]
Out[412]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
A (3,1) source:
In [413]: np.array([[7,8,9]]).T
Out[413]:
array([[7],
[8],
[9]])
In [414]: target[:,[0,0,0]] = _
In [415]: target
Out[415]:
array([[7, 1],
[8, 2],
[9, 3]])
The (3,1) can broadcast to (3,3). It works, but ends up assigning [7,8,9] 3 times, all to the same 0 column.
Another way of assigning the 1st column:
In [423]: target[np.ix_([0,1,2],[0,0,0])]
Out[423]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
Again a (3,3), with accepts a (3,1):
In [424]: target[np.ix_([0,1,2],[0,0,0])] = np.array([[7,8,9]]).T
In [425]: target
Out[425]:
array([[7, 1],
[8, 2],
[9, 3]])
ix_ makes 2 arrays that can broadcast against each other, in this case a column vector and a row one:
In [426]: np.ix_([0,1,2],[0,0,0])
Out[426]:
(array([[0],
[1],
[2]]), array([[0, 0, 0]]))
I can select all elements of target with:
In [430]: target[np.ix_([0,1,2],[0,1])]
Out[430]:
array([[0, 1],
[1, 2],
[2, 3]])
and in a jumbled order:
In [431]: target[np.ix_([2,0,1],[1,0])]
Out[431]:
array([[3, 2],
[1, 0],
[2, 1]])
I couldn't get it to work using : indexing, however the following is functional by using an array of indices. Not sure why the : method is not working, if someone can come up with a way to fix that I will accept it instead.
>>> target = np.array( [ [0,1],[1,2],[2,3] ])
>>> rows = np.arange(target.shape[0])
>>> actions = np.array([0,1,0])
>>> values = np.array([7,8,9])
>>> target[rows,actions] = values
>>> target
array([[7, 1],
[1, 8],
[9, 3]])

How do I sorting a 2D numpy array? [duplicate]

How do I sort a NumPy array by its nth column?
For example, given:
a = array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
I want to sort the rows of a by the second column to obtain:
array([[7, 0, 5],
[9, 2, 3],
[4, 5, 6]])
To sort by the second column of a:
a[a[:, 1].argsort()]
#steve's answer is actually the most elegant way of doing it.
For the "correct" way see the order keyword argument of numpy.ndarray.sort
However, you'll need to view your array as an array with fields (a structured array).
The "correct" way is quite ugly if you didn't initially define your array with fields...
As a quick example, to sort it and return a copy:
In [1]: import numpy as np
In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])
In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]:
array([[0, 0, 1],
[1, 2, 3],
[4, 5, 6]])
To sort it in-place:
In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None
In [7]: a
Out[7]:
array([[0, 0, 1],
[1, 2, 3],
[4, 5, 6]])
#Steve's really is the most elegant way to do it, as far as I know...
The only advantage to this method is that the "order" argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=['f1','f2','f0'].
You can sort on multiple columns as per Steve Tjoa's method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:
a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]
This sorts by column 0, then 1, then 2.
In case someone wants to make use of sorting at a critical part of their programs here's a performance comparison for the different proposals:
import numpy as np
table = np.random.rand(5000, 10)
%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop
%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop
import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop
So, it looks like indexing with argsort is the quickest method so far...
From the NumPy mailing list, here's another solution:
>>> a
array([[1, 2],
[0, 0],
[1, 0],
[0, 2],
[2, 1],
[1, 0],
[1, 0],
[0, 0],
[1, 0],
[2, 2]])
>>> a[np.lexsort(np.fliplr(a).T)]
array([[0, 0],
[0, 0],
[0, 2],
[1, 0],
[1, 0],
[1, 0],
[1, 0],
[1, 2],
[2, 1],
[2, 2]])
As the Python documentation wiki suggests:
a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]);
a = sorted(a, key=lambda a_entry: a_entry[1])
print a
Output:
[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]
I had a similar problem.
My Problem:
I want to calculate an SVD and need to sort my eigenvalues in descending order. But I want to keep the mapping between eigenvalues and eigenvectors.
My eigenvalues were in the first row and the corresponding eigenvector below it in the same column.
So I want to sort a two-dimensional array column-wise by the first row in descending order.
My Solution
a = a[::, a[0,].argsort()[::-1]]
So how does this work?
a[0,] is just the first row I want to sort by.
Now I use argsort to get the order of indices.
I use [::-1] because I need descending order.
Lastly I use a[::, ...] to get a view with the columns in the right order.
import numpy as np
a=np.array([[21,20,19,18,17],[16,15,14,13,12],[11,10,9,8,7],[6,5,4,3,2]])
y=np.argsort(a[:,2],kind='mergesort')# a[:,2]=[19,14,9,4]
a=a[y]
print(a)
Desired output is [[6,5,4,3,2],[11,10,9,8,7],[16,15,14,13,12],[21,20,19,18,17]]
note that argsort(numArray) returns the indices of an numArray as it was supposed to be arranged in a sorted manner.
example
x=np.array([8,1,5])
z=np.argsort(x) #[1,3,0] are the **indices of the predicted sorted array**
print(x[z]) #boolean indexing which sorts the array on basis of indices saved in z
answer would be [1,5,8]
A little more complicated lexsort example - descending on the 1st column, secondarily ascending on the 2nd. The tricks with lexsort are that it sorts on rows (hence the .T), and gives priority to the last.
In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]])
In [121]: b
Out[121]:
array([[1, 2, 1],
[3, 1, 2],
[1, 1, 3],
[2, 3, 4],
[3, 2, 5],
[2, 1, 6]])
In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)]
Out[122]:
array([[3, 1, 2],
[3, 2, 5],
[2, 1, 6],
[2, 3, 4],
[1, 1, 3],
[1, 2, 1]])
Here is another solution considering all columns (more compact way of J.J's answer);
ar=np.array([[0, 0, 0, 1],
[1, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 1],
[0, 0, 1, 0],
[1, 1, 0, 0]])
Sort with lexsort,
ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]
Output:
array([[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 1],
[1, 0, 1, 0],
[1, 1, 0, 0]])
Pandas Approach Just For Completeness:
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
a = pd.DataFrame(a)
a.sort_values(1, ascending=True).to_numpy()
array([[7, 0, 5], # '1' means sort by second column
[9, 2, 3],
[4, 5, 6]])
prl900
Did the Benchmark, comparing with the accepted answer:
%timeit pandas_df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop
%timeit numpy_table[numpy_table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop
It is an old question but if you need to generalize this to a higher than 2 dimension arrays, here is the solution than can be easily generalized:
np.einsum('ij->ij', a[a[:,1].argsort(),:])
This is an overkill for two dimensions and a[a[:,1].argsort()] would be enough per #steve's answer, however that answer cannot be generalized to higher dimensions. You can find an example of 3D array in this question.
Output:
[[7 0 5]
[9 2 3]
[4 5 6]]
#for sorting along column 1
indexofsort=np.argsort(dataset[:,0],axis=-1,kind='stable')
dataset = dataset[indexofsort,:]
def sort_np_array(x, column=None, flip=False):
x = x[np.argsort(x[:, column])]
if flip:
x = np.flip(x, axis=0)
return x
Array in the original question:
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
The result of the sort_np_array function as expected by the author of the question:
sort_np_array(a, column=1, flip=False)
[2]: array([[7, 0, 5],
[9, 2, 3],
[4, 5, 6]])
Thanks to this post: https://stackoverflow.com/a/5204280/13890678
I found a more "generic" answer using structured array.
I think one advantage of this method is that the code is easier to read.
import numpy as np
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
struct_a = np.core.records.fromarrays(
a.transpose(), names="col1, col2, col3", formats="i8, i8, i8"
)
struct_a.sort(order="col2")
print(struct_a)
[(7, 0, 5) (9, 2, 3) (4, 5, 6)]
Simply using sort, use column number based on which you want to sort.
a = np.array([1,1], [1,-1], [-1,1], [-1,-1]])
print (a)
a = a.tolist()
a = np.array(sorted(a, key=lambda a_entry: a_entry[0]))
print (a)

Operation on numpy arrays contain rows with different size

I have two lists, looking like this:
a= [[1,2,3,4], [2,3,4,5],[3,4,5,6,7]], b= [[5,6,7,8], [9,1,2,3], [4,5,6,7,8]]
which I want to subtract from each other element by element for an Output like this:
a-b= [[-4,-4,-4,-4],[7,2,2,2],[-1,-1,-1,-1,-1]]
In order to do so I convert each of a and b to arrays and subtract them I use:
np.array(a)-np.array(b)
The Output just gives me the error:
Unsupported Operand type for-: 'list' and 'list'
What am I doing wrong? Shouldn't the np.array command ensure the conversion to the array?
Here is a Numpythonic way:
>>> y = map(len, a)
>>> a = np.hstack(np.array(a))
>>> b = np.hstack(np.array(b))
>>> np.split(a-b, np.cumsum(y))
[array([-4, -4, -4, -4]), array([-7, 2, 2, 2]), array([-1, -1, -1, -1, -1]), array([], dtype=float64)]
>>>
Since you cannot subtract the arrays with different shapes, you can flatten your arrays using np.hstack() then subtract your flattened arrays then reshape based on the previous shape.
You can try:
>>> a= [[1,2,3,4], [2,3,4,5],[3,4,5,6,7]]
>>> b= [[5,6,7,8], [9,1,2,3], [4,5,6,7,8]]
>>>
>>> c =[]
>>> for i in range(len(a)):
c.append([A - B for A, B in zip(a[i], b[i])])
>>> print c
[[-4, -4, -4, -4], [-7, 2, 2, 2], [-1, -1, -1, -1, -1]]
Or
2nd method is using map:
from operator import sub
a= [[1,2,3,4], [2,3,4,5],[3,4,5,6,7]]
b= [[5,6,7,8], [9,1,2,3], [4,5,6,7,8]]
c =[]
for i in range(len(a)):
c.append(map(sub, a[i], b[i]))
print c
[[-4, -4, -4, -4], [-7, 2, 2, 2], [-1, -1, -1, -1, -1]]
The dimensions of your two arrays don't match, i.e. the first two sublists of a have 4 elements, but the third has 5 and ditto with b. If you convert the lists to numpy arrays, numpy silently gives you something like this:
In [346]: aa = np.array(a)
In [347]: bb = np.array(b)
In [348]: aa
Out[348]: array([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6, 7]], dtype=object)
In [349]: bb
Out[349]: array([[5, 6, 7, 8], [9, 1, 2, 3], [4, 5, 6, 7, 8]], dtype=object)
You need to make sure that all your sublists have the same number of elements, then your code will work:
In [350]: a = [[1,2,3,4], [2,3,4,5],[3,4,5,6]]; b = [[5,6,7,8], [9,1,2,3], [4,5,6,7]] # I removed the last element of third sublist in a and b
In [351]: np.array(a) - np.array(b)
Out[351]:
array([[-4, -4, -4, -4],
[-7, 2, 2, 2],
[-1, -1, -1, -1]])
Without NumPy:
result = []
for (m, n) in (zip(a, b)):
result.append([i - j for i, j in zip(m, n)])
See also this question and this one.
What about a custom function such as:
import numpy as np
a = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6, 7]]
b = [[5, 6, 7, 8], [9, 1, 2, 3], [4, 5, 6, 7, 8]]
def np_substract(l1, l2):
return np.array([np.array(l1[i]) - np.array(l2[i]) for i in range(len(l1))])
print np_substract(a, b)
You are getting the error, because your code is trying to subtract sublist from sublist, if you want to make it work, you can do the same in following manner:
import numpy as np
a= [[1,2,3,4], [2,3,4,5],[3,4,5,6,7]]
b= [[5,6,7,8], [9,1,2,3], [4,5,6,7,8]]
#You can apply different condition here, like (if (len(a) == len(b)), then only run the following code
for each in range(len(a)):
list = np.array(a[each])-np.array(b[each])
#for converting the output array in to list
subList[each] = list.tolist()
print subList
A nested list comprehension will do the job:
In [102]: [[i2-j2 for i2,j2 in zip(i1,j1)] for i1,j1 in zip(a,b)]
Out[102]: [[-4, -4, -4, -4], [-7, 2, 2, 2], [-1, -1, -1, -1, -1]]
The problem with np.array(a)-np.array(b) is that the sublists differ in length, so the resulting arrays are object type - arrays of lists
In [104]: np.array(a)
Out[104]: array([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6, 7]], dtype=object)
Subtraction is iterating over the outer array just fine, but hitting a problem when subtracting one sublist from another - hence the error message.
If I made the inputs arrays of arrays, the subtraction will work
In [106]: np.array([np.array(a1) for a1 in a])
Out[106]: array([array([1, 2, 3, 4]), array([2, 3, 4, 5]), array([3, 4, 5, 6, 7])], dtype=object)
In [107]: aa=np.array([np.array(a1) for a1 in a])
In [108]: bb=np.array([np.array(a1) for a1 in b])
In [109]: aa-bb
Out[109]:
array([array([-4, -4, -4, -4]),
array([-7, 2, 2, 2]),
array([-1, -1, -1, -1, -1])], dtype=object)
You can't count of array operations working on object dtype arrays. But in this case, subtraction is defined for the subarrays, so it can handle the nesting.
Another way to do the nesting is use np.subtract. This is a ufunc version of - and will apply np.asarray to its inputs as needed:
In [103]: [np.subtract(i1,j1) for i1,j1 in zip(a,b)]
Out[103]: [array([-4, -4, -4, -4]), array([-7, 2, 2, 2]), array([-1, -1, -1, -1, -1])]
Notice that these array calculations return arrays or a list of arrays. Turning the inner arrays back to lists requires iteration.
If you are starting with lists, converting to arrays often does not save time. Array calculations can be faster, but that doesn't compensate for the overhead in creating the arrays in the first place.
If I pad the inputs to equal length, then the simple array subtraction works, creating a 2d array.
In [116]: ao= [[1,2,3,4,0], [2,3,4,5,0],[3,4,5,6,7]]; bo= [[5,6,7,8,0], [9,1,2,3,0], [4,5,6,7,8]]
In [117]: np.array(ao)-np.array(bo)
Out[117]:
array([[-4, -4, -4, -4, 0],
[-7, 2, 2, 2, 0],
[-1, -1, -1, -1, -1]])

Inserting a row into a NumPy array

I have :
A = np.array([[0,1,1],[0,3,2],[1,1,1],[1,5,2]])
where the NumPy array is sorted based on first element and then second element and so on.
I want to insert [1,4,10] into A,such that the output would be :
A = array([[0,1,1],[0,3,2],[1,1,1],[1,4,10][1,5,2]])
How should I do it?
First off, stack the new 1D array as the last row with np.vstack -
B = np.vstack((A,[1,4,10]))
Now, for maintaining the precedence order of considering first and then second and so on elements for each row, assume each row as an indexing tuple and then get the sorted indices. This could be achieved with np.ravel_multi_index(B.T,B.max(0)+1). Then, use these indices for rearranging rows of B and have the desired output. Thus, the final code would be -
out = B[np.ravel_multi_index(B.T,B.max(0)+1).argsort()]
It seems there's an alternative with np.lexsort to get the sorted indices that respects that precedence, but does from in the opposite sense. So, we need to reverse the order of elements row-wise, use lexsort and then get the sorted indices. These indices could then be used for indexing into B just like in the previous approach and get us the output. So, the alternative final code with np.lexsort would be -
out = B[np.lexsort(B[:,::-1].T)]
Sample run -
In [60]: A
Out[60]:
array([[0, 1, 1],
[0, 3, 2],
[1, 1, 1],
[1, 5, 2]])
In [61]: B = np.vstack((A,[1,4,10]))
In [62]: B
Out[62]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 5, 2],
[ 1, 4, 10]]) # <= New row
In [63]: B[np.ravel_multi_index(B.T,B.max(0)+1).argsort()]
Out[63]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 4, 10], # <= New row moved here
[ 1, 5, 2]])
In [64]: B[np.lexsort(B[:,::-1].T)]
Out[64]:
array([[ 0, 1, 1],
[ 0, 3, 2],
[ 1, 1, 1],
[ 1, 4, 10], # <= New row moved here
[ 1, 5, 2]])

Categories

Resources