append an element to every row of a jagged array - python

I am trying to add a 0 to every row of a jagged array.
I want to go from
<JaggedArray [[1 2 3] [1 2]]>
to
<JaggedArray [[1 2 3 0] [1 2 0]]>
so that when I grab the -1th index, I get 0. Currently I'm padding every row to the length of the biggest row + 1, then filling nans with 0, which works, but I am wondering if there's a better way.
I saw that there's a class AppendableArray that has a .append() function, but I'm not sure how to convert between the two.
I'm using awkward 0.12.22, and the data is read out of a ROOT file with uproot 3.11.0

Perhaps this is too short to be an answer, but
Upgrade to Awkward 1.x (you can still import awkward0 and use ak.from_awkward0 and ak.to_awkward0 to go back and forth in the same process).
Create an array of single-item lists, perhaps in NumPy (ak.from_numpy), perhaps by slicing a one-dimensional array with np.newaxis.
Concatenate it with your other array using ak.concatenate with axis=1. The first dimension needs to be the same (len of both arrays must be equal), but the second dimensions are unconstrained.

Related

Why do axes transpose upon indexing? [duplicate]

I have a 5 dimension array like this
a=np.random.randint(10,size=[2,3,4,5,600])
a.shape #(2,3,4,5,600)
I want to get the first element of the 2nd dimension, and several elements of the last dimension
b=a[:,0,:,:,[1,3,5,30,17,24,30,100,120]]
b.shape #(9,2,4,5)
as you can see, the last dimension was automatically converted to the first dimension.
why? and how to avoid that?
This behavior is described in the numpy documentation. In the expression
a[:,0,:,:,[1,3,5,30,17,24,30,100,120]]
both 0 and [1,3,5,30,17,24,30,100,120] are advanced indexes, separated by slices. As the documentation explains, in such case dimensions coming from advanced indexes will be first in the resulting array.
If we replace 0 by the slice 0:1 it will change this situation (since it will leave only one advanced index), and then the order of dimensions will be preserved. Thus one way to fix this issue is to use the 0:1 slice and then squeeze the appropriate axis:
a[:,0:1,:,:,[1,3,5,30,17,24,30,100,120]].squeeze(axis=1)
Alternatively, one can keep both advanced indexes, and then rearrange axes:
np.moveaxis(a[:,0,:,:,[1,3,5,30,17,24,30,100,120]], 0, -1)

Update numpy array with sparse indices and values

I have 1-dimensional numpy array and want to store sparse updates of it.
Say I have array of length 500000 and want to do 100 updates of 100 elements. Updates are either adds or just changing the values (I do not think it matters).
What is the best way to do it using numpy?
I wanted to just store two arrays: indices, values_to_add and therefore have two objects: one stores dense matrix and other just keeps indices and values to add, and I can just do something like this with the dense matrix:
dense_matrix[indices] += values_to_add
And if I have multiple updates, I just concat them.
But this numpy syntax doesn't work fine with repeated elements: they are just ignored.
Updating pair when we have an update that repeats index is O(n). I thought of using dict instead of array to store updates, which looks fine from the point of view of complexity, but it doesn't look good numpy style.
What is the most expressive way to achieve this? I know about scipy sparse objects, but (1) I want pure numpy because (2) I want to understand the most efficient way to implement it.
If you have repeated indices you could use at, from the documentation:
Performs unbuffered in place operation on operand ‘a’ for elements
specified by ‘indices’. For addition ufunc, this method is equivalent
to a[indices] += b, except that results are accumulated for elements
that are indexed more than once.
Code
a = np.arange(10)
indices = [0, 2, 2]
np.add.at(a, indices, [-44, -55, -55])
print(a)
Output
[ -44 1 -108 3 4 5 6 7 8 9]

Finding indexes for use with np.ravel

I would like to use np.ravel to create a similar return structure as seen in the MATLAB code below:
[xi yi imv1] = find(squeeze(imagee(:,:,1))+0.1);
imv1 = imv1 - 0.1;
[xi yi imv2] = find(squeeze(imagee(:,:,2))+0.1);
imv2 = imv2 - 0.1;
where imagee is a matrix corresponding to values of a picture obtained from imread().
so, the(almost) corresponding Python translation is:
imv1=np.ravel(imagee**[:,:,0]**,order='F')
Where the bolded index splicing is clearly not the same as MATLAB. How do I specify the index values in Pythonic so that my return values will be the same as that found in the MATLAB portion? I believe this MATLAB code is written as "access all rows, columns, in the specified array of the third dimension." Therefore, how to specify this third parameter in Python?
To retrieve indexes, I usually use np.where. Here's an example: You have a 2 dimensional array
a = np.asarray([[0,1,2],[3,4,5]])
and want to get the indexes where the values are above a threshold, say 2. You can use np.where with the condition a>2
idxX, idxY = np.where(a>2)
which in turn you can use to address a
print a[idxX, idxY]
>>> [3 4 5]
However, the same effect can be achieved by indexing:
print a[a>2]
>>> [3 4 5]
This works on ravel'ed arrays as well as on three dimensional. Using 3D arrays with the first method however will require you to foresee more index arrays.

numpy matrix subset view

I want to view a numpy matrix by specifying the row and column number. For example, row 0 and 2 and column 0 and 2 of a 3×3 matrix.
M = np.array(range(9)).reshape((3,3))
M[:,[0,2]][[0,2],:]
But I know this is not a view, a new matrix is created due to the iterated indexing. Is it possible to do such a view?
I think it is strange that i can do
M[:2,:2]
to view the matrix. but not use
M[[0,1],[0,1]]
to achieve the same view.
EDIT: provide one more example. If I have a matrix
M = np.array(range(16)).reshape((4,4))
How do I get rows [1,2,3] and columns [0,2,3] with a single step of indexing? This will do it in 2 steps:
M[[1,2,3],:][:,[0,2,3]]
How do I get rows [1,2,3] and columns [0,2,3] with a single step of indexing?
You could use np.ix_ instead but this is neither less typing nor is it faster. In fact its slower:
%timeit M[np.ix_([1,2,3],[0,2,3])]
100000 loops, best of 3: 17.8 µs per loop
%timeit M[[1,2,3],:][:, [0,2,3]]
100000 loops, best of 3: 10.9 µs per loop
How to force a view (if possible)?
You can use numpy.lib.stride_tricks.as_strided to ask for a tailored view of an array.
Here is an example of its use from scipy-lectures.
This would allow you to get a view instead of a copy in your very first example:
from numpy.lib.stride_tricks import as_strided
M = np.array(range(9)).reshape((3,3))
sub_1 = M[:,[0,2]][[0,2],:]
sub_2 = as_strided(M, shape=(2, 2), strides=(48,16))
print sub_1
print ''
print sub_2
[[0 2]
[6 8]]
[[0 2]
[6 8]]
# change the initial array
M[0,0] = -1
print sub_1
print ''
print sub_2
[[0 2]
[6 8]]
[[-1 2]
[ 6 8]]
As you can see sub_2 is indeed a view since it reflects changes made to the initial array M.
The strides argument passed to as_strided specifies the byte-sizes to "walk" in each dimension:
The datatype of the initial array M is numpy.int64 (on my machine) so an int is 8 bytes in memory. Since Numpy arranges arrays by default in C-style (row-major order), one row of M is consecutive in memory and takes 24 bytes. Since you want every other row you specify 48 bytes as stride in the first dimension. For the second dimension you want also every other element -- which now sit next to each other in memory -- so you specify 16 bytes as stride.
For your latter example Numpy is not able to return a view because the requested indices are to irregular to be described through shape and strides.
For your second example:
import numpy as np
M = np.array(range(16)).reshape((4,4))
print(M[np.meshgrid([1,2,3],[0,2,3])].transpose())
the .transpose() is necessary because of meshgrid's order of indexing. According to Numpy doc there is a new indexing option, so that M[np.meshgrid([1,2,3],[0,2,3],indexing='ij')] should work, but I don't have Numpy's latest version and can't test it.
M[[0,1],[0,1]] returns elements at (0,0) and (1,1) in the matrix.
Slicing a numpy array gives a view of the array, but your code M[:2, :2] gets a submatrix with row 0,1 and column 0,1 of M, you need ::2:
In [1710]: M
Out[1710]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [1711]: M[:2, :2]
Out[1711]:
array([[0, 1],
[3, 4]])
In [1712]: M[::2, ::2]
Out[1712]:
array([[0, 2],
[6, 8]])
To understand this behavior of numpy, you need to read up on numpy array striding. The great power of numpy lies in providing a uniform interface for the whole numpy/scipy ecosystem to grow around. That interface is the ndarray, which provides a simple yet general method for storing numerical data.
'Simple' and 'general' are value judgements of course, but a balance has been struck by settling on strided arrays to form this interface. Every numpy array has a set of strides that tells you how to find any given element in the array, as a simple inner product between strides and indices.
Of course one could imagine an alternative numpy which had different code paths for all kinds of other data representations; much in the same way as one could imagine the pyramids of Giza, except ten times bigger. Easy to imagine; but building it is a little more work.
What is however impossible to imagine, is indexing an array as arr[[2,0,1]], and representing that array as a strided view on the same piece of memory. arr[[1,0]] on the other hand could be represented as a view, but returning a view or copy depending on the content of the indices you are indexing with would mean a performance hit for what should be a simple operation; and it would make for rather funny semantics as well.

Indexing with Masked Arrays in numpy

I have a bit of code that attempts to find the contents of an array at indices specified by another, that may specify indices that are out of range of the former array.
input = np.arange(0, 5)
indices = np.array([0, 1, 2, 99])
What I want to do is this:
print input[indices]
and get
[0 1 2]
But this yields an exception (as expected):
IndexError: index 99 out of bounds 0<=index<5
So I thought I could use masked arrays to hide the out of bounds indices:
indices = np.ma.masked_greater_equal(indices, 5)
But still:
>print input[indices]
IndexError: index 99 out of bounds 0<=index<5
Even though:
>np.max(indices)
2
So I'm having to fill the masked array first, which is annoying, since I don't know what fill value I could use to not select any indices for those that are out of range:
print input[np.ma.filled(indices, 0)]
[0 1 2 0]
So my question is: how can you use numpy efficiently to select indices safely from an array without overstepping the bounds of the input array?
Without using masked arrays, you could remove the indices greater or equal to 5 like this:
print input[indices[indices<5]]
Edit: note that if you also wanted to discard negative indices, you could write:
print input[indices[(0 <= indices) & (indices < 5)]]
It is a VERY BAD idea to index with masked arrays. There was a (very short) time with using MaskedArrays for indexing would have thrown an exception, but it was a bit too harsh...
In your test, you're filtering indices to find the entries matching a condition. What should you do with the missing entries of your MaskedArray ? Is the condition False ? True ? Should you use a default ? It's up to you, the user, to decide what to do.
Using indices.filled(0) means that when an item of indices is masked (as in, undefined), you want to take the first index (0) as default. Probably not what you wanted.
Here, I would have simply used input[indices.compressed()] : the compressed method flattens your MaskedArray, keeping only the unmasked entries.
But as you realized, you probably didn't need MaskedArrays in the first place

Categories

Resources