Bug (?) with numpy indexing

Bug (?) with numpy indexing - python

I was trying to achieve a kind of 2d filter with numpy, and I found something that looks to me like a bug.
In the example below, I'm trying to target the 2nd and 4th columns of the first, second and last lines of my data, ie:
[[ 2 4]
[ 8 10]
[26 28]]
I am aware that the second to last line does return that, but I wouldn't be able to assign anything there (it returns a copy). And this still doesn't explain why the last one fails.
import numpy as np
# create my data: 5x6 array
data = np.arange(0,30).reshape(5,6)
# mask: only keep row 1,2,and 5
mask = np.array([1,1,0,0,1])
mask = mask.astype(bool)
# this is fine
print 'data\n', data, '\n'
# this is good
print 'mask\n', mask, '\n'
# this is nice
print 'data[mask]\n', data[mask], '\n'
# this is great
print 'data[mask, 2]\n', data[mask, 2], '\n'
# this is awesome
print 'data[mask][:,[2,4]]\n', data[mask][:,[2,4]], '\n'
# this fails ??
print 'data[mask, [2,4]]\n', data[mask, [2,4]], '\n'
output:
data
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]]
mask
[ True True False False True]
data[mask]
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[24 25 26 27 28 29]]
data[mask, 2]
[ 2 8 26]
data[mask][:,[2,4]]
[[ 2 4]
[ 8 10]
[26 28]]
data[mask, [2,4]]
Traceback (most recent call last):
[...]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
I'm posting this here, because I'm not confident enough in my numpy skills to be sure this is a bug, and file a bug report...
Thanks for your help/feedback !

This is not a bug.
This is an implementation definition
If you read array indexing in section Advanced Indexing you notice that it says
Purely integer array indexing When the index consists of as many
integer arrays as the array being indexed has dimensions, the indexing
is straight forward, but different from slicing. Advanced indexes
always are broadcast and iterated as one:
result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
..., ind_N[i_1, ..., i_M]]
therefore
print 'data[mask, [2,4]]\n', data[mask, [1,2,4]], '\n'
works and outputs
data[mask, [1,2,4]]
[ 1 8 28]
index length in broadcasting must be the same
Maybe you can achieve what you want using ix_ function. See array indexing
columns = np.array([2, 4], dtype=np.intp)
print data[np.ix_(mask, columns)]
which outputs
[[ 2 4]
[ 8 10]
[26 28]]

Related

Modifying Array Giving Back Wrong Output

I'm new to numpy and am trying to do some slicing and indexing with arrays. My goal is to take an array, and use slicing and indexing to square the last column, and then subtract the first column from that result. I then want to put the new column back into the old array.
I've been able to figure out how to slice and index the column to get the result I want for the last column. My problem however is that when I try to put it back into my original array, I get the wrong output (as seen below).
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
sliceColumnOne = theNumbers[:,0]
sliceColumnThree = theNumbers[:,3]**2
editColumnThree = sliceColumnThree - sliceColumnOne
newArray = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[editColumnThree]])
print("nums:\n{}".format(newArray))
I want the output to be
[[ 1 2 3 15]
[ 5 6 7 59]
[ 9 10 11 135]
[ 13 14 15 243]]
However mine becomes:
[list([1, 2, 3, 4]) list([5, 6, 7, 8]) list([9, 10, 11, 12])
list([array([ 15, 59, 135, 243])])]
Any suggestions on how to fix this?

Just assign the last numpy array row to the new one "theNumbers[3] = editColumnThree"
Code:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
sliceColumnOne = theNumbers[:,0]
sliceColumnThree = theNumbers[:,3]**2
editColumnThree = sliceColumnThree - sliceColumnOne
theNumbers[3] = editColumnThree
print("nums:\n{}".format(theNumbers))
Output:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[ 15 59 135 243]]

newArray = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[editColumnThree]])
print("nums:\n{}".format(newArray))
this way, editColumnThree is the last row, not column. You can use
newArray = theNumbers.copy() # if a copy is needed
newArray[:,-1] = editColumnThree # replace last (-1) column

If you just want to stack the vectors on top of eachother, use vstack:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
newNumbers = np.vstack(theNumbers)
print(newNumbers)
>>>[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
But the issue here isn't just that you need to stack these numbers, you are mixing up columns and rows. You are changing a row instead of a column. To change the column, update the last element in each row:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
LastColumn = theNumbers[:,3]**2
FirstColumn = theNumbers[:,0]
editColumnThree = LastColumn - FirstColumn
for i in range(4):
theNumbers[i,3] = editColumnThree [i]
print(theNumbers)
>>>[[ 1 2 3 15]
[ 5 6 7 59]
[ 9 10 11 135]
[ 13 14 15 243]]

How to index a ndarray with another ndarray?

I am doing some machine learning stuff in python/numpy in which I want to index a 2-dimensional ndarray with a 1-D ndarray, so that I get a 1-D array with the indexed values.
I got it to work with some ugly piece of code and I would like to know if there is a better way, because this just seems unnatural for such a nice language and module combination as python+numpy.
a = np.arange(50).reshape(10, 5) # Array to be indexed
b = np.arange(9, -1, -2) # Indexing array
print(a)
print(b)
print(a[b, np.arange(0, a.shape[1]).reshape(1,a.shape[1])])
#Prints:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]
[45 46 47 48 49]]
[9 7 5 3 1]
[[45 36 27 18 9]]
This is exactly what I want(even though technically a 2-D ndarray), but this seems very complicated. Is there a neater and tidier way?
Edit:
To clarify, I actually I do not want a 1-D array, that was very poorly explained. I actually do want one dimension with length 1, because that is what I need for processing it later, but this would be easily achieved with a reshape() statement. Sorry for the confusion, I just mixed my actual code with the more general question.

You want a 1D array, yet you included a reshape call whose only purpose is to take the array from the format you want to a format you don't want.
Stop reshaping the arange output. Also, you don't need to specify the 0 start value explicitly:
result = a[b, np.arange(a.shape[1])]

You can just use np.diagonal to get what you want. No need of reshape or indexing. The tricky part here was to identify the pattern which you want to obtain which is basically the diagonal elements of a[b] matrix.
a = np.arange(50).reshape(10, 5) # Array to be indexed
b = np.arange(9, -1, -2) # Indexing array
print (np.diagonal(a[b]))
# [45 36 27 18 9]
As #user2357112 mentioned in the comments, the return of np.diagonal is read only. In my opinion, it would be a problem if you plan to append/modify the values to this final desired list. If you just want to print them or use them for some further indexing, it should be fine.
As per the docs
Starting in NumPy 1.9 it returns a read-only view on the original array. Attempting to write to the resulting array will produce an error.
In some future release, it will return a read/write view and writing to the returned array will alter your original array. The returned array will have the same type as the input array.
If you don’t write to the array returned by this function, then you can just ignore all of the above.

Python subscript syntax clarification

Can you clarify what the [:, :5] part of the code does in the following code segment?
for i in range(5):
weights = None
test_inputs = testset[i][:, :5]
test_inputs = test_inputs.astype(np.float32)
test_answer = testset[i][:, :5]
test_answer = code_answer(test_answer)

this is explained in the numpy indexing guide of the manual. this is not standard python syntax.
if you have an array a, a[:] returns a view (not a copy; assigning to this will change a) on the whole array; a[:5] a view on elements 0, 1, ..., 4.
numpy allows more-dimensional arrays to be indexed with a[i, j] instead of the pure python version a[i][j].
this should cover all your cases.

This code probably uses numpy arrays.
Numpy defines more elaborate array slicing, similar to Matlab style.
The [:, :5] means that from you array (probably a 2D array, returned from tester[i]) you take all the rows (designated by :) and then it takes columns from the beginning until but not including column 5.
Each part of the slicing expression follows regular Python slicing syntax.
The [:, :5] itself is actually interpreted like this [(:, :5)], since in Python a comma separated values without parenthesis are interpreted as a tuple.
The array object can handle tuples that denote complex slicing patterns.
This is in short the meaning of this syntax.
For more information, you may want to visit the numpy page, you can start at www.scipy.org.

As mentioned by others, the array being referenced is from the
numpy package (see it's home page)
An example should help. First create a multi-dimensional array structure similar to what the code provided is manipulating:
from numpy import *
a=arange(36).reshape(6,6)
b=a.reshape(6,3,2).swapaxes(0,2)
print b
[[[0 6 12 18 24 30]
[2 8 14 20 26 32]
[4 10 16 22 28 34]]
[[1 7 13 19 25 31]
[3 9 15 21 27 33]
[5 11 17 23 29 35]]]
The [:,:5] syntax is an array slicing mechanism that culls all
component array entries beyond the fifth:
print b[1][:,:5]
[[1 7 13 19 25]
[3 9 15 21 27]
[5 11 17 23 29]]

Why does numpy.concatenate work along axis=1 for small one dimensional arrays, but not larger ones?

I couldn't get my 4 arrays of year, day of year, hour, and minute to concatenate the way I wanted, so I decided to test several variations on shorter arrays than my data.
I found that it worked using method "t" from my test code:
import numpy as np
a=np.array([[1, 2, 3, 4, 5, 6]])
b=np.array([[11, 12, 13, 14, 15, 16]])
c=np.array([[21, 22, 23, 24, 25, 26]])
d=np.array([[31, 32, 33, 34, 35, 36]])
print a
print b
print c
print d
q=np.concatenate((a, b, c, d), axis=0)
#concatenation along 1st axis
print q
t=np.concatenate((a.T, b.T, c.T, d.T), axis=1)
#transpose each array before concatenation along 2nd axis
print t
x=np.concatenate((a, b, c, d), axis=1)
#concatenation along 2nd axis
print x
But when I tried this with the larger arrays it behaved the same as method "q".
I found an alternative approach of using vstack over here that did what I wanted, but I am trying to figure out why concatenation sometimes works for this, but not always.
Thanks for any insights.
Also, here are the outputs of the code:
q:
[[ 1 2 3 4 5 6]
[11 12 13 14 15 16]
[21 22 23 24 25 26]
[31 32 33 34 35 36]]
t:
[[ 1 11 21 31]
[ 2 12 22 32]
[ 3 13 23 33]
[ 4 14 24 34]
[ 5 15 25 35]
[ 6 16 26 36]]
x:
[[ 1 2 3 4 5 6 11 12 13 14 15 16 21 22 23 24 25 26 31 32 33 34 35 36]]
EDIT: I added method t to the end of a section of the code that was already fixed with vstack, so you can compare how vstack will work with this data but not concatenate. Again, to clarify, I found a workaround already, but I don't know why the concatenate method doesn't seem to be consistent.
Here is the code:
import numpy as np
BAO10m=np.genfromtxt('BAO_010_2015176.dat', delimiter=",", usecols=range(0-6), dtype=[('h', int), ('year', int), ('day', int), ('time', int), ('temp', float)])
#10 meter weather readings at BAO tower site for June 25, 2015
hourBAO=BAO10m['time']/100
minuteBAO=BAO10m['time']%100
#print hourBAO
#print minuteBAO
#time arrays
dayBAO=BAO10m['day']
yearBAO=BAO10m['year']
#date arrays
datetimeBAO=np.vstack((yearBAO, dayBAO, hourBAO, minuteBAO))
#t=np.concatenate((a.T, b.T, c.T, d.T), axis=1) <this gave desired results in simple tests
#not working for this data, use vstack instead, with transposition after stack
print datetimeBAO
test=np.transpose(datetimeBAO)
#rotate array
print test
#this prints something that can be used for datetime
t=np.concatenate((yearBAO.T, dayBAO.T, hourBAO.T, minuteBAO.T), axis=1)
print t
#this prints a 1D array of all the year values, then all the day values, etc...
#but this method worked for shorter 1D arrays
The file I used can be found at this site.

Python - Theano scan() function

I cannot fully understand the behaviour of theano.scan().
Here's an example:
import numpy as np
import theano
import theano.tensor as T
def addf(a1,a2):
return a1+a2
i = T.iscalar('i')
x0 = T.ivector('x0')
step= T.iscalar('step')
results, updates = theano.scan(fn=addf,
outputs_info=[{'initial':x0, 'taps':[-2]}],
non_sequences=step,
n_steps=i)
f=theano.function([x0,i,step],results)
print f([1,1],10,2)
The above snippet prints the following sequence, which is perfectly reasonable:
[ 3 3 5 5 7 7 9 9 11 11]
However if I switch the tap index from -2 to -1, i.e.
outputs_info=[{'initial':x0, 'taps':[-1]}]
The result becomes:
[[ 3 3]
[ 5 5]
[ 7 7]
[ 9 9]
[11 11]
[13 13]
[15 15]
[17 17]
[19 19]
[21 21]]
instead of what would seem reasonable to me (just take the last value of the vector and add 2):
[ 3 5 7 9 11 13 15 17 19 21]
Any help would be much appreciated.
Thanks!

When you use taps=[-1], scan suppose that the information in the output info is used as is. That mean the addf function will be called with a vector and the non_sequence as inputs. If you convert x0 to a scalar, it will work as you expect:
import numpy as np
import theano
import theano.tensor as T
def addf(a1,a2):
print a1.type
print a2.type
return a1+a2
i = T.iscalar('i')
x0 = T.iscalar('x0')
step= T.iscalar('step')
results, updates = theano.scan(fn=addf,
outputs_info=[{'initial':x0, 'taps':[-1]}],
non_sequences=step,
n_steps=i)
f=theano.function([x0,i,step],results)
print f(1,10,2)
This give this output:
TensorType(int32, scalar)
TensorType(int32, scalar)
[ 3 5 7 9 11 13 15 17 19 21]
In your case as it do addf(vector,scalar), it broadcast the elemwise value.
Explained in another way, if taps is [-1], x0 will be passed "as is" to the inner function. If taps contain anything else, what is passed to the inner function will have 1 dimension less then x0, as x0 must provide many initial steps value (-2 and -1).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bug (?) with numpy indexing - python

Related

Modifying Array Giving Back Wrong Output

How to index a ndarray with another ndarray?

Python subscript syntax clarification

Why does numpy.concatenate work along axis=1 for small one dimensional arrays, but not larger ones?

Python - Theano scan() function

Categories

Resources