numpy.take range of array elements Python - python

I have an array of integers.
data = [10,20,30,40,50,60,70,80,90,100]
I want to extract a range of integers from the array and get a smaller array.
data_extracted = [20,30,40]
I tried numpy.take.
data = [10,20,30,40,50,60,70,80,90,100]
start = 1 # index of starting data entry (20)
end = 3 # index of ending data entry (40)
data_extracted = np.take(data,[start:end])
I get a syntax error pointing to the : in numpy.take.
Is there a better way to use numpy.take to store part of an array in a separate array?

You can directly slice the list.
import numpy as np
data = [10,20,30,40,50,60,70,80,90,100]
data_extracted = np.array(data[1:4])
Also, you do not need to use numpy.array, you could just store the data in another list:
data_extracted = data[1:4]
If you want to use numpy.take, you have to pass it a list of the desired indices as second argument:
import numpy as np
data = [10,20,30,40,50,60,70,80,90,100]
data_extracted = np.take(data, [1, 2, 3])
I do not think numpy.take is needed for this application though.

You ought to just use a slice to get a range of indices, there is no need for numpy.take, which is intended as a shortcut for fancy indexing.
data_extracted = data[1:4]

As others have mentioned, you can use fancy indexing in this case. However, if you need to use np.take because e.g. the axis you're slicing over is variable, you might try:
axis=0
data.take(range(1,4), axis=axis)
Note: this might be slower than:
data_extracted = data[1:4]

Related

How to sort a numpy array by a nested dtype?

How do you sort a numpy array by a nested dtype?
I want to sort a numpy array by the first element inside the array
import numpy as np
from random import randint
# create dummy data
test = np.array([[[randint(1, 10) for _ in range(3)]] for _ in range(10)])
dtype = [('response', [('x', 'f'),('y', 'f'),('x', 'f')])]
# convert over to dtype
test.astype(dtype)
How do I sort on a nested key? as the below doesn't work
np.sort(a, order='response.x')
What I would like to achieve is to use np.sort in the same way I would use sorted for a list
a_list = [[[randint(1, 10) for _ in range(3)]
sorted(a_list,key=lambda x: (x[0][0]))
But I would like to use np.sort as this is a sample of a much more complicated problem where I only have access to numpy arrays and would like to work with the numpy methods.
I understand you have a tensor (three nested arrays), and each array element is a structured data type with nested fields response.x, response.y and response.z.
Since you have a tensor, you can order it in three dimensions (rows, columns and Z-dimension). Default numpy sort behavior is to sort the last dimension, i.e. the innermost array.
To get the sorted indices of the nested data structure, we can use numpy.argsort(). It gives you the sorted indices. For example, the following orders the response.xs along the outermost array (i.e. compares rows with each other):
order = np.argsort(test['response']['x'], axis=0)
You can then use these indices to get the original array ordered. To get the same behavior as np.sort, you would use numpy.take_along_axis() with the same axis argument:
np.take_along_axis(A, order, axis=0)
Note that this orders individual elements along the rows. This is the same behavior that np.sort(..., axis=0) would give you.
However, it seems to me you want to order the entire inner arrays by their first element, i.e. no individual reordering in the inner arrays. To do so, row-by-row, you would do:
test[order[:,0,0]]
This orders the outermost arrays as a whole by the first innermost item (0, 0).
The default behavior of np.sort would be to order the last dimension, this would be achieved by changing the axis argument in the first example above to -1 (or removing it completely) in both usages above.

Pulling elements in order based on first element using key array

I'm looking for a vectorized approach for the following problem:
Suppose I have two arrays, one with a bunch of non-contiguous ids in the first column and some data in the remaining columns, and a second array suggesting which datalines I need to pull:
data_array = np.array([[101,4],[102,7],[201,2],[203,9],[403,12]])
key_array = np.array([101,403,201])
The output must stay in the order given by the key_array, leading to the following:
output_array = np.array([[101,4],[403,12],[201,2]])
I can easily do this through a list comprehension:
output_array = np.array([data_array[i==data_array[:,0]][0] for i in key_array])
but this is not a vectorized solution. Using the numpy isin() is very close to working, but does not preserve the given order:
data_array[np.isin(data_array[:,0],key_array)]
#[[101 4]
# [201 2] not the order given by the key_array!
# [403 12]]
I tried making the above work by some use of argsort(), haven't been able to get anything working. Any help would be greatly appreciated.
We can use np.searchsorted -
s = data_array[:,0].argsort()
out = data_array[s[np.searchsorted(data_array[:,0],key_array,sorter=s)]]
If the first column of data_array is already sorted, simplifies to one-liner -
out = data_array[np.searchsorted(data_array[:,0],key_array)]

numpy array dimension mismatch error

I am quite new to numpy and python in general. I am getting a dimension mismatch error when I try to append values even though I have made sure that both arrays have the same dimension. Also another question I have is why does numpy create a single dimensional array when reading in data from a tab delimited text file.
import numpy as np
names = ["Angle", "RX_Power", "Frequency"]
data = np.array([0,0,0],float) #experimental
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', names = names, usecols=[0,1,2], skip_header=1)
freq_177 = np.zeros(shape=(data.shape))
print(freq_177.shape) #outputs(315,)
for i in range(len(data)):
if data[i][2] == 177:
#np.concatenate(freq_177,data[i]) has same issue
np.append(freq_177,data[i],0)
The output I am getting is
all the input arrays must have same number of dimensions
Annotated code:
import numpy as np
names = ["Angle", "RX_Power", "Frequency"]
You don't need to 'initialize' an array - unless you are going to assign values to individual elements.
data = np.array([0,0,0],float) #experimental
This data assignment completely overwrites the previous one.
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', names = names, usecols=[0,1,2], skip_header=1)
Look at data at this point. What is data.shape? What is data.dtype? Print it, or at least some elements. With names I'm guessing that this is a 1d array, with a 3 field dtype. It's not a 2d array, though, with all floats it could transformed/view as such.
Why are you making a 1d array of zeros?
freq_177 = np.zeros(shape=(data.shape))
print(freq_177.shape) #outputs(315,)
With a structured array like data, the preferred way to index a given element is by field name and row number, eg. data['frequency'][i]`. Play with that.
np.append is not the same as the list append. It returns a value; it does not change freq_177 in place. Same for concatenate. I recommend staying away from np.append. It's too easy to use it in the wrong way and place.
for i in range(len(data)):
if data[i][2] == 177:
#np.concatenate(freq_177,data[i]) has same issue
np.append(freq_177,data[i],0)
It looks like you want to collect in freq_177 all the terms of the data array for which the 'frequency' field is 177.
I = data['frequency'].astype(int)==177
freq_177 = data[I]
I have used astype(int) because the == test with floats is uncertain. It is best used with integers.
I is a boolean mask, true where the values match; data[I] then is the corresponding elements of data. The dtype will match that of data, that is, it will have 3 fields. You can't append or concatenate it to an array of float zeros (your original freq_177).
If you must iterate and collect values, I suggest using list append, e.g.
alist = []
for row in data:
if int(row['frequency'])==177:
alist.append(row)
freq177 = np.array(alist)
I don't think np.append is discussed much except in its own doc page and text. It comes up periodically in SO questions.
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.append.html
Returns: append : ndarray
A copy of arr with values appended to axis. Note that append does not occur in-place: a new array is allocated and filled.
See also help(np.append) in an interpreter shell.
For genfromtxt - it too has docs, and lots of SO discussion. But to understand what it returned in this case, you need to also read about structured arrays and compound dtype. (add links?)
Try loading the data with:
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', usecols=[0,1,2], skip_header=1)
Since you are skipping the header line, and just using columns with floats, data should be a 2d array with 3 columns, (N, 3). In that case you could access the 'frequency' values with data[:,2]
I = int(data[:,2])==177
freq_177 = data[I,:]
freq_177 is now be a 3 column array - with a subset of the data rows.

Isolating a column out of a numpy array using a variable?

I am trying to isolate the last column of a numpy array. However, the function needs to work for arrays of different sizes. When I put it like this:
array[:,array_length]
#array_length is a variable set to the length of one row of the array
which seems like it would work, it returns an error telling me that I can't slice with a variable, but only with an integer.
Is there a way to do this with numpy that I'm not seeing?
To access the last column of a numpy array, you can use -1
last_col = array[:, -1]
Or you can also do
array_length = len(array[0]) - 1
last_col = array[:, array_length]

Copying row element in a numpy array

I have an array X of <class 'scipy.sparse.csr.csr_matrix'> format with shape (44, 4095)
I would like to now to create a new numpy array say X_train = np.empty([44, 4095]) and copy row by row in a different order. Say I want the 5th row of X in 1st row of X_train.
How do I do this (copying an entire row into a new numpy array) similar to matlab?
Define the new row order as a list of indices, then define X_train using integer indexing:
row_order = [4, ...]
X_train = X[row_order]
Note that unlike Matlab, Python uses 0-based indexing, so the 5th row has index 4.
Also note that integer indexing (due to its ability to select values in arbitrary order) returns a copy of the original NumPy array.
This works equally well for sparse matrices and NumPy arrays.
Python works generally by reference, which is something you should keep in mind. What you need to do is make a copy and then swap. I have written a demo function which swaps rows.
import numpy as np # import numpy
''' Function which swaps rowA with rowB '''
def swapRows(myArray, rowA, rowB):
temp = myArray[rowA,:].copy() # create a temporary variable
myArray[rowA,:] = myArray[rowB,:].copy()
myArray[rowB,:]= temp
a = np.arange(30) # generate demo data
a = a.reshape(6,5) # reshape the data into 6x5 matrix
print a # prin the matrix before the swap
swapRows(a,0,1) # swap the rows
print a # print the matrix after the swap
To answer your question, one solution would be to use
X_train = np.empty([44, 4095])
X_train[0,:] = x[4,:].copy() # store in the 1st row the 5th one
unutbu answer seems to be the most logical.
Kind Regards,

Categories

Resources