I have an array X of <class 'scipy.sparse.csr.csr_matrix'> format with shape (44, 4095)
I would like to now to create a new numpy array say X_train = np.empty([44, 4095]) and copy row by row in a different order. Say I want the 5th row of X in 1st row of X_train.
How do I do this (copying an entire row into a new numpy array) similar to matlab?
Define the new row order as a list of indices, then define X_train using integer indexing:
row_order = [4, ...]
X_train = X[row_order]
Note that unlike Matlab, Python uses 0-based indexing, so the 5th row has index 4.
Also note that integer indexing (due to its ability to select values in arbitrary order) returns a copy of the original NumPy array.
This works equally well for sparse matrices and NumPy arrays.
Python works generally by reference, which is something you should keep in mind. What you need to do is make a copy and then swap. I have written a demo function which swaps rows.
import numpy as np # import numpy
''' Function which swaps rowA with rowB '''
def swapRows(myArray, rowA, rowB):
temp = myArray[rowA,:].copy() # create a temporary variable
myArray[rowA,:] = myArray[rowB,:].copy()
myArray[rowB,:]= temp
a = np.arange(30) # generate demo data
a = a.reshape(6,5) # reshape the data into 6x5 matrix
print a # prin the matrix before the swap
swapRows(a,0,1) # swap the rows
print a # print the matrix after the swap
To answer your question, one solution would be to use
X_train = np.empty([44, 4095])
X_train[0,:] = x[4,:].copy() # store in the 1st row the 5th one
unutbu answer seems to be the most logical.
Kind Regards,
Related
In the below snippet of the code (which is written just to show the issue), I have initially created a 2D array of size 3*4. Now during the execution of the code, at some time step, I need to change the number of columns in the third row from 4 to 2. I tried by the folowing way but it is showing the value error. How to do this ? Can somebody describe ?
import numpy as np
A=np.ones((3,4))
# Some other portion of the code
for i in range(0,3):
if(i==2): # In the third row
A[i,:]=np.ones(2) # Change the size of this third row.Now only need two elements (two 1's) in it
print(A)
ValueError: could not broadcast input array from shape (2) into shape (4)
Because Numpy only support the new row has the same dimension as the input of column (4 for your example).
You can change from Numpy to list.
import numpy as np
A=np.ones((3,4))
A = A.tolist()
for i in range(0,3):
if(i==2): # In the third row
A[i] = [1]*2
print(A)
You cannot change the dimension of a structured data table,
You could try:
A=np.ones((3,4))
row_no = 2
for i in range(0,row_no+1):
if(i==row_no):
A[i,:row_no]=np.ones(2)
A[i,row_no+1:]= np.nan #or 0 or somr other placeholder
print(A)
I am quite new to numpy and python in general. I am getting a dimension mismatch error when I try to append values even though I have made sure that both arrays have the same dimension. Also another question I have is why does numpy create a single dimensional array when reading in data from a tab delimited text file.
import numpy as np
names = ["Angle", "RX_Power", "Frequency"]
data = np.array([0,0,0],float) #experimental
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', names = names, usecols=[0,1,2], skip_header=1)
freq_177 = np.zeros(shape=(data.shape))
print(freq_177.shape) #outputs(315,)
for i in range(len(data)):
if data[i][2] == 177:
#np.concatenate(freq_177,data[i]) has same issue
np.append(freq_177,data[i],0)
The output I am getting is
all the input arrays must have same number of dimensions
Annotated code:
import numpy as np
names = ["Angle", "RX_Power", "Frequency"]
You don't need to 'initialize' an array - unless you are going to assign values to individual elements.
data = np.array([0,0,0],float) #experimental
This data assignment completely overwrites the previous one.
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', names = names, usecols=[0,1,2], skip_header=1)
Look at data at this point. What is data.shape? What is data.dtype? Print it, or at least some elements. With names I'm guessing that this is a 1d array, with a 3 field dtype. It's not a 2d array, though, with all floats it could transformed/view as such.
Why are you making a 1d array of zeros?
freq_177 = np.zeros(shape=(data.shape))
print(freq_177.shape) #outputs(315,)
With a structured array like data, the preferred way to index a given element is by field name and row number, eg. data['frequency'][i]`. Play with that.
np.append is not the same as the list append. It returns a value; it does not change freq_177 in place. Same for concatenate. I recommend staying away from np.append. It's too easy to use it in the wrong way and place.
for i in range(len(data)):
if data[i][2] == 177:
#np.concatenate(freq_177,data[i]) has same issue
np.append(freq_177,data[i],0)
It looks like you want to collect in freq_177 all the terms of the data array for which the 'frequency' field is 177.
I = data['frequency'].astype(int)==177
freq_177 = data[I]
I have used astype(int) because the == test with floats is uncertain. It is best used with integers.
I is a boolean mask, true where the values match; data[I] then is the corresponding elements of data. The dtype will match that of data, that is, it will have 3 fields. You can't append or concatenate it to an array of float zeros (your original freq_177).
If you must iterate and collect values, I suggest using list append, e.g.
alist = []
for row in data:
if int(row['frequency'])==177:
alist.append(row)
freq177 = np.array(alist)
I don't think np.append is discussed much except in its own doc page and text. It comes up periodically in SO questions.
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.append.html
Returns: append : ndarray
A copy of arr with values appended to axis. Note that append does not occur in-place: a new array is allocated and filled.
See also help(np.append) in an interpreter shell.
For genfromtxt - it too has docs, and lots of SO discussion. But to understand what it returned in this case, you need to also read about structured arrays and compound dtype. (add links?)
Try loading the data with:
data = np.genfromtxt("rx_power_mode 0.txt", dtype=float, delimiter='\t', usecols=[0,1,2], skip_header=1)
Since you are skipping the header line, and just using columns with floats, data should be a 2d array with 3 columns, (N, 3). In that case you could access the 'frequency' values with data[:,2]
I = int(data[:,2])==177
freq_177 = data[I,:]
freq_177 is now be a 3 column array - with a subset of the data rows.
I have a text file with 93 columns and 1699 rows that I have imported into Python. The first three columns do not contain data that is necessary for what I'm currently trying to do. Within each column, I need to divide each element (aka row) in the column by all of the other elements (rows) in that same column. The result I want is an array of 90 elements where each of 1699 elements has 1699 elements.
A more detailed description of what I'm attempting: I begin with Column3. At Column3, Row1 is to be divided by all the other rows (including the value in Row1) within Column3. That will give Row1 1699 calculations. Then the same process is done for Row2 and so on until Row1699. This gives Column3 1699x1699 calculations. When the calculations of all of the rows in Column 3 have completed, then the program moves on to do the same thing in Column 4 for all of the rows. This is done for all 90 columns which means that for the end result, I should have 90x1699x1699 calculations.
My code as it currently is is:
import numpy as np
from glob import glob
fnames = glob("NIR_data.txt")
arrays = np.array([np.loadtxt(f, skiprows=1) for f in fnames])
NIR_values = np.concatenate(arrays)
NIR_band = NIR_values.T
C_values = []
for i in range(3,len(NIR_band)):
for j in range(0,len(NIR_band[3])):
loop_list = NIR_band[i][j]/NIR_band[i,:]
C_values.append(loop_list)
What it produces is an array of 1699x1699 dimension. Each individual array is the results from the Row calculations. Another complaint is that the code takes ages to run. So, I have two questions, is it possible to create the type of array I'd like to work with? And, is there a faster way of coding this calculation?
Dividing each of the numbers in a given column by each of the other values in the same column can be accomplished in one operation as follows.
result = a[:, numpy.newaxis, :] / a[numpy.newaxis, :, :]
Because looping over the elements happens in the optimized binary depths of numpy, this is as fast as Python is ever going to get for this operation.
If a.shape was [1699,90] to begin with, then the result will have shape [1699,1699,90]. Assuming dtype=float64, that means you will need nearly 2 GB of memory available to store the result.
First let's focus on the load:
arrays = np.array([np.loadtxt(f, skiprows=1) for f in fnames])
NIR_values = np.concatenate(arrays)
Your text talks about loading a file, and manipulating it. But this clip loads multple files and joins them.
My first change is to collect the arrays in a list, not another array
alist = [np.loadtxt(f, skiprows=1) for f in fnames]
If you want to skip some columns, look at using the usecols parameter. That may save you work later.
The elements of alist will now be 2d arrays (of floats). If they are matching sizes (N,M), they can be joined in various ways. If there are n files, then
arrays = np.array(alist) # (n,N,M) array
arrays = np.concatenate(alist, axis=0) # (n*N, M) array
# similarly for axis=1
Your code does the same, but potentially confuses steps:
In [566]: arrays = np.array([np.ones((3,4)) for i in range(5)])
In [567]: arrays.shape
Out[567]: (5, 3, 4) # (n,N,M) array
In [568]: NIR_values = np.concatenate(arrays)
In [569]: NIR_values.shape
Out[569]: (15, 4) # (n*N, M) array
NIR_band is now (4,15), and it's len() is the .shape[0], the size of the 1st dimension.len(NIR_band[3])isshape[1]`, the size of the 2nd dimension.
You could skip the columns of NIR_values with NIR_values[:,3:].
I get lost in the rest of text description.
The NIR_band[i][j]/NIR_band[i,:], I would rewrite as NIR_band[i,j]/NIR_band[i,:]. What's the purpose of that?
As for you subject line, Storing multiple arrays within multiple arrays within an array - that sounds like making a 3 or 4d array. arrays is 3d, NIR_valus is 2d.
Creating a (90,1699,1699) from a (93,1699) will probably involve (without iteration) a calculation analogous to:
In [574]: X = np.arange(13*4).reshape(13,4)
In [575]: X.shape
Out[575]: (13, 4)
In [576]: (X[3:,:,None]+X[3:,None,:]).shape
Out[576]: (10, 4, 4)
The last dimension is expanded with None (np.newaxis), and 2 versions broadcasted against each other. np.outer does the multiplication of this calculation.
I am having a issues figuring out to do this operation
So I have and the variable index 1xM sparse binary array and I have a 2-d array (NxM) samples. I want to use index to select specific rows of samples adnd get a 2-d array.
I have tried stuff like:
idx = index.todense() == 1
samples[idx.T,:]
but nothing.
So far I have made it work doing this:
idx = test_x.todense() == 1
selected_samples = samples[np.array(idx.flat)]
But there should be a cleaner way.
To give an idea using a fraction of the data:
print(idx.shape) # (1, 22360)
print(samples.shape) (22360, 200)
The short answer:
selected_samples = samples[index.nonzero()[1]]
The long answer:
The first problem is that your index matrix is 1xN while your sample ndarray is NxM. (See the mismatch?) This is why you needed to call .flat.
That's not a big deal, though, because we just need the nonzero entries in the sparse vector. Get those with index.nonzero(), which returns a tuple of (row indices, column indices). We only care about the column indices, so we use index.nonzero()[1] to get those by themselves.
Then, simply index with the array of nonzero column indices and you're done.
By binary matrix, I mean every element in the matrix is either 0 or 1, and I use the Matrix class in numpy for this.
First of all, is there a specific type of matrix in numpy for it, or do we simply use a matrix that is populated with 0s and 1s?
Second, what is the quickest way for creating a square matrix full of 0s given its dimension with the Matrix class? Note: numpy.zeros((dim, dim)) is not what I want, as it creates a 2-D array with float 0.
Third, I want to get and set any given row of the matrix frequently. For get, I can think of using row = my_matrix.A[row_index].tolist(), which will return a list representation of the given row. For set, it seems that I can just do my_matrix[row_index] = row_list, with row_list being a list of the same length as the given row. Again, I wonder whether they are the most efficient methods for doing the jobs.
To make a numpy array whose elements can be either 0 or 1, use the dtype = 'bool' parameter:
arr = np.zeros((dim,dim), dtype = 'bool')
Or, to convert arr to a numpy matrix:
arr = np.matrix(arr)
To access a row:
arr[row_num]
and to set a row:
arr[row_num] = new_row
is the quickest way.