Python: Convert 2D array to seperate arrays - python

I have read in a numpy.ndarray that looks like this:
[[1.4600e-01 2.9575e+00 6.1580e+02]
[5.8600e-01 4.5070e+00 8.7480e+02]]
Let's assume that this array I am reading will not always have a length of 2. (e.g. It could have a length of 1,3, 456, etc.)
I would like to separate this to two separate arrays that look like this:
[[1.4600e-01 2.9575e+00 6.1580e+02]]
[[5.8600e-01 4.5070e+00 8.7480e+02]]
I previously tried searching a solution to this problem but this is not the solution I am looking for: python convert 2d array to 1d array

Since you want to extract the rows, you can just index them. So suppose your array is stored in the variable x. x[0] will give you the first row: [1.4600e-01 2.9575e+00 6.1580e+02], while x[1] will give you the second row: [5.8600e-01 4.5070e+00 8.7480e+02], etc.
You can also iterate over the rows doing something like:
for row in x:
# Do stuff with the row
If you really want to preserve the outer dimension, you reshape the rows using x[0].reshape((1,-1)) which says to set the first dimension to 1 (meaning it has 1 row) and infer the second dimension from the existing data.
Alternatively if you want to split some number of rows into n groups, you can use the numpy.vsplit() function: https://numpy.org/doc/stable/reference/generated/numpy.vsplit.html#numpy.vsplit
However, I would suggest looping over the rows instead of splitting them up unless you really need to split them up.

Related

Understanding np.ix_

Code:
import numpy as np
ray = [1,22,33,42,51], [61,71,812,92,103], [113,121,132,143,151], [16,172,183,19,201]
ray = np.asarray(ray)
type(ray)
ray[np.ix_([-2:],[3:4])]
I'd like to use index slicing and get a subarray consisting of the last two rows and the 3rd/4th columns. My current code produces an error:
I'd also like to sum each column. What am I doing wrong? I cannot post a picture because I need at least 10 reputation points.
So you want to make a slice of an array. The most straightforward way to do it is... slicing:
slice = ray[-2:,3:]
or if you want it explicitly
slice = ray[-2:,3:5]
See it explained in Understanding slicing
But if you do want to use np.ix_ for some reason, you need
slice = ray[np.ix_([-2,-1],[3,4])]
You can't use : here, because [] here don't make a slice, they construct lists and you should specify explicitly every row number and every column number you want in the result. If there are too many consecutive indices, you may use range:
slice = ray[np.ix_(range(-2, 0),range(3, 5))]
And to sum each column:
slice.sum(0)
0 means you want to reduce the 0th dimension (rows) by summation and keep other dimensions (columns in this case).

Efficient way of converting a numpy array of 2 dimensions into a list with no duplicates

I want to extract the values from two different columns of a pandas dataframe, put them in a list with no duplicate values.
I have tried the following:
arr = df[['column1', 'column2']].values
thelist= []
for ix, iy in np.ndindex(arr.shape):
if arr[ix, iy] not in thelist:
thelist.append(edges[ix, iy])
This works but it is taking too long. The dataframe contains around 30 million rows.
Example:
column1 column2
1 adr1 adr2
2 adr1 adr2
3 adr3 adr4
4 adr4 adr5
Should generate the list with the values:
[adr1, adr2, adr3, adr4, adr5]
Can you please help me find a more efficient way of doing this, considering that the dataframe contains 30 million rows.
#ALollz gave a right answer. I'll extend from there. To convert into list as expected just use list(np.unique(df.values))
You can use just np.unique(df) (maybe this is the shortest version).
Formally, the first parameter of np.unique should be an array_like object,
but as I checked, you can also pass just a DataFrame.
Of course, if you want just plain list not a ndarray, write
np.unique(df).tolist().
Edit following your comment
If you want the list unique but in the order of appearance, write:
pd.DataFrame(df.values.reshape(-1,1))[0].drop_duplicates().tolist()
Operation order:
reshape changes the source array into a single column.
Then a DataFrame is created, with default column name = 0.
Then [0] takes just this (the only) column.
drop_duplicates acts exactly what the name says.
And the last step: tolist converts to a plain list.

sort an array by row in python

I understood that sorting a numpy array arr by column (for only a particular column, for example, its 2nd column) can be done with:
arr[arr[:,1].argsort()]
How I understood this code sample works: argsort sorts the values of the 2nd column of arr, and gives the corresponding indices as an array. This array is given to arr as row numbers. Am I correct in my interpretation?
Now I wonder what if I want to sort the array arr with respect to the 2nd row instead of the 2nd column? Is the simplest way to transpose the array before sorting it and transpose it back after sorting, or is there a way to do it like previously (by giving an array with the number of the columns we wish to display)?
Instead of doing (n,n)array[(n,)array] (n is the size of the 2d array) I tried to do something like (n,n)array[(n,1)array] to indicate the numbers of the columns but it does not work.
EXAMPLE of what I want:
arr = [[11,25],[33,4]] => base array
arr_col2=[[33,4],[11,25]] => array I got with argsort()
arr_row2=[[25,11],[4,33]] => array I tried to got in a simple way with argsort() but did not succeed
I assume that arr is a numpy array? I haven't seen the syntax arr[:,1] in any other context in python. It would be worth mentioning this in your question!
Assuming this is the case, then you should be using
arr.sort(axis=0)
to sort by column and
arr.sort(axis=1)
to sort by row. (Both sort in-place, i.e. change the value of arr. If you don't want this you can copy arr into another variable first, and apply sort to that.)
If you want to sort just a single row (in this case, the second one) then
arr[1,:].sort()
works.
Edit: I now understand what problem you are trying to solve. You would like to reorder the columns in the matrix so that the nth row goes in increasing order. You can do this simply by
arr[:,arr[1,:].argsort()]
(where here we're sorting by the 2nd row).

How to apply the output of numpy.argpartition for 2-D Arrays?

I have a largish 2d numpy array, and I want to extract the lowest 10 elements of each row as well as their indexes. Since my array is largish, I would prefer not to sort the whole array.
I heard about the argpartition() function, with which I can get the indexes of the lowest 10 elements:
top10indexes = np.argpartition(myBigArray,10)[:,:10]
Note that argpartition() partitions axis -1 by default, which is what I want. The result here has the same shape as myBigArray containing indexes into the respective rows such that the first 10 indexes point to the 10 lowest values.
How can I now extract the elements of myBigArray corresponding to those indexes?
Obvious fancy indexing like myBigArray[top10indexes] or myBigArray[:,top10indexes] do something quite different. I could also use list comprehensions, something like:
array([row[idxs] for row,idxs in zip(myBigArray,top10indexes)])
but that would incur a performance hit iterating numpy rows and converting the result back to an array.
nb: I could just use np.partition() to get the values, and they may even correspond to the indexes (or may not..), but I don't want to do the partition twice if I can avoid it.
You can avoid using the flattened copies and the need to extract all the values by doing:
num = 10
top = np.argpartition(myBigArray, num, axis=1)[:, :num]
myBigArray[np.arange(myBigArray.shape[0])[:, None], top]
For NumPy >= 1.9.0 this will be very efficient and comparable to np.take().

Change dtype for particular values in numpy array?

I have a numpy array x, dimensions = (20, 4), in which only the first row and column are real string values (alphabets) and rest of the values are numerals with their types allocated as string. I want to change these numeral values to float or integer type.
I have tried some steps:
a. I made copies of first row and column of the array as separate variables:
x_row = x[0]
x_col = x[:,0]
Then deleted them from the original array x (using numpy.delete() method) and convertd the type of remaining values by applying a for loop that iterates over each value. However, when I stack back the copied rows and columns using numpy.vstack() and numpy.hstack(), then everything again converts to strings type. So, not sure why this is happening.
b. Same procedure as point a, except I used numpy.insert() method for inserting rows and columns, but is doing the same thing - converting everything back to string type.
So, is there a way through which I don't have to go through this deleting and stacking mechanism (which isn't working anyways) and I can change all the values (except first row and column) of an array to int() or float() type?
All items in a numpy array have to have the same dtype. That is a fundamental fact about numpy. You could possibly use a numpy recarray, or you could use dtype=object which basically lets all values be anything.
I'd recommend you take a look at pandas, which provides a tabular data structure that allows different columns to have different types. It sounds like what you have is a table with row and column labels, and that's what pandas deals with nicely.

Categories

Resources