Calling certain length of elements in a list in Python - python

Python
I have a list with 5 elements. Each element has 2042 rows and 2 columns(2042,2).The variable type is numpy.ndarray. I want to call on the first element (data[0]) and the first 1440 rows in the second column. So the format right now is data[0][0,1], which gives me the value in the first row and second column for the first element. How do I write a line to give me the first 1440 rows?
Something like data[0][0:1439,1], which I know doesn't work--but to that effect.

If it's a numpy array, then
data[0][:1440,1]
Should do the job.

Well, its difficult to assume what has been your approach.
But taking that:
data = [[[i for i in range(2042)] for i in range(2)] for i in range(5)]
The best aproximation to obtain the first 1440 rows in second column, of first element is:
data[0][1][:1440]
Regards!

Related

Python: Convert 2D array to seperate arrays

I have read in a numpy.ndarray that looks like this:
[[1.4600e-01 2.9575e+00 6.1580e+02]
[5.8600e-01 4.5070e+00 8.7480e+02]]
Let's assume that this array I am reading will not always have a length of 2. (e.g. It could have a length of 1,3, 456, etc.)
I would like to separate this to two separate arrays that look like this:
[[1.4600e-01 2.9575e+00 6.1580e+02]]
[[5.8600e-01 4.5070e+00 8.7480e+02]]
I previously tried searching a solution to this problem but this is not the solution I am looking for: python convert 2d array to 1d array
Since you want to extract the rows, you can just index them. So suppose your array is stored in the variable x. x[0] will give you the first row: [1.4600e-01 2.9575e+00 6.1580e+02], while x[1] will give you the second row: [5.8600e-01 4.5070e+00 8.7480e+02], etc.
You can also iterate over the rows doing something like:
for row in x:
# Do stuff with the row
If you really want to preserve the outer dimension, you reshape the rows using x[0].reshape((1,-1)) which says to set the first dimension to 1 (meaning it has 1 row) and infer the second dimension from the existing data.
Alternatively if you want to split some number of rows into n groups, you can use the numpy.vsplit() function: https://numpy.org/doc/stable/reference/generated/numpy.vsplit.html#numpy.vsplit
However, I would suggest looping over the rows instead of splitting them up unless you really need to split them up.

Printing a Python Array

I have the array below that represent a matrix of 20 cols x 10 rows.
What I am trying to do is to get the value located on the third position after I provide the Column and Row Values. For example if I type in the values 3 and 0, I expect to get 183 as answer. I used the print command as follows print(matrix[3][0][I don't know]) either I get out of range or the undesirable results.
I also organized the data as matrix[[[0],[0],[180]], [[1],[0],[181]], [[2],[0],[182]],... without too much success.
I have the matrix data on a csv file, so I can formatted accordingly if the problem is the way I am presenting the data.
Can soomeone, please, take a look to this code and direct me? Thanks
matrix =[]
matrix =[
['0','0','180'],
['1','0','181'],
['2','0','182'],
['3','0','183'],
['4','0','184'],
['5','0','185'],
['6','0','186'],
['7','0','187'],
['18','0','198']]
print(matrix[?][?][value])
your matrix here is 9 * 3
if you want the 185, it's in the 6th row 3rd column, so indexes are 5 and 2 respectively.
matrix[5][2] will print the result, idk why you have a 3rd bracket.
basically to access an element you will do [rowNumber][colNumber] , first brackets will give you whatever is in that position of the big array (a 2 d array is just an array of arrays) so you get an array (1D with 3 element) you then put the index of the element in that 1D array.

min of all columns of the dataframe in a range

I want to find the min value of every row of a dataframe restricting to only few columns.
For example: consider a dataframe of size 10*100. I want the min of middle 5 rows and this becomes of size 10*5.
I know to find the min using df.min(axis=0) but i dont know how to restrict the number of columns. Thanks for the help.
I use pandas lib.
You can start by selecting the slice of columns you are interested in and applying DataFrame.min() to only that selection:
df.iloc[:, start:end].min(axis=0)
If you want these to be the middle 5, simply find the integer indices which correspond to the start and end of that range:
start = int(n_columns/2 - 2.5)
end = start + 5
Following the 'pciunkiewicz's logic:
First you should select the columns that you desire. You can use the functions: .loc[..] or .iloc[..].
The first one you can use the names of the columns. When it takes 2 arguments, the first one is the row's index. The second is the columns.
df.loc[[rows], [columns]] # The filter data should be inside the brakets.
df.loc[:, [columns]] # This will consider all rows.
You can also use .iloc. In this case, you have to use integers to locate the data. So you don't have to know the name of the columns, but their position.

Finding the index of the maximum number in a python matrix which includes strings

I understand that
np.argmax(np.max(x, axis=1))
returns the index of the row that contains the maximum value and
np.argmax(np.max(x, axis=0))
returns the index of the row that contains the maximum value.
But what if the matrix contained strings? How can I change the code so that it still finds the index of the largest value?
Also (if there's no way to do what I previously asked for), can I change the code so that the operation is only carried out on a sub-section of the matrix, for instance, on the bottom right '2x2' sub-matrix in this example:
array = [['D','F,'J'],
['K',3,4],
['B',3,1]]
[[3,4],
[3,1]]
Can you try first converting the column to type dtype? If you take the min/max of a dtype column, it should use string values for the minimum/maximum.
Although not efficient, this could be one way to find index of the maximum number in the original matrix by using slices:
newmax=0
newmaxrow=0
newmaxcolumn=0
for row in [array[i][1:] for i in range(1,2)]:
for num in row:
if num>newmax:
newmax=num
newmaxcolumn=row.index(newmax)+1
newmaxrow=[array[i][1:] for i in range(1,2)].index(row)+1
Note: this method would not work if the lagest number lies within row 0 or column 0.

Efficient way of converting a numpy array of 2 dimensions into a list with no duplicates

I want to extract the values from two different columns of a pandas dataframe, put them in a list with no duplicate values.
I have tried the following:
arr = df[['column1', 'column2']].values
thelist= []
for ix, iy in np.ndindex(arr.shape):
if arr[ix, iy] not in thelist:
thelist.append(edges[ix, iy])
This works but it is taking too long. The dataframe contains around 30 million rows.
Example:
column1 column2
1 adr1 adr2
2 adr1 adr2
3 adr3 adr4
4 adr4 adr5
Should generate the list with the values:
[adr1, adr2, adr3, adr4, adr5]
Can you please help me find a more efficient way of doing this, considering that the dataframe contains 30 million rows.
#ALollz gave a right answer. I'll extend from there. To convert into list as expected just use list(np.unique(df.values))
You can use just np.unique(df) (maybe this is the shortest version).
Formally, the first parameter of np.unique should be an array_like object,
but as I checked, you can also pass just a DataFrame.
Of course, if you want just plain list not a ndarray, write
np.unique(df).tolist().
Edit following your comment
If you want the list unique but in the order of appearance, write:
pd.DataFrame(df.values.reshape(-1,1))[0].drop_duplicates().tolist()
Operation order:
reshape changes the source array into a single column.
Then a DataFrame is created, with default column name = 0.
Then [0] takes just this (the only) column.
drop_duplicates acts exactly what the name says.
And the last step: tolist converts to a plain list.

Categories

Resources