I have the array below that represent a matrix of 20 cols x 10 rows.
What I am trying to do is to get the value located on the third position after I provide the Column and Row Values. For example if I type in the values 3 and 0, I expect to get 183 as answer. I used the print command as follows print(matrix[3][0][I don't know]) either I get out of range or the undesirable results.
I also organized the data as matrix[[[0],[0],[180]], [[1],[0],[181]], [[2],[0],[182]],... without too much success.
I have the matrix data on a csv file, so I can formatted accordingly if the problem is the way I am presenting the data.
Can soomeone, please, take a look to this code and direct me? Thanks
matrix =[]
matrix =[
['0','0','180'],
['1','0','181'],
['2','0','182'],
['3','0','183'],
['4','0','184'],
['5','0','185'],
['6','0','186'],
['7','0','187'],
['18','0','198']]
print(matrix[?][?][value])
your matrix here is 9 * 3
if you want the 185, it's in the 6th row 3rd column, so indexes are 5 and 2 respectively.
matrix[5][2] will print the result, idk why you have a 3rd bracket.
basically to access an element you will do [rowNumber][colNumber] , first brackets will give you whatever is in that position of the big array (a 2 d array is just an array of arrays) so you get an array (1D with 3 element) you then put the index of the element in that 1D array.
Related
I have read in a numpy.ndarray that looks like this:
[[1.4600e-01 2.9575e+00 6.1580e+02]
[5.8600e-01 4.5070e+00 8.7480e+02]]
Let's assume that this array I am reading will not always have a length of 2. (e.g. It could have a length of 1,3, 456, etc.)
I would like to separate this to two separate arrays that look like this:
[[1.4600e-01 2.9575e+00 6.1580e+02]]
[[5.8600e-01 4.5070e+00 8.7480e+02]]
I previously tried searching a solution to this problem but this is not the solution I am looking for: python convert 2d array to 1d array
Since you want to extract the rows, you can just index them. So suppose your array is stored in the variable x. x[0] will give you the first row: [1.4600e-01 2.9575e+00 6.1580e+02], while x[1] will give you the second row: [5.8600e-01 4.5070e+00 8.7480e+02], etc.
You can also iterate over the rows doing something like:
for row in x:
# Do stuff with the row
If you really want to preserve the outer dimension, you reshape the rows using x[0].reshape((1,-1)) which says to set the first dimension to 1 (meaning it has 1 row) and infer the second dimension from the existing data.
Alternatively if you want to split some number of rows into n groups, you can use the numpy.vsplit() function: https://numpy.org/doc/stable/reference/generated/numpy.vsplit.html#numpy.vsplit
However, I would suggest looping over the rows instead of splitting them up unless you really need to split them up.
I'm trying to rewrite some Julia code into python code. And I just found colptr attached to sparse matrix. I searched it but I still don't understand what it is.
Could someone provide me the information about it and counterpart in python 3? Thank you in advance.
[edit]
this is from Julia's references
struct SparseMatrixCSC{Tv,Ti<:Integer} <: AbstractSparseMatrix{Tv,Ti}
m::Int # Number of rows
n::Int # Number of columns
colptr::Vector{Ti} # Column j is in colptr[j]:(colptr[j+1]-1)
rowval::Vector{Ti} # Row indices of stored values
nzval::Vector{Tv} # Stored values, typically nonzeros
for instance, A.colptr[j] is referring all the elements of j-th column of CSC matrix A?
I tried to figure it out by running some simple code below,
A = sparse([1, 1, 2, 3], [1, 3, 2, 3], [0, 1, 2, 0])
for i=1:4
println(A.colptr[i])
end
and the result was
1
2
3
5
I still have no idea why the result would be like this. explanation says
Ti is the integer type for storing column pointers
You're looking at a compressed sparse column (CSC) representation of a matrix. Instead of, for example, storing all the values of a matrix in sequence in memory, this allows only storing nonzero values. For example, the matrix
5 0 0
6 0 7
1 0 3
0 2 0
can either be stored in memory as the column major sequence 5 6 1 0 0 0 0 2 0 7 3 0 or you could do something smarter.
If you only store the column major sequence of nonzero elements, you end up with a much shorter list: 5 6 1 2 7 3! But now you need a way to map these values back to their locations in the matrix. You need a column index and a row index. Thus, we have two more lists:
The row indices for each stored value can also be stored in a one-to-one fashion: 1 2 3 4 2 3.
Now, I could store the column indices in a similar one-to-one fashion: 1 1 1 2 3 3. Were I to do so, this would be a sparse coordinate (COO) format. But note that there's a lot of redundant information here: look at all those repeated values! The common CSC format compresses this further. I already know I have three columns; I could simply store where each column starts. This is the colptr: it has one value per column and points to where that column starts. Thus instead of storing six values, it only need to store three: column one starts at index one (of course). Column two starts at index 4, and column three starts at index 5. It turns out to make life a bit easier if we also store a final fourth value representing one past the end, because then we can describe the nonzero values in a particular column simply by saying that the stored values in column j can be found in nzval[colptr[j]:colptr[j+1]-1].
The typical Python equivalent is in scipy: scipy.sparse.csc_matrix; simply substitutue colptr -> indptr, indices -> rowval, nzval -> data and accommodate the 0-based indexing.
I want to extract the values from two different columns of a pandas dataframe, put them in a list with no duplicate values.
I have tried the following:
arr = df[['column1', 'column2']].values
thelist= []
for ix, iy in np.ndindex(arr.shape):
if arr[ix, iy] not in thelist:
thelist.append(edges[ix, iy])
This works but it is taking too long. The dataframe contains around 30 million rows.
Example:
column1 column2
1 adr1 adr2
2 adr1 adr2
3 adr3 adr4
4 adr4 adr5
Should generate the list with the values:
[adr1, adr2, adr3, adr4, adr5]
Can you please help me find a more efficient way of doing this, considering that the dataframe contains 30 million rows.
#ALollz gave a right answer. I'll extend from there. To convert into list as expected just use list(np.unique(df.values))
You can use just np.unique(df) (maybe this is the shortest version).
Formally, the first parameter of np.unique should be an array_like object,
but as I checked, you can also pass just a DataFrame.
Of course, if you want just plain list not a ndarray, write
np.unique(df).tolist().
Edit following your comment
If you want the list unique but in the order of appearance, write:
pd.DataFrame(df.values.reshape(-1,1))[0].drop_duplicates().tolist()
Operation order:
reshape changes the source array into a single column.
Then a DataFrame is created, with default column name = 0.
Then [0] takes just this (the only) column.
drop_duplicates acts exactly what the name says.
And the last step: tolist converts to a plain list.
Python
I have a list with 5 elements. Each element has 2042 rows and 2 columns(2042,2).The variable type is numpy.ndarray. I want to call on the first element (data[0]) and the first 1440 rows in the second column. So the format right now is data[0][0,1], which gives me the value in the first row and second column for the first element. How do I write a line to give me the first 1440 rows?
Something like data[0][0:1439,1], which I know doesn't work--but to that effect.
If it's a numpy array, then
data[0][:1440,1]
Should do the job.
Well, its difficult to assume what has been your approach.
But taking that:
data = [[[i for i in range(2042)] for i in range(2)] for i in range(5)]
The best aproximation to obtain the first 1440 rows in second column, of first element is:
data[0][1][:1440]
Regards!
I am having some problems assigning fields to an array using the view method. Apparently, there doesn't seem to be a control of how you want to assign the field.
a=array([[1,2],[1,2],[1,2]]) # 3x2 matrix
#array([[1, 2],
# [1, 2],
# [1, 2]])
aa=a.transpose() # 2x3 matrix
#array([[1, 1, 1],
# [2, 2, 2]])
a.view(dtype='i8,i8') # This works
a.view(dtype='i8,i8,i8') # This returns error ValueError: new type not compatible with array.
aa.view(dtype='i8,i8') # This works
aa.view(dtype='i8,i8,i8') # This returns error ValueError: new type not compatible with array.
In fact, if I create aa from scratch instead of using transpose of a,
b=array([[1,1,1],[2,2,2]])
b.view(dtype='i8 i8') # This returns ValueError again.
b.view(dtype='i8,i8,i8') # This works
Why does this happen? Is there any way I can set the fields to represent rows or columns?
When you create a standard array in NumPy, some contiguous blocks of memory are occupied by the data. The size of each block depends on the dtype, the number and organization of these blocks by the shape of your array. Structured arrays follow the same pattern, except that each block is now composed of several sub-blocks, each sub-block occupying some space as defined by the corresponding dtype of the field.
In your example, you define a (3,2) array of ints (a). That's 2 int blocks for the first row, followed by 2 other blocks for the second and then 2 last blocks for the first. If you want to transform it into a structured array, you can either keep the original layout (each block becomes a unique field (a.view(dtype=[('f0', int)]), or transform your 2-block rows into rows of 1 larger block consisting of 2 sub-blocks, each sub-block having a int size.
That's what happen when you do a.view(dtype=[('f0',int),('f1',int)]).
You can't make larger blocks (ie, dtype="i8,i8,i8"), as the corresponding information would be spread across different rows.
Now, you can display your array in a different way, for example display it column by column: that's what happen when you do a .transpose of your array. It's only display, though, ('views' in the NumPy lingo), that doesn't change the original memory layout. So, your aa example, the original layout is still "3 rows of 2 integers", that you can represent as "3 rows of one block of 2 integers".
In your second example, b=array([[1,1,1],[2,2,2]]), you have a different layout: 2 rows of 3 int blocks. You can group the 3 int blocks into one larger block (dtype="i8,i8,i8") because you're not going over a row. You can't group it two by two, because you would have an extra block on each row.
You can transform a (N,M) standard array into only (1) a N structured array of M fields or (2) a NxM structured array of 1 field and that's it. The (N,M) is the shape given to the array at its creation. You can display your array as a (M,N) array by a transposition, but that doesn't modify the original memory layout.
when you specify the view as b.view(dtype='i8, i8') you are asking numpy to reinterpret the values as set of tuples with two values in them but this simply isn't feasible since we have 3 values which isn't a multiple of two, its like reshaping the matrix where it would generate a new matrix of different size, numpy doesn't like such things.