as the title says, I have this list called "list", containing multiple Dataframes (shape 120 x 120) with some numeric data, added from a previous list.
...
df_sum = list_dataframe[0]
for i in range (1, len(list_dataframe)):
df_sum = df_sum.add(list_dataframe[i])
list.append(df_sum)
Let's say that "list" contains 800 dataframes, so every index of this list contains a dataframe. I want to:
create an array with the same length of "list"
take every dataframe in "list", one by one, convert it into a Numpy array (120 x 120, so a matrix)
add every Numpy array (120 x 120) into the array created (800).
So i want to obtain an array (with a length of 800, same of list), where every index contains one of the 800 Numpy array (matrix).
I have already used .to_numpy() function applied to the list with a for loop,
for i in range(len(list)):
list[i] = list[i].to_numpy()
but it generates a strange structure, like an array of array of array where the second one contains only one element, that is the dataframe converted into an array:
>>> list
>>>[array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]),
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
How can I do that?
You're on the right track. If you call np.array on your resulting list, it will create one large array that has shape (800, 120, 120). An example using a list comprehension instead of a for-loop:
import numpy as np
import pandas as pd
my_list = [pd.DataFrame(np.random.randint(10, size=(120, 120))) for _ in range(800)]
out = np.array([df.to_numpy() for df in my_list])
>>> out.shape
(800, 120, 120)
Related
Lets say I have the following NumPy array matrix:
import numpy as np
A = np.zeros((4,4))
A
Out[7]:
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Next, say I have the following index lists:
row_indexes = [0,2,3]
column_indexes = [0,2,1]
and a list of corresponding values:
values = [10, 20, 30]
My question is: How can I insert the list of values into the matrix A as efficiently as possible (computation time is relevant) at the row/column index locations specified by row_indexes and column_indexes so that after this operation, A would equal:
array([[10., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 20., 0.],
[ 0., 30., 0., 0.]])
I have a 2D numpy array, 'construct' (17 rows, 1531900 columns). I have created a second numpy array of zeros, 'new_var' (17 rows, 8928000 columns). I would like to insert my 'construct' array into my 'new_var' array with each row going into its respective rows and each column value from construct going into new_var based on the indices given by a third array idx_arr (1 row, 1531900 columns).
I.e:
new_var:
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
idx_arr:
array([[1695453, 1695608, 1695763, ..., 7200121, 7200276, 7201360]])
construct[0, :]:
array([-0.63766944, -0.653992 , -0.5967345 , ..., -0.7344175 ,
-0.7344163 , -0.7344165 ], dtype=float32)
construct[1, :]
array([0.05108674, 0.01683133, 0.07986307, ..., -0.9598859 ,
-0.959886 , -0.9598871 ], dtype=float32)
For example, the elements in construct[0,0] (-0.63766944) and construct[0,1] (-0.653992) would go into new_var at indices 0, 1695453 and 0, 1695608 respectively. Elements in construct[1, 0] and construct[1, 1] would go into new_var at indices 1, 1695453 and 1, 1695608 respectively.
Thanks!
I have a 60000 by 200 numpy array. I want to make it 60000 by 201 by adding a column of 1's to the right (so every row is [prev, 1]).
Concatenate with axis = 1 doesn't work because it seems like concatenate requires all input arrays to have the same dimension.
How should I do this?
Let me just throw in a very simple example with much smaller size. The principle should be the same.
a = np.zeros((6,2))
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
b = np.ones((6,1))
array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]])
np.hstack((a,b))
array([[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.]])
Using numpy index trick to append a 1D vector to a 2D array
a = np.zeros((6,2))
# array([[ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.]])
b = np.ones(6) # or np.ones((6,1))
#array([1., 1., 1., 1., 1., 1.])
np.c_[a,b]
# array([[0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.]])
Under cover all the stack variants (including append and insert) end up doing a concatenate. They just precede it with some sort of array reshape.
In [60]: A = np.arange(12).reshape(3,4)
In [61]: np.concatenate([A, np.ones((A.shape[0],1),dtype=A.dtype)], axis=1)
Out[61]:
array([[ 0, 1, 2, 3, 1],
[ 4, 5, 6, 7, 1],
[ 8, 9, 10, 11, 1]])
Here I made a (3,1) array of 1s, to match the (3,4) array. If I wanted to add a new row, I'd make a (1,4) array.
While the variations are handy, if you are learning, you should become familiar with concatenate and the various ways of constructing arrays that match in number of dimensions and necessary shapes.
The first thing to think about is that numpy arrays are really not meant to change size. So you should ask yourself, can you create your original matrix as 60k x 201 and then fill the last column afterwards. This is usually best.
If you really must do this, see
How to add column to numpy array
I think the numpy method column_stack is more interesting because you do not need to create a column numpy array to stack it in the matrix of interest. With the column_stack you just need to create a normal numpy array.
I want to initialize an empty vector with 3 columns that I can add to. I need to perform some l2 norm distance calculations on the rows after I have added to it, and I'm having the following problem.
I start with an initial empty array:
accepted_clusters = np.array([])
Then I add my first 1x3 set of values to this:
accepted_clusters = np.append(accepted_clusters, X_1)
returning:
[ 0.47843416 0.50829221 0.51484499]
Then I add a second set of 1x3 values in the same way, and I get the following:
[ 0.47843416 0.50829221 0.51484499 0.89505277 0.8359252 0.21434642]
However, what I want is something like this:
[ 0.47843416 0.50829221 0.51484499]
[ 0.89505277 0.8359252 0.21434642]
.. and so on
This would enable me to calculate distances between the rows. Ideally, the initial empty vector would be of undefined length, but something like a 10x3 of zeros would also work if the code for that is easy.
The most straightforward way is to use np.vstack:
In [9]: arr = np.array([1,2,3])
In [10]: x = np.arange(20, 23)
In [11]: arr = np.vstack([arr, x])
In [12]: arr
Out[12]:
array([[ 1, 2, 3],
[20, 21, 22]])
Note, your entire approach has major code smell, doing the above in a loop will give you quadratic complexity. Perhaps you should work with a list and then convert to an array at the end (which will at least be linear-time). Or maybe rethink your approach entirely.
Or, as you imply, you could pre-allocate your array:
In [18]: result = np.zeros((10, 3))
In [19]: result
Out[19]:
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
In [20]: result[0] = x
In [21]: result
Out[21]:
array([[ 20., 21., 22.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
You can try using vstack to add rows.
accepted_clusters=np.vstack([accepted_clusters,(0.89505277, 0.8359252, 0.21434642)])
I wish to be able to extract a row or a column from a 2D array in Python such that it preserves the 2D shape and can be used for matrix multiplication. However, I cannot find in the documentation how can this best be done. For example, I can use
a = np.zeros(shape=(6,6))
to create an array, but a[:,0] will have the shape of (6,), and I cannot multiply this by a matrix of shape (6,1). Do I need to reshape a row or a column of an array into a matrix for every matrix multiplication, or are there other ways to do matrix multiplication?
You could use np.matrix directly:
>>> a = np.zeros(shape=(6,6))
>>> ma = np.matrix(a)
>>> ma
matrix([[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
>>> ma[0,:]
matrix([[ 0., 0., 0., 0., 0., 0.]])
or you could add the dimension with np.newaxis
>>> a[0,:][np.newaxis, :]
array([[ 0., 0., 0., 0., 0., 0.]])