In Python, is there a way to generate a 2d array using numpy with random integer entries without specifying either the low or high?
I tried mat = np.random.randint(size=(3, 4)) but it did not work.
Assuming you don't want to specify the min or max values of the array, one can use numpy.random.normal
np.random.normal(mean, standard deviation, (rows,columns))
And then round it with astype(np.int), as
>>> import numpy as np
>>> mat = (np.random.normal(1, 3, (3,4))).astype(np.int)
[[ 0 0 0 -1]
[ 0 5 0 0]
[-5 1 2 2]]
Please note that the output may vary, as the values are random.
If you want to specify the min and max values, there are various ways of doing that, such as
mat = (np.random.random((3,4))*10).astype(np.int) # Random ints between 0 and 10
or
mat = np.random.randint(1,5, size=(3,4)) # Random ints between 1 and 5
And more.
Related
I am using Python and I have a XxY matrix where X=Y and I want to iterate over the upper triangular matrix in a specific way such that it starts with and proceeds with and and so on and so forth until the last row and column. Therefore, I tried to create a double loop which loops over the columns one by one and within that loop I created another loop which loops over the rows always adding one row. However, I got stuck in defining how to add the next row for every column in the second loop. Here is what I got so far (for simplicity I just created an array of zeros):
import pandas as pd
import numpy as np
# number of columns
X = 10
# number or rows
Y = X
U = np.zeros((Y,X))
for j in range(X):
for z in range():
My initial idea was to create an array of Yx1 with y = np.asarray(list(range(0,Y)))and use it for the second loop but I don't understand how to implement it. Can somebody please help me? Is there maybe a simpler way to define such an iteration?
With Numpy, you can get the indices for the upper triangular matrix with triu_indices_from and index into the array with that:
import numpy as np
a = np.arange(16).reshape([4, 4])
print(a)
#[[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]
# [12 13 14 15]]
indices = np.triu_indices_from(a)
upper = a[indices]
print(upper)
# [ 0 1 2 3 5 6 7 10 11 15]
I have a list like this x=[1,2,2,3,1,2,1,1,2,2] where the number is a positive integer that increments by 0 or 1 and sometimes resets to 1, and need to transform it to [1,2,2,3,4,5,6,7,8,8] in an incremental way, where each 1 should be the previous number plus 1 and whatever follows 1 increment accordingly. Is there a simple way to do this via a numpy array etc? I tried using loops but I guess there's a simpler way.
You can use np.add.accumulate():
import numpy as np
x = np.array([1,2,2,3,1,2,1,1,2,2])
x[1:] += np.add.accumulate(x[:-1]*(x[1:]==1))
print(x)
[1 2 2 3 4 5 6 7 8 8]
Suppose I have a 3D numpy array with shape (10, 20, 3), representing an image with 10 rows and 20 columns, where the 3rd dimension contains an array of length 3 of either all zeros, or all ones, for example [0 0 0] or [1 1 1].
What numpy method would be most suitable to convert the 3D array to 2D, where the third dimension has been reduced to a single value of either 0 or 1, depending on whether the array was previously [0 0 0] or [1 1 1]?
The new shape should be (10, 20), where the value of each cell is either 0 or 1. So instead of the third dimension being an array, it becomes a integer.
I have had a look at reshape, and also flatten, however it looks like both of these methods maintain the same total number of 'cells' (i.e. 10 x 20 x 3 = 600). What I want is to reduce one dimension down to a single value so that the total number of cells is now 10 x 20 = 200.
Imagine the following dataset:
X Y
0 2 4
1 5 6
2 3 4
Now, imagine the following tuple of points: ((2,4), (6,5), (1,14))
How can I find the closest point to each row and assign the index of the point to a new column?
For example, since the closest point to the first row is the point with index 0, the first row would become:
X Y Closest_Point
0 2 4 0
Try with scipy , the logic here is broadcast
from scipy.spatial import distance
ary = distance.cdist(df.values, np.array(l), metric='euclidean')
ary.argmin(1)
Out[326]: array([0, 1, 0], dtype=int32)
I would for sure use Numpy to make both the tuple and the dataset into numpy arrays.
For the examples you gave:
import numpy as np
dataset = np.array([[2,4],[5,6],[3,4]])
points = np.array([[2,4],[6,5],[1,14]])
dataset_indexed = []
for i in range(dataset.shape[0]):
temp= (((dataset[i,0]-points[0,0])**2 +(dataset[i,1]-points[0,1])**2)**(1/2))
index=0
for n in range(points.shape[0]):
print(((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2))
if(((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2)<=temp):
temp= ((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2)
index = n
dataset_indexed.append([dataset[i,0],dataset[i,1],index])
Question
Is there a good way to transform a DataFrame with an n-level index into an n-D Numpy array (a.k.a n-tensor)?
Example
Suppose I set up a DataFrame like
from pandas import DataFrame, MultiIndex
index = range(2), range(3)
value = range(2 * 3)
frame = DataFrame(value, columns=['value'],
index=MultiIndex.from_product(index)).drop((1, 0))
print frame
which outputs
value
0 0 0
1 1
2 3
1 1 5
2 6
The index is a 2-level hierarchical index. I can extract a 2-D Numpy array from the data using
print frame.unstack().values
which outputs
[[ 0. 1. 2.]
[ nan 4. 5.]]
How does this generalize to an n-level index?
Playing with unstack(), it seems that it can only be used to massage the 2-D shape of the DataFrame, but not to add an axis.
I cannot use e.g. frame.values.reshape(x, y, z), since this would require that the frame contains exactly x * y * z rows, which cannot be guaranteed. This is what I tried to demonstrate by drop()ing a row in the above example.
Any suggestions are highly appreciated.
Edit. This approach is much more elegant (and two orders of magnitude faster) than the one I gave below.
# create an empty array of NaN of the right dimensions
shape = map(len, frame.index.levels)
arr = np.full(shape, np.nan)
# fill it using Numpy's advanced indexing
arr[frame.index.codes] = frame.values.flat
# ...or in Pandas < 0.24.0, use
# arr[frame.index.labels] = frame.values.flat
Original solution. Given a setup similar to above, but in 3-D,
from pandas import DataFrame, MultiIndex
from itertools import product
index = range(2), range(2), range(2)
value = range(2 * 2 * 2)
frame = DataFrame(value, columns=['value'],
index=MultiIndex.from_product(index)).drop((1, 0, 1))
print(frame)
we have
value
0 0 0 0
1 1
1 0 2
1 3
1 0 0 4
1 0 6
1 7
Now, we proceed using the reshape() route, but with some preprocessing to ensure that the length along each dimension will be consistent.
First, reindex the data frame with the full cartesian product of all dimensions. NaN values will be inserted as needed. This operation can be both slow and consume a lot of memory, depending on the number of dimensions and on the size of the data frame.
levels = map(tuple, frame.index.levels)
index = list(product(*levels))
frame = frame.reindex(index)
print(frame)
which outputs
value
0 0 0 0
1 1
1 0 2
1 3
1 0 0 4
1 NaN
1 0 6
1 7
Now, reshape() will work as intended.
shape = map(len, frame.index.levels)
print(frame.values.reshape(shape))
which outputs
[[[ 0. 1.]
[ 2. 3.]]
[[ 4. nan]
[ 6. 7.]]]
The (rather ugly) one-liner is
frame.reindex(list(product(*map(tuple, frame.index.levels)))).values\
.reshape(map(len, frame.index.levels))
This can be done quite nicely using the Python xarray package which can be found here: http://xarray.pydata.org/en/stable/. It has great integration with Pandas and is quite intuitive once you get to grips with it.
If you have a multiindex series you can call the built-in method multiindex_series.to_xarray() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_xarray.html). This will generate a DataArray object, which is essentially a name-indexed numpy array, using the index values and names as coordinates. Following this you can call .values on the DataArray object to get the underlying numpy array.
If you need your tensor to conform to a set of keys in a specific order, you can also call .reindex(index_name = index_values_in_order) (http://xarray.pydata.org/en/stable/generated/xarray.DataArray.reindex.html) on the DataArray. This can be extremely useful and makes working with the newly generated tensor much easier!