import numpy as Np
I need to calculate the std on the first three rows of a numPy array I
made with y = Np.random(100, size = (5, 3)).
The above produced the array I am working on. Note that I have since calculated the median of the array after having removed the 2 smallest values in the array with:
y=Np.delete(y, y.argmin())
y=Np.delete(y, y.argmin())
Np.median(y)
When I call y now it no longer is in a square matrix. It comes all on one line like array([48, 90, 67, 26, 53, 16, 19, 64, 51, 47, 54, 91, 36]).
When I try to slice it and calculate an standard deviation (std) I get an IndexError. I think it is because this array is now a tuple.
As other people suggested the question format is not clear. Here what I tried:
import numpy as np
y = np.random.randint(100, size = (5, 3))
y
array([[65, 84, 56],
[90, 44, 42],
[51, 58, 9],
[82, 1, 91],
[96, 32, 24]])
Now to compute std for each row:
y.std(axis=1)
array([11.6714276 , 22.1710522 , 21.63844316, 40.47221269, 32.22145593])
Since you just want the first 3 rows you can slice the result:
result = y.std(axis=1)[:3]
result
array([11.6714276 , 22.1710522 , 21.63844316])
Alternatively you can first select/slice the 1st 3 rows and then use std:
y[:3].std(axis=1)
array([11.6714276 , 22.1710522 , 21.63844316])
Related
Let's say I have a dataframe with 1000 rows. Is there an easy way of slicing the datframe in sucha way that the resulting datframe consisits of alternating N rows?
For example, I want rows 1-100, 200-300, 400-500, ....and so on and skip 100 rows in between and create a new dataframe out of this.
I can do this by storing each individial slice first into a new dataframe and then append at the end, but I was wondering if there a much simpler way to do this.
You can use:
import numpy as np
out = df[np.arange(len(df))%200<100]
for the demo here is an example with 1-10, 20-30, etc.
df = pd.DataFrame(index=range(100))
out = df[np.arange(len(df))%20<10]
out.index
output:
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, # rows 1-10
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, # rows 20-30
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, # rows 30-50
60, 61, 62, 63, 64, 65, 66, 67, 68, 69, # rows 50-70
80, 81, 82, 83, 84, 85, 86, 87, 88, 89],# rows 80-90
dtype='int64')
You can use list comprehension and a simple math operation to select specific rows.
If you don't know, % is the modulo operation in Python, which returns the remainder of a division between two numbers.
The int function, instead, eliminates the decimal digits from a number.
Let df be your dataframe and N be your interval (in your example N=100):
N = 100
df.loc[[i for i in range(df.shape[0]) if int(i/N) % 2 == 0]]
This will return rows with indexes 0-99, 200-299, 400-499, ...
I'm writing a script to reduce the number of colors in a list by finding clusters. The problem I seem to run into is that the clusters will have different dimensions. Here is my jumping off point after the original list of 6 colors got already seperated into 3 clusters:
import numpy
a = numpy.array([
[12, 44, 52],
[27, 0, 71],
[81, 99, 92]
])
b = numpy.array([
[ 12, 13, 93],
[128, 128, 128]
])
c = numpy.array([
[ 57, 14, 255]
])
clusters = numpy.array([a,b,c])
print(numpy.min(clusters, axis=1))
However now the function numpy.min() starts to throw an error - I suspect it's because of the differently sized arrays.
The cluster arrays will always have the shape (x, 3) (x number of colors, 3 components). I want to get an array with the minimums of all components of the colors in one cluster (n, 3) (n is number of clusters) - so array([12, 0, 52], [12, 13, 93], [57, 14, 255]) in this case.
Is there a way to do this? As I mentioned it works as long as all clusters have multiple values.
Since your arrays a, b and c don't have an equal shape, you can't put them in the same array (at least if you don't pad with some value). You could calculate the minimum first and then generate an array from these minima:
numpy.array([arr.min(axis=0) for arr in (a, b, c)])
Which gives you:
array([[ 12, 0, 52],
[ 12, 13, 93],
[ 57, 14, 255]])
If I have an array in numpy a which is n x 1. In addition, I have a function F(x,y) which takes in two values and returns a single value. I want to construct an n x n matrix b where b_ij = F(a_i, a_j) (in the array a). Is there any way to do this without looping over both arrays?
Assume that your function is:
def F(a_i, a_j):
return (a_i + a_j) if a_i % 2 == 0 else (a_i + a_j + 1)
To call it on 2 arrays in 1 go, define the vectorized version of
this function:
FF = np.vectorize(F)
Then call it:
result = FF(a, a.T)
As the source array I used:
a = np.array([[1], [5], [10], [50], [80]])
so its shape is (5, 1) (a single-column array) and got:
array([[ 3, 7, 12, 52, 82],
[ 7, 11, 16, 56, 86],
[ 11, 15, 20, 60, 90],
[ 51, 55, 60, 100, 130],
[ 81, 85, 90, 130, 160]])
I was wondering what the use of the comma was when slicing Python arrays - I have an example that appears to work, but the line that looks weird to me is
p = 20*numpy.log10(numpy.abs(numpy.fft.rfft(data[:2048, 0])))
Now, I know that when slicing an array, the first number is start, the next is end, and the last is step, but what does the comma after the end number designate? Thanks.
It is being used to extract a specific column from a 2D array.
So your example would extract column 0 (the first column) from the first 2048 rows (0 to 2047). Note however that this syntax will only work for numpy arrays and not general python lists.
Empirically - create an array using numpy
m = np.fromfunction(lambda i, j: (i +1)* 10 + j + 1, (9, 4), dtype=int)
which assigns an array like below to m
array(
[[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34],
[41, 42, 43, 44],
[51, 52, 53, 54],
[61, 62, 63, 64],
[71, 72, 73, 74],
[81, 82, 83, 84],
[91, 92, 93, 94]])
Now for the slice
m[:,0]
giving us
array([11, 21, 31, 41, 51, 61, 71, 81, 91])
I may have misinterpreted Khan Academy (so take with grain of salt):
In linear algebra terms, m[:,n] is taking the nth column vector of
the matrix m
See Abhranil's note how this specific interpretation only applies to numpy
It slices with a tuple. What exactly the tuple means depends on the object being sliced. In NumPy arrays, it performs a m-dimensional slice on a n-dimensional array.
>>> class C(object):
... def __getitem__(self, val):
... print val
...
>>> c = C()
>>> c[1:2,3:4]
(slice(1, 2, None), slice(3, 4, None))
>>> c[5:6,7]
(slice(5, 6, None), 7)
Let's suppose I have a set of 2D coordinates that represent the centers of cells of a 2D regular mesh. I would like to find, for each cell in the grid, the two closest neighbors in each direction.
The problem is quite straightforward if one assigns to each cell and index defined as follows:
idx_cell = idx+N*idy
where N is the total number of cells in the grid, idx=x/dx and idy=y/dx, with x and y being the x-coordinate and the y-coordinate of a cell and dx its size.
For example, the neighboring cells for a cell with idx_cell=5 are the cells with idx_cell equal to 4,6 (for the x-axis) and 5+N,5-N (for the y-axis).
The problem that I have is that my implementation of the algorithm is quite slow for large (N>1e6) data sets.
For instance, to get the neighbors of the x-axis I do
[x[(idx_cell==idx_cell[i]-1)|(idx_cell==idx_cell[i]+1)] for i in cells]
Do you think there's a fastest way to implement this algorithm?
You are basically reinventing the indexing scheme of a multidimensional array. It is relatively easy to code, but you can use the two functions unravel_index and ravel_multi_index to your advantage here.
If your grid is of M rows and N columns, to get the idx and idy of a single item you could do:
>>> M, N = 12, 10
>>> np.unravel_index(4, dims=(M, N))
(0, 4)
This also works if, instead of a single index, you provide an array of indices:
>>> np.unravel_index([15, 28, 32, 97], dims=(M, N))
(array([1, 2, 3, 9], dtype=int64), array([5, 8, 2, 7], dtype=int64))
So if cells has the indices of several cells you want to find neighbors to:
>>> cells = np.array([15, 28, 32, 44, 87])
You can get their neighbors as:
>>> idy, idx = np.unravel_index(cells, dims=(M, N))
>>> neigh_idx = np.vstack((idx-1, idx+1, idx, idx))
>>> neigh_idy = np.vstack((idy, idy, idy-1, idy+1))
>>> np.ravel_multi_index((neigh_idy, neigh_idx), dims=(M,N))
array([[14, 27, 31, 43, 86],
[16, 29, 33, 45, 88],
[ 5, 18, 22, 34, 77],
[25, 38, 42, 54, 97]], dtype=int64)
Or, if you prefer it like that:
>>> np.ravel_multi_index((neigh_idy, neigh_idx), dims=(M,N)).T
array([[14, 16, 5, 25],
[27, 29, 18, 38],
[31, 33, 22, 42],
[43, 45, 34, 54],
[86, 88, 77, 97]], dtype=int64)
The nicest thing about going this way is that ravel_multi_index has a mode keyword argument you can use to handle items on the edges of your lattice, see the docs.