what is the meaning of X[i,] in python [duplicate] - python

I was wondering what the use of the comma was when slicing Python arrays - I have an example that appears to work, but the line that looks weird to me is
p = 20*numpy.log10(numpy.abs(numpy.fft.rfft(data[:2048, 0])))
Now, I know that when slicing an array, the first number is start, the next is end, and the last is step, but what does the comma after the end number designate? Thanks.

It is being used to extract a specific column from a 2D array.
So your example would extract column 0 (the first column) from the first 2048 rows (0 to 2047). Note however that this syntax will only work for numpy arrays and not general python lists.

Empirically - create an array using numpy
m = np.fromfunction(lambda i, j: (i +1)* 10 + j + 1, (9, 4), dtype=int)
which assigns an array like below to m
array(
[[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34],
[41, 42, 43, 44],
[51, 52, 53, 54],
[61, 62, 63, 64],
[71, 72, 73, 74],
[81, 82, 83, 84],
[91, 92, 93, 94]])
Now for the slice
m[:,0]
giving us
array([11, 21, 31, 41, 51, 61, 71, 81, 91])
I may have misinterpreted Khan Academy (so take with grain of salt):
In linear algebra terms, m[:,n] is taking the nth column vector of
the matrix m
See Abhranil's note how this specific interpretation only applies to numpy

It slices with a tuple. What exactly the tuple means depends on the object being sliced. In NumPy arrays, it performs a m-dimensional slice on a n-dimensional array.
>>> class C(object):
... def __getitem__(self, val):
... print val
...
>>> c = C()
>>> c[1:2,3:4]
(slice(1, 2, None), slice(3, 4, None))
>>> c[5:6,7]
(slice(5, 6, None), 7)

Related

How to return every N alternate rows from a pandas dataframe?

Let's say I have a dataframe with 1000 rows. Is there an easy way of slicing the datframe in sucha way that the resulting datframe consisits of alternating N rows?
For example, I want rows 1-100, 200-300, 400-500, ....and so on and skip 100 rows in between and create a new dataframe out of this.
I can do this by storing each individial slice first into a new dataframe and then append at the end, but I was wondering if there a much simpler way to do this.
You can use:
import numpy as np
out = df[np.arange(len(df))%200<100]
for the demo here is an example with 1-10, 20-30, etc.
df = pd.DataFrame(index=range(100))
out = df[np.arange(len(df))%20<10]
out.index
output:
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, # rows 1-10
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, # rows 20-30
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, # rows 30-50
60, 61, 62, 63, 64, 65, 66, 67, 68, 69, # rows 50-70
80, 81, 82, 83, 84, 85, 86, 87, 88, 89],# rows 80-90
dtype='int64')
You can use list comprehension and a simple math operation to select specific rows.
If you don't know, % is the modulo operation in Python, which returns the remainder of a division between two numbers.
The int function, instead, eliminates the decimal digits from a number.
Let df be your dataframe and N be your interval (in your example N=100):
N = 100
df.loc[[i for i in range(df.shape[0]) if int(i/N) % 2 == 0]]
This will return rows with indexes 0-99, 200-299, 400-499, ...

How to speed up Numpy array slicing within a for loop? [duplicate]

This question already has answers here:
Rolling window for 1D arrays in Numpy?
(7 answers)
Closed 1 year ago.
I have an original array, e.g.:
import numpy as np
original = np.array([56, 30, 48, 47, 39, 38, 44, 18, 64, 56, 34, 53, 74, 17, 72, 13, 30, 17, 53])
The desired output is an array made up of a fixed-size window sliding through multiple iterations, something like
[56, 30, 48, 47, 39, 38],
[30, 48, 47, 39, 38, 44],
[48, 47, 39, 38, 44, 18],
[47, 39, 38, 44, 18, 64],
[39, 38, 44, 18, 64, 56],
[38, 44, 18, 64, 56, 34],
[44, 18, 64, 56, 34, 53],
[18, 64, 56, 34, 53, 74],
[64, 56, 34, 53, 74, 17],
[56, 34, 53, 74, 17, 72]
At the moment I'm using
def myfunc():
return np.array([original[i: i+k] for i in range(i_range)])
with parameters i_range = 10 and k = 6, using python's timeit module (10000 iter), I'm getting close to 0.1 seconds. Can this be improved 100x by any chance?
I've also tried Numba but the result wasn't ideal, as it shines better with larger arrays.
NOTE: the arrays used in this post are reduced for demo purpose, actual size of original is at around 500.
As RandomGuy suggested, you can use stride_tricks:
np.lib.stride_tricks.as_strided(original,(i_range,k),(8,8))
For larger arrays (and i_range and k) this is probably the most efficient, as it does not allocate any additional memory, there's a drawback - editing the created array would modify the original array as well, unless you make a copy.
The (8,8) parameter define how many bytes in the memory you advance in each direction, I use 8 as its the original array stride size.
Another option, which works better for smaller arrays:
def myfunc2():
i_s = np.arange(i_range).reshape(-1,1)+np.arange(k)
return original[i_s]
This is faster than your original version.
Both, however, are not 100x faster.
Use np.lib.stride_tricks.sliding_window_view

Cannot select rows out of numpy array to perform an std

import numpy as Np
I need to calculate the std on the first three rows of a numPy array I
made with y = Np.random(100, size = (5, 3)).
The above produced the array I am working on. Note that I have since calculated the median of the array after having removed the 2 smallest values in the array with:
y=Np.delete(y, y.argmin())
y=Np.delete(y, y.argmin())
Np.median(y)
When I call y now it no longer is in a square matrix. It comes all on one line like array([48, 90, 67, 26, 53, 16, 19, 64, 51, 47, 54, 91, 36]).
When I try to slice it and calculate an standard deviation (std) I get an IndexError. I think it is because this array is now a tuple.
As other people suggested the question format is not clear. Here what I tried:
import numpy as np
y = np.random.randint(100, size = (5, 3))
y
array([[65, 84, 56],
[90, 44, 42],
[51, 58, 9],
[82, 1, 91],
[96, 32, 24]])
Now to compute std for each row:
y.std(axis=1)
array([11.6714276 , 22.1710522 , 21.63844316, 40.47221269, 32.22145593])
Since you just want the first 3 rows you can slice the result:
result = y.std(axis=1)[:3]
result
array([11.6714276 , 22.1710522 , 21.63844316])
Alternatively you can first select/slice the 1st 3 rows and then use std:
y[:3].std(axis=1)
array([11.6714276 , 22.1710522 , 21.63844316])

New array of smaller size excluding one value from each column

In Python 2.7 using numpy or by any means if I had an array of any size and wanted to excluded certain values and output the new array how would I do that? Here is What I would like
[(1,2,3),
(4,5,6), then exclude [4,2,9] to make the array[(1,5,3),
(7,8,9)] (7,8,6)]
I would always be excluding data the same length as the row length and always only one entry per column. [(1,5,3)] would be another example of data I would want to excluded. So every time I loop the function it reduces the array row size by one. I would imagine I have to use a masked array or convert my mask to a masked array and subtract the two then maybe condense the output but I have no idea how. Thanks for your time.
You can do it very efficiently if you transform your 2-D array in an unraveled 1-D array. Then you repeat the array with the elements to be excluded, called e in order to do an element-wise comparison:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
e = [1, 5, 3]
ar = a.T.ravel()
er = np.repeat(e, a.shape[0])
ans = ar[er != ar].reshape(a.shape[1], a.shape[0]-1).T
But it will work if each element in e only matches one row of a.
EDIT:
as suggested by #Jaime, you can avoid the ravel() and get the same result doing directly:
ans = a.T[(a != e).T].reshape(a.shape[1], a.shape[0]-1).T
To exclude vector e from matrix a:
import numpy as np
a = np.array([(1,2,3), (4,5,6), (7,8,9)])
e = [4,2,9]
print np.array([ [ i for i in a.transpose()[j] if i != e[j] ]
for j in range(len(e)) ]).transpose()
This would take some work to generalize, but here's something that can handle 2-d cases of the kind you describe. If passed unexpected input, this won't notice and will generate strange results, but it's at least a starting point:
def columnwise_compress(a, values):
a_shape = a.shape
a_trans_flat = a.transpose().reshape(-1)
compressed = a_trans_flat[~numpy.in1d(a_trans_flat, values)]
return compressed.reshape(a_shape[:-1] + ((a_shape[0] - 1),)).transpose()
Tested:
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [4, 2, 9])
array([[1, 5, 3],
[7, 8, 6]])
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [1, 5, 3])
array([[4, 2, 6],
[7, 8, 9]])
The difficulty is that you're asking for "compression" of a kind that numpy.compress doesn't do (removing different values for each column or row) and you're asking for compression along columns instead of rows. Compressing along rows is easier because it moves along the natural order of the values in memory; you might consider working with transposed arrays for that reason. If you want to do that, things become a bit simpler:
>>> a = numpy. array([[1, 4, 7],
... [2, 5, 8],
... [3, 6, 9]])
>>> a[~numpy.in1d(a, [4, 2, 9]).reshape(3, 3)].reshape(3, 2)
array([[1, 7],
[5, 8],
[3, 6]])
You'll still need to handle shape parameters intelligently if you do it this way, but it will still be simpler. Also, this assumes there are no duplicates in the original array; if there are, this could generate wrong results. Saullo's excellent answer partially avoids the problem, but any value-based approach isn't guaranteed to work unless you're certain that there aren't duplicate values in the columns.
In the spirit of #SaulloCastro's answer, but handling multiple occurrences of items, you can remove the first occurrence on each column doing the following:
def delete_skew_row(a, b) :
rows, cols = a.shape
row_to_remove = np.argmax(a == b, axis=0)
items_to_remove = np.ravel_multi_index((row_to_remove,
np.arange(cols)),
a.shape, order='F')
ret = np.delete(a.T, items_to_remove)
return np.ascontiguousarray(ret.reshape(cols,rows-1).T)
rows, cols = 5, 10
a = np.random.randint(100, size=(rows, cols))
b = np.random.randint(rows, size=(cols,))
b = a[b, np.arange(cols)]
>>> a
array([[50, 46, 85, 82, 27, 41, 45, 27, 17, 26],
[92, 35, 14, 34, 48, 27, 63, 58, 14, 18],
[90, 91, 39, 19, 90, 29, 67, 52, 68, 69],
[10, 99, 33, 58, 46, 71, 43, 23, 58, 49],
[92, 81, 64, 77, 61, 99, 40, 49, 49, 87]])
>>> b
array([92, 81, 14, 82, 46, 29, 67, 58, 14, 69])
>>> delete_skew_row(a, b)
array([[50, 46, 85, 34, 27, 41, 45, 27, 17, 26],
[90, 35, 39, 19, 48, 27, 63, 52, 68, 18],
[10, 91, 33, 58, 90, 71, 43, 23, 58, 49],
[92, 99, 64, 77, 61, 99, 40, 49, 49, 87]])

Python, neighbors on a regular grid

Let's suppose I have a set of 2D coordinates that represent the centers of cells of a 2D regular mesh. I would like to find, for each cell in the grid, the two closest neighbors in each direction.
The problem is quite straightforward if one assigns to each cell and index defined as follows:
idx_cell = idx+N*idy
where N is the total number of cells in the grid, idx=x/dx and idy=y/dx, with x and y being the x-coordinate and the y-coordinate of a cell and dx its size.
For example, the neighboring cells for a cell with idx_cell=5 are the cells with idx_cell equal to 4,6 (for the x-axis) and 5+N,5-N (for the y-axis).
The problem that I have is that my implementation of the algorithm is quite slow for large (N>1e6) data sets.
For instance, to get the neighbors of the x-axis I do
[x[(idx_cell==idx_cell[i]-1)|(idx_cell==idx_cell[i]+1)] for i in cells]
Do you think there's a fastest way to implement this algorithm?
You are basically reinventing the indexing scheme of a multidimensional array. It is relatively easy to code, but you can use the two functions unravel_index and ravel_multi_index to your advantage here.
If your grid is of M rows and N columns, to get the idx and idy of a single item you could do:
>>> M, N = 12, 10
>>> np.unravel_index(4, dims=(M, N))
(0, 4)
This also works if, instead of a single index, you provide an array of indices:
>>> np.unravel_index([15, 28, 32, 97], dims=(M, N))
(array([1, 2, 3, 9], dtype=int64), array([5, 8, 2, 7], dtype=int64))
So if cells has the indices of several cells you want to find neighbors to:
>>> cells = np.array([15, 28, 32, 44, 87])
You can get their neighbors as:
>>> idy, idx = np.unravel_index(cells, dims=(M, N))
>>> neigh_idx = np.vstack((idx-1, idx+1, idx, idx))
>>> neigh_idy = np.vstack((idy, idy, idy-1, idy+1))
>>> np.ravel_multi_index((neigh_idy, neigh_idx), dims=(M,N))
array([[14, 27, 31, 43, 86],
[16, 29, 33, 45, 88],
[ 5, 18, 22, 34, 77],
[25, 38, 42, 54, 97]], dtype=int64)
Or, if you prefer it like that:
>>> np.ravel_multi_index((neigh_idy, neigh_idx), dims=(M,N)).T
array([[14, 16, 5, 25],
[27, 29, 18, 38],
[31, 33, 22, 42],
[43, 45, 34, 54],
[86, 88, 77, 97]], dtype=int64)
The nicest thing about going this way is that ravel_multi_index has a mode keyword argument you can use to handle items on the edges of your lattice, see the docs.

Categories

Resources