I have a really big numpy array(145000 rows * 550 cols). And I wanted to create rolling slices within subarrays. I tried to implement it with a function. The function lagged_vals behaves as expected but np.lib.stride_tricks does not behave the way I want it to -
def lagged_vals(series,l):
# Garbage implementation but still right
return np.concatenate([[x[i:i+l] for i in range(x.shape[0]) if i+l <= x.shape[0]] for x in series]
,axis = 0)
# Sample 2D numpy array
something = np.array([[1,2,2,3],[2,2,3,3]])
lagged_vals(something,2) # Works as expected
# array([[1, 2],
# [2, 2],
# [2, 3],
# [2, 2],
# [2, 3],
# [3, 3]])
np.lib.stride_tricks.as_strided(something,
(something.shape[0]*something.shape[1],2),
(8,8))
# array([[1, 2],
# [2, 2],
# [2, 3],
# [3, 2], <--- across subarray stride, which I do not want
# [2, 2],
# [2, 3],
# [3, 3])
How do I remove that particular row in the np.lib.stride_tricks implementation? And how can I scale this cross array stride removal for a big numpy array ?
Sure, that's possible with np.lib.stride_tricks.as_strided. Here's one way -
from numpy.lib.stride_tricks import as_strided
L = 2 # window length
shp = a.shape
strd = a.strides
out_shp = shp[0],shp[1]-L+1,L
out_strd = strd + (strd[1],)
out = as_strided(a, out_shp, out_strd).reshape(-1,L)
Sample input, output -
In [177]: a
Out[177]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [178]: out
Out[178]:
array([[0, 1],
[1, 2],
[2, 3],
[4, 5],
[5, 6],
[6, 7]])
Note that the last step of reshaping forces it to make a copy there. But that's can't be avoided if we need the final output to be a 2D. If we are okay with a 3D output, skip that reshape and thus achieve a view, as shown with the sample case -
In [181]: np.shares_memory(a, out)
Out[181]: False
In [182]: as_strided(a, out_shp, out_strd)
Out[182]:
array([[[0, 1],
[1, 2],
[2, 3]],
[[4, 5],
[5, 6],
[6, 7]]])
In [183]: np.shares_memory(a, as_strided(a, out_shp, out_strd) )
Out[183]: True
Related
Let the 2-dimensional array is as below:
In [1]: a = [[1, 2], [3, 4], [5, 6], [1, 2], [7, 8]]
a = np.array(a)
a, type(a)
Out [1]: (array([[1, 2],
[3, 4],
[5, 6],
[1, 2],
[7, 8]]),
numpy.ndarray)
I have tried to do this procedure:
In [2]: a = a[a != [1, 2])
a = np.reshape(a, (int(a.size/2), 2) # I have to do this since on the first line in In [2] change the dimension to 1 [3, 4, 5, 6, 7, 8] (the initial array is 2-dimensional array)
a
Out[2]: array([[3, 4],
[5, 6],
[7, 8]])
My question is, is there any function in NumPy that can directly do that?
Updated Question
Here's the semi-full source code that I've been working on:
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = pd.DataFrame(data.target)
bucket = df[df['Target'] == 0]
bucket = bucket.iloc[:,[0,1]].values
lp, rp = leftestRightest(bucket)
bucket = np.array([x for x in bucket if list(x) != lp])
bucket = np.array([x for x in bucket if list(x) != rp])
Notes:
leftestRightest(arg) is a function that returns 2 one-dimensional NumPy arrays of size 2 (which are lp and rp). For instances, lp = [1, 3], rp = [2, 4] and the parameter is 2-dimensional NumPy array
There should be a more delicate approach, but here what I have come up with:
np.array([x for x in a if list(x) != [1,2]])
Output
[[3, 4], [5, 6], [7, 8]]
Note that I wouldn't recommend working with list comprehensions in the large array since it would be highly time-consuming.
You're approach is correct, but the mask needs to be single-dimensional:
a[(a != [1, 2]).all(-1)]
Output:
array([[3, 4],
[5, 6],
[7, 8]])
Alternatively, you can collect the elements and infer the dimension with -1:
a[a != [1, 2]].reshape(-1, 2)
the boolean condition creates a 2D array of True/False. You have to apply and operation across the columns to make sure the match is not a partial match. Consider a row [5,2] in your above array, the script you wrote will add 5 and ignore 2 in the resultant 1D array. It can be done as follows:
a[np.all(a != [1, 2],axis=1)]
I have two feature arrays, e.g.
a = [1, 2, 3]
b = [4, 5, 6]
Now I want to combine these arrays in the following way:
[[1, 4], [2, 5], [3, 6]]
The location in the array corresponds to a timestep. I tried appending and then reshaping, but then I get:
[[1, 2], [3, 4], [5, 6]]
you can use np.dstack to stack your lists depth-wise:
>>> np.dstack([a, b])
array([[[1, 4],
[2, 5],
[3, 6]]])
As noted by #BramVanroy, this does add an unwanted dimension. Two ways around that are to squeeze the result, or to use column_stack instead:
np.dstack([a, b]).squeeze()
# or
np.column_stack([a, b])
Both of which return:
array([[1, 4],
[2, 5],
[3, 6]])
As an alternative to sacuL's reply, you can also simply do
>>> np.array(list(zip(a, b)))
array([[1, 4],
[2, 5],
[3, 6]])
In fact, this is closer to the expected result in terms of the number of dimensions (two, rather than three in sacuL's answer which you still need to .squeeze() to achieve the correct result).
I need to slice the same element in 3D numpy array (actually masked array, but works the same). I usually do it with iterations - however current data is so huge and it needs repeating the process on thousands of datasets - it will take weeks (raw estimation). What is the quickest way to slice 3D array without looping through all 2D arrays?
In this simple example I need to slice [1, 0] element in each 2D array which is 3 in all 2D arrays and store them in result array.
NetCDF example (slicing element [500, 400])
import netCDF4
url = "http://eip.ceh.ac.uk/thredds/dodsC/public-chess/PET/aggregation/PETAggregation.ncml"
dataset = netCDF4.Dataset(url)
result = dataset.variables['pet'][:, 500, 400]
myarray SUPERSEDED NOW WITH ABOVE
myarray = np.array([
[[1, 2], [3, 4], [5, 6]],
[[1, 2], [3, 4], [5, 6]],
[[1, 2], [3, 4], [5, 6]],
[[1, 2], [3, 4], [5, 6]],
])
result = []
for i in myarray:
result.append(i[1][0])
result [3, 3, 3, 3]
EDIT
FirefoxMetzger suggested to slice it simply with
result = myarray[:, 1, 0]. However, I'm getting the following error message with this:
RuntimeError: NetCDF: DAP server error
The minimal numpy example you provided can be efficiently sliced using standard slicing mechanisms:
myarray = np.array([
[[1, 2], [3, 4], [5, 6]],
[[1, 2], [3, 4], [5, 6]],
[[1, 2], [3, 4], [5, 6]],
[[1, 2], [3, 4], [5, 6]],
])
result = myarray[:, 1, 0]
The NetCFD seems to come from the resulting slice being too large to be returned from the server, causing a crash. As per your comment, the solution here is to query the server in chunks and aggregate the results locally.
I am trying to use arrays to set values in other arrays. Unfortunately instead of setting a value it is somehow overwriting a bunch of values. What is going on, and how can I achieve what I want?
>>> target = np.array( [ [0,1],[1,2],[2,3] ])
>>> target
array([[0, 1],
[1, 2],
[2, 3]])
>>> actions = np.array([0,0,0])
>>> target[actions] #The first row, 3 times
array([[0, 1],
[0, 1],
[0, 1]])
>>> target[:,actions] #The first column, 3 times
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
>>> values = np.array([7,8,9])
>>> target[:,actions] = values #why isnt this working?
>>> target
array([[9, 1],
[9, 2],
[9, 3]])
#Actually want
#array([[7, 1],
# [8, 2],
# [9, 3]])
>>> target = np.array( [ [0,1],[1,2],[2,3] ]) #reset to original value
>>> actions = np.array([0,1,0])
>>> target[:,actions] = values.reshape(3, 1)
array([[7, 7],
[8, 8],
[9, 9]])
#Actually want
#array([[7, 1],
# [1, 8],
# [9, 3]])
target[:,actions] selects the same column of target thrice.
When you say target[:,actions] = values, what you are doing is:
Assign 7 to all the values in the column, three times.
Assign 8 to all the values in the column, three times.
Assign 9 to all the values in the column, three times.
So you end up with 9 in all the values in the column.
If you insist on this awkward triple-writing of data, you can fix it by transposing the write:
target[:,actions] = values.reshape(3, 1)
This will write [7,8,9] to the column, three times. Obviously that's wasteful, and you could do this instead:
target[:,actions[-1]] = values
The effect should be the same, and it saves computation.
2 ways to write [7,8,9] to the first column:
basic indexing (with slice):
In [396]: target[:,0] = [7,8,9] # all rows, 1st column
In [397]: target
Out[397]:
array([[7, 1],
[8, 2],
[9, 3]])
Advanced indexing (with 2 lists)
In [398]: target[[0,1,2],[0,0,0]] = [7,8,9] # pair [0,0],[1,0],[2,0]
In [399]: target
Out[399]:
array([[7, 1],
[8, 2],
[9, 3]])
The 2nd method also works for a mix of columns:
In [400]: target = np.array( [ [0,1],[1,2],[2,3] ])
In [401]: target[[0,1,2],[0,1,0]] = [7,8,9]
In [402]: target
Out[402]:
array([[7, 1],
[1, 8],
[9, 3]])
Broadcasting comes into play. In a case like this the are 3 potential arrays to broadcast - the 2 dimensions and the source array.
Advanced indexing like this produces a 1d array. So the source array has to match:
In [403]: target[[0,1,2],[0,1,0]]
Out[403]: array([7, 8, 9])
A (1,3) can broadcast to (3,), but a (3,1) can't:
In [404]: target[[0,1,2],[0,1,0]] = np.array([[7,8,9]])
In [405]: target[[0,1,2],[0,1,0]] = np.array([[7,8,9]]).T
...
ValueError: shape mismatch: value array of shape (3,1) could not be broadcast to indexing result of shape (3,)
This sort of indexing is unusual. Note that the result is (3,3).
In [412]: target[:,[0,0,0]]
Out[412]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
A (3,1) source:
In [413]: np.array([[7,8,9]]).T
Out[413]:
array([[7],
[8],
[9]])
In [414]: target[:,[0,0,0]] = _
In [415]: target
Out[415]:
array([[7, 1],
[8, 2],
[9, 3]])
The (3,1) can broadcast to (3,3). It works, but ends up assigning [7,8,9] 3 times, all to the same 0 column.
Another way of assigning the 1st column:
In [423]: target[np.ix_([0,1,2],[0,0,0])]
Out[423]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
Again a (3,3), with accepts a (3,1):
In [424]: target[np.ix_([0,1,2],[0,0,0])] = np.array([[7,8,9]]).T
In [425]: target
Out[425]:
array([[7, 1],
[8, 2],
[9, 3]])
ix_ makes 2 arrays that can broadcast against each other, in this case a column vector and a row one:
In [426]: np.ix_([0,1,2],[0,0,0])
Out[426]:
(array([[0],
[1],
[2]]), array([[0, 0, 0]]))
I can select all elements of target with:
In [430]: target[np.ix_([0,1,2],[0,1])]
Out[430]:
array([[0, 1],
[1, 2],
[2, 3]])
and in a jumbled order:
In [431]: target[np.ix_([2,0,1],[1,0])]
Out[431]:
array([[3, 2],
[1, 0],
[2, 1]])
I couldn't get it to work using : indexing, however the following is functional by using an array of indices. Not sure why the : method is not working, if someone can come up with a way to fix that I will accept it instead.
>>> target = np.array( [ [0,1],[1,2],[2,3] ])
>>> rows = np.arange(target.shape[0])
>>> actions = np.array([0,1,0])
>>> values = np.array([7,8,9])
>>> target[rows,actions] = values
>>> target
array([[7, 1],
[1, 8],
[9, 3]])
I have an 3D matrix a,like this:
a=np.array([[[1,2],[2,3]],[[3,4],[4,5]]])
[
[[1 2],[2 3]]
[[3 4],[4 5]]
]
a.shape
(2, 2, 2)
Now, I want to add another element, like [[5,6],[6,7]] to this array.
So, new array will be:
[
[[1, 2],[2, 3]]
[[3, 4],[4, 5]]
[[5, 6],[6, 7]]
]
a.shape
(3, 2, 2)
What is the best way to do this?
( I'm working with big datasets so I need the best way)
Use np.vstack to vertically stack after extending the second array to 3D by adding a new axis as its first axis with None/np.newaxis, like so -
np.vstack((a,b[None]))
Sample run -
In [403]: a
Out[403]:
array([[[1, 2],
[2, 3]],
[[3, 4],
[4, 5]]])
In [404]: b
Out[404]:
array([[5, 6],
[6, 7]])
In [405]: np.vstack((a,b[None]))
Out[405]:
array([[[1, 2],
[2, 3]],
[[3, 4],
[4, 5]],
[[5, 6],
[6, 7]]])
You can use np.append to append to matrixes:
a = np.array([[[1,2],[2,3]],[[3,4],[4,5]]])
a = np.append(a, [[[5,6],[6,7]]], axis=0)
Note that I had to add an extra set of brackets around the second part, in order for the dimensions to be correct. Also, you must use an axis or it will all be flattened to a linear array.
Try numpy.append
import numpy as np
a=np.array([[[1,2],[2,3]],[[3,4],[4,5]]])
b=np.array([[3,4],[4,5]])
np.append(a,[b[:,:]],axis=0)