Is there a simple way of flattening an xarray dataset into a single 1D numpy array?
For example, flattening the following test dataset:
xr.Dataset({
'a' : xr.DataArray(
data=[10,11,12,13,14],
coords={'x':[0,1,2,3,4]},
dims={'x':5}
),
'b' : xr.DataArray(data=1,coords={'y':0}),
'c' : xr.DataArray(data=2,coords={'y':0}),
'd' : xr.DataArray(data=3,coords={'y':0})
})
to
[10,11,12,13,14,1,2,3]
?
If you're OK with repeated values, you can use .to_array() and then flatten the values in NumPy, e.g.,
>>> ds.to_array().values.ravel()
array([10, 11, 12, 13, 14, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3,
3, 3, 3])
If you don't want repeated values, then you'll need to write something yourself, e.g.,
>>> np.concatenate([v.values.ravel() for v in ds.data_vars.values()])
array([10, 11, 12, 13, 14, 1, 2, 3])
More generally, this sounds somewhat similar to a proposed interface for "stacking" data variables in 2D for machine learning applications: https://github.com/pydata/xarray/issues/1317
As of July 2019, xarray now has the functions to_stacked_array and to_unstacked_dataset that perform this function.
Get Dataset from question:
ds = xr.Dataset({
'a' : xr.DataArray(
data=[10,11,12,13,14],
coords={'x':[0,1,2,3,4]},
dims={'x':5}
),
'b' : xr.DataArray(data=1,coords={'y':0}),
'c' : xr.DataArray(data=2,coords={'y':0}),
'd' : xr.DataArray(data=3,coords={'y':0})
})
Get the list of data variables:
variables = ds.data_vars
Use the np.flatten() method to reduce arrays to 1D:
arrays = [ ds[i].values.flatten() for i in variables ]
Then expand list of 1D arrays (as detailed in this answer):
arrays = [i for j in arrays for i in j ]
Now convert this to an array as requested in Q (as currently a list):
array = np.array(arrays)
Related
This is what I made and it doesn't work, I made a for loop and I use it to get the index and use it in another thing why doesn't it work or can I found another method to delete the element and use the index of it.
Here is some of my code
X1_train, X1_test, y1_train, y1_test = train_test_split(EclipseFeautres, EclipseClass, test_size=0.3, random_state=0)
E_critical_class=y1_train.copy()
E_critical_class = E_critical_class[E_critical_class != 1]
for x in range(len(E_critical_class)):
if(E_critical_class[x]==1):
E=np.delete(E_critical_class,x)
Your task is something like filtering of an array.
You want to drop all elements == 1.
Assume that the source array (arr) contains:
array([0, 1, 2, 3, 4, 1, 0, 3, 7, 1])
so it contains 3 elements == 1 (to be dropped).
A much simpler way to do it is to use boolean indexing and save the
result back to the original variable:
arr = arr[arr != 1]
The result is:
array([0, 2, 3, 4, 0, 3, 7])
as you wish - with all ones dropped.
#dizi icinde olan ve kendini tekrarlayan sayiyi delete etme!!!!!!
#to delete the repeated element in the numpy array
import numpy as np
a = np.array([10, 0, 0, 20, 0, 30, 40, 50, 0, 60, 70, 80, 90, 100,0])
print("Original array:")
print(a)
index=np.zeros(0)
print("index=",index)
for i in range(len(a)):
if a[i]==0:
index=np.append(index, i)
print("index=",index)
new_a=np.delete(a,index)
print("new_a=",new_a)
I have an array. Let's say a=array([[10, 2, 13, 55]])
I want to create a function that gives me the 1st element for t=0, the second element for t=1...
I have tried the following:
def a(t):
return a[t]
You can do it like this :
a=array([[10, 2, 13, 55]])
def get_value(t):
return a[t]
get_value(0) #results [10, 2, 13, 55]
Since your example data is 2D , if we want to access each of them we must pass 2 numbers as index.
a=array([[10, 2, 13, 55]])
def get_value(t1,t2):
return a[t1][t2]
get_value(0, 1) #results 2
This function, only works if you have arrays of the shape [[...]], otherwise, you have to change the level parameter.
from numpy import array
a=array([[10, 2, 13, 55]])
def matrix_reader(a,t,level=0):
return a[level][t]
matrix_reader(a,1)
Your example is 2D array so you need 2 parameter to return the correct number that you want.
Example: With your array a=array([[10, 2, 13, 55]]), a[0,0] return 10, a[0,1] return 2.
I recommend you create 1D array, put your array into function and named your function different from your array
from numpy import array
temp=array([10, 2, 13, 55])
def a(arr,t):
return arr[t]
print(a(temp, 2))
The example return 13
I am trying to convert the value of a dictionary to a 1d array using:np.asarray(dict.values()), but when I tried to print the shape of the output array, I have problem.
My array looks like this:
dict_values([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26])
but the output of array.shape is:
()
by which I was expecting (27,1) or (27,)
after I changed the code to np.asarray(dict.values()).flatten(),the output of array.shape became
(1,)
I have read the document of numpy.ndarray.shape, but can't get a hint why the outputs are like these. Can someone explain it to me? Thx
This must be python 3.
From docs
The objects returned by dict.keys(), dict.values() and dict.items()
are view objects. They provide a dynamic view on the dictionary’s
entries, which means that when the dictionary changes, the view
reflects these changes.
The issue is that dict.values() is only returning a dynamic view of the data in dictionary's values, Leading to the behaviour you see.
dict_a = {'1': 1, '2': 2}
res = np.array(dict_a.values())
res.shape #()
res
#Output:
array(dict_values([1, 2]), dtype=object)
Notice that the numpy array isn't resolving the view object into the actual integers, but rather just coercing the view into an array with dtype = object
To avoid this issue, consume the view to get a list, as follows:
dict_a = {'1': 1, '2': 2}
res = np.array(list(dict_a.values()))
res.shape #(2,)
res #array([1, 2])
res.dtype #dtype('int32')
I generate an array with python and then I save it in a txt file. When I recover it and I try to convert it into an array, and I work with it, it gives me the error:
ufunc 'multiply' did not contain a loop with signature matching types
dtype('
This is the code:
import numpy as np
lista=[1,2,3,4,5,6,7,8]
vector=np.array(lista)
print (vector)
lista.append(9)
vector=np.array(lista)
print (vector)
archivo= open('datos.txt','w')
archivo.write('%s'%vector)
archivo=open('datos.txt','r')
dades=archivo.read()
vector2=np.array(dades)
print(vector2)
print(vector2*2)
Can you help me?. Thank
When you read it in dades=archivo.read() you actually get a 19-char string.
In order to turn this into a NumPy array you need to do some processing:
>>> dades_as_ints = list(map(int, dades[1:-1].split()))
>>> vector2 = np.array(dades_as_ints)
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> vector2 * 2
array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])
I would suggest you look at numpy docs savetxt, which will store your array in a human readable format, or numpy.save for efficient store/load.
I need a fast way to keep a running maximum of a numpy array. For example, if my array was:
x = numpy.array([11,12,13,20,19,18,17,18,23,21])
I'd want:
numpy.array([11,12,13,20,20,20,20,20,23,23])
Obviously I could do this with a little loop:
def running_max(x):
result = [x[0]]
for val in x:
if val > result[-1]:
result.append(val)
else:
result.append(result[-1])
return result
But my arrays have hundreds of thousands of entries and I need to call this many times. It seems like there's got to be a numpy trick to remove the loop, but I can't seem to find anything that will work. The alternative will be to write this as a C extension, but it seems like I'd be reinventing the wheel.
numpy.maximum.accumulate works for me.
>>> import numpy
>>> numpy.maximum.accumulate(numpy.array([11,12,13,20,19,18,17,18,23,21]))
array([11, 12, 13, 20, 20, 20, 20, 20, 23, 23])
As suggested, there is scipy.maximum.accumulate:
In [9]: x
Out[9]: [1, 3, 2, 5, 4]
In [10]: scipy.maximum.accumulate(x)
Out[10]: array([1, 3, 3, 5, 5])