How to append multiple data arrays into one varibale of xarray dataset? - python

i am new in this field and i need a small help.
i just want to know that, what is the best way to append multiple data arrays in a variable of a xarray dataset?
each data array has a different time and value but has the same x,y coordinates same as dataset.
i tried ds[variable_name] = da but it works only for the first data array .
i want to create a function that gets data arrays and put them into one variable of the dataset and updating the time dimension of the dataset.
thanks for your help

The best way for doing that is first to convert data arrays to datasets separately then merge datasets together (using xr.merge).
Hope it helps the others.

Related

Fastest Way to Create Large Numpy Arrays?

I'm working on creating a bunch of data (10M rows or more) and am trying to understand more deeply about the best way to work with numpy arrays as quickly as possible.
I'll have 1M rows of each class of data, which I read in from different sources (async). When I'm done reading, I want to combine them into a single numpy array. I'll know the final array is 10M (precisely).
I'm thinking I have the following paths:
Create a global numpy array of the entire size and copy in
Create a global numpy array and a numpy array for each source and concat together at the end
Create a null global numpy array and add each row to the global array (I think this is the slowest)
However, I'm not sure how to do #1 - numpy.copyto seems to always start with index 0.
Is there another model. I should be going with here?
If I use "views", I'm not clear how to copy it to the final array. I'm, of course, familiar with views for DBs, but not for numpy.

Downsample numpy array

I have two numpy arrays. One has features and the other the corresponding labels.
There are 3 classes, but the dataset is imbalanced, so I would like to balance it out by downsample one class. It has like 10k elements, I would like to have it around 2k like the other classes. I tried to do it with a for loop by creating a new array but I am sure there is a more clean method for that.
In the end there should be the two numpy arrays, one with the features but with the elements removed that also got removed in the labels array based on the class.
Any idea? Thanks!

Accessing ndarray columns using a custom label

I've just started learning numpy to analyse experimental data and I am trying to give a custom name to a ndarray column so that I can do operations without slicing.
The data I receive is generally a two column .txt file (I'll call it X and Y for the sake of clarity) with a large number of rows corresponding to the measured data. I do operations on those data and generate new columns (I'll call it F(X,Y), G(X,Y,F), ...). I know I can do column-wise operations by slicing ie Data[:,2]=Data[:,1]+Data[:,0], but with a large number of added columns, this becomes tedious. Hence I'm looking for a way to label the columns so that I can refer to a column by its label, and can also label the new columns I generate. So essentially I'm looking for something that'll allow me to directly write F=X+Y (as a substitute for the example above).
Currently, I'm assigning the entire column to a new variable and doing the operations, and then 'hstack'ing it to the data, but I'm unsure of the memory usage here. For example,
X=Data[:,0]
Y=Data[:,1]
F=X+Y
Data=numpy.hstack((Data,F.reshape(n,1)))
I've seen the use of structured array and record arrays, but the data I'm dealing with is homogeneous and new columns are being added continuously. Also, I hear Pandas is well suited for what I'm describing, but again, since I'm working with numerical data, I don't find the need to learn a new module, unless it's really needed. Thanks in advance.

How do I change the shape of data file that I extracted in Python? (Numpy)

I'm using mapped earth data and was able to subtract monthly data (e.g. Relative Humidity) from model and observational data. I want to subtract observational data from the model data to see the difference between the two.
When I try taking away the observation data from the model data I get: "operands could not be broadcast together with shapes (197,258) (18,36)"`.
Someone suggested that I regrid the data but I'm not sure how to do that. I tried using scipy.interpolate but I'm unsure what function would work best for my situation here.
Thanks!

Python - pandas dataframe or array of dataclass instances for reading in data?

I'm relatively new to data analysis using Python and I'm trying to determine the most practical and useful way to read in my data so that I can index into it and use it in calculations. I have many images in the form of np.arrays that each have a corresponding set of data such as x- and y-coordinates, size, filter number, etc. I just want to make sure each set of data is grouped together with its corresponding image. My first thought was sticking the data in an np.array of dataclass instances (where each element of the array is an instance that contains all my data). My second thought was a pandas dataframe.
My gut is telling me that using a dataframe makes more sense. Do np.arrays store nicely inside dataframes? What are the pros/cons to each method and which would be best if I will need to be pulling data from them often, and I always need to make sure the data can be matched with its corresponding image?
What variables I have to read in: x_coord - float, y_coord - float, filter - int, image - np.ndarray.
I've been trying to stick the image arrays into a pandas dataframe but when indexing into it using .loc it is extremely slow to run the Jupyter Notebook cell. It was also very slow to populate the dataframe using .from_dict(). I'm guessing dataframes weren't meant to hold np.ndarrays?
My biggest concerns are the bookkeeping and ease of indexing - What can I do to always make sure I can retrieve the metadata for the corresponding image? In what form should my data be in so I can easily extract an image and its metadata, or all images with the same filter number, etc.

Categories

Resources