how to extract values based upon month in xarray - python

I have an array of dimensions (9131,101,191). The first dimension is the days from 1/1/2075 till 31/12/2099. I want to extract all the days which are in the month of July. How can I do this in xarray? I have tried using loops and numpy but not getting the desired result. Ultimately, I want to extract all the arrays which are falling in July and find the mean.
Here is the array, its name is initialize_c3 and its shape is (9131,101,191).
import xarray as xr
arr_c3 = xr.DataArray(initialize_c3,
dims=("time", "lat", "lon"),
coords={"time": pd.date_range("2075-01-01", periods=9131, freq="D"),"lat": list(range(1, 102)),"lon": list(range(1, 192)),
},)
I have tried to groupby according to months.
try = arr_c3.groupby(arr_c3.time.dt.month)
After this the shape of try is (755,1,1) but want the dimensions of try to be (755,101,191). What I am doing wrong?

You can use groupby() to calculate the monthly climatology. Then use sel() to select the monthly mean for July:
ds.groupby('time.month').mean().sel(month=7)
Another way that avoids calculating the means for the other months is to first filter all days in July:
ds.sel(time=(ds.time.dt.month == 7)).mean('time')

Related

how to combine 4D xarray data

I have a 4D xarray which contains time, lev, lat, and lon. The data is for specific day so that the length of time is 1. My goal is to use 4D xarray with same attributess but include a month data so that the time length will be 30.
I try to google it but cannot find useful information. I appreciate it if anyone can provide some insights.
If you have multiple points in a time series, you can use xr.DataArray.resample to change the frequency of a datetime dimension. Once you have resampled, you'll get a DataArrayResample object, to which you can apply any of the methods listed in the DataArrayResample API docs.
If you only have a single point in time, you can't resample to a higher frequency. Your best bet is probably to simply select and drop the time dim altogether, then use expand_dims to expand the dimensions again to include the full time dim you want. Just be careful because this overwrites the time dimension's values with whatever you want, regardless of what was in there before:
target_dates = pd.date_range('2018-08-01', '2018-08-30', freq='D')
daily = (
da
.isel(time=0, drop=True)
.expand_dims(time=target_dates)
)

Moving x rows up in a dataframe indexed on dates

I have a dataframe that has Date as its index. The dataframe has stock market related data so the dates are not continuous. If I want to move lets say 120 rows up in the dataframe, how do I do that. For example:
If I want to get the data starting from 120 trading days before the start of yr 2018, how do I do that below:
df['2018-01-01':'2019-12-31']
Thanks
Try this:
df[df.columns[df.columns.get_loc('2018-01-01'):df.columns.get_loc('2019-12-31')]]
Get location of both Columns in the column array and index them to get the desired.
UPDATE :
Based on your requirement, make some little modifications of above.
Yearly Indexing
>>> df[df.columns[(df.columns.get_loc('2018')).start:(df.columns.get_loc('2019')).stop]]
Above df.columns.get_loc('2018') will give out numpy slice object and will give us index of first element of 2018 (which we index using .start attribute of slice) and similarly we do for index of last element of 2019.
Monthly Indexing
Now consider you want data for First 6 months of 2018 (without knowing what is the first day), the same can be done using:
>>> df[df.columns[(df.columns.get_loc('2018-01')).start:(df.columns.get_loc('2018-06')).stop]]
As you can see above we have indexed the first 6 months of 2018 using the same logic.
Assuming you are using pandas and the dataframe is sorted by dates, a very simple way would be:
initial_date = '2018-01-01'
initial_date_index = df.loc[df['dates']==initial_date].index[0]
offset=120
start_index = initial_date_index-offset
new_df = df.loc[start_index:]

Pandas Python Dataframe

I have a dataset with YYYY-MM as data, however I want to find the mean of the temperature for the year, therefore I need to add up the 12 months in a year, and find the summary. How do I do that using Pandas?
An example of my data: (I have more than a year dataset, tried to reshape them, but it doesn't seem to work)
Ket us do string slice then groupby + sum
s=df.groupby(df['month'].str[:4]).sum()

Take maximum rainfall value for each season over a time period (xarray)

I'm trying to find the maximum rainfall value for each season (DJF, MAM, JJA, SON) over a 10 year period. I am using netcdf data and xarray to try and do this. The data consists of rainfall (recorded every 3 hours), lat, and lon data. Right now I have the following code:
ds.groupby('time.season).max('time')
However, when I do it this way the output has a shape of (4,145,192) indicating that it's taking the maximum value for each season over the entire period. I would like the maximum for each individual season every year. In other words, output should have something with a shape like (40,145,192) (4 values for each year x 10 years)
I've looked into trying to do this with DataSet.resample as well using time=3M as the frequency, but then it doesn't split the months up correctly. If I have to I can alter the dataset, so it starts in the correct place, but I was hoping there would be an easier way considering there's already a function to group it correctly.
Thanks and let me know if you need anymore details!
Resample is going to be the easiest tool for this job. You are close with the time frequency but you probably want to use the quarterly frequency with an offset:
ds.resample(time='QS-Mar').max('time')
These offsets can be further configured as described in the Pandas documentation: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

Using a function to do a %change on a dataset

I am on python using pandas but running into this issue. I am having a dataset that has the countries on the columns and dates(my months) on the rows. The data consists of the population of an item.
I am required to calculate the % change of population month by month is there a function that I can use to get the data into a dataset with the %change month by month in the format attached?
I am trying to do the apply a function onto the dataset but getting the function to retrieve the previous month's population to do a % change is an issue.
Anyone has any good ideas to get this done? Thanks
You can use pct_change:
df.pct_change()
First order the data by month (if it isn't already), and then use the .shift() function for pandas dataframes
df['pct_change'] = (df.US - df.US.shift(1) ) / df.US
.shift() allows you to shift rows up or down depending on the argument.

Categories

Resources