I'm kind of new into Python functions and in general python (xarray for large data sets).
Does exist a function that can calculate the 90th percentile centered on a 15 day window for a long time series of a domain?
ie I have 62 years of daily maximum temperature during the summer season over North America (I merge with cdo the whole days within JJA). I want to calculate the 90th daily percentile of tmax (during the summer season) of the period based on a centered 15-day window (in order to calculate heat waves)
I was wondering if it exists something that can calculate directly this for a long time series? as in np.percentile? I was thinking if perhaps np.percentile has something like np.roll? but not sure how to use it properly
Thanks in advance!
Related
I am a type 1 diabetic and wear a continuous glucose monitor that measures my blood glucose levels every 5 minutes. The company that makes the CGM generates a report with a graph that looks like the figure at the bottom of this post. My goal is to learn how to recreate this graph for myself in a Jupyter notebook.
The data that I have, for example, looks like this:
Timestamp
Glucose Value (mg/dL)
2021-07-11 00:11:25
116.0
2021-07-11 00:16:25
118.0
2021-07-11 00:21:25
121.0
2021-07-11 00:26:24
123.0
2021-07-11 00:31:25
124.0
The graph is using data from a 30 day period and summarizing the distribution of values at each point in time. Is there a name for this type of graph, and how can I create it myself using Pandas/matplotlib/seaborn?
So far, I have tried creating a graph with the IQR split by day which is rather easy - using ploty:
glucose['Day'] = glucose['Timestamp'].dt.day_name()
fig = px.box(glucose, x="Day", y="Glucose Value (mg/dL)",
points="all", color='Day')
fig.show()
But now I am unsure how to easily calculate the IQR for specific time periods and average them.
Thank you so much for your help!
Answering my own question with help from the links that Joe provided in the comments:
I was able to group the dataframe by hour, then use .quantile to generate a new dataframe with rows as hours and columns as 10%, 25%, 50%, 75%, and 90%. From there it was a matter of simple formatting with matplotlib to copy the original one.
grouped = df.groupby([df['Timestamp'].dt.hour])
i = grouped['bgl'].quantile([.1, .25, .5, .75, .9]).unstack()
Thanks a lot Joe!
I am calculating trendlines for stock market, and want to know the angle between 2 lines.
The X-axis is epoch timestamp (in ms) and Y-axis is the price.
The problem is that because epoch ts's number is so high (lets say 1,591,205,309,000 ms) and the price per share can vary from 0.078$ to 10,000$, the scales are not proportional.
I am also a trader, and when I trade I see charts as described in the picture below:
This way, the ploting is probably scaling the axes to fit in some way (compressing X axis and stretching Y axis).
Also, this scaling is generic (whether I am looking at 5 minute chart or 1 day chart), when I draw lines (in the same timeframe), I see it in a comfortable way.
If you will take those lines and plot it on a ts/price graph, you will probably see 2 parallel lines.
I also must keep the line equation in ts because I need to forcast when the trade will be in the future (giving it the ts, and it returns the price where it will be at)
Right now, when calculating this angle I get around 0.0003 degrees, I want to get the degrees of the lines like in the chart above.
I need calculate the solar zenith angle for approximately 106.000.000 of different coordinates. This coordinates are referrals to the pixels from an image projected at Earth Surface after the image had been taken by camera into the airplane.
I am using the pvlib.solarposition.get_position() to calculate the solar zenith angle. The values returned are being calculated correctly (I compared some results with NOOA website) but, how I need calculate the solar zenith angle for many couple of coordinates, the python is spending many days (about 5 days) to finish the execution of the function.
How I am a beginner in programming, I wonder is there is any way to accelerate the solar zenith angle calculation.
Below found the part of the code implemented which calculate the solar zenith angle:
sol_apar_zen = []
for i in range(size3):
solar_position = np.array(pvl.solarposition.get_solarposition(Data_time_index, lat_long[i][0], lat_long[i][1]))
sol_apar_zen.append(solar_position[0][0])
print(len(sol_apar_zen))
Technically, if you need to compute Solar Zenith Angle quickly for a large list (array), there are more efficient algorithms than the PVLIB's one. For example, the one described by Roberto Grena in 2012 (https://doi.org/10.1016/j.solener.2012.01.024).
I found a suitable implementation here: https://github.com/david-salac/Fast-SZA-and-SAA-computation (you mind need some tweaks, but it's simple to use it, plus it's also implemented for other languages than Python like C/C++ & Go).
Example of how to use it:
from sza_saa_grena import solar_zenith_and_azimuth_angle
# ...
# A random time series:
time_array = pd.date_range("2020/1/1", periods=87_600, freq="10T", tz="UTC")
sza, saa = solar_zenith_and_azimuth_angle(longitude=-0.12435, # London longitude
latitude=51.48728, # London latitude
time_utc=time_array)
That unit-test (in the project's folder) shows that in the normal latitude range, an error is minimal.
Since your coordinates represent a grid, another option would be to calculate the zenith angle for a subset of your coordinates, and the do a 2-d interpolation to obtain the remainder. 1 in 100 in both directions would reduce your calculation time by a factor of 10000.
If you want to fasten up this calculation you can use the numba core (if installed)
location.get_solarposition(
datetimes,
method='nrel_numba'
)
Otherwise you have to implement your own calculation based on vectorized numpy arrays. I know it is possible but I am not allowed to share. You can find the formulation if you search for spencer 1971 solar position
I want to compare 2 histograms, that are coming from a evaluation board, which is already binning the counted events in a histogram. I am taking data from 2 channels with different number of events (in fact, one is background only, one is background + signal, a pretty usual experimental setting), and with different number of bins, different bin width and different center position of bins.
The datafile looks like this:
HSlice [CH1]
...
44.660 46.255 6
46.255 47.850 10
47.850 49.445 18
49.445 51.040 8
51.040 52.635 28
52.635 54.230 4
54.230 55.825 18
55.825 57.421 183
57.421 59.016 582
59.016 60.611 1786
...
HSlice [CH2]
...
52.022 53.880 0
53.880 55.738 9
55.738 57.596 213
57.596 59.454 728
59.454 61.312 2944
61.312 63.170 9564
...
The first two columns give the boundaries of the respective bin (that is time) and the last column represents the number of events within this timeframe.
Now I want make a kind of background-reduction, so to say subtract the background-histogram from the "background+signal"-histogram to obtain the time trace of the actual signal. I can not do this line-wise since the histograms are quite different. Is there a simple function in python or an elegant solution how to make the data comparable, (for example by interpolating between two datapoints in one histogram to fit the position of a bin of the other histogram) without messing up the time resolution given by the experiment (neither make it worse than it is, nor pretending a better time resolution).
Thank you,
lepakk
Channel 2 has a bigger bin size than channel 1 (1.858 vs 1.595). So I would transfer the values from the smaller bins into the bigger bins. That will lead to a loss of resolution, but I think thats more honest than transferring from bigger bins into smaller bin and therefore increase the resolution.
Now my approach would be to take all the values from the bins in channel 1 and assign them the point in the center of their time bin. You don't really know exactly where in the bin they were originally measured, so this is the point where you cheat a little bit.
Now fill the values of channel 1 into the bins of channel 2 according to their new time value.
That would be my first approach.
I conducted an experiment in triplicate (so each wavelength has 3 corresponding values). I need to get the total area under the curve (one value) for each wavelength. How would I go about doing this in python using pandas?