Group datafrime by time slots in Pandas python

Group datafrime by time slots in Pandas python - python

i'm working with a dataset that comes from the data sent by underground sensors stations, which provide an estimate of the flow of the cars going through them.
My data are grouped by hour for each sensor on the same period of time,
this is how the df looks like:
I thought to find some trends of the flow in various time slots( like morning, afternoon, evening, night)
My question is:
there's a way to group the data for each station_id in time slots?
For example group the data of each station from 00:00 to 06:00, from 06:00 to 12:00 and so on, and for every subgroup calculate the mean of the flow value.
Concerning the time i'm interested in keeping for each time slot only the day and the month
I've read the datetime's documentation and tried with some methods but unsuccessfully
I'll appreciate everyone who'll reply and help me with any tip.

create the bins and group by them:
df = pd.read_csv('readings_by_hour.csv')
df['time'] = pd.to_datetime(df['time'])
df['time_bins'] = df['time'].dt.floor('6h')
df.groupby(['station_id', 'time_bins'])['flow'].mean()

Related

remove/isolate days when there is no change (in pandas)

I have annual hourly energy data for two AC systems for two hotel rooms. I want to figure out when the rooms were occupied or not by isolating/removing the days when the ac was not used for 24 hours.
I did df[df.Meter2334Diff > 0.1] for one room which gives me all the hours when AC was turned on, however it removes the hours of the days when the room was most likely occupied and the AC was turned off. This is where my knowledge stops. I therefore enquire the assistance from the oracles of the internet.
my dataframe above
results after df[df.Meter2334Diff > 0.1]

If I've interpreted your question correctly, you want to extract all the days from the dataframe where the Meter2334Diff value was zero?
As your data is currently has a frequency of every hour, we can resample it in pandas using the resample() function. To resample() we can pass the freq parameter which tells pandas at what time interval to aggregate the data. There are lots of options (see the docs) but in your case we can set freq='D' to group by day.
Then we can calculate the sum of that day for the Meter2334Diff column. If we then filter out the days that have a value == 0 (obviously without knowledge of your dataset etc I don't know whether 0 is the correct value).
total_daily_meter_diff = df.resample('D')['Meter2334Diff'].sum()
days_less_than_cutoff = total_daily_meter_diff.query('MeterDiff2334 == 0')
We can then use these days to filter in the original dataset:
df.loc[df.index.floor('D').isin(days_less_than_cutoff) , :]

Plotting categorical data over time in Python

I have a data set as follows:
[Time of notification], [Station], [Category]
2019-02-04 19.36:22, Location A, Alert
2019-02-04 20.06:35, Location B, Request
2019-02-05 07.04:53, Location A, Incident
Time of notification is in datetime64[ns] format. The time span is one year.
I am trying to get the following line graphs:
One per station
Time on x axis. Preferably: Accumulated for days of the week and hours (e.g. all Mondays, Tuesdays etc together, so that a daily/weekly trend over the whole year becomes visible).
Number of notifications (for that station) on the y axis. Category is irrelevant.
I have tried a lot, but I am new to time series and to visualization, and I am getting nowhere after hours of trying. I have been trying with plt.subplots, value_counts etcetera. Also tried making this graph for one station first, but even that didn't work out.
Can anyone help?
Thank you!

Extract future timeseries data and join on past timeseries that are 12 hours apart?

I am in a data science course and my instructor isn't very strong in python.
Use a shift function to pull prices by 12 hours (aligning prices 12 hours in the future with a row's current prices). Then create a new column populated with this info.
So I should have my index, column 1, and newcolumn
I have tried a few different ways. I have tried extracting the 12 hours into a list and merging, I have tried using .slice, and I have tried creating a function.
https://imgur.com/a/AYaM1Ye

This seemed to work
slice= currency [currency.index.min():currency.index.max()]
#Move the datetime values forward an hour
shifted = slice.shift(periods=1, freq='12H')

Calculating and plotting a 20 year Climatology

I am working on plotting a 20 year climatology and have had issues with averaging.
My data is hourly data since December 1999 in CSV format. I used an API to get the data and currently have it in a pandas data frame. I was able to split up hours, days, etc like this:
dfROVC1['Month'] = dfROVC1['time'].apply(lambda cell: int(cell[5:7]))
dfROVC1['Day'] = dfROVC1['time'].apply(lambda cell: int(cell[8:9]))
dfROVC1['Year'] = dfROVC1['time'].apply(lambda cell: int(cell[0:4]))
dfROVC1['Hour'] = dfROVC1['time'].apply(lambda cell: int(cell[11:13]))
So I averaged all the days using:
z=dfROVC1.groupby([dfROVC1.index.day,dfROVC1.index.month]).mean()
That worked, but I realized I should take the average of the mins and average of the maxes of all my data. I have been having a hard time figuring all of this out.
I want my plot to look like this:
Monthly Average Section
but I can't figure out how to make it work.
I am currently using Jupyter Notebook with Python 3.
Any help would be appreciated.

Is there a reason you didn't just use datetime to convert your time column?
The minimums by month would be:
z=dfROVC1.groupby(['Year','Month']).min()

Take maximum rainfall value for each season over a time period (xarray)

I'm trying to find the maximum rainfall value for each season (DJF, MAM, JJA, SON) over a 10 year period. I am using netcdf data and xarray to try and do this. The data consists of rainfall (recorded every 3 hours), lat, and lon data. Right now I have the following code:
ds.groupby('time.season).max('time')
However, when I do it this way the output has a shape of (4,145,192) indicating that it's taking the maximum value for each season over the entire period. I would like the maximum for each individual season every year. In other words, output should have something with a shape like (40,145,192) (4 values for each year x 10 years)
I've looked into trying to do this with DataSet.resample as well using time=3M as the frequency, but then it doesn't split the months up correctly. If I have to I can alter the dataset, so it starts in the correct place, but I was hoping there would be an easier way considering there's already a function to group it correctly.
Thanks and let me know if you need anymore details!

Resample is going to be the easiest tool for this job. You are close with the time frequency but you probably want to use the quarterly frequency with an offset:
ds.resample(time='QS-Mar').max('time')
These offsets can be further configured as described in the Pandas documentation: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Group datafrime by time slots in Pandas python - python

create the bins and group by them: df = pd.read_csv('readings_by_hour.csv') df['time'] = pd.to_datetime(df['time']) df['time_bins'] = df['time'].dt.floor('6h') df.groupby(['station_id', 'time_bins'])['flow'].mean()

Related

remove/isolate days when there is no change (in pandas)

Plotting categorical data over time in Python

Extract future timeseries data and join on past timeseries that are 12 hours apart?

Calculating and plotting a 20 year Climatology

Take maximum rainfall value for each season over a time period (xarray)

Categories

Resources