How to get mean value of every month in such dataframe?

How to get mean value of every month in such dataframe? - python

dataframe
time A100 A101 A102
2017/1/1 0:00
2017/1/1 1:00
2017/1/1 2:00
...
2017/12/31 23:00
I have a dataframe as shown above, which includes 24 hours daily records in 2017. How can I get every month's mean value of every column?

1st convert your time column to datatime in pandas by using to_datetime, then we using groupby
df.time=pd.to_datetime(df.time,format='%Y/%m/%d %H:%M')
GMonth=df.groupby(df.time.dt.strftime('%Y-%m')).mean()

First make sure that the data type for the time column is parsed correctly, use dtypes to verify it.
Next step would be just:
df.resample("M", how='mean')

Related

Sorting dataframe rows by Day of Date wise

I have made my dataframe. But I want to sort it by the date wise..For example, I want data for 02.01.2016 just after 01.01.2016.
df_data_2311 = df_data_231.groupby('Date').agg({'Wind Offshore in [MW]': ['sum']})
df_data_2311 = pd.DataFrame(df_data_2311)
After running this, I got the below output. This dataframe has 2192 rows.
Wind Offshore in [MW]
sum
Date
01.01.2016 5249.75
01.01.2017 12941.75
01.01.2018 19020.00
01.01.2019 13723.00
01.01.2020 17246.25
... ...
31.12.2017 21322.50
31.12.2018 13951.75
31.12.2019 21457.25
31.12.2020 16491.25
31.12.2021 35683.25
Kindly let me know How would I sort this data of the day of the date.

You can use the sort_values function in pandas.
df_data_2311.sort_values(by=["Date"])
However in order to sort them by the Date column you will need reset_index() on your grouped dataframe and then to convert the date values to datetime, you can use pandas.to_datetime.
df_data_2311 = df_data_231.groupby('Date').agg({'Wind Offshore in [MW]': ['sum']}).reset_index()
df_data_2311["Date"] = pandas.to_datetime(df_data_2311["Date"], format="%d.%m.%Y")
df_data_2311 = df_data_2311.sort_values(by=["Date"])
I recommend reviewing the pandas docs.

How to set the time stamp of the data frame to thw first day of the month

I have a few data frames that i am resampling to match each other. I would like to set the timestamps (index) for all the data to be the first days of the month of the dsy the measurements were taken. I cannot find anywhere how to do it, the closest I got was with the resample(period=...), but it leaves me without the day.
The code I tried
df['value'].resample('M',kind = 'period').sum()
It comes like like this whereas I would like the timestamp to have the form of 2018-09-01.

This line is all what you need:
df.index = pd.to_datetime(df.index).strftime('%Y-%m-%d')
# Output
# value
# 2018-09-01 11
# 2018-10-01 12
It transforms your index column to a datetime type column. The first day of the month is automatically inserted. For more details, see the docs.

Get sum of business days in dataframe python with resample

I have a time-series where I want to get the sum the business day values for each week. A snapshot of the dataframe (df) used is shown below. Note that 2017-06-01 is a Friday and hence the days missing represent the weekend
I use resample to group the data by each week, and my aim is to get the sum. When I apply this function however I get results which I can't justify. I was expecting in the first row to get 0 which is the sum of the values contained in the first week, then 15 for the next week etc...
df_resampled = df.resample('W', label='left').sum()
df_resampled.head()
Can someone explain to me what am I missing since it seems like I have not understood the resampling function correctly?

Python: Date conversion to year-weeknumber, issue at switch of year

I am trying to convert a dataframe column with a date and timestamp to a year-weeknumber format, i.e., 01-05-2017 03:44 = 2017-1. This is pretty easy, however, I am stuck at dates that are in a new year, yet their weeknumber is still the last week of the previous year. The same thing that happens here.
I did the following:
df['WEEK_NUMBER'] = df.date.dt.year.astype(str).str.cat(df.date.dt.week.astype(str), sep='-')
Where df['date'] is a very large column with date and times, ranging over multiple years.
A date which gives a problem is for example:
Timestamp('2017-01-01 02:11:27')
The output for my code will be 2017-52, while it should be 2016-52. Since the data covers multiple years, and weeknumbers and their corresponding dates change every year, I cannot simply subtract a few days.
Does anybody have an idea of how to fix this? Thanks!

Replace df.date.dt.year by this:
(df.date.dt.year- ((df.date.dt.week>50) & (df.date.dt.month==1)))
Basically, it means that you will substract 1 to the year value if the week number is greater than 50 and the month is January.

Date ranges in Pandas

After fighting with NumPy and dateutil for days, I recently discovered the amazing Pandas library. I've been poring through the documentation and source code, but I can't figure out how to get date_range() to generate indices at the right breakpoints.
from datetime import date
import pandas as pd
start = date('2012-01-15')
end = date('2012-09-20')
# 'M' is month-end, instead I need same-day-of-month
date_range(start, end, freq='M')
What I want:
2012-01-15
2012-02-15
2012-03-15
...
2012-09-15
What I get:
2012-01-31
2012-02-29
2012-03-31
...
2012-08-31
I need month-sized chunks that account for the variable number of days in a month. This is possible with dateutil.rrule:
rrule(freq=MONTHLY, dtstart=start, bymonthday=(start.day, -1), bysetpos=1)
Ugly and illegible, but it works. How can do I this with pandas? I've played with both date_range() and period_range(), so far with no luck.
My actual goal is to use groupby, crosstab and/or resample to calculate values for each period based on sums/means/etc of individual entries within the period. In other words, I want to transform data from:
total
2012-01-10 00:01 50
2012-01-15 01:01 55
2012-03-11 00:01 60
2012-04-28 00:01 80
#Hypothetical usage
dataframe.resample('total', how='sum', freq='M', start='2012-01-09', end='2012-04-15')
to
total
2012-01-09 105 # Values summed
2012-02-09 0 # Missing from dataframe
2012-03-09 60
2012-04-09 0 # Data past end date, not counted
Given that Pandas originated as a financial analysis tool, I'm virtually certain that there's a simple and fast way to do this. Help appreciated!

freq='M' is for month-end frequencies (see here). But you can use .shift to shift it by any number of days (or any frequency for that matter):
pd.date_range(start, end, freq='M').shift(15, freq=pd.datetools.day)

There actually is no "day of month" frequency (e.g. "DOMXX" like "DOM09"), but I don't see any reason not to add one.
http://github.com/pydata/pandas/issues/2289
I don't have a simple workaround for you at the moment because resample requires passing a known frequency rule. I think it should be augmented to be able to take any date range to be used as arbitrary bin edges, also. Just a matter of time and hacking...

try
date_range(start, end, freq=pd.DateOffset(months=1))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get mean value of every month in such dataframe? - python

dataframe time A100 A101 A102 2017/1/1 0:00 2017/1/1 1:00 2017/1/1 2:00 ... 2017/12/31 23:00 I have a dataframe as shown above, which includes 24 hours daily records in 2017. How can I get every month's mean value of every column?

1st convert your time column to datatime in pandas by using to_datetime, then we using groupby df.time=pd.to_datetime(df.time,format='%Y/%m/%d %H:%M') GMonth=df.groupby(df.time.dt.strftime('%Y-%m')).mean()

First make sure that the data type for the time column is parsed correctly, use dtypes to verify it. Next step would be just: df.resample("M", how='mean')

Related

Sorting dataframe rows by Day of Date wise

How to set the time stamp of the data frame to thw first day of the month

Get sum of business days in dataframe python with resample

Python: Date conversion to year-weeknumber, issue at switch of year

Date ranges in Pandas

Categories

Resources