Pandas sort_index only the given timeframe - python

I have a pandas series object consists of a datetime_index and some values, looks like following:
df
2020-01-01 00:00:00 39.6
2020-01-01 00:15:00 35.6
2020-01-01 00:30:00 35.6
2020-01-01 00:45:00 39.2
2020-01-01 01:00:00 56.7
...
2020-12-31 23:45:00 56.3
I am adding some values to this df with .append(). Since it is not sorted then I sort its index via .sort_index(). However what I would like to achieve is that I want to sort only for given day.
So for example I add some values to day 2020-01-01, and since the added values will be after the end of the day 2020-01-01 I just need to sort the first day of the year. NOT ALL THE DF.
Here is an example, NaN value is added with .append():
df
2020-01-01 00:00:00 39.6
2020-01-01 00:15:00 35.6
...
2020-01-01 23:45:00 34.3
2020-01-01 15:00:00 NaN
...
2020-12-31 23:45:00 56.3
Now I cannot df.sort_index(), because it breaks other days. That is why I just want to apply .sort_index() to the day 2020-01-01. How do I do that?
WHAT I TRIED SO FAR AND DOES NOT WORK:
df.loc['2020-01-01'] = df.loc['2020-01-01'].sort_index()

Filter rows for 2020-01-01 days, sorting and join back with not matched rows:
mask = df.index.normalize() == '2020-01-01'
df = pd.concat([df[mask].sort_index(), df[~mask]])
print (df)
2020-01-01 00:00:00 39.6
2020-01-01 00:15:00 35.6
2020-01-01 15:00:00 NaN
2020-01-01 23:45:00 34.3
2020-12-31 23:45:00 56.3
Name: a, dtype: float64
Another idea:
df1 = df['2020-01-01'].sort_index()
df = pd.concat([df1, df.drop(df1.index)])

Related

Combining dataframes with differing dates column

I have a dataset of hourly prices where I have produced a dataframe that contains the minimum price from the previous day using:
df_min = df_hour_0[['Price_REG1', 'Price_REG2', 'Price_REG3',
'Price_REG4']].between_time('00:00', '23:00').resample('d').min()
This gives me:
Price_REG1 Price_REG2 Price_REG3 Price_REG4
date
2020-01-01 00:00:00 25.07 25.07 25.07 25.07
2020-01-02 00:00:00 12.07 12.07 12.07 12.07
2020-01-03 00:00:00 0.14 0.14 0.14 0.14
2020-01-04 00:00:00 3.83 3.83 3.83 3.83
2020-01-05 00:00:00 25.77 25.77 25.77 25.77
Now, I want to combine this df with 24 other df's, one for each hour (hour_0 below):
Price_REG1 Price_REG2 ... Price_24_3 Price_24_4
date ...
2020-01-01 00:00:00 30.83 30.83 ... NaN NaN
2020-01-02 00:00:00 24.81 24.81 ... 25.88 25.88
2020-01-03 00:00:00 24.39 24.39 ... 27.69 27.69
2020-01-04 00:00:00 22.04 22.04 ... 25.70 25.70
2020-01-05 00:00:00 25.77 25.77 ... 27.37 27.37
Which I do this way:
df_hour_0 = pd.concat([df_hour_0, df_min, df_max], axis=1)
This works fine for the df from the first hour, since the dates matches. But for the other df's the date is "2020-01-01 00:01:00", "2020-01-01 00:02:00" etc.
Since the dates don't match, the pd.concat gives me two times as many observations where every other observation is null:
Price_REG1 Price_REG2 ... Price_3_min Price_4_min
date ...
2020-01-01 00:00:00 NaN NaN ... NaN NaN
2020-01-01 01:00:00 28.78 28.78 ... NaN NaN
2020-01-02 00:00:00 NaN NaN ... 30.83 30.83
2020-01-02 01:00:00 12.07 12.07 ... NaN NaN
2020-01-03 00:00:00 NaN NaN ... 31.20 31.20
I tried to fix this by:
df_max = df_max.reset_index()
df_max = df_max.drop(['date'], axis=1)
But this only gives me the same issue but instead of every other being null the whole df_min df is just inserted at the bottom of the first df.
I want to keep the date, otherwise I guess it could be possible to reset the index in both df's and combine them by index instead of date.
Thank you.
One option could be to normalize to the date:
dfs = [df_hour_0, df_min, df_max]
pd.concat([d.set_axis(d.index.normalize()) for d in dfs], axis=1)

Is there a Pandas function that can group the hourly data of each day like 2021-01-01 01:00:00 to 2021-01-02 00:00:00 as one group and so on

I've this dataset that contains data observed after each hour of the day. Since the observation is done after every hour, the data starts from 01:00:00 hour and ends at 00:00:00 of the next day.
Is there a way to group these data into a single day starting from hour 01 and ends at hour 00 .
2021-01-01 01:00:00 22.5
2021-01-01 02:00:00 25.3
.
.
.
2021-01-01 23:00:00 30.2
2021-01-02 00:00:00 28.6
2021-01-02 01:00:00 29.2
2021-01-02 02:00:00 30.2
.
.
.
2021-01-02 23:00:00 28.2
2021-01-03 00:00:00 28.0
I've tried pd.Grouper but it groups from 00-23 hour.
df_Paris['DateTime'] = pd.to_datetime(df_Paris['DateTime'], format='%Y-%m-%d')
davg_df = df_Paris.groupby(pd.Grouper(freq='D', key='DateTime')).mean()
But I need to group data like day1 01:00:00 - day2 00:00:00.
Is there a way to do this ?
Thanks
You could create a helper column, which is 'DateTime' minus one hour, and use that for grouping.
EX:
import pandas as pd
df = pd.DataFrame({'DateTime': ["2020-01-01 01:00", "2020-01-02 00:00",
"2020-01-02 01:00", "2020-01-03 00:00"],
'value': [1, 1, 3, 3]})
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['helper'] = df['DateTime'] - pd.Timedelta(hours=1)
davg_df = df.groupby(pd.Grouper(freq='D', key='helper')).mean()
# davg_df
# value
# helper
# 2020-01-01 1
# 2020-01-02 3

how to add values to specific date in pandas?

So I have a dataset with a specific date along with every data. I want to fill these values according to their specific date in Excel which contains the date range of the whole year. It's like the date starts from 01-01-2020 00:00:00 and end at 31-12-2020 23:45:00 with the frequency of 15 mins. So there will be a total of 35040 date-time values in Excel.
my data is like:
load date
12 01-02-2020 06:30:00
21 29-04-2020 03:45:00
23 02-07-2020 12:15:00
54 07-08-2020 16:00:00
23 22-09-2020 16:30:00
As you can see these values are not continuous but they have specific dates with them, so I these date values as the index and put it at that particular date in the Excel which has the date column, and also put zero in the missing values. Can someone please help?
Use DataFrame.reindex with date_range - so added 0 values for all not exist datetimes:
rng = pd.date_range('2020-01-01','2020-12-31 23:45:00', freq='15Min')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').reindex(rng, fill_value=0)
print (df)
load
2020-01-01 00:00:00 0
2020-01-01 00:15:00 0
2020-01-01 00:30:00 0
2020-01-01 00:45:00 0
2020-01-01 01:00:00 0
...
2020-12-31 22:45:00 0
2020-12-31 23:00:00 0
2020-12-31 23:15:00 0
2020-12-31 23:30:00 0
2020-12-31 23:45:00 0
[35136 rows x 1 columns]

How to extract hourly data from a df in python?

I have the following df
dates Final
2020-01-01 00:15:00 94.7
2020-01-01 00:30:00 94.1
2020-01-01 00:45:00 94.1
2020-01-01 01:00:00 95.0
2020-01-01 01:15:00 96.6
2020-01-01 01:30:00 98.4
2020-01-01 01:45:00 99.8
2020-01-01 02:00:00 99.8
2020-01-01 02:15:00 98.0
2020-01-01 02:30:00 95.1
2020-01-01 02:45:00 91.9
2020-01-01 03:00:00 89.5
The entire dataset is till 2021-01-01 00:00:00 95.6 with a gap of 15mins.
Since the freq is 15mins, I would like to change it to 1 hour and maybe drop the middle values
Expected output
dates Final
2020-01-01 01:00:00 95.0
2020-01-01 02:00:00 99.8
2020-01-01 03:00:00 89.5
With the last row being 2021-01-01 00:00:00 95.6
How can this be done?
Thanks
Use Series.dt.minute to performance a boolean indexing:
df_filtered = df.loc[df['dates'].dt.minute.eq(0)]
#if necessary
#df_filtered = df.loc[pd.to_datetime(df['dates']).dt.minute.eq(0)]
print(df_filtered)
dates Final
3 2020-01-01 01:00:00 95.0
7 2020-01-01 02:00:00 99.8
11 2020-01-01 03:00:00 89.5
If you're doing data analysis or data science I don't think dropping the middle values is a good approach at all! You should sum them I guess (I don't know about your use case but I know some stuff about Time Series data).

read excel and convert index to datatimeindex pandas

I read an excel in pandas like this
df = pd.read_excel("Test.xlsx", index_col=[0])
The dataframe look like this with the index containing a date and time and one column:
01.01.2015 00:15:00 47.2
01.01.2015 00:30:00 46.6
01.01.2015 00:45:00 19.4
01.01.2015 01:00:00 14.8
01.01.2015 01:15:00 14.8
01.01.2015 01:30:00 16.4
01.01.2015 01:45:00 16.2
...
I want to convert the index to a datatimeindex, I tried
df.index = pd.to_datetime(df.index)
and got: "ValueError: Unknown string format"
What is here the best way to convert the index to a datatime format containing date and tiem to use datetime based functions
I think you need add parameter format - see http://strftime.org/:
df.index = pd.to_datetime(df.index, format='%d.%m.%Y %H:%M:%S')
print (df)
a
2015-01-01 00:15:00 47.2
2015-01-01 00:30:00 46.6
2015-01-01 00:45:00 19.4
2015-01-01 01:00:00 14.8
2015-01-01 01:15:00 14.8
2015-01-01 01:30:00 16.4
2015-01-01 01:45:00 16.2

Categories

Resources