Pandas rearrange hour-date excel table into a datetime dataframe - python
I have an Excel data set that looks like this:
24 25 26 27
1 0,3818 0,0713 0,07222 0,3542
2 0,17802 0,04508 0,06877 0,17319
3 0,22356 0,07314 0,04991 0,22448
4 0,1771 0,07038 0,07406 0,19136
5 0,19389 0,06164 0,05497 0,18538
6 0,20401 0,07475 0,06417 0,21413
7 0,18354 0,07245 0,07337 0,17756
8 0,46184 0,04669 0,0506 0,28819
9 0,43838 0,0667 0,06785 0,4692
10 0,78292 0,07038 0,07291 0,66424
11 1,81792 0,06003 0,04508 1,17001
12 2,40833 0,05451 0,07245 1,08422
13 1,55746 0,07038 0,07314 0,61272
14 1,2075 0,06509 0,04485 0,40871
15 2,4196 0,05014 0,07291 0,27393
16 0,95979 0,07015 0,07291 0,2323
17 0,51681 0,06992 0,04554 0,2024
18 0,46529 0,04232 0,85192 0,35558
19 0,58328 0,06992 1,59321 0,60283
20 1,40185 0,07015 0,82869 1,23326
21 0,71484 0,04692 1,05041 1,01131
22 0,48576 0,07291 0,80707 1,4697
23 0,04278 0,07245 0,57523 1,72316
24 0,07291 0,04554 0,5175 0,61364
The first column represents the hours of the day, the first row the number of day of the year (24 corresponds to the 24th of January, the rows spans all the way through the year, ending on the day number 365) for the year 2013.
What I want to obtain is a dataframe which as first column has the date, with year-month-day-hour and for which the respective hourly value is correctly associated.
'date' 'value'
2013-01-24 01:00 0.3818
2013-01-24 02:00 0.17802
2013-01-24 03:00 0.22356
...
The Excel data set
Thank you for your help.
This is the best I've got:
if your data is on a pandas.DataFrame called df you can do:
df2 = df.unstack()
start = pd.Timestamp('01/01/2013')
df2 = df2.reset_index()
df2['date'] = [start + pd.DateOffset(days = int(x)-1) for x in df2.level_0.values]
df2['date'] += pd.to_timedelta(df2.level_1, unit='h')
df2.index = df2.date
df2 = df2[0]
Result
date
2013-01-24 00:00:00 0,3818
2013-01-24 01:00:00 0,17802
2013-01-24 02:00:00 0,22356
2013-01-24 03:00:00 0,1771
2013-01-24 04:00:00 0,19389
2013-01-24 05:00:00 0,20401
2013-01-24 06:00:00 0,18354
2013-01-24 07:00:00 0,46184
2013-01-24 08:00:00 0,43838
2013-01-24 09:00:00 0,78292
2013-01-24 10:00:00 1,81792
2013-01-24 11:00:00 2,40833
2013-01-24 12:00:00 1,55746
2013-01-24 13:00:00 1,2075
2013-01-24 14:00:00 2,4196
2013-01-24 15:00:00 0,95979
2013-01-24 16:00:00 0,51681
2013-01-24 17:00:00 0,46529
2013-01-24 18:00:00 0,58328
2013-01-24 19:00:00 1,40185
2013-01-24 20:00:00 0,71484
2013-01-24 21:00:00 0,48576
2013-01-24 22:00:00 0,04278
2013-01-24 23:00:00 0,07291
2013-01-25 00:00:00 0,0713
Related
Generate time series dataframe with min and max time with the given interval pandas
How can I generate a time series dataset with min and max date range with the specific interval in pandas? min_date = 18 oct 2022 Max_date = 20 Oct 2022 interval = 1 hour Min_date Max_date 18/10/2022 00:00:00 18/10/2022 01:00:00 18/10/2022 01:00:00 18/10/2022 02:00:00 18/10/2022 02:00:00 18/10/2022 03:00:00 18/10/2022 03:00:00 18/10/2022 04:00:00 19/10/2022 22:00:00 18/10/2022 23:00:00 19/10/2022 23:00:00 18/10/2022 23:59:00 Thanks in advance
import pandas as pd min_date = pd.Timestamp('oct 18, 2022') max_date = pd.Timestamp('oct 20, 2022') interval = pd.offsets.Hour(+1) df = pd.DataFrame(pd.date_range(min_date, max_date - interval, freq = interval), columns = ['Min_date']) df['Max_date'] = df['Min_date'] + interval print(df) Output: Min_date Max_date 0 2022-10-18 00:00:00 2022-10-18 01:00:00 1 2022-10-18 01:00:00 2022-10-18 02:00:00 2 2022-10-18 02:00:00 2022-10-18 03:00:00 . . . 45 2022-10-19 21:00:00 2022-10-19 22:00:00 46 2022-10-19 22:00:00 2022-10-19 23:00:00 47 2022-10-19 23:00:00 2022-10-20 00:00:00
Convert a Pandas Column to Hours and Minutes
I have one field in a Pandas DataFrame that is in integer format. How do I convert to a DateTime format and append the column to my DataFrame?. Specifically, I need hours and minutes. Example: DataFrame Name: df The column as a list: df.index dtype='int64' Sample data in df.index -- [0, 15, 30, 45, 100, 115, 130, 145, 200...2300, 2315, 2330, 2345] I tried pd.to_datetime(df.index, format='') but it is returning the wrong format.
You have an index that has time values as HHMM represented by an integer. In order to convert this to a datetime dtype, you have to first make strings that can be correctly converted by the to_datetime() method. time_strs = df.index.astype(str).str.zfill(4) This converts all of the integer values to strings that are zero padded to 4 characters, so 15 becomes the string "0015" for example. Now you can use the format "%H%M" to convert to a datetime object: pd.to_datetime(time_strs, format="%H%M") And then use the methods of datetime objects to access the hours and minutes.
import pandas as pd df = pd.DataFrame({'time':[0, 15, 30, 45, 100, 115, 130, 145, 200, 2300, 2315, 2330, 2345]}) df.set_index('time', inplace=True) df['datetime_dtype'] = pd.to_datetime(df.index, format='%H', exact=False) df['str_dtype'] = df['datetime_dtype'].astype(str).str[11:16] print(df) datetime_dtype str_dtype time 0 1900-01-01 00:00:00 00:00 15 1900-01-01 15:00:00 15:00 30 1900-01-01 03:00:00 03:00 45 1900-01-01 04:00:00 04:00 100 1900-01-01 10:00:00 10:00 115 1900-01-01 11:00:00 11:00 130 1900-01-01 13:00:00 13:00 145 1900-01-01 14:00:00 14:00 200 1900-01-01 20:00:00 20:00 2300 1900-01-01 23:00:00 23:00 2315 1900-01-01 23:00:00 23:00 2330 1900-01-01 23:00:00 23:00 2345 1900-01-01 23:00:00 23:00 print(df.dtypes) datetime_dtype datetime64[ns] str_dtype object dtype: object If you want to get back to this year, you can use a time delta. delta = pd.Timedelta(weeks=6278, hours=0, minutes=0) df['datetime_dtype_2020'] = df['datetime_dtype'] + delta print(df) datetime_dtype str_dtype datetime_dtype_2020 time 0 1900-01-01 00:00:00 00:00 2020-04-27 00:00:00 15 1900-01-01 15:00:00 15:00 2020-04-27 15:00:00 30 1900-01-01 03:00:00 03:00 2020-04-27 03:00:00 45 1900-01-01 04:00:00 04:00 2020-04-27 04:00:00 100 1900-01-01 10:00:00 10:00 2020-04-27 10:00:00 115 1900-01-01 11:00:00 11:00 2020-04-27 11:00:00 130 1900-01-01 13:00:00 13:00 2020-04-27 13:00:00 145 1900-01-01 14:00:00 14:00 2020-04-27 14:00:00 200 1900-01-01 20:00:00 20:00 2020-04-27 20:00:00 2300 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00 2315 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00 2330 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00 2345 1900-01-01 23:00:00 23:00 2020-04-27 23:00:00
If you only want hours and minutes, then you can use datetime.time objects. import datetime def int_to_time(i): if i < 60: return datetime.time(0, i) elif i < 1000: return datetime.time(int(str(i)[0]), int(str(i)[1:])) else: return datetime.time(int(str(i)[0:2]), int(str(i)[2:])) df.index.apply(int_to_time) Example import datetime import numpy as np ints = [i for i in np.random.randint(0, 2400, 100) if i % 100 < 60][0:5] df = pd.DataFrame({'a': ints}) >>>df 0 1559 1 1712 2 1233 3 953 4 938 >>>df['a'].apply(int_to_time) 0 15:59:00 1 17:12:00 2 12:33:00 3 09:53:00 4 09:38:00 From there, you can access the hour and minute properties of the values >>>df['a'].apply(int_to_time).apply(lambda x: (x.hour, x.minute)) 0 (15, 59) 1 (17, 12) 2 (12, 33) 3 (9, 53) 4 (9, 38)
How to reset a date time column to be in increments of one minute in python
I have a dataframe that has a date time column called start time and it is set to a default of 12:00:00 AM. I would like to reset this column so that the first row is 00:01:00 and the second row is 00:02:00, that is one minute interval. This is the original table. ID State Time End Time A001 12:00:00 12:00:00 A002 12:00:00 12:00:00 A003 12:00:00 12:00:00 A004 12:00:00 12:00:00 A005 12:00:00 12:00:00 A006 12:00:00 12:00:00 A007 12:00:00 12:00:00 I want to reset the start time column so that my output is this: ID State Time End Time A001 0:00:00 12:00:00 A002 0:00:01 12:00:00 A003 0:00:02 12:00:00 A004 0:00:03 12:00:00 A005 0:00:04 12:00:00 A006 0:00:05 12:00:00 A007 0:00:06 12:00:00 How do I go about this?
you could use pd.date_range: df['Start Time'] = pd.date_range('00:00', periods=df['Start Time'].shape[0], freq='1min') gives you df Out[23]: Start Time 0 2019-09-30 00:00:00 1 2019-09-30 00:01:00 2 2019-09-30 00:02:00 3 2019-09-30 00:03:00 4 2019-09-30 00:04:00 5 2019-09-30 00:05:00 6 2019-09-30 00:06:00 7 2019-09-30 00:07:00 8 2019-09-30 00:08:00 9 2019-09-30 00:09:00 supply a full date/time string to get another starting date.
First we convert your State Time column to datetime type. Then we use pd.date_range and use the first time as starting point with a frequency of 1 minute. df['State Time'] = pd.to_datetime(df['State Time']) df['State Time'] = pd.date_range(start=df['State Time'].min(), periods=len(df), freq='min').time Output ID State Time End Time 0 A001 12:00:00 12:00:00 1 A002 12:01:00 12:00:00 2 A003 12:02:00 12:00:00 3 A004 12:03:00 12:00:00 4 A005 12:04:00 12:00:00 5 A006 12:05:00 12:00:00 6 A007 12:06:00 12:00:00
How can I efficiently convert hourly data into dates and times for every day of the year using Python pandas?
I have a pandas DataFrame that represents a value for every hour of a day and I want to report each value of each day for a year. I have written the 'naive' way to do it. Is there a more efficient way? Naive way (that works correctly, but takes a lot of time): dfConsoFrigo = pd.read_csv("../assets/datas/refregirateur.csv", sep=';') dataframe = pd.DataFrame(columns=['Puissance']) iterator = 0 for day in pd.date_range("01 Jan 2017 00:00", "31 Dec 2017 23:00", freq='1H'): iterator = iterator % 24 dataframe.loc[day] = dfConsoFrigo.iloc[iterator]['Puissance'] iterator += 1 Input (time;value) 24 rows: Heure;Puissance 00:00;48.0 01:00;47.0 02:00;46.0 03:00;46.0 04:00;45.0 05:00;46.0 ... 19:00;55.0 20:00;53.0 21:00;51.0 22:00;50.0 23:00;49.0 Expected Output (8760 rows): Puissance 2017-01-01 00:00:00 48 2017-01-01 01:00:00 47 2017-01-01 02:00:00 46 2017-01-01 03:00:00 46 2017-01-01 04:00:00 45 ... 2017-12-31 20:00:00 53 2017-12-31 21:00:00 51 2017-12-31 22:00:00 50 2017-12-31 23:00:00 49
I think you need numpy.tile: np.random.seed(10) df = pd.DataFrame({'Puissance':np.random.randint(100, size=24)}) rng = pd.date_range("01 Jan 2017 00:00", "31 Dec 2017 23:00", freq='1H') df = pd.DataFrame({'a':np.tile(df['Puissance'].values, 365)}, index=rng) print (df.head(30)) a 2017-01-01 00:00:00 9 2017-01-01 01:00:00 15 2017-01-01 02:00:00 64 2017-01-01 03:00:00 28 2017-01-01 04:00:00 89 2017-01-01 05:00:00 93 2017-01-01 06:00:00 29 2017-01-01 07:00:00 8 2017-01-01 08:00:00 73 2017-01-01 09:00:00 0 2017-01-01 10:00:00 40 2017-01-01 11:00:00 36 2017-01-01 12:00:00 16 2017-01-01 13:00:00 11 2017-01-01 14:00:00 54 2017-01-01 15:00:00 88 2017-01-01 16:00:00 62 2017-01-01 17:00:00 33 2017-01-01 18:00:00 72 2017-01-01 19:00:00 78 2017-01-01 20:00:00 49 2017-01-01 21:00:00 51 2017-01-01 22:00:00 54 2017-01-01 23:00:00 77 2017-01-02 00:00:00 9 2017-01-02 01:00:00 15 2017-01-02 02:00:00 64 2017-01-02 03:00:00 28 2017-01-02 04:00:00 89 2017-01-02 05:00:00 93
Pandas filtering values in dataframe
I have this dataframe. The columns represent the highs and the lows in daily EURUSD price: df.low df.high 2013-01-17 16:00:00 1.33394 2013-01-17 20:00:00 1.33874 2013-01-18 18:00:00 1.32805 2013-01-18 09:00:00 1.33983 2013-01-21 00:00:00 1.32962 2013-01-21 09:00:00 1.33321 2013-01-22 11:00:00 1.32667 2013-01-22 09:00:00 1.33715 2013-01-23 17:00:00 1.32645 2013-01-23 14:00:00 1.33545 2013-01-24 10:00:00 1.32860 2013-01-24 18:00:00 1.33926 2013-01-25 04:00:00 1.33497 2013-01-25 17:00:00 1.34783 2013-01-28 10:00:00 1.34246 2013-01-28 16:00:00 1.34771 2013-01-29 13:00:00 1.34143 2013-01-29 21:00:00 1.34972 2013-01-30 08:00:00 1.34820 2013-01-30 21:00:00 1.35873 2013-01-31 13:00:00 1.35411 2013-01-31 17:00:00 1.35944 I summed them up into a third column (df.extremes). df.extremes 2013-01-17 16:00:00 1.33394 2013-01-17 20:00:00 1.33874 2013-01-18 18:00:00 1.32805 2013-01-18 09:00:00 1.33983 2013-01-21 00:00:00 1.32962 2013-01-21 09:00:00 1.33321 2013-01-22 09:00:00 1.33715 2013-01-22 11:00:00 1.32667 2013-01-23 14:00:00 1.33545 2013-01-23 17:00:00 1.32645 2013-01-24 10:00:00 1.32860 2013-01-24 18:00:00 1.33926 2013-01-25 04:00:00 1.33497 2013-01-25 17:00:00 1.34783 2013-01-28 10:00:00 1.34246 2013-01-28 16:00:00 1.34771 2013-01-29 13:00:00 1.34143 2013-01-29 21:00:00 1.34972 2013-01-30 08:00:00 1.34820 2013-01-30 21:00:00 1.35873 2013-01-31 13:00:00 1.35411 2013-01-31 17:00:00 1.35944 But now i want to filter some values from df.extremes. To explain what to filter i try with this "pseudocode": IF following the index we move from: previous df.low --> df.low --> df.high: IF df.low > previous df.low: delete df.low IF df.low < previous df.low: delete previous df.low If i try to work this out with a for loop, it gives me a KeyError: 1.3339399999999999. day = df.groupby(pd.TimeGrouper('D')) is_day_min = day.extremes.apply(lambda x: x == x.min()) for i in df.extremes: if is_day_min[i] == True and is_day_min[i+1] == True: if df.extremes[i] > df.extremes[i+1]: del df.extremes[i] for i in df.extremes: if is_day_min[i] == True and is_day_min[i+1] == True: if df.extremes[i] < df.extremes[i+1]: del df.extremes[i+1] How to filter/delete the values as i explained in pseudocode? I am struggling with indexing and bools but i can't solve this. I strongly suspect that i need to use a lambda function, but i don't know how to apply it. So please have mercy it's too long that i'm trying on this. Hope i've been clear enough.
All you're really missing is a way of saying "previous low" in a vectorized fashion. That's spelled df['low'].shift(-1). Once you have that it's just: prev = df.low.shift(-1) filtered_df = df[~((df.low > prev) | (df.low < prev))]