I have two dates in pandas dataframes (df1.a_date & df2.another_date) read from CSV files. They match at the date level (YYYY-MM-DD) but not at the time (HH:MM:SS). Both are read in as dtype: object.
I need to merge the two dataframes on the dates, but since they aren't exact, i probably need to convert them first. Any ideas?
edit:
I've tried using diatomite.date to construct a new date from the pandas.datetime, but that doesn't seem to work.
datetime.date(df.a_date.year, df.a_date.month, df.a_date.day)
pandas datetime objects don't have year, month, day accessors, though.
You can normalize a date column/DatetimeIndex index:
Note: At the moment normalize isn't exported to the dt accessor so we need to wrap with DatetimeIndex.
In [11]: df = pd.DataFrame(pd.date_range('2015-01-01 05:00', periods=3), columns=['datetime'])
In [12]: df
Out[12]:
datetime
0 2015-01-01 05:00:00
1 2015-01-02 05:00:00
2 2015-01-03 05:00:00
In [13]: df["date"] = pd.DatetimeIndex(df["datetime"]).normalize()
In [14]: df
Out[14]:
datetime date
0 2015-01-01 05:00:00 2015-01-01
1 2015-01-02 05:00:00 2015-01-02
2 2015-01-03 05:00:00 2015-01-03
This works if it's a DatetimeIndex too, use df.index rather than df[col_name].
Format the datetime to only include YYYY-MM-DD:
assuming df is your dataframe:
'{:%Y-%m-%d}'.format(d)
Assume, dft is your dataframe and 'index' column contains datetime:
In [1804]: dft.head()
Out[1804]:
index A
0 2013-01-01 00:00:00 1.193366
1 2013-01-01 00:01:00 1.013425
2 2013-01-01 00:02:00 1.281902
3 2013-01-01 00:03:00 -0.043788
4 2013-01-01 00:04:00 -1.610164
You could convert the column to contain just the date and save it in a different column, if you want. And operate on that:
In [1805]: dft['index'].apply(lambda v:v.date()).head()
Out[1805]:
0 2013-01-01
1 2013-01-01
2 2013-01-01
3 2013-01-01
4 2013-01-01
Name: index, dtype: object
Related
So I have a dataset with a specific date along with every data. I want to fill these values according to their specific date in Excel which contains the date range of the whole year. It's like the date starts from 01-01-2020 00:00:00 and end at 31-12-2020 23:45:00 with the frequency of 15 mins. So there will be a total of 35040 date-time values in Excel.
my data is like:
load date
12 01-02-2020 06:30:00
21 29-04-2020 03:45:00
23 02-07-2020 12:15:00
54 07-08-2020 16:00:00
23 22-09-2020 16:30:00
As you can see these values are not continuous but they have specific dates with them, so I these date values as the index and put it at that particular date in the Excel which has the date column, and also put zero in the missing values. Can someone please help?
Use DataFrame.reindex with date_range - so added 0 values for all not exist datetimes:
rng = pd.date_range('2020-01-01','2020-12-31 23:45:00', freq='15Min')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').reindex(rng, fill_value=0)
print (df)
load
2020-01-01 00:00:00 0
2020-01-01 00:15:00 0
2020-01-01 00:30:00 0
2020-01-01 00:45:00 0
2020-01-01 01:00:00 0
...
2020-12-31 22:45:00 0
2020-12-31 23:00:00 0
2020-12-31 23:15:00 0
2020-12-31 23:30:00 0
2020-12-31 23:45:00 0
[35136 rows x 1 columns]
I am trying to convert a column which has different date formats.
For example:
month
2018-01-01 float64
2018-02-01 float64
2018-03-01 float64
2018-03-01 00:00:00 float64
2018-04-01 01:00:00 float64
2018-05-01 01:00:00 float64
2018-06-01 01:00:00 float64
2018-07-01 01:00:00 float64
I want to convert everything in the column to just month and year. For example I would like Jan-18, Feb-18, Mar-18, etc.
I have tried using this code to first convert my column to datetime:
df['month'] = pd.to_datetime(df['month'], format='%Y-%m-%d')
But it returns a float64:
Out
month
2018-01-01 00:00:00 float64
2018-02-01 00:00:00 float64
2018-03-01 00:00:00 float64
2018-04-01 01:00:00 float64
2018-05-01 01:00:00 float64
2018-06-01 01:00:00 float64
2018-07-01 01:00:00 float64
In my output to CSV the month format has been changed to 01/05/2016 00:00:00. Can you please help me covert to just month and year e.g. Aug-18.
Thank you
I assume you have a Pandas dataframe. In this case, you can use pd.Series.dt.to_period:
s = pd.Series(['2018-01-01', '2018-02-01', '2018-03-01',
'2018-03-01 00:00:00', '2018-04-01 01:00:00'])
res = pd.to_datetime(s).dt.to_period('M')
print(res)
0 2018-01
1 2018-02
2 2018-03
3 2018-03
4 2018-04
dtype: object
As you can see, this results in a series of dtype object, which is generally inefficient. A better idea is to set the day to the last of the month and maintain a datetime series internally represented by integers.
Sorry I am new to asking questions on stackoverflow so I don't understand how to format properly.
So I'm given a Pandas dataframe that contains column of datetime which contains the date and the time and an associated column that contains some sort of value. The given dates and times are incremented by the hour. I would like to manipulate the dataframe to have them increment every 15 minutes, but retain the same value. How would I do that? Thanks!
I have tried :
df = df.asfreq('15Min',method='ffill').
But I get a error:
"TypeError: Cannot compare type 'Timestamp' with type 'long'"
current dataframe:
datetime value
00:00:00 1
01:00:00 2
new dataframe:
datetime value
00:00:00 1
00:15:00 1
00:30:00 1
00:45:00 1
01:00:00 2
01:15:00 2
01:30:00 2
01:45:00 2
Update:
The approved answer below works, but so does the initial code I tried above
df = df.asfreq('15Min',method='ffill'). I was messing around with other Dataframes and I seemed to be having trouble with some null values so I took care of that with a fillna statements and everything worked.
You can use TimedeltaIndex, but is necessary manually add last value for correct reindex:
df['datetime'] = pd.to_timedelta(df['datetime'])
df = df.set_index('datetime')
tr = pd.timedelta_range(df.index.min(),
df.index.max() + pd.Timedelta(45*60, unit='s'), freq='15Min')
df = df.reindex(tr, method='ffill')
print (df)
value
00:00:00 1
00:15:00 1
00:30:00 1
00:45:00 1
01:00:00 2
01:15:00 2
01:30:00 2
01:45:00 2
Another solution with resample and same problem - need append new value for correct appending last values:
df['datetime'] = pd.to_timedelta(df['datetime'])
df = df.set_index('datetime')
df.loc[df.index.max() + pd.Timedelta(1, unit='h')] = 1
df = df.resample('15Min').ffill().iloc[:-1]
print (df)
value
datetime
00:00:00 1
00:15:00 1
00:30:00 1
00:45:00 1
01:00:00 2
01:15:00 2
01:30:00 2
01:45:00 2
But if values are datetimes:
print (df)
datetime value
0 2018-01-01 00:00:00 1
1 2018-01-01 01:00:00 2
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')
tr = pd.date_range(df.index.min(),
df.index.max() + pd.Timedelta(45*60, unit='s'), freq='15Min')
df = df.reindex(tr, method='ffill')
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.set_index('datetime')
df.loc[df.index.max() + pd.Timedelta(1, unit='h')] = 1
df = df.resample('15Min').ffill().iloc[:-1]
print (df)
value
datetime
2018-01-01 00:00:00 1
2018-01-01 00:15:00 1
2018-01-01 00:30:00 1
2018-01-01 00:45:00 1
2018-01-01 01:00:00 2
2018-01-01 01:15:00 2
2018-01-01 01:30:00 2
2018-01-01 01:45:00 2
You can use pandas.daterange
pd.date_range('00:00:00', '01:00:00', freq='15T')
If df is a Dataframe indexed by DateTime objects, the following code splits it into the list groups_list where each index containts all the data in df that belongs to a given day:
groupby_clause = [df.index.year,df.index.month,df.index.day]
groups_list = [group[1] for group in df.groupby(groupby_clause)]
I am having trouble, though, to understand how the grouping is actually made, since I don't need to label the elements of groupby_clause as year, month, and day for the grouping to be made on DateTime objects.
As an example, I have the following components for groups_list:
Maybe I'm missing something obvious, but I don't get it: how does pandas know that it should associate groupby_clause[0] to year, groupby_clause[1] to month, and groupby_clause[2] to day in order to group the dataframe indexes that have DateTime type?
Suppose you have a DataFrame like this:
0
2011-01-01 00:00:00 -0.324398
2011-01-01 01:00:00 -0.761585
2011-01-01 02:00:00 0.057204
2011-01-01 03:00:00 -1.162510
2011-01-01 04:00:00 -0.680896
2011-01-01 05:00:00 -0.701835
2011-01-01 06:00:00 -0.431338
2011-01-01 07:00:00 0.306935
2011-01-01 08:00:00 -0.503177
2011-01-01 09:00:00 -0.507444
2011-01-01 10:00:00 0.230590
2011-01-01 11:00:00 -2.326702
2011-01-01 12:00:00 -0.034664
2011-01-01 13:00:00 0.224373
2011-01-01 14:00:00 -0.242884
If you want the index to be by year month and date then just set_index it:
df.set_index([ts.index.year, ts.index.month, ts.index.day])
Output
0
2011 1 1 -0.324398
1 -0.761585
1 0.057204
1 -1.162510
1 -0.680896
1 -0.701835
1 -0.431338
1 0.306935
1 -0.503177
1 -0.507444
1 0.230590
1 -2.326702
1 -0.034664
1 0.224373
1 -0.242884
1 -0.134757
1 -1.177362
1 0.931335
1 0.904084
1 -0.757860
1 0.406597
1 -0.664150
I have a dataframe that has a date column and an hour column.
DATE HOUR
2015-1-1 1
2015-1-1 2
. .
. .
. .
2015-1-1 24
I want to convert these columns into a datetime format something like:
2015-12-26 01:00:00
You could first convert df.DATE to datetime column and add df.HOUR delta via timedelta64[h]
In [10]: df
Out[10]:
DATE HOUR
0 2015-1-1 1
1 2015-1-1 2
2 2015-1-1 24
In [11]: pd.to_datetime(df.DATE) + df.HOUR.astype('timedelta64[h]')
Out[11]:
0 2015-01-01 01:00:00
1 2015-01-01 02:00:00
2 2015-01-02 00:00:00
dtype: datetime64[ns]
Or, use pd.to_timedelta
In [12]: pd.to_datetime(df.DATE) + pd.to_timedelta(df.HOUR, unit='h')
Out[12]:
0 2015-01-01 01:00:00
1 2015-01-01 02:00:00
2 2015-01-02 00:00:00
dtype: datetime64[ns]