to_datetime in pandas changes the date of my datetime data - python

I use the following code to extract the datetime of a .csv file:
house_data = 'test_1house_EV.csv'
house1 = pandas.read_csv(house_data)
time = pandas.to_datetime(house1["localminute"])
The datetime data to be extracted are the 1440 minutes of September 1, 2017.
However, after using to_datetime the minutes between 00:00 and 05:00 are placed on September 2.
e.g. the original data looks like this:
28 2017-09-01 00:28:00-05
29 2017-09-01 00:29:00-05
...
1411 2017-09-01 23:31:00-05
1412 2017-09-01 23:32:00-05
but the datetime data looks like this:
28 2017-09-01 05:28:00
29 2017-09-01 05:29:00
...
1410 2017-09-02 04:30:00
1411 2017-09-02 04:31:00
Does anyone know how to fix this?

Use this, as per #James' suggestion:
pd.to_datetime(house1["localminute"], format='%Y-%m-%d %H:%M:%S-%f')

You can slice off the last three characters of the date string before converting.
pd.to_datetime(house1.localminute.str[:-3])

Related

How to drop multiple rows with datetime index?

I have the pandas data frame as below with a datetime index. The dataframe shows the data for the month of April and May. (The original dataframe has many more columns).
I want to remove all the rows for the month of May i.e. starting from index 2022-05-01 00:00:00 and ending at 2022-05-31 23:45:00. Currently, I am doing it by explicitly mentioning the index labels but I am sure that should be a more sophisticated way to do it without having to mention the index labels so that if the data changes and I want to remove the next month, I don't have to hard code it. I would appreciate help with this.
Current Code:
start_remove = pd.to_datetime('2022-05-01 00:00:00')
end_remove = pd.to_datetime('2022-05-01 23:45:00')
df = df.loc[(df.index < start_remove) | (df.index > end_remove)]
Sample Dataset:
date Open Close High Low
...
2022-04-30 23:30:00 10 11.4 10.2 10.7
2022-04-30 23:45:00 18 17.2 17.2 15.8
2022-05-01 00:00:00 24 24 24.8 24.8
2022-05-01 00:15:00 59 58 60 60.3
2022-05-01 00:30:00 43.7 43.9 48 48
...
...
2022-05-31 23:45:00 41.7 53.9 51 50
you may want to include the year when choosing month, to avoid deleting same month from other year
# assumption: date field is an index
# and is already converted to datetime using pd.to_datetime
df.drop(df.loc[df.index.strftime('%Y%m') == '202205'].index)
converting index to datetime
df.index=pd.to_datetime(df.index)
df

Convert a column to a specific time format which contains different types of time formats in python

This is my data frame
df = pd.DataFrame({
'Time': ['10:00PM', '15:45:00', '13:40:00AM','5:00']
})
Time
0 10:00PM
1 15:45:00
2 13:40:00AM
3 5:00
I need to convert the time format in a specific format which is my expected output, given below.
Time
0 22:00:00
1 15:45:00
2 01:40:00
3 05:00:00
I tried using split and endswith function of str which is a complicated solution. Is there any better way to achieve this?
Thanks in advance!
here you go. One thing to mention though 13:40:00AM will result in an error since 13 is a) wrong format as AM/PM only go from 1 to 12 and b) PM (which 13 would be) cannot at the same time be AM :)
Cheers
import pandas as pd
df = pd.DataFrame({'Time': ['10:00PM', '15:45:00', '01:40:00AM', '5:00']})
df['Time'] = pd.to_datetime(df['Time'])
print(df['Time'].dt.time)
<<< 22:00:00
<<< 15:45:00
<<< 01:45:00
<<< 05:00:00

Rounding up the value to the nearest hour

This is my first time to post a question here, if I don't explain the question very clearly, please give me a chance to improve the way of asking. Thank you!
I have a dataset contains dates and times like this
TIME COL1 COL2 COL3 ...
2018/12/31 23:50:23 34 DC 23
2018/12/31 23:50:23 32 NC 23
2018/12/31 23:50:19 12 AL 33
2018/12/31 23:50:19 56 CA 23
2018/12/31 23:50:19 98 CA 33
I want to create a new column and the format would be like '2018-12-31 11:00:00 PM' instead of '2018/12/31 23:10:23' and 17:40 was rounded up to 6:00
I have tried to use .dt.strftime("%Y-%m-%d %H:%M:%S") to change the format and then when I try to convert the time from 12h to 24h, I stuck here.
Name: TIME, Length: 3195450, dtype: datetime64[ns]
I found out the type of df['TIME'] is pandas.core.series.Series
Now I have no idea about how to continue. Please give me some ideas, hints or any instructions. Thank you very much!
From your example it seems you want to floor to the hour, instead of round? In any case, first make sure your TIME column is of datetime dtype.
df['TIME'] = pd.to_datetime(df['TIME'])
Now floor (or round) using the dt accessor and an offset alias:
df['newTIME'] = df['TIME'].dt.floor('H') # could use round instead of floor here
# df['newTIME']
# 0 2018-12-31 23:00:00
# 1 2018-12-31 23:00:00
# 2 2018-12-31 23:00:00
# 3 2018-12-31 23:00:00
# 4 2018-12-31 23:00:00
# Name: newTIME, dtype: datetime64[ns]
Afer that, you can format to string in a desired format, again using the dt accessor to access properties of a datetime series:
df['timestring'] = df['newTIME'].dt.strftime("%Y-%m-%d %I:%M:%S %p")
# df['timestring']
# 0 2018-12-31 11:00:00 PM
# 1 2018-12-31 11:00:00 PM
# 2 2018-12-31 11:00:00 PM
# 3 2018-12-31 11:00:00 PM
# 4 2018-12-31 11:00:00 PM
# Name: timestring, dtype: object

Add index weekday to pandas dataframe

I have a following data frame, which is indexed by date_time:
date_time rsvp_limit rsvp_yes dropout
2017-11-30 19:00:00 240 229 0.045833
2017-10-19 19:00:00 300 300 0.000000
2017-06-26 19:00:00 300 300 0.000000
When I try to add weekday column to it, somehow it does not seem to succeed:
weekday_dropoouts = events['dropout'].copy()
weekday_dropoouts['weekday'] = weekday_dropoouts.index.weekday_name
weekday_dropoouts[:3]
Gives me:
date_time
2017-11-30 19:00:00 0.0458333
2017-10-19 19:00:00 0
2017-06-26 19:00:00 0
Name: dropout, dtype: object
What I'm trying achieve is to create a bar plot per weekday i.e. basically I'm trying to figure out which weekday the event experiences the highest drop out.
I'm sure I'm missing something fundamental here, but I can't figure out what it is.
Could be a type issue?
Is weekday_dropoouts.index definitely a DatetimeIndex?

Calculating daily average from irregular time series using pandas

I am trying to obtain daily averages from an irregular time series from a csv-file.
The data in the csv-file start at 13:00 on 20 September 2013 and run till 10:57 on 14 January 2014:
Time Values
20/09/2013 13:00 5.133540
20/09/2013 13:01 5.144993
20/09/2013 13:02 5.158208
20/09/2013 13:03 5.170542
20/09/2013 13:04 5.167899
20/09/2013 13:25 5.168780
20/09/2013 13:26 5.179351
...
I import them with:
import pandas as pd
data = pd.read_csv('<file name>', parse_dates={'Timestamp':'Time']},index_col='Timestamp')
This results in
Values
Timestamp
2013-09-20 13:00:00 5.133540
2013-09-20 13:01:00 5.144993
2013-09-20 13:02:00 5.158208
2013-09-20 13:03:00 5.170542
2013-09-20 13:04:00 5.167899
2013-09-20 13:25:00 5.168780
2013-09-20 13:26:00 5.179351
...
And then I do
dataDailyAv = data.resample('D', how = 'mean')
This results in
Values
Timestamp
2013-01-10 8.623744
2013-01-11 NaN
2013-01-12 NaN
2013-01-13 NaN
2013-01-14 NaN
...
In other words, the result contains dates that do not appear in the original data, and for some of these dates (e.g. 10 January 2013), there even appears a value.
Any ideas about what is going wrong?
Thanks.
Edit: apparently something goes wrong with the parsing of the date: 01/10/2013 is interpreted as 10 January 2013 instead of 1 October 2013. This can be solved by editing the date format in the csv-file, but is there a way to specify the date format in read_csv?
You want dayfirst=True, one of the many tweaks listed in the read_csv docs.

Categories

Resources