I have a dataframe column full datetime type that are in the format
2016Oct03:14:38:33
Right now, the data type of this column of the dataframe is String. I would like to convert it into datetime in order to be able perform some numerical operations like subtractions on them. I have tried specifying the format while using pd.to_datetime but as the time is in a 24 hr format, it is throwing up an error. What is the best way to do this? Thanks in advance!
There doesn't seem to be anything unusual about the time format at all; 24 hour is absolutely standard.
Just the normal strptime is fine:
datetime.strptime(my_date, '%Y%b%d:%H:%M:%S')
You need to_datetime with parameter format:
df = pd.DataFrame({'dates':['2016Oct03:14:38:33',
'2016Oct03:14:38:33',
'2016Oct03:14:38:33']})
print (df)
dates
0 2016Oct03:14:38:33
1 2016Oct03:14:38:33
2 2016Oct03:14:38:33
df.dates = pd.to_datetime(df.dates, format='%Y%b%d:%H:%M:%S')
print (df)
dates
0 2016-10-03 14:38:33
1 2016-10-03 14:38:33
2 2016-10-03 14:38:33
Duplicated question
Use datetime.strptime
Ex:
from datetime import datetime
date_object = datetime.strptime('2016Oct03:14:38:33', '%Y%b%d:%H:%M:%S')
Doc : https://docs.python.org/2/library/datetime.html
Related
I am calling some financial data from an API which is storing the time values as (I think) UTC (example below):
enter image description here
I cannot seem to convert the entire column into a useable date, I can do it for a single value using the following code so I know this works, but I have 1000's of rows with this problem and thought pandas would offer an easier way to update all the values.
from datetime import datetime
tx = int('1645804609719')/1000
print(datetime.utcfromtimestamp(tx).strftime('%Y-%m-%d %H:%M:%S'))
Any help would be greatly appreciated.
Simply use pandas.DataFrame.apply:
df['date'] = df.date.apply(lambda x: datetime.utcfromtimestamp(int(x)/1000).strftime('%Y-%m-%d %H:%M:%S'))
Another way to do it is by using pd.to_datetime as recommended by Panagiotos in the comments:
df['date'] = pd.to_datetime(df['date'],unit='ms')
You can use "to_numeric" to convert the column in integers, "div" to divide it by 1000 and finally a loop to iterate the dataframe column with datetime to get the format you want.
import pandas as pd
import datetime
df = pd.DataFrame({'date': ['1584199972000', '1645804609719'], 'values': [30,40]})
df['date'] = pd.to_numeric(df['date']).div(1000)
for i in range(len(df)):
df.iloc[i,0] = datetime.utcfromtimestamp(df.iloc[i,0]).strftime('%Y-%m-%d %H:%M:%S')
print(df)
Output:
date values
0 2020-03-14 15:32:52 30
1 2022-02-25 15:56:49 40
I have a dataframe with time column as string and I should convert it to a timestamp only with h:m:sec.ms . Here an example:
import pandas as pd
df=pd.DataFrame({'time': ['02:21:18.110']})
df.time= pd.to_datetime(df.time , format="%H:%M:%S.%f")
df # I get 1900-01-01 02:21:18.110
Without format flag, I get current day 2020-12-16. How can I get the stamp without year-month-day which seemingly always is included. Thanks!
If need processing values later by some datetimelike methods better is convert values to timedeltas by to_timedelta instead times:
df['time'] = pd.to_timedelta(df['time'])
print (df)
time
0 0 days 02:21:18.110000
You need this:
df=pd.DataFrame({'time': ['02:21:18.110']})
df['time'] = pd.to_datetime(df['time']).dt.time
In [1023]: df
Out[1023]:
time
0 02:21:18.110000
I have a column of timestamps that I would like to convert to datetime in my pandas dataframe. The format of the dates is %Y-%m-%d-%H-%M-%S which pd.to_datetime does not recognize. I have manually entered the format as below:
df['TIME'] = pd.to_datetime(df['TIME'], format = '%Y-%m-%d-%H-%M-%S')
My problem is some of the times do not have seconds so they are shorter
(format = %Y-%m-%d-%H-%M).
How can I get all of these strings to datetimes?
I was thinking I could add zero seconds (-0) to the end of my shorter dates but I don't know how to do that.
try strftime and if you want the right format and if Pandas can't recognize your custom datetime format, you should provide it explicetly
from functools import partial
df1 = pd.DataFrame({'Date': ['2018-07-02-06-05-23','2018-07-02-06-05']})
newdatetime_fmt = partial(pd.to_datetime, format='%Y-%m-%d-%H-%M-%S')
df1['Clean_Date'] = (df1.Date.str.replace('-','').apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d-%H-%M-%S'))
.apply(newdatetime_fmt))
print(df1,df1.dtypes)
output:
Date Clean_Date
0 2018-07-02-06-05-23 2018-07-02 06:05:23
1 2018-07-02-06-05 2018-07-02 06:05:00
Date object
Clean_Date datetime64[ns]
I have a data sheet in which issue_d is a date column having values stored in a format - 11-Dec. On clicking any cell of the column, date is coming as 12/11/2018.
But while reading the csv file, issue_d is getting imported as 11-Dec. Year is not getting imported.
How do I get the issue_d column in format- d/m/y?
Code i tried -
import pandas
data=pandas.read_csv('Project_data.csv')
print(data)
checking issue_d column: data['issue_d']
result :
0 11-Dec
1 11-Dec
2 11-Dec
expected:
0 11-Dec-2018
1 11-Dec-2018
2 11-Dec-201
You can use to_datetime with add year to column:
df['issue_d'] = pd.to_datetime(df['issue_d'] + '-2018')
print (df)
issue_d
0 2018-12-11
1 2018-12-11
2 2018-12-11
A more 'controllable' way of getting the data is to first get the datetime from the data frame as normal, and then convert it:
dt = dt.strftime('%Y-%m-%d')
In this case, you'd put %d in front. strftime is a great technique because it allows the most customization when converting a datetime variable, and I used it in my tutorial book - if you're a beginner to python algorithms, you should definitely check it out!
After you do this, you can splice out each individual month, day, and year, and then use
strftime("%B")
to get the string-name of the month (e.g. "February").
Good Luck!
Python 3.6.0
I am importing a file with Unix timestamps.
I’m converting them to Pandas datetime and rounding to 10 minutes (12:00, 12:10, 12:20,…)
The data is collected from within a specified time period, but from different dates.
For our analysis, we want to change all dates to the same dates before doing a resampling.
At present we have a reduce_to_date that is the target for all dates.
current_date = pd.to_datetime('2017-04-05') #This will later be dynamic
reduce_to_date = current_date - pd.DateOffset(days=7)
I’ve tried to find an easy way to change the date in a series without changing the time.
I was trying to avoid lengthy conversions with .strftime().
One method that I've almost settled is to add the reduce_to_date and df['Timestamp'] difference to df['Timestamp']. However, I was trying to use the .date() function and that only works on a single element, not on the series.
GOOD!
passed_df['Timestamp'][0] = passed_df['Timestamp'][0] + (reduce_to_date.date() - passed_df['Timestamp'][0].date())
NOT GOOD
passed_df['Timestamp'][:] = passed_df['Timestamp'][:] + (reduce_to_date.date() - passed_df['Timestamp'][:].date())
AttributeError: 'Series' object has no attribute 'date'
I can use a loop:
x=1
for line in passed_df['Timestamp']:
passed_df['Timestamp'][x] = line + (reduce_to_date.date() - line.date())
x+=1
But this throws a warning:
C:\Users\elx65i5\Documents\Lightweight Logging\newmain.py:60: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
The goal is to have all dates the same, but leave the original time.
If we can simply specify the replacement date, that’s great.
If we can use mathematics and change each date according to a time delta, equally as great.
Can we accomplish this in a vectorized fashion without using .strftime() or a lengthy procedure?
If I understand correctly, you can simply subtract an offset
passed_df['Timestamp'] -= pd.offsets.Day(7)
demo
passed_df=pd.DataFrame(dict(
Timestamp=pd.to_datetime(['2017-04-05 15:21:03', '2017-04-05 19:10:52'])
))
# Make sure your `Timestamp` column is datetime.
# Mine is because I constructed it that way.
# Use
# passed_df['Timestamp'] = pd.to_datetime(passed_df['Timestamp'])
passed_df['Timestamp'] -= pd.offsets.Day(7)
print(passed_df)
Timestamp
0 2017-03-29 15:21:03
1 2017-03-29 19:10:52
using strftime
Though this is not ideal, I wanted to make a point that you absolutely can use strftime. When your column is datetime, you can use strftime via the dt date accessor with dt.strftime. You can create a dynamic column where you specify the target date like this:
pd.to_datetime(passed_df.Timestamp.dt.strftime('{} %H:%M:%S'.format('2017-03-29')))
0 2017-03-29 15:21:03
1 2017-03-29 19:10:52
Name: Timestamp, dtype: datetime64[ns]
I think you need convert df['Timestamp'].dt.date to_datetime, because output of date is python date object, not pandas datetime object:
df=pd.DataFrame({'Timestamp':pd.to_datetime(['2017-04-05 15:21:03','2017-04-05 19:10:52'])})
print (df)
Timestamp
0 2017-04-05 15:21:03
1 2017-04-05 19:10:52
current_date = pd.to_datetime('2017-04-05')
reduce_to_date = current_date - pd.DateOffset(days=7)
df['Timestamp'] = df['Timestamp'] - reduce_to_date + pd.to_datetime(df['Timestamp'].dt.date)
print (df)
Timestamp
0 2017-04-12 15:21:03
1 2017-04-12 19:10:52