Convert from float to datetime in Python - python

I have a dataframe which datatype is float64 and I want to change it to datetime 64. But the result is return to only one day : 1970-01-01 no matter which method I use. Any help please
df.product_first_sold_date = [41245,0, 37659.0,40487.0,41701.0,40649.0]
dt.cv = pd.to_datetime(df.product_first_sold_date)
dt.cv
dt.cv2 = df.product_first_sold_date.apply(lambda x: datetime.fromtimestamp(x).strftime('%m-%d-%Y') if x==x else None)
dt.cv2

I believe you re dealing with Excel date type which is the number of days since 1900-01-01, as #Dishin pointed out 1899-12-30
# sample data:
df = pd.DataFrame({'date':[41245,37659,40487]})
# convert - adjust 1900-01-01 to the correct day
df['date'] = pd.to_timedelta(df.date, unit='D') + pd.to_datetime('1899-12-30')
Output:
date
0 2012-12-02
1 2003-02-07
2 2010-11-05

Related

Can't convert string object to time

I have a dataframe containing different recorded times as string objects, such as 1:02:45, 51:11, 54:24.
I can't convert to time objects, this is the error I am getting:
"time data '49:49' does not match format '%H:%M:%S"
This is the code I am using:
df_plot2 = df[['year', 'gun_time', 'chip_time']]
df_plot2['gun_time'] = pd.to_datetime(df_plot2['gun_time'], format = '%H:%M:%S')
df_plot2['chip_time'] = pd.to_datetime(df_plot2['chip_time'], format = '%H:%M:%S')
Thanks in advance for your help!
you can create a common format in the time Series by checking string len and adding the hours as zero '00:' where there are only minutes and seconds. Then parse to datetime. Ex:
import pandas as pd
s = pd.Series(["1:02:45", "51:11", "54:24"])
m = s.str.len() <= 5
s.loc[m] = '00:' + s.loc[m]
dts = pd.to_datetime(s)
print(dts)
0 2021-12-01 01:02:45
1 2021-12-01 00:51:11
2 2021-12-01 00:54:24
dtype: datetime64[ns]
I believe it may be because for %H python expects to see 01, 02, 03 etc instead of 1, 2, 3. To use your specific example 1:02:45 may have to be in the 01:02:45 format for python to be able to convert it to a datetime variable with %H:%M:$S.

convert yyyy-mm-dd to mmm-yy in dataframe python

I am trying to convert the way month and year is presented.
I have dataframe as below
Date
2020-01-31
2020-04-30
2021-05-05
and I want to convert it in the way like month and year.
The output that I am expecting is
Date
Jan-20
Apr-20
May-21
I tried to do it with datetime but it doesn't work.
pd.to_datetime(pd.Series(df['Date'),format='%mmm-%yy')
Use .dt.strftime() to change the display format. %b-%y is the format string for Mmm-YY:
df.Date = pd.to_datetime(df.Date).dt.strftime('%b-%y')
# Date
# 0 Jan-20
# 1 Apr-20
# 2 May-21
Or if Date is the index:
df.index = pd.to_datetime(df.index).dt.strftime('%b-%y')
import pandas as pd
date_sr = pd.to_datetime(pd.Series("2020-12-08"))
change_format = date_sr.dt.strftime('%b-%Y')
print(change_format)
reference https://docs.python.org/3/library/datetime.html
%Y-%m-%d changed to ('%b-%y')
import datetime
df['Date'] = df['Date'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d').strftime('%b-%y'))
# reference https://docs.python.org/3/library/datetime.html
# %Y-%m-%d changed to ('%b-%y')

Modifying format of rows values in Pandas Data-frame

I have a dataset of 70000+ data points (see picture)
As you can see, in the column 'date' half of the format is different (more messy) compared to the other half (more clear). How can I make the whole format as the second half of my data frame?
I know how to do it manually, but it will take ages!
Thanks in advance!
EDIT
df['date'] = df['date'].apply(lambda x: dt.datetime.fromtimestamp(int(str(x)) / 1000).strftime('%Y-%m-%d %H:%M:%S') if str(x).isdigit() else x)
Date is in a strange format
[
EDIT 2
two data formats:
2012-01-01 00:00:00
2020-07-21T22:45:00+00:00
I've tried the below and it works, note that this assuming two key assumptions:
1- Your date fromat follows one and ONLY ONE of the TWO formats in your example!
2- The final output is a string!
If so, this should do the trick, else, it's a starting point and can be altered to you want it to look like:
import pandas as pd
import datetime
#data sample
d = {'date':['20090602123000', '20090602124500', '2020-07-22 18:45:00+00:00', '2020-07-22 19:00:00+00:00']}
#create dataframe
df = pd.DataFrame(data = d)
print(df)
date
0 20090602123000
1 20090602124500
2 2020-07-22 18:45:00+00:00
3 2020-07-22 19:00:00+00:00
#loop over records
for i, row in df.iterrows():
#get date
dateString = df.at[i,'date']
#check if it's the undesired format or the desired format
#NOTE i'm using the '+' substring to identify that, this comes to my first assumption above that you only have two formats and that should work
if '+' not in dateString:
#reformat datetime
#NOTE: this is comes to my second assumption where i'm producing it into a string format to add the '+00:00'
df['date'].loc[df.index == i] = str(datetime.datetime.strptime(dateString, '%Y%m%d%H%M%S')) + '+00:00'
else:
continue
print(df)
date
0 2009-06-02 12:30:00+00:00
1 2009-06-02 12:45:00+00:00
2 2020-07-22 18:45:00+00:00
3 2020-07-22 19:00:00+00:00
you can format the first part of your dataframe
import datetime as dt
df['date'] = df['date'].apply(lambda x: dt.datetime.fromtimestamp(int(str(x)) / 1000).strftime('%Y-%m-%d %H:%M:%S') if str(x).isdigit() else x)
this checks if all characters of the value are digits, then format the date as the second part
EDIT
the timestamp seems to be in miliseconds while they should be in seconds => / 1000

Pandas Python: KeyError Date

I am import into python where it will automatically create a date time object.
However I want the first column to be a datetime object in Python. Data looks like
Date,cost
41330.66667,100
41331.66667,101
41332.66667,102
41333.66667,103
Current code looks like:
from datetime import datetime
import pandas as pd
data = pd.read_csv(r"F:\Sam\PJ\CSV2.csv")
data['Date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))
print(data)
This looks like an excel datetime format. This is called a serial date. To convert from that serial date you can do this:
data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))
Which outputs:
>>> data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))
0 2013-02-25 10:00:00.288
1 2013-02-26 10:00:00.288
2 2013-02-27 10:00:00.288
3 2013-02-28 10:00:00.288
To assign it to data['Date'] you just do:
data['Date'] = data['Date'].apply(lambda x: datetime.fromtimestamp( (x - 25569) *86400.0))
#df
Date cost
0 2013-02-25 16:00:00.288 100
1 2013-02-26 16:00:00.288 101
2 2013-02-27 16:00:00.288 102
3 2013-02-28 16:00:00.288 103
Unfortunately, read_csv does not cope with date columns given as numbers.
But the good news is that Pandas does have a suitable function to do it.
After read_csv call:
df.Date = pd.to_datetime(df.Date - 25569, unit='D').dt.round('ms')
As I undestand, your Date is actually the number of days since 30.12.1899
(plus fractional part of the day).
The above "correction factor" (25569) works OK. For Date == 0 it gives
just the above start of Excel epoch date.
Rounding to miliseconds (or maybe even seconds) is advisable.
Otherwise you will get weird effects resulting from inaccurate rounding
of fractional parts of day.
E.g. 0.33333333 corresponding to 8 hours can be computed as
07:59:59.999712.
Well you have two problems here.
We don't know what data and columns the CSV has, but in order for pandas to pick up the date as a column, it must be a column on that csv file.
Apply doesn't work in place. You would have to assign the result of apply back to date, as
data['Date'] = data['Date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))

KeyError: Timestamp when converting date in column to date

Trying to convert the date (type=datetime) of a complete column into a date to use in a condition later on. The following error keeps showing up:
KeyError: Timestamp('2010-05-04 10:15:55')
Tried multiple things but I'm currently stuck with the code below.
for d in df.column:
pd.to_datetime(df.column[d]).apply(lambda x: x.date())
Also, how do I format the column so I can use it in a statement as follows:
df = df[df.column > 2015-05-28]
Just adding an answer in case anyone else ends up here :
firstly, lets create a dataframe with some dates, change the dtype into a string and convert it back. the errors='ignore' argument will ignore any non date time values in your column, so if you had John Smith in row x it would remain, on the same vein, if you changed errors='coerce' it would change John Smith into NaT (not a time value)
# Create date range with frequency of a day
rng = pd.date_range(start='01/01/18', end ='01/01/19',freq='D')
#pass this into a dataframe
df = pd.DataFrame({'Date' : rng})
print(df.dtypes)
Date datetime64[ns]
#okay lets case this into a str so we can convert it back
df['Date'] = df['Date'].astype(str)
print(df.dtypes)
Date object
# now lets convert it back #
df['Date'] = pd.to_datetime(df.Date,errors='ignore')
print(df.dtypes)
Date datetime64[ns]
# Okay lets slice the data frame for your desired date ##
print(df.loc[df.Date > '2018-12-29'))
Date
363 2018-12-30
364 2018-12-31
365 2019-01-01
The answer as provided by #Datanovice:
pd.to_datetime(df['your column'],errors='ignore')
then inspect the dtype it should be a datetime, if so, just do
df.loc[df.['your column'] > 'your-date' ]

Categories

Resources