Issue with pandas to_datetime function - python

I have a column that is unix timestamps. I want to convert this column to just dates in a %y-%m-%d format. Just to test the to_datetime() function I did the below, which works as expected and gives me the column in a format like this 2015-05-12 00:11:30 :
df['time'] = pd.to_datetime(df['time'], unit='s')
When I add in the format argument Like below, I get an error:
df['time'] = pd.to_datetime(df['time'], unit='s', format='%d/%m/%Y')
The error is ValueError: time data 1431389490 does not match format '%d/%m/%Y'
How can I strip off the hours, minutes and seconds so I am only left with 2014-05-12?

If you want to extract just the date, you can do that in a second step after converting to datetime:
x = pd.to_datetime(pd.Series([1431389490]), unit='s')
# Datetime columns have a `.dt` attribute, with useful properties
# and methods for working with dates
x.dt.date
Out[7]:
0 2015-05-12
dtype: object
This will discard the information about hours and minutes, but you will be able to work with the resulting column/series easily because the result is a datetime.date object, e.g. subtracting to find the number of days between your column and a certain date.
If you want to keep the information about hours and minutes, but only display it differently, I'm not sure that's possible.

Related

Change timedelta64[ns[ to string

I have both Date and Time which is being imported from MySQL. The Date column is object type while Time is timedelta64[ns] type. I wanted to combine them and put it as an index column on the DataFrame, so that I could put it as x-axis labels in the graphs. I tried a lot of ways but nothing seems to work out for me. Is there any way to do this effectively?
We first have to cast the object type to a datetime object:
df['Date'] = pd.to_datetime(df['Date'])
Then we can just add the time difference to the date by using:
df['Datetime'] = df['Date'] + df['Time']

Unable to convert object to Date(Y-m-d) format in Python

I am aware there are multiple answers for string/object to date time conversion. I have tried most of them, but still not able to get the desired result.
I have date in format 2024-08-01 00:00:00.0000000 and I want only the date part 2024-08-01 format.
My dataframe is in format: Date is of type object
Date
2024-08-01 00:00:00.0000000
2024-09-01 00:00:00.0000000
Using the answers provided in stackoverflow, I performed:
from dateutil.parser import parse
df['DATE'] = df['DATE'].apply(lambda x : parse(x)) #This will give in format 2024-08-01 00:00:00 of dtype datetime.
Then I use strftime to convert it into %Y-%m-%d format.
def tz_datetime(date_tz):
date_obj = dt.datetime.strftime(date_tz, '%Y-%m-%d')
return date_obj
df['DATE'] = df['DATE'].apply(tz_datetime)
My DATE column is of object dtype now.
df.head() gives me:
DATE
2024-08-01
2024-09-01
Then I use pd.to_datetime to convert it into datetime object.
df['DATE'] = pd.to_datetime(df['DATE'], format="%Y-%m-%d")
I also use floor() option to get the date in required date format.
df['DATE'] = df['DATE'].dt.floor('d')
Now the DATE dtype in datetime64[ns].
and df.head() also shows the required format.
But, when I export it to csv format or I display(df) again my DATE column shows different format.
DATE
2024-10-01T00:00:00.000+0000
2024-08-01T00:00:00.000+0000
2024-07-01T00:00:00.000+0000
2024-06-01T00:00:00.000+0000
2017-10-01T00:00:00.000+0000
I have exhausted all option to get this date in "%Y-%m-%d" format. Can someone please explain me what am I doing wrong, why this date format shows correct when I do .head() and shows different/actual value of DATE column when I display() the dataframe.
If you are not already using it the date_format parameter might be what you're missing.
df.to_csv(filename, date_format='%Y-%m-%d')
https://stackoverflow.com/a/22798849/8328420
As you have demonstrated above, truncating the data to just 'Y-m-d' still gives the longer format 'Y-m-d H:M:S.f' (and an extra zero) when exporting because the date type hasn't changed. This means the output format for the date type hasn't changed either.
Also a side note: You want to avoid overwriting the data with strftime() just to get a different format as you may need the lost data in later analysis.

Convert string time into DatetimeIndex and then resample

Two of the columns in my dataset are hour and mins as integers. Here's a snippet of the dataset.
I'm creating a timestamp through the following code:
TIME = pd.to_timedelta(df["hour"], unit='h') + pd.to_timedelta(df["mins"], unit='m')
#df['TIME'] = TIME
df['TIME'] = TIME.astype(str)
I convert TIME to string format because I'm exporting the dataframe to MS Excel which doesn't support timedelta format.
Now I want timestamps for every minute.
For that, I want to fill the missing minutes and add zero to the TOTAL_TRADE_RATE against them, for which I first have to set the TIME column as index. I'm applying this:
df = df.set_index('TIME')
df.index = pd.DatetimeIndex(df.index)
df.resample('60s').sum().reset_index()
but it's giving the following error:
Unknown string format: 0 days 09:33:00.000000000

Convert Python object column in dataframe to time without date using Pandas

I have a column in my dataframe that lists time in HH:MM:SS. When I run dtype on the column, it comes up with dtype('o') and I want to be able to use it as the x-axis for plotting some of my other signals. I saw previous documentation on using to_datetime and tried to use that to convert it to a usable time format for matplotlib.
Used pandas version is 0.18.1
I used:
time=pd.to_datetime(df.Time,format='%H:%M:%S')
where the output then becomes:
time
0 1900-01-01 00:00:01
and is carried out for the rest of the data points in the column.
Even though I specified just hour,minutes,and seconds I am still getting date. Why is that? I also tried
time.hour()
just to extract the hour portion but then I get an error that it doesn't have an 'hour' attribute.
Any help is much appreciated! Thanks!
Now in 2019, using pandas 0.25.0 and Python 3.7.3.
(Note : Edited answer to take plotting in account)
Even though I specified just hour,minutes,and seconds I am still getting date. Why is that?
According to pandas documentation I think it's because in a pandas Timestamp (equivalent of Datetime) object, the arguments year, month and day are mandatory, while hour, minutes and seconds are optional.
Therefore if you convert your object-type object in a Datetime, it must have a year-month-day part - if you don't indicate one, it will be the default 1900-01-01.
Since you also have a Date column in your sample, you can use it to have a datetime column with the right dates that you can use to plot :
import pandas as pd
df['Time'] = df.Date + " " + df.Time
df['Time'] = pd.to_datetime(df['Time'], format='%m/%d/%Y %H:%M:%S')
df.plot('Time', subplots=True)
With this your 'Time' column will display values like : 2016-07-25 01:12:07 and its dtype is datetime64[ns].
That being said, IF you plot day by day and you only want to compare times within a day (and not dates+times), having a default date does not seem bothering as long as it's the same date for all times - the times will be correctly compared on a same day, be it a wrong one.
And in the least likely case you would still want a time-only column, this is the reverse operation :
import pandas as pd
df['Time-only'] = pd.to_datetime(df['Time'], format='%H:%M:%S').dt.time
As explained before, it doesn't have a date (year-month-day) so it cannot be a datetime object, therefore this column will be in Object format.
You can extract a time object like:
import pandas as pd
df = pd.DataFrame([['12:10:20']], columns={"time": "item"})
time = pd.to_datetime(df.time, format='%H:%M:%S').dt.time[0]
After which you can extract desired properties as:
hour = time.hour
(Source)

Dropping rows from a Dataframe based on Date

How can I drop rows from Dataframe df if the dates associated with df['maturity_dt'] are less that today's date?
I am currently doing the following:
todays_date = datetime.date.today()
datenow = datetime.datetime.combine(todays_date, datetime.datetime.min.time()) #Converting to datetime
for (i,row) in df.iterrows():
if datetime.datetime.strptime(row['maturity_dt'], '%Y-%m-%d %H:%M:%S.%f') < datenow):
df.drop(df.index[i])
However, its taking too long and I was hoping to do something like: df = df[datetime.datetime.strptime(df['maturity_dt'], '%Y-%m-%d %H:%M:%S.%f') < datenow, but this results in the error TypeError: must be str, not Series
Thank You
Haven't tried it but maybe the pandas native functions will iterate faster. Something like:
df['dt']=pandas.Datetimeindex(df['maturity_dt'])
newdf=df.loc[df['dt']<=todays_date].copy()
Instead of parsing the date in each row, you could format your comparison date in the same format as these dates are stored and then you could just do a string comparison.
Also, if there is a way to drop multiple rows in a single call, you could use your loop just to gather the indices of those rows to be dropped, then use that call to drop them in bunches.

Categories

Resources