I have a column that is unix timestamps. I want to convert this column to just dates in a %y-%m-%d format. Just to test the to_datetime() function I did the below, which works as expected and gives me the column in a format like this 2015-05-12 00:11:30 :
df['time'] = pd.to_datetime(df['time'], unit='s')
When I add in the format argument Like below, I get an error:
df['time'] = pd.to_datetime(df['time'], unit='s', format='%d/%m/%Y')
The error is ValueError: time data 1431389490 does not match format '%d/%m/%Y'
How can I strip off the hours, minutes and seconds so I am only left with 2014-05-12?
If you want to extract just the date, you can do that in a second step after converting to datetime:
x = pd.to_datetime(pd.Series([1431389490]), unit='s')
# Datetime columns have a `.dt` attribute, with useful properties
# and methods for working with dates
x.dt.date
Out[7]:
0 2015-05-12
dtype: object
This will discard the information about hours and minutes, but you will be able to work with the resulting column/series easily because the result is a datetime.date object, e.g. subtracting to find the number of days between your column and a certain date.
If you want to keep the information about hours and minutes, but only display it differently, I'm not sure that's possible.
Related
I have both Date and Time which is being imported from MySQL. The Date column is object type while Time is timedelta64[ns] type. I wanted to combine them and put it as an index column on the DataFrame, so that I could put it as x-axis labels in the graphs. I tried a lot of ways but nothing seems to work out for me. Is there any way to do this effectively?
We first have to cast the object type to a datetime object:
df['Date'] = pd.to_datetime(df['Date'])
Then we can just add the time difference to the date by using:
df['Datetime'] = df['Date'] + df['Time']
I am aware there are multiple answers for string/object to date time conversion. I have tried most of them, but still not able to get the desired result.
I have date in format 2024-08-01 00:00:00.0000000 and I want only the date part 2024-08-01 format.
My dataframe is in format: Date is of type object
Date
2024-08-01 00:00:00.0000000
2024-09-01 00:00:00.0000000
Using the answers provided in stackoverflow, I performed:
from dateutil.parser import parse
df['DATE'] = df['DATE'].apply(lambda x : parse(x)) #This will give in format 2024-08-01 00:00:00 of dtype datetime.
Then I use strftime to convert it into %Y-%m-%d format.
def tz_datetime(date_tz):
date_obj = dt.datetime.strftime(date_tz, '%Y-%m-%d')
return date_obj
df['DATE'] = df['DATE'].apply(tz_datetime)
My DATE column is of object dtype now.
df.head() gives me:
DATE
2024-08-01
2024-09-01
Then I use pd.to_datetime to convert it into datetime object.
df['DATE'] = pd.to_datetime(df['DATE'], format="%Y-%m-%d")
I also use floor() option to get the date in required date format.
df['DATE'] = df['DATE'].dt.floor('d')
Now the DATE dtype in datetime64[ns].
and df.head() also shows the required format.
But, when I export it to csv format or I display(df) again my DATE column shows different format.
DATE
2024-10-01T00:00:00.000+0000
2024-08-01T00:00:00.000+0000
2024-07-01T00:00:00.000+0000
2024-06-01T00:00:00.000+0000
2017-10-01T00:00:00.000+0000
I have exhausted all option to get this date in "%Y-%m-%d" format. Can someone please explain me what am I doing wrong, why this date format shows correct when I do .head() and shows different/actual value of DATE column when I display() the dataframe.
If you are not already using it the date_format parameter might be what you're missing.
df.to_csv(filename, date_format='%Y-%m-%d')
https://stackoverflow.com/a/22798849/8328420
As you have demonstrated above, truncating the data to just 'Y-m-d' still gives the longer format 'Y-m-d H:M:S.f' (and an extra zero) when exporting because the date type hasn't changed. This means the output format for the date type hasn't changed either.
Also a side note: You want to avoid overwriting the data with strftime() just to get a different format as you may need the lost data in later analysis.
Two of the columns in my dataset are hour and mins as integers. Here's a snippet of the dataset.
I'm creating a timestamp through the following code:
TIME = pd.to_timedelta(df["hour"], unit='h') + pd.to_timedelta(df["mins"], unit='m')
#df['TIME'] = TIME
df['TIME'] = TIME.astype(str)
I convert TIME to string format because I'm exporting the dataframe to MS Excel which doesn't support timedelta format.
Now I want timestamps for every minute.
For that, I want to fill the missing minutes and add zero to the TOTAL_TRADE_RATE against them, for which I first have to set the TIME column as index. I'm applying this:
df = df.set_index('TIME')
df.index = pd.DatetimeIndex(df.index)
df.resample('60s').sum().reset_index()
but it's giving the following error:
Unknown string format: 0 days 09:33:00.000000000
I have a column in my dataframe that lists time in HH:MM:SS. When I run dtype on the column, it comes up with dtype('o') and I want to be able to use it as the x-axis for plotting some of my other signals. I saw previous documentation on using to_datetime and tried to use that to convert it to a usable time format for matplotlib.
Used pandas version is 0.18.1
I used:
time=pd.to_datetime(df.Time,format='%H:%M:%S')
where the output then becomes:
time
0 1900-01-01 00:00:01
and is carried out for the rest of the data points in the column.
Even though I specified just hour,minutes,and seconds I am still getting date. Why is that? I also tried
time.hour()
just to extract the hour portion but then I get an error that it doesn't have an 'hour' attribute.
Any help is much appreciated! Thanks!
Now in 2019, using pandas 0.25.0 and Python 3.7.3.
(Note : Edited answer to take plotting in account)
Even though I specified just hour,minutes,and seconds I am still getting date. Why is that?
According to pandas documentation I think it's because in a pandas Timestamp (equivalent of Datetime) object, the arguments year, month and day are mandatory, while hour, minutes and seconds are optional.
Therefore if you convert your object-type object in a Datetime, it must have a year-month-day part - if you don't indicate one, it will be the default 1900-01-01.
Since you also have a Date column in your sample, you can use it to have a datetime column with the right dates that you can use to plot :
import pandas as pd
df['Time'] = df.Date + " " + df.Time
df['Time'] = pd.to_datetime(df['Time'], format='%m/%d/%Y %H:%M:%S')
df.plot('Time', subplots=True)
With this your 'Time' column will display values like : 2016-07-25 01:12:07 and its dtype is datetime64[ns].
That being said, IF you plot day by day and you only want to compare times within a day (and not dates+times), having a default date does not seem bothering as long as it's the same date for all times - the times will be correctly compared on a same day, be it a wrong one.
And in the least likely case you would still want a time-only column, this is the reverse operation :
import pandas as pd
df['Time-only'] = pd.to_datetime(df['Time'], format='%H:%M:%S').dt.time
As explained before, it doesn't have a date (year-month-day) so it cannot be a datetime object, therefore this column will be in Object format.
You can extract a time object like:
import pandas as pd
df = pd.DataFrame([['12:10:20']], columns={"time": "item"})
time = pd.to_datetime(df.time, format='%H:%M:%S').dt.time[0]
After which you can extract desired properties as:
hour = time.hour
(Source)
How can I drop rows from Dataframe df if the dates associated with df['maturity_dt'] are less that today's date?
I am currently doing the following:
todays_date = datetime.date.today()
datenow = datetime.datetime.combine(todays_date, datetime.datetime.min.time()) #Converting to datetime
for (i,row) in df.iterrows():
if datetime.datetime.strptime(row['maturity_dt'], '%Y-%m-%d %H:%M:%S.%f') < datenow):
df.drop(df.index[i])
However, its taking too long and I was hoping to do something like: df = df[datetime.datetime.strptime(df['maturity_dt'], '%Y-%m-%d %H:%M:%S.%f') < datenow, but this results in the error TypeError: must be str, not Series
Thank You
Haven't tried it but maybe the pandas native functions will iterate faster. Something like:
df['dt']=pandas.Datetimeindex(df['maturity_dt'])
newdf=df.loc[df['dt']<=todays_date].copy()
Instead of parsing the date in each row, you could format your comparison date in the same format as these dates are stored and then you could just do a string comparison.
Also, if there is a way to drop multiple rows in a single call, you could use your loop just to gather the indices of those rows to be dropped, then use that call to drop them in bunches.