This question already has answers here:
removing time from date&time variable in pandas?
(3 answers)
Closed last year.
solar["DATE"]= solar['DATE'].strftime('%Y-%m-%d')
display(solar)
I want to remove the time function from the DATE column. I only want the date, how do I get rid of it but keep the date?
[1]: https://i.stack.imgur.com/8G8Jg.png
The error I get is below:
AttributeError: 'Series' object has no attribute 'strftime'
According to the error i think so you are using pandas dataframe and to edit the values you will have to use .apply() function.
You can do it via:
#IF the values are already a datetime object
solar['DATE'].apply(lambda d: d.date())
#ELSE IF dates are a string:
solar['DATE'].apply(lambda d: d.stftime('%Y-%m-%d'))
What I came up with is what follows:
import pandas as pd
import datetime
date = pd.date_range("2018-01-01", periods=500, freq="H")
dataframe = pd.DataFrame({"date":date})
def removeDayTime(date):
dateStr = str(date) # This line is just to change the timestamp format to str. You probably do not need this line to include in your code.
dateWitoutTime = datetime.datetime.strptime(dateStr, "%Y-%m-%d %H:%M:%S").strftime("%Y-%m-%d")
return dateWitoutTime
dataframe["date"] = dataframe["date"].apply(removeDayTime)
dataframe.head()
Note that in order to have example data to work with, I have generated 500 periods of dates. You probably do not need to use my dataframe. So just use the rest of the code.
Output
date
0
2018-01-01
1
2018-01-01
2
2018-01-01
3
2018-01-01
4
2018-01-01
Related
This question already has answers here:
Keep only date part when using pandas.to_datetime
(13 answers)
Closed last month.
Can you please help me with the following issue? When I import a csv file I have a dataframe like smth like this:
df = pd.DataFrame(['29/12/17',
'30/12/17', '31/12/17', '01/01/18', '02/01/18'], columns=['Date'])
What I want is to convert `Date' column of df into Date Time object. So I use the code below:
df['date_f'] = pd.to_datetime(df['Date'])
What I get is smth like this:
df1 = pd.DataFrame({'Date': ['29/12/17', '30/12/17', '31/12/17', '01/01/18', '02/01/18'],
'date_f':['2017-12-29T00:00:00.000Z', '2017-12-30T00:00:00.000Z', '2017-12-31T00:00:00.000Z', '2018-01-01T00:00:00.000Z', '2018-02-01T00:00:00.000Z']})
The question is, why am I getting date_f in the following format ('2017-12-29T00:00:00.000Z') and not just ('2017-12-29') and how can I get the later format ('2017-12-29')?
P.S.
I you use the code above it will the date_f in the format that I need. However, if the data is imported from csv, it provides the date_f format as specified above
use dt.date
df['date_f'] = pd.to_datetime(df['Date']).dt.date
or
df['date_f'] = pd.to_datetime(df['Date'], utc=False)
both cases will get same outputs
Date date_f
0 29/12/17 2017-12-29
1 30/12/17 2017-12-30
2 31/12/17 2017-12-31
3 01/01/18 2018-01-01
4 02/01/18 2018-02-01
This question already has answers here:
Python pandas integer YYYYMMDD to datetime
(2 answers)
Closed 5 months ago.
I am trying to convert a pandas column DateTime (UTC) which seems like this df_1['MESS_DATUM'] = 202209250000 to unixtime. My code looks like this:
df_1['MESS_DATUM'] = calendar.timegm(df_1['MESS_DATUM'].timetuple())
print(datetime.utcfromtimestamp(df_1['MESS_DATUM']))
But I am getting this error "AttributeError: 'Series' object has no attribute 'timetuple'".
I have used the below method as well but my time is in UTC which is why it is not giving me the right unix time I guess:
df_1['MESS_DATUM'] = pd.to_datetime(df_1['MESS_DATUM'])
(df_1['MESS_DATUM'] - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
print(df_1['MESS_DATUM']) #it gives me the following datetime in unix form
1970-01-01 00:03:22.209252150
I tried the above method for a single datetime string as shown below and it works but for the whole datetime column it gives me this value 1970-01-01 00:03:22.209252150
dates = pd.to_datetime(['2022-09-15 13:30:00'])
# calculate unix datetime
dates = (dates - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
print(dates) # Int64Index([1663248600], dtype='int64')
I tried this method as well which gives me again the wrong unixtime
df_1['MESS_DATUM'] = pd.DatetimeIndex ( df_1['MESS_DATUM'] ).astype ( np.int64 )/1000000
print(df_1['MESS_DATUM'])
202.209252 # this is the unixtime I get
Any helpful solution will be highly appreciated.
You could convert the value using the datetime library;
d = 202209250000
import datetime
datetime.datetime.strptime(str(d),'%Y%m%d%H%M').timestamp()
Converting the column can be done using using df.apply;
df = pd.DataFrame({'MESS_DATUM': [202209250000,202209260000,202209270000]})
df['MESS_DATUM'] = df['MESS_DATUM'].apply(lambda x: datetime.datetime.strptime(str(x),'%Y%m%d%H%M').timestamp())
This question already has answers here:
Convert Excel style date with pandas
(3 answers)
Closed 1 year ago.
please I need solution to this problem I have a field that is formatted in e.g 43390 which is general date format in excel. I need to format it to a date like "d/m/yyy"
here is the code I wrote :
trans_ data['DATE'] = pd.to_ datetime(trans_ data['DATE'], format='%d-%m-%Y')
but I have this error:
ValueError: time data '43390' does not match format '%d-%m-%Y' (match)
I tried to convert the "43390" in LibreOffice and it converted it to 2018-10-17 (the origin "0" is "30/12/1899"):
origin = pd.Timestamp("30/12/1899")
df["col"] = df["col"].apply(lambda x: origin + pd.Timedelta(days=x))
print(df)
Prints:
col
0 2018-10-17
1 2019-11-02
df used:
col
0 43390
1 43771
Screenshot:
The following code might help you.
from datetime import timedelta
import pandas as pd
excel_date = '43390'
excel_date = int(excel_date)
python_date = pd.to_datetime('1900-01-01') + timedelta(excel_date-2)
print(python_date)
The python_date object stores the date. Then you can change the format to the format you need.
I have a column of dates in the following format:
Jan-85
Apr-99
Nov-01
Feb-65
Apr-57
Dec-19
I want to convert this to a pandas datetime object.
The following syntax works to convert them:
pd.to_datetime(temp, format='%b-%y')
where temp is the pd.Series object of dates. The glaring issue here of course is that dates that are prior to 1970 are being wrongly converted to 20xx.
I tried updating the function call with the following parameter:
pd.to_datetime(temp, format='%b-%y', origin='1950-01-01')
However, I am getting the error:
Name: temp, Length: 42537, dtype: object' is not compatible with origin='1950-01-01'; it must be numeric with a unit specified
I tried specifying a unit as it said, but I got a different error citing that the unit cannot be specified alongside a format.
Any ideas how to fix this?
Just #DudeWah's logic, but improving upon the code:
def days_of_future_past(date,chk_y=pd.Timestamp.today().year):
return date.replace(year=date.year-100) if date.year > chk_y else date
temp = pd.to_datetime(temp,format='%b-%y').map(days_of_future_past)
Output:
>>> temp
0 1985-01-01
1 1999-04-01
2 2001-11-01
3 1965-02-01
4 1957-04-01
5 2019-12-01
6 1965-05-01
Name: date, dtype: datetime64[ns]
Gonna go ahead and answer my own question so others can use this solution if they come across this same issue. Not the greatest, but it gets the job done. It should work until 2069, so hopefully pandas will have a better solution to this by then lol
Perhaps someone else will post a better solution.
def wrong_date_preprocess(data):
"""Correct date issues with pre-1970 dates with whacky mon-yy format."""
df1 = data.copy()
dates = df1['date_column_of_interest']
# use particular datetime format with data; ex: jan-91
dates = pd.to_datetime(dates, format='%b-%y')
# look at wrongly defined python dates (pre 1970) and get indices
date_dummy = dates[dates > pd.Timestamp.today().floor('D')]
idx = list(date_dummy.index)
# fix wrong dates by offsetting 100 years back dates that defaulted to > 2069
dummy2 = date_dummy.apply(lambda x: x.replace(year=x.year - 100)).to_list()
dates.loc[idx] = dummy2
df1['date_column_of_interest'] = dates
return(df1)
Ultimately I want to calculate the number of days to the last day of the month from every date in df['start'] and populate the 'count' column with the result.
As a first step towards that goal the calendar.monthrange
method takes (year, month) arguments and returns a (first weekday, number of days) tuple.
There seems to be a general mistake regarding applying functions to dataframes or series objects. I would like to understand, why this isn't working.
import numpy as np
import pandas as pd
import calendar
def last_day(row):
return calendar.monthrange(row['start'].dt.year, row['start'].dt.month)
This line raises an AttributeError: "Timestamp object has no attribute 'dt'":
df['count'] = df.apply(last_day, axis=1)
this is what my dataframe looks like:
start count
0 2016-02-15 NaN
1 2016-02-20 NaN
2 2016-04-23 NaN
df.dtypes
start datetime64[ns]
count float64
dtype: object
Remove the .dt. This is generally needed when accessing a vector of some sort. But when accessing an individual element it will already be a datetime object:
Code:
def last_day(row):
return calendar.monthrange(row['start'].year, row['start'].month)
Why:
This apply calls last_day and passes a Series.
df['count'] = df.apply(last_day, axis=1)
In last_day you then select a single element of the series:
row['start'].year
I would do it like this:
from pandas.tseries.offsets import MonthEnd
## sample data
d = pd.DataFrame({'start':['2016-02-15','2016-02-20','2016-04-23']})
## solution
d['start'] = pd.to_datetime(d['start'])
d['end'] = d['start'] + MonthEnd(1)
d['count'] = (d['start'] - d['end']) / np.timedelta64(-1, 'D')