|Event Date|startTime|
|----------|---------|
|2022-11-23|0 days 08:30:00|
when i was tring to get data a sql table to dataframe using variables from columns of other dataframe
it came like this i want it only the time 08:30:00 what to do to get the required output
output I need is like this
|Event Date|startTime|
|----------|---------|
|2022-11-23|08:30:00|
i tried
sql['startTime']=pd.to_datetime(df1['startTime']).dt.time
it is showing this error
TypeError: <class 'datetime.time'> is not convertible to datetime
tried finding for it be didn't get anything useful solution but came across the opposite situation question still not useful info present in the question for my situation
Add the timedelta to a datetime, then you have a time component. Ex:
import pandas as pd
df = pd.DataFrame({"Event Date": ["2022-11-23"],
"startTime": ["0 days 08:30:00"]})
# ensure correct datatypes; that might be not necessary in your case
df["Event Date"] = pd.to_datetime(df["Event Date"])
df["startTime"] = pd.to_timedelta(df["startTime"])
# create a datetime Series that includes the timedelta
df["startDateTime"] = df["Event Date"] + df["startTime"]
df["startDateTime"].dt.time
0 08:30:00
Name: startDateTime, dtype: object
Related
Context
I have a Pandas Series containing Dates in a String format (e.g. 2017-12-19 09:35:00). My goal is to convert this Series into Timestamps (Time in Seconds since 1970).
The difficulty is, that some Values in this Series are corrupt and cannot be converted to a Timestamp. In that case, they should be converted to None.
Code
import datetime
series = series.apply(lambda x: datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S").timestamp())
Question
The code above would work when all Values are in the correct format, however there is corrupt data.
How can I achieve my goal while converting all not-convertible data to None?
Pandas typically represents invalid timestamps with NaT (Not a Time). You can use pd.to_datetime with errors="coerce":
import pandas as pd
series = pd.Series(["2023-01-07 12:34:56", "error"])
out = pd.to_datetime(series, format="%Y-%m-%d %H:%M:%S", errors="coerce")
output:
0 2023-01-07 12:34:56
1 NaT
dtype: datetime64[ns]
Create a function with try except, like this:
def to_timestamp(x):
try:
return datetime.datetime.strptime(x, "%Y-%m-%d %H:%M:%S").timestamp()
except:
return None
series = series.apply(to_timestamp)
I have a column of dates in the following format:
Jan-85
Apr-99
Nov-01
Feb-65
Apr-57
Dec-19
I want to convert this to a pandas datetime object.
The following syntax works to convert them:
pd.to_datetime(temp, format='%b-%y')
where temp is the pd.Series object of dates. The glaring issue here of course is that dates that are prior to 1970 are being wrongly converted to 20xx.
I tried updating the function call with the following parameter:
pd.to_datetime(temp, format='%b-%y', origin='1950-01-01')
However, I am getting the error:
Name: temp, Length: 42537, dtype: object' is not compatible with origin='1950-01-01'; it must be numeric with a unit specified
I tried specifying a unit as it said, but I got a different error citing that the unit cannot be specified alongside a format.
Any ideas how to fix this?
Just #DudeWah's logic, but improving upon the code:
def days_of_future_past(date,chk_y=pd.Timestamp.today().year):
return date.replace(year=date.year-100) if date.year > chk_y else date
temp = pd.to_datetime(temp,format='%b-%y').map(days_of_future_past)
Output:
>>> temp
0 1985-01-01
1 1999-04-01
2 2001-11-01
3 1965-02-01
4 1957-04-01
5 2019-12-01
6 1965-05-01
Name: date, dtype: datetime64[ns]
Gonna go ahead and answer my own question so others can use this solution if they come across this same issue. Not the greatest, but it gets the job done. It should work until 2069, so hopefully pandas will have a better solution to this by then lol
Perhaps someone else will post a better solution.
def wrong_date_preprocess(data):
"""Correct date issues with pre-1970 dates with whacky mon-yy format."""
df1 = data.copy()
dates = df1['date_column_of_interest']
# use particular datetime format with data; ex: jan-91
dates = pd.to_datetime(dates, format='%b-%y')
# look at wrongly defined python dates (pre 1970) and get indices
date_dummy = dates[dates > pd.Timestamp.today().floor('D')]
idx = list(date_dummy.index)
# fix wrong dates by offsetting 100 years back dates that defaulted to > 2069
dummy2 = date_dummy.apply(lambda x: x.replace(year=x.year - 100)).to_list()
dates.loc[idx] = dummy2
df1['date_column_of_interest'] = dates
return(df1)
I am trying to import a dataframe from a spreadsheet using pandas and then carry out numpy operations with its columns. The problem is that I obtain the error specified in the title: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value.
The reason for this is that my dataframe contains a column with dates, like:
ID Date
519457 25/02/2020 10:03
519462 25/02/2020 10:07
519468 25/02/2020 10:12
... ...
And Numpy requires the format to be floating point numbers, as so:
ID Date
519457 43886.41875
519462 43886.42153
519468 43886.425
... ...
How can I make this change without having to modify the spreadsheet itself?
I have seen a lot of posts on the forum asking the opposite, and asking about the error, and read the docs on xlrd.xldate, but have not managed to do this, which seems very simple.
I am sure this kind of problem has been dealt with before, but have not been able to find a similar post.
The code I am using is the following
xls=pd.ExcelFile(r'/home/.../TwoData.xlsx')
xls.sheet_names
df=pd.read_excel(xls,"Hoja 1")
df["E_t"]=df["Date"].diff()
Any help or pointers would be really appreciated!
PS. I have seen solutions that require computing the exact number that wants to be obtained, but this is not possible in this case due to the size of the dataframes.
You can convert the date into the Unix timestamp. In python, if you have a datetime object in UTC, you can the timestamp() to get a UTC timestamp. This function returns the time since epoch for that datetime object.
Please see an example below-
from datetime import timezone
dt = datetime(2015, 10, 19)
timestamp = dt.replace(tzinfo=timezone.utc).timestamp()
print(timestamp)
1445212800.0
Please check the datetime module for more info.
I think you need:
#https://stackoverflow.com/a/9574948/2901002
#rewritten to vectorized solution
def excel_date(date1):
temp = pd.Timestamp(1899, 12, 30) # Note, not 31st Dec but 30th!
delta = date1 - temp
return (delta.dt.days) + (delta.dt.seconds) / 86400
df["Date"] = pd.to_datetime(df["Date"]).pipe(excel_date)
print (df)
ID Date
0 519457 43886.418750
1 519462 43886.421528
2 519468 43886.425000
In a train data set, datetime column is an object . First row of this column : 2009-06-15 17:26:21 UTC . I tried splitting the data
train['Date'] = train['pickup_datetime'].str.slice(0,11)
train['Time'] = test['pickup_datetime'].str.slice(11,19)
So that I can split the Date and time as two variables and change them to datetime data type. Tried lot of methods but could not get the result.
train['Date']=pd.to_datetime(train['Date'], format='%Y-%b-%d')
Also tried spliting the date,time and UTC
train['DateTime'] = pd.to_datetime(train['DateTime'])
Please suggest a code for this. I am a begginer.
Thanks in advance
I would try the following
import pandas as pd
#create some random dates matching your formatting
df = pd.DataFrame({"date": ["2009-06-15 17:26:21 UTC", "2010-08-16 19:26:21 UTC"]})
#convert to datetime objects
df["date"] = pd.to_datetime(df["date"])
print(df["date"].dt.date) #returns the date part without tz information
print(df["date"].dt.time) #returns the time part
Output:
0 2009-06-15
1 2010-08-16
Name: date, dtype: object
0 17:26:21
1 19:26:21
Name: date, dtype: object
For further information feel free to consult the docs:
dt.date
dt.time
For your particular case:
#convert to datetime object
df['pickup_datetime']= pd.to_datetime(df['pickup_datetime'])
# seperate date and time
df['Date'] = df['pickup_datetime'].dt.date
df['Time'] = df['pickup_datetime'].dt.time
I have a column in my dataframe df_events called 'Program Date Time.' I've successfully created separate columns for EventDate and EventTime based on this using df_events.ProgramDateTime.dt.date and df_events.ProgramDateTime.dt.time.
My problem occurs when I try to select records between two dates. I seem to run into all kinds of type errors whatever I try.
I'm a relatively new Python/pandas user, just recently familiar with dataframes. I'm using Python3.7.
I have tried using strptime and even just trying to select records based on the original column ProgramDateTime.
I'm also writing this code in Sublime Text
import pandas as pd, numpy as np
from datetime import datetime
File_Path = 'path'
Event_csv = 'file.csv'
df_event = pd.read_csv(File_Path+Event_csv)
# Indicate analysis period
StartDate = datetime.strptime('2018-08-08', '%Y-%m-%d')
EndDate = datetime.strptime('2019-07-01', '%Y-%m-%d')
# Change appropriate column in Events dataframe to make sure in Datetime format.
df_event['ProgramDateTime'] = pd.to_datetime(df_event['ProgramDateTime'])
#Create separate columns for Event Date and Time in dataframe
df_event['EventDate'], df_event['EventTime'] = df_event.ProgramDateTime.dt.date, df_event.ProgramDateTime.dt.time
# Create dataframe of programs occurring only during analysis period
df_event_ap = df_event[df_event['EventDate']>=StartDate and df_event['EventDate']<=EndDate]
print(df_event_ap.dtypes)
print(df_even_ap.head(11))
I expect to see a new dataframe, df_events_ap, containing only those records that are between StartDate and EndDate.
Instead, the problem happens just as Python's supposed to select the records (the code underneath the last comment (#) line.)
I get this error:
TypeError: can't compare datetime.datetime to datetime.date
The first thing that I can spot is your df_event['EventDate']. This needs to be once again converted into datetime format:
df_event['EventDate'] = pd.to_datetime(df_event['EventDate'])
Then do:
from datetime import datetime
StartDate = datetime.strptime('2018-08-08', '%Y-%m-%d')
EndDate = datetime.strptime('2019-07-01', '%Y-%m-%d')
Now that StartDate, EndDate and df_event['EventDate'] are all in the same format, you have to do:
df_event_ap = df_event[(df_event['EventDate']>=StartDate) & (df_event['EventDate']<=EndDate)]
You will now get your both outputs:
Output for first print statement:
print(df_event_ap.dtypes)
ProgramDateTime datetime64[ns]
EventDate datetime64[ns]
EventTime object
dtype: object
Output for second print statement:
print(df_event_ap.head(11))
ProgramDateTime EventDate EventTime
0 2018-12-20 12:46:52 2018-12-20 12:46:52
2 2018-12-25 12:46:52 2018-12-25 12:46:52
4 2018-11-20 12:46:52 2018-11-20 12:46:52
5 2018-12-10 12:46:52 2018-12-10 12:46:52