pandas to_datetime not working - python
I can't seem to apply to_datetime to a pandas dataframe column, although I've done it dozens of times in the past. The following code tells me that any random value in the "Date Time" column is a string, after I try to convert it to a timestamp. The 'errors=coerce' should convert any parsing errors to 'NaT', but instead I still have '2015-10-10 12:31:04' as a string.
import pandas as pd
df=pd.read_csv(...)
df["Date Time"]=pd.to_datetime(df["Date Time"],errors="coerce")
print str(type(df["Date Time"][9]))+" 1"##########
Why would pandas not raise an error, or not convert parsing errors to 'NaT'?
Here are a few rows of the csv. The real file has a million rows coming from different sources, so it is possible that date formatting is not uniform, however in that case I would expect datetime to return 'NaT' or raise an error, depending on the error argument.
Accuracy,Activity,Altitude,Bearing,Date Time,Date(GMT),Description,Distance,Latitude,Longitude,Name,Speed,_FileNames,datenum
,,null,,,,,,sj,,,,C:/Users/Alexis/Dropbox/Location/Path Tracking Lite/aacy.csv,17054710926
0.0,,0.0,0.0,,,,0.00292115,50.67713796,4.61960233,,4.5,C:/Users/Alexis/Dropbox/Location/Path Tracking Lite/aars.csv,17054710926
0.0,,0.0,0.0,2015-01-31 15:10:,,,0.00404488,39.91572515,116.43714731,,5.4,C:/Users/Alexis/Dropbox/Location/Path Tracking Lite/abch.csv,17054710926
0.0,Walk/Run,0.0,0.0,2015-01-11 10:36:22,,,0,39.94002308,116.43548671,tfdeddd,0.0,C:/Users/Alexis/Dropbox/Location/Path Tracking Lite/abbj.csv,20150111
0.0,Walk/Run,0.0,0.0,2015-01-11 10:36:24,,,0.00968132,39.93998097,116.43558673,,2.7,C:/Users/Alexis/Dropbox/Location/Path Tracking Lite/abbj.csv,20150111
0.0,Walk/Run,0.0,0.0,2015-01-11 10:36:26,,,0.00768588,39.94003147,116.43552386,,4.5,C:/Users/Alexis/Dropbox/Location/Path Tracking Lite/abbj.csv,20150111
0.0,Walk/Run,0.0,0.0,2015-01-11 10:36:28,,,0.00239565,39.94007265,116.43551403,,3.6,C:/Users/Alexis/Dropbox/Location/Path Tracking Lite/abbj.csv,20150111
Related
Solutions not working for 'Out of bounds nanosecond timestamp' error with datetime in pandas
Using python and pandas, after importing data, a date column defaults to the object datatype, so in trying to convert it to datetime, I get the error 'OutOfBoundsDatetime: Out of bounds nanosecond timestamp'. Having searched previous questions with the same error message, my issue is not being resolved. All my dates are in between 2021-01-01 and 2022-12-31 in a pandas column. All data has the format yyyy-mm-dd i.e. no time. Giving both the time range (2021-22) and time is not included in the date, I am not sure why this error is arising. I don't want to use the workaround: column_name = pd.to_datetime(column_name, errors = 'coerce') as I need to preserve all my dates, and besides, my data is within the permitted range.
Convert timestamp to datetime for a Vaex dataframe
I have a parquet file that I have loaded as a Vaex dataframe. The parquette file has a column for a timestamp in the format 2022-10-12 17:10:00+00:00. When I try to do any kind of analysis with my dataframe I get the following error. KeyError: "Unknown variables or column: 'isna(timestamp)'" When I remove that column everything works. I assume that the time column is not in the correct format. But I have been having trouble converting it. I tryed df['timestamp']= pd.to_datetime(df['timestamp'].astype(str)) but I get the error <class 'vaex.expression.Expression'> is not convertible to datetime I assume I can't mix pandas and vaex. I am also having trouble producing a minimal reproducible example since when I create a dataframe, the datatime column would be a string and everthing works fine.
Convert date from xlsx dataset from YYYY.TEXT (e.g. 2012.916667) to a normal date format (e.g. 01/01/2012)
I've read in a xlsx file using pandas.read_excel and the dates on the dataset have come in like 2012.916667 for example. I can't figure out what the actual dates are as I don't have them so I'm not sure what the numbers mean. Anyone know how to convert them to normal dates? Thanks!
You can convert it in the regular pandas Timestamp data format like so import pandas as pd pd.to_datetime(2012.916667, unit='d', origin='1970-01-01') # if the dates are loaded in a column, say, dates pd.to_datetime(df['dates'], unit='d', origin='1970-01-01') where the assumption is that the integer part is the number of days since the epoch (origin), and the decimal part is the percentage of day. Since the data is coming from an excel file, the above assumptions are probably correct. Still, you should first get it confirmed from the data owner and use the appropriate parameters in the pandas function.
Pandas timestamps in ISO format cause Exasol error when importing
When using pyexasol's import_from_pandas(df) for a DataFrame, df, which has a datetime column, Exasol (6.2) throws an error because it can't parse the ISO-formatted string representation of the dataframe column. Specifically, the "+00:00" final characters are unparsable by Exasol. My current workaround is to turn all pandas datetime columns into string columns, but that can cost a lot of time. What's the right way to import datetime columns from Pandas dataframes into an existing Exasol table with a TIMESTAMP column type?
PyEXASOL creator is here. You may use import_params dictionary argument to pass additional parameters to pandas.to_csv() method which is used internally. One of such parameters is date_format. Just pass the right format compatible with Exasol. I'll consider adding this parameter by default. Hope it helps!
Python Pandas to_datetime Out of bounds nanosecond timestamp on a pandas.datetime
I am using Python 2--I am behind moving over my code--so perhaps this issue has gone away. Using pandas, I can create a datetime like this: import pandas as pd big_date= pd.datetime(9999,12,31) print big_date 9999-12-31 00:00:00 big_date2 = pd.to_datetime(big_date) . . . Out of bounds nanosecond timestamp: 9999-12-31 00:00:00 I understand the reason for the error in that there are obviously too many nanoseconds in a date that big. I also know that big_date2 = pd.to_datetime(big_date, errors='ignore') would work. However, in my situation, I have a column of what are supposed to be dates (read from SQL server) and I do indeed want it to change invalid data/dates to NaT. In effect, I was using pd.to_datetime as a validity check. To Pandas, on the one hand, 9999-12-31 is a valid date, and on the other, it's not. That means I can't use it and have had to come up with something else. I've played around with the arguments in pandas to_datetime and not been able to solve this. I've looked at other questions/problems of this nature, and not found an answer.
I have a similar issue and was able to find a solution. I have a pandas dataframe with one column that contains a datetime (retrieved from a database table where the column was a DateTime2 data type), but I need to be able to represents date that are further in the future than the Timestamp.max value. Fortunately, I didn't need to worry about the time part of the datetime column - it was actually always 00:00:00 (I didn't create the database design and, yes, it probably should have been a Date data type and not a DateTime2 data type). So I was able to get round the issue by converting the pandas dataframe column to just a date type. For example: for i, row in df.iterrows(): df.set_value(i, 'DateColumn', datetime.datetime(9999, 12, 31).date()) sets all of the values in the column to the date 9999-12-31 and you don't receive any errors when using this column anymore. So, if you can afford to lose the time part of the date you are trying to use you can work round the limitation of the datetime values in the dataframe by converting to a date.