Error converting string to date field in Pandas - python

As you can infer from the above , When I try to convert the string , it gives error.
Tried below codes but got same error as,day is not defined,
df['day'] = pd.to_datetime(df['day'],format='%d %b %Y %H:%M:%S:%f')
As SO memeber suggested,I edited code but index stills the string, did not convert to day

If you don't want to create another column, then just this will do:
df.index = pd.to_datetime(df.index)

In your example, df['day'] actually appears to be your index. To fix this, you'd want to call pd.to_datetime on your index:
df.index = pd.to_datetime(df.index)
I could tell it was your index because pandas offsets the row height of the columns for the index column and the other columns. Take this example:
df = pd.DataFrame({'a':[1,2,3], 'b':['a','b','c']})
df.set_index('a', inplace=True)
outputs:
b
a
1 a
2 b
3 c

Related

Python Pandas. replace incorrect dates in a column with default when converting

I am trying to convert a date column to datetime64[ns] and I am using the following line
df["Date"] = df["Date"].astype('datetime64[ns]',errors="ignore")
If I don't use ignore key, the script crashes at this line.
Now some of the values in this column are incorrect and they may nor may not be date at all.
How do I convert all the date values to the date format and replace all non date values to
Edit: The coerce doesn't work anymore as per the compiler, only options are raise and ignore
Use to_datetime with errors='coerce' and then replace missing values:
datetime = '2000-01-01'
df["Date"] = pd.to_datetime(df["Date"], errors='coerce').fillna(pd.Timestamp(datetime))
If possible missing values in column and need not convert them:
datetime = '2000-01-01'
m = df["Date"].notna()
df.loc[m, "Date"] = (pd.to_datetime(df.loc[m, "Date"], errors='coerce')
.fillna(pd.Timestamp(datetime)))

Filtering DataFrame by month [duplicate]

Hi I am using pandas to convert a column to month.
When I read my data they are objects:
Date object
dtype: object
So I am first making them to date time and then try to make them as months:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Also if that helps:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
So, the error I get is like this:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
EDITED:
Date columns are like this:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03
5 2014-01-03
6 2014-01-03
7 2014-01-07
8 2014-01-08
9 2014-01-09
Do you have any ideas?
Thank you very much!
Your problem here is that to_datetime silently failed so the dtype remained as str/object, if you set param errors='coerce' then if the conversion fails for any particular string then those rows are set to NaT.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
So you need to find out what is wrong with those specific row values.
See the docs
First you need to define the format of date column.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
For your case base format can be set to;
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
After that you can set/change your desired output as follows;
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')
Your problem here is that the dtype of 'Date' remained as str/object. You can use the parse_dates parameter when using read_csv
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', parse_dates= [col],encoding='utf-8-sig', usecols= ['Date', 'ids'],)
df['Month'] = df['Date'].dt.month
From the documentation for the parse_dates parameter
parse_dates : bool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.
Note: A fast-path exists for iso8601-formatted dates.
The relevant case for this question is the "list of int or names" one.
col is the columns index of 'Date' which parses as a separate date column.
#Convert date into the proper format so that date time operation can be easily performed
df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"])
# Cal Year
df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')
train_data=pd.read_csv("train.csv",parse_dates=["date"])
I encountered a similar problem when trying to use pd.Series.dt.floor, although all the elements in my pd.Series were datetime.datetime instances (absolutely no NAs). I suspect it had to do with having tz-aware instances with different timezones.
My workaround, in order to take advantage of the pd.Timestamp.floor method was to define the following function:
def floor_datetime(base_datetime_aware, freq="2H"):
return pd.Timestamp(base_datetime_aware).floor(freq)
The I would just use pd.Series.apply to get every element of my Series through the function.
In the end, when you use the .dt accessor, the functions you would use are methods of the base classes, so using apply with a short custom function like mine may solve your problem!
When you write
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Date'] = df['Date'].dt.strftime('%m/%d')
It can fixed

Extracting the hour from a time column in pandas

Suppose I have the following dataset:
How would I create a new column, to be the hour of the time?
For example, the code below works for individual times, but I haven't been able to generalise it for a column in pandas.
t = datetime.strptime('9:33:07','%H:%M:%S')
print(t.hour)
Use to_datetime to datetimes with dt.hour:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
#should be slowier
#df['hour'] = pd.to_datetime(df['TIME']).dt.hour
df['hour'] = pd.to_datetime(df['TIME'], format='%H:%M:%S').dt.hour
print (df)
TIME hour
0 9:33:07 9
1 9:41:09 9
If want working with datetimes in column TIME is possible assign back:
df['TIME'] = pd.to_datetime(df['TIME'], format='%H:%M:%S')
df['hour'] = df['TIME'].dt.hour
print (df)
TIME hour
0 1900-01-01 09:33:07 9
1 1900-01-01 09:41:09 9
My suggestion:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
df['hour']= df.TIME.str.extract("(^\d+):", expand=False)
"str.extract(...)" is a vectorized function that extract a regular expression pattern ( in our case "(^\d+):" which is the hour of the TIME) and return a Pandas Series object by specifying the parameter "expand= False"
The result is stored in the "hour" column
You can use extract() twice to feature out the 'hour' column
df['hour'] = df. TIME. str. extract("(\d+:)")
df['hour'] = df. hour. str. extract("(\d+)")

Rename last index of pandas dataframe and coerce dtypes

I have a doubt. I have a dataframe df and I need to change the index of the last row, this is probably easy but I'm having troubles doing it:
A B
2017-02-09 2.4 3.4
2017-02-10 3.4 3.2
2017-02-13 3.3 2.2
0 3.1 2.1
I need to change that "0" index at the last row with today's date 2017-02-14, how can I do this ?
I have tried:
df.set_index[len(df)-1](dt.date.today())
where dt stands for datetime. It does not work, any idea ? Thank you very much
You could use DF.rename with a dict mapping to change the labels along the index axis as shown:
import datetime
df.rename({df.index[-1]: datetime.date.today()}, inplace=True)
df.index = pd.to_datetime(df.index) # Convert dtype to DateTimeIndex
Check the dtypes:
df.index.dtype
dtype('<M8[ns]')
Note: You cannot use set_index[..] as these refer to the built-in methods that aren't subscriptable. Instead, you must use it like set_index(..) with enclosed parentheses.
A succinct manner to coerce the dtypes in one-line by converting the index to it's series representation and replacing just the last value with today's date:
df.index = pd.to_datetime(df.index.to_series().replace({df.index[-1]:datetime.date.today()}))
gives:
df.index
DatetimeIndex(['2017-02-09', '2017-02-10', '2017-02-13', '2017-02-14'],
dtype='datetime64[ns]', freq=None)
ind = df.index.tolist()
ind[len(ind)-1] = dt.date.today()
df.index = ind

AttributeError: Can only use .dt accessor with datetimelike values

Hi I am using pandas to convert a column to month.
When I read my data they are objects:
Date object
dtype: object
So I am first making them to date time and then try to make them as months:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Also if that helps:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
So, the error I get is like this:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
EDITED:
Date columns are like this:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03
5 2014-01-03
6 2014-01-03
7 2014-01-07
8 2014-01-08
9 2014-01-09
Do you have any ideas?
Thank you very much!
Your problem here is that to_datetime silently failed so the dtype remained as str/object, if you set param errors='coerce' then if the conversion fails for any particular string then those rows are set to NaT.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
So you need to find out what is wrong with those specific row values.
See the docs
First you need to define the format of date column.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
For your case base format can be set to;
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
After that you can set/change your desired output as follows;
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')
Your problem here is that the dtype of 'Date' remained as str/object. You can use the parse_dates parameter when using read_csv
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', parse_dates= [col],encoding='utf-8-sig', usecols= ['Date', 'ids'],)
df['Month'] = df['Date'].dt.month
From the documentation for the parse_dates parameter
parse_dates : bool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.
Note: A fast-path exists for iso8601-formatted dates.
The relevant case for this question is the "list of int or names" one.
col is the columns index of 'Date' which parses as a separate date column.
#Convert date into the proper format so that date time operation can be easily performed
df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"])
# Cal Year
df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')
train_data=pd.read_csv("train.csv",parse_dates=["date"])
I encountered a similar problem when trying to use pd.Series.dt.floor, although all the elements in my pd.Series were datetime.datetime instances (absolutely no NAs). I suspect it had to do with having tz-aware instances with different timezones.
My workaround, in order to take advantage of the pd.Timestamp.floor method was to define the following function:
def floor_datetime(base_datetime_aware, freq="2H"):
return pd.Timestamp(base_datetime_aware).floor(freq)
The I would just use pd.Series.apply to get every element of my Series through the function.
In the end, when you use the .dt accessor, the functions you would use are methods of the base classes, so using apply with a short custom function like mine may solve your problem!
When you write
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Date'] = df['Date'].dt.strftime('%m/%d')
It can fixed

Categories

Resources