Extract month data from pandas Dataframe

Extract month data from pandas Dataframe - python

I originally have dates in string format.
I want to extract the month as a number from these dates.
df = pd.DataFrame({'Date':['2011/11/2', '2011/12/20', '2011/8/16']})
I convert them to a pandas datetime object.
df['Date'] = pd.to_datetime(df['Date'])
I then want to extract all the months.
When I try:
df.loc[0]["Date"].month
This works returning the correct value of 11.
But when I try to call multiple months it doesn't work?
df.loc[1:2]["Date"].month
AttributeError: 'Series' object has no attribute 'month'

df.loc[0]["Date"] returns a scalar: pd.Timestamp objects have a month attribute, which is what you are accessing.
df.loc[1:2]["Date"] returns a series: pd.Series objects do not have a month attribute, they do have a dt.month attribute if df['Date'] is a datetime series.
In addition, don't use chained indexing. You can use:
df.loc[0, 'Date'].month for a scalar
df.loc[1:2, 'Date'].dt.month for a series

There are different functions. pandas.Series.dt.month for converting Series filled by datetimes and pandas.Timestamp for converting scalar. For converting Index is function pandas.DatetimeIndex.month, there is no .dt.
So need:
#Series
df.loc[1:2, "Date"].dt.month
#scalar
df.loc[0, 'Date'].month
#DatetimeIndex
df.set_index('Date').month

Related

Convert object-type hours:minutes:seconds column to datetime type in Pandas

I have a column called Time in a dataframe that looks like this:
599359 12:32:25
326816 17:55:22
326815 17:55:22
358789 12:48:25
361553 12:06:45
...
814512 21:22:07
268266 18:57:31
659699 14:28:20
659698 14:28:20
268179 17:48:53
Name: Time, Length: 546967, dtype: object
And right now it is an object dtype. I've tried the following to convert it to a datetime:
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce', utc = True).dt.time
And I understand that the .dt.time methods are needed to prevent the Year and Month from being added, but I believe this is causing the dtype to revert to an object.
Any workarounds? I know I could do
df['Time'] = df['Time'].apply(pd.to_datetime, format='%H:%M:%S', errors='coerce', utc = True)
but I have over 500,000 rows and this is taking forever.

When you do this bit: df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce', utc = True).dt.time, you're converting the 'Time' column to have pd.dtype as object... and that "object" is the python type datetime.time.
The pandas dtype pd.datetime is a different type than python's datetime.datetime objects. And pandas' pd.datetime does not support time objects (i.e. you can't have pandas consider the column a datetime without providing the year). This is the dtype is changing to object.
In the case of your second approach, df['Time'] = df['Time'].apply(pd.to_datetime, format='%H:%M:%S', errors='coerce', utc = True) there is something slightly different happening. In this case you're applying the pd.to_datetime to each scalar element of the 'Time' series. Take a look at the return types of the function in the docs, but basically in this case the time values in your df are being converted to pd.datetime objects on the 1st of january 1900. (i.e. a default date is added).
So: pandas is behaving correctly. If you only want the times, then it's okay to use the datetime.time objects in the column. But to operate on them you'll probably be relying on many [slow] df.apply methods. Alternatively, just keep the default date of 1900-01-01 and then you can add/subtract the pd.datetime columns and get the speed advantage of pandas. Then just strip off the date when you're done with it.

selecting a df row by month formatted with (lambda x: datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S%z'))

I'm having some issues with geopandas and pandas datetime objects; I kept getting the error
pandas Invalid field type <class 'pandas._libs.tslibs.timedeltas.Timedelta'>
when I try to save it using gpd.to_file() apparently this is a known issue between pandas and geopandas date types so I used
df.DATE = df.DATE.apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S%z'))
to get a datetime object I could manipulate without getting the aforementioned error when I save the results. Due to that change, my selection by month
months = [4]
for month in months:
df = df[[(pd.DatetimeIndex(df.DATE).month == month)]]
no longer works, throwing a value error.
ValueError: Item wrong length 1 instead of 108700.
I tried dropping the pd.DatetimeIndex but this throws a dataframe series error
AttributeError: 'Series' object has no attribute 'month'
and
df = df[(df.DATE.month == month)]
gives me the same error.
I know it converted over to a datetime object because print(df.dtype) shows DATE datetime64[ns, UTC] and
for index, row in df.iterrows():
print(row.DATE.month)
prints the month as a integer to the terminal.
Without going back to pd.Datetime how can I fix my select statement for the month?

The statement df.DATE returns a Series object. That doesn't have a .month attribute. The dates inside the Series do, which is why row.DATE.month works. Try something like:
filter = [x.month == month for x in df.DATE]
df_filtered = df[filter]
Before that, I'm not sure what you're trying to accomplish with pd.DatetimeIndex(df.DATE).month == month) but a similar fix should take care of it.

Data type still shows 'Object' even after using pandas datetime conversion?

I'm working with COVID19 data and have a date column. It was originally as Int64, so I changed it to datetime object using the below function:
us['date'] = pd.to_datetime(us['date'], format='%Y%m%d', errors='ignore')
us['date'] = us['date'].dt.strftime("%m/%d")
When I print out the result, I see all the data in the column as "1/1, 1/4, 2/1 ....", so on. (There is no null value here) I'm trying to use group this by month, so I tried to use another function to get only months as below:
us['date'].dt.strftime("%m")
But I'm getting the error:
AttributeError: Can only use .dt accessor with datetimelike values
I don't understand, since I change the datatype to datetime using pd_datetime, should this column already be datetime, not object? When I checked the data type, it shows 'O'. Why is this so?

When we do ignore ,it will ignore the whole column if any item can not be changed ,we should do coerce
us['date'] = pd.to_datetime(us['date'], format='%Y%m%d', errors='coerce')
us['date'].dt.strftime('%B')

Lets try this:
us['monthanddate'] = us['date'].dt.strftime("%m/%d")
For only month
us['monthonly'] = us['date'].dt.strftime("%m")

Filtering DataFrame by month [duplicate]

Hi I am using pandas to convert a column to month.
When I read my data they are objects:
Date object
dtype: object
So I am first making them to date time and then try to make them as months:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Also if that helps:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
So, the error I get is like this:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
EDITED:
Date columns are like this:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03
5 2014-01-03
6 2014-01-03
7 2014-01-07
8 2014-01-08
9 2014-01-09
Do you have any ideas?
Thank you very much!

Your problem here is that to_datetime silently failed so the dtype remained as str/object, if you set param errors='coerce' then if the conversion fails for any particular string then those rows are set to NaT.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
So you need to find out what is wrong with those specific row values.
See the docs

First you need to define the format of date column.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
For your case base format can be set to;
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
After that you can set/change your desired output as follows;
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')

Your problem here is that the dtype of 'Date' remained as str/object. You can use the parse_dates parameter when using read_csv
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', parse_dates= [col],encoding='utf-8-sig', usecols= ['Date', 'ids'],)
df['Month'] = df['Date'].dt.month
From the documentation for the parse_dates parameter
parse_dates : bool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.
Note: A fast-path exists for iso8601-formatted dates.
The relevant case for this question is the "list of int or names" one.
col is the columns index of 'Date' which parses as a separate date column.

#Convert date into the proper format so that date time operation can be easily performed
df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"])
# Cal Year
df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')

train_data=pd.read_csv("train.csv",parse_dates=["date"])

I encountered a similar problem when trying to use pd.Series.dt.floor, although all the elements in my pd.Series were datetime.datetime instances (absolutely no NAs). I suspect it had to do with having tz-aware instances with different timezones.
My workaround, in order to take advantage of the pd.Timestamp.floor method was to define the following function:
def floor_datetime(base_datetime_aware, freq="2H"):
return pd.Timestamp(base_datetime_aware).floor(freq)
The I would just use pd.Series.apply to get every element of my Series through the function.
In the end, when you use the .dt accessor, the functions you would use are methods of the base classes, so using apply with a short custom function like mine may solve your problem!

When you write
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Date'] = df['Date'].dt.strftime('%m/%d')
It can fixed

AttributeError: Can only use .dt accessor with datetimelike values

Hi I am using pandas to convert a column to month.
When I read my data they are objects:
Date object
dtype: object
So I am first making them to date time and then try to make them as months:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Also if that helps:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
So, the error I get is like this:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
EDITED:
Date columns are like this:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03
5 2014-01-03
6 2014-01-03
7 2014-01-07
8 2014-01-08
9 2014-01-09
Do you have any ideas?
Thank you very much!

Your problem here is that to_datetime silently failed so the dtype remained as str/object, if you set param errors='coerce' then if the conversion fails for any particular string then those rows are set to NaT.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
So you need to find out what is wrong with those specific row values.
See the docs

First you need to define the format of date column.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
For your case base format can be set to;
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
After that you can set/change your desired output as follows;
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')

Your problem here is that the dtype of 'Date' remained as str/object. You can use the parse_dates parameter when using read_csv
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', parse_dates= [col],encoding='utf-8-sig', usecols= ['Date', 'ids'],)
df['Month'] = df['Date'].dt.month
From the documentation for the parse_dates parameter
parse_dates : bool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.
Note: A fast-path exists for iso8601-formatted dates.
The relevant case for this question is the "list of int or names" one.
col is the columns index of 'Date' which parses as a separate date column.

#Convert date into the proper format so that date time operation can be easily performed
df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"])
# Cal Year
df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')

train_data=pd.read_csv("train.csv",parse_dates=["date"])

I encountered a similar problem when trying to use pd.Series.dt.floor, although all the elements in my pd.Series were datetime.datetime instances (absolutely no NAs). I suspect it had to do with having tz-aware instances with different timezones.
My workaround, in order to take advantage of the pd.Timestamp.floor method was to define the following function:
def floor_datetime(base_datetime_aware, freq="2H"):
return pd.Timestamp(base_datetime_aware).floor(freq)
The I would just use pd.Series.apply to get every element of my Series through the function.
In the end, when you use the .dt accessor, the functions you would use are methods of the base classes, so using apply with a short custom function like mine may solve your problem!

When you write
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Date'] = df['Date'].dt.strftime('%m/%d')
It can fixed

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract month data from pandas Dataframe - python

Related

Convert object-type hours:minutes:seconds column to datetime type in Pandas

selecting a df row by month formatted with (lambda x: datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S%z'))

Data type still shows 'Object' even after using pandas datetime conversion?

Filtering DataFrame by month [duplicate]

AttributeError: Can only use .dt accessor with datetimelike values

Categories

Resources