Filtering DataFrame by month [duplicate]

Filtering DataFrame by month [duplicate] - python

Hi I am using pandas to convert a column to month.
When I read my data they are objects:
Date object
dtype: object
So I am first making them to date time and then try to make them as months:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Also if that helps:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
So, the error I get is like this:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
EDITED:
Date columns are like this:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03
5 2014-01-03
6 2014-01-03
7 2014-01-07
8 2014-01-08
9 2014-01-09
Do you have any ideas?
Thank you very much!

Your problem here is that to_datetime silently failed so the dtype remained as str/object, if you set param errors='coerce' then if the conversion fails for any particular string then those rows are set to NaT.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
So you need to find out what is wrong with those specific row values.
See the docs

First you need to define the format of date column.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
For your case base format can be set to;
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
After that you can set/change your desired output as follows;
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')

Your problem here is that the dtype of 'Date' remained as str/object. You can use the parse_dates parameter when using read_csv
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', parse_dates= [col],encoding='utf-8-sig', usecols= ['Date', 'ids'],)
df['Month'] = df['Date'].dt.month
From the documentation for the parse_dates parameter
parse_dates : bool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.
Note: A fast-path exists for iso8601-formatted dates.
The relevant case for this question is the "list of int or names" one.
col is the columns index of 'Date' which parses as a separate date column.

#Convert date into the proper format so that date time operation can be easily performed
df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"])
# Cal Year
df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')

train_data=pd.read_csv("train.csv",parse_dates=["date"])

I encountered a similar problem when trying to use pd.Series.dt.floor, although all the elements in my pd.Series were datetime.datetime instances (absolutely no NAs). I suspect it had to do with having tz-aware instances with different timezones.
My workaround, in order to take advantage of the pd.Timestamp.floor method was to define the following function:
def floor_datetime(base_datetime_aware, freq="2H"):
return pd.Timestamp(base_datetime_aware).floor(freq)
The I would just use pd.Series.apply to get every element of my Series through the function.
In the end, when you use the .dt accessor, the functions you would use are methods of the base classes, so using apply with a short custom function like mine may solve your problem!

When you write
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Date'] = df['Date'].dt.strftime('%m/%d')
It can fixed

Related

Convert object-type hours:minutes:seconds column to datetime type in Pandas

I have a column called Time in a dataframe that looks like this:
599359 12:32:25
326816 17:55:22
326815 17:55:22
358789 12:48:25
361553 12:06:45
...
814512 21:22:07
268266 18:57:31
659699 14:28:20
659698 14:28:20
268179 17:48:53
Name: Time, Length: 546967, dtype: object
And right now it is an object dtype. I've tried the following to convert it to a datetime:
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce', utc = True).dt.time
And I understand that the .dt.time methods are needed to prevent the Year and Month from being added, but I believe this is causing the dtype to revert to an object.
Any workarounds? I know I could do
df['Time'] = df['Time'].apply(pd.to_datetime, format='%H:%M:%S', errors='coerce', utc = True)
but I have over 500,000 rows and this is taking forever.

When you do this bit: df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S', errors='coerce', utc = True).dt.time, you're converting the 'Time' column to have pd.dtype as object... and that "object" is the python type datetime.time.
The pandas dtype pd.datetime is a different type than python's datetime.datetime objects. And pandas' pd.datetime does not support time objects (i.e. you can't have pandas consider the column a datetime without providing the year). This is the dtype is changing to object.
In the case of your second approach, df['Time'] = df['Time'].apply(pd.to_datetime, format='%H:%M:%S', errors='coerce', utc = True) there is something slightly different happening. In this case you're applying the pd.to_datetime to each scalar element of the 'Time' series. Take a look at the return types of the function in the docs, but basically in this case the time values in your df are being converted to pd.datetime objects on the 1st of january 1900. (i.e. a default date is added).
So: pandas is behaving correctly. If you only want the times, then it's okay to use the datetime.time objects in the column. But to operate on them you'll probably be relying on many [slow] df.apply methods. Alternatively, just keep the default date of 1900-01-01 and then you can add/subtract the pd.datetime columns and get the speed advantage of pandas. Then just strip off the date when you're done with it.

pd.to_datetime format '%Y-%m-%d' does not apply [duplicate]

How should I transform from datetime to string? My attempt:
dates = p.to_datetime(p.Series(['20010101', '20010331']), format = '%Y%m%d')
dates.str

There is no .str accessor for datetimes and you can't do .astype(str) either.
Instead, use .dt.strftime:
>>> series = pd.Series(['20010101', '20010331'])
>>> dates = pd.to_datetime(series, format='%Y%m%d')
>>> dates.dt.strftime('%Y-%m-%d')
0 2001-01-01
1 2001-03-31
dtype: object
See the docs on customizing date string formats here: strftime() and strptime() Behavior.
For old pandas versions <0.17.0, one can instead can call .apply with the Python standard library's datetime.strftime:
>>> dates.apply(lambda x: x.strftime('%Y-%m-%d'))
0 2001-01-01
1 2001-03-31
dtype: object

As of pandas version 0.17.0, you can format with the dt accessor:
dates.dt.strftime('%Y-%m-%d')

There is a pandas function that can be applied to DateTime index in pandas data frame.
date = dataframe.index #date is the datetime index
date = dates.strftime('%Y-%m-%d') #this will return you a numpy array, element is string.
dstr = date.tolist() #this will make you numpy array into a list
the element inside the list:
u'1910-11-02'
You might need to replace the 'u'.
There might be some additional arguments that I should put into the previous functions.

pandas timestamp series to string?

I am new to python (coming from R), and I am trying to understand how I can convert a timestamp series in a pandas dataframe (in my case this is called df['timestamp']) into what I would call a string vector in R. is this possible? How would this be done?
I tried df['timestamp'].apply('str'), but this seems to simply put the entire column df['timestamp'] into one long string. I'm looking to convert each element into a string and preserve the structure, so that it's still a vector (or maybe this a called an array?)

Consider the dataframe df
df = pd.DataFrame(dict(timestamp=pd.to_datetime(['2000-01-01'])))
df
timestamp
0 2000-01-01
Use the datetime accessor dt to access the strftime method. You can pass a format string to strftime and it will return a formatted string. When used with the dt accessor you will get a series of strings.
df.timestamp.dt.strftime('%Y-%m-%d')
0 2000-01-01
Name: timestamp, dtype: object
Visit strftime.org for a handy set of format strings.

Use astype
>>> import pandas as pd
>>> df = pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))
>>> df.astype(str)
0 2009-07-31
1 2010-01-10
2 NaT
dtype: object
returns an array of strings

Following on from VinceP's answer, to convert a datetime Series in-place do the following:
df['Column_name']=df['Column_name'].astype(str)

Rename last index of pandas dataframe and coerce dtypes

I have a doubt. I have a dataframe df and I need to change the index of the last row, this is probably easy but I'm having troubles doing it:
A B
2017-02-09 2.4 3.4
2017-02-10 3.4 3.2
2017-02-13 3.3 2.2
0 3.1 2.1
I need to change that "0" index at the last row with today's date 2017-02-14, how can I do this ?
I have tried:
df.set_index[len(df)-1](dt.date.today())
where dt stands for datetime. It does not work, any idea ? Thank you very much

You could use DF.rename with a dict mapping to change the labels along the index axis as shown:
import datetime
df.rename({df.index[-1]: datetime.date.today()}, inplace=True)
df.index = pd.to_datetime(df.index) # Convert dtype to DateTimeIndex
Check the dtypes:
df.index.dtype
dtype('<M8[ns]')
Note: You cannot use set_index[..] as these refer to the built-in methods that aren't subscriptable. Instead, you must use it like set_index(..) with enclosed parentheses.
A succinct manner to coerce the dtypes in one-line by converting the index to it's series representation and replacing just the last value with today's date:
df.index = pd.to_datetime(df.index.to_series().replace({df.index[-1]:datetime.date.today()}))
gives:
df.index
DatetimeIndex(['2017-02-09', '2017-02-10', '2017-02-13', '2017-02-14'],
dtype='datetime64[ns]', freq=None)

ind = df.index.tolist()
ind[len(ind)-1] = dt.date.today()
df.index = ind

AttributeError: Can only use .dt accessor with datetimelike values

Hi I am using pandas to convert a column to month.
When I read my data they are objects:
Date object
dtype: object
So I am first making them to date time and then try to make them as months:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Also if that helps:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
So, the error I get is like this:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
EDITED:
Date columns are like this:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03
5 2014-01-03
6 2014-01-03
7 2014-01-07
8 2014-01-08
9 2014-01-09
Do you have any ideas?
Thank you very much!

Your problem here is that to_datetime silently failed so the dtype remained as str/object, if you set param errors='coerce' then if the conversion fails for any particular string then those rows are set to NaT.
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
So you need to find out what is wrong with those specific row values.
See the docs

First you need to define the format of date column.
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S')
For your case base format can be set to;
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')
After that you can set/change your desired output as follows;
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d')

Your problem here is that the dtype of 'Date' remained as str/object. You can use the parse_dates parameter when using read_csv
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', parse_dates= [col],encoding='utf-8-sig', usecols= ['Date', 'ids'],)
df['Month'] = df['Date'].dt.month
From the documentation for the parse_dates parameter
parse_dates : bool or list of int or names or list of lists or dict, default False
The behavior is as follows:
boolean. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.
Note: A fast-path exists for iso8601-formatted dates.
The relevant case for this question is the "list of int or names" one.
col is the columns index of 'Date' which parses as a separate date column.

#Convert date into the proper format so that date time operation can be easily performed
df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"])
# Cal Year
df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')

train_data=pd.read_csv("train.csv",parse_dates=["date"])

I encountered a similar problem when trying to use pd.Series.dt.floor, although all the elements in my pd.Series were datetime.datetime instances (absolutely no NAs). I suspect it had to do with having tz-aware instances with different timezones.
My workaround, in order to take advantage of the pd.Timestamp.floor method was to define the following function:
def floor_datetime(base_datetime_aware, freq="2H"):
return pd.Timestamp(base_datetime_aware).floor(freq)
The I would just use pd.Series.apply to get every element of my Series through the function.
In the end, when you use the .dt accessor, the functions you would use are methods of the base classes, so using apply with a short custom function like mine may solve your problem!

When you write
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Date'] = df['Date'].dt.strftime('%m/%d')
It can fixed

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Filtering DataFrame by month [duplicate] - python

#Convert date into the proper format so that date time operation can be easily performed df_Time_Table["Date"] = pd.to_datetime(df_Time_Table["Date"]) # Cal Year df_Time_Table['Year'] = df_Time_Table['Date'].dt.strftime('%Y')

train_data=pd.read_csv("train.csv",parse_dates=["date"])

When you write df['Date'] = pd.to_datetime(df['Date'], errors='coerce') df['Date'] = df['Date'].dt.strftime('%m/%d') It can fixed

Related

Convert object-type hours:minutes:seconds column to datetime type in Pandas

pd.to_datetime format '%Y-%m-%d' does not apply [duplicate]

pandas timestamp series to string?

Rename last index of pandas dataframe and coerce dtypes

AttributeError: Can only use .dt accessor with datetimelike values

Categories

Resources