Select date range from Pandas DataFrame - python

I have a list of dates in a DF that have been converted to a YYYY-MM format and need to select a range. This is what I'm trying:
#create dataframe
data = ['2016-01','2016-02','2016-09','2016-10','2016-11','2017-04','2017-05','2017-06','2017-07','2017-08']
df = pd.DataFrame(data, columns = {'date'})
#lookup range
df[df["date"].isin(pd.date_range('2016-01', '2016-06'))]
It doesn't seem to be working because the date column is no longer a datetime column. The format has to be in YYYY-MM. So I guess the question is, how can I make a datetime column with YYYY-MM? Can someone please help?
Thanks.

You do not need an actual datetime-type column or query values for this to work. Keep it simple:
df[df.date.between('2016-01', '2016-06')]
That gives:
date
0 2016-01
1 2016-02
It works because ISO 8601 date strings can be sorted as if they were plain strings. '2016-06' comes after '2016-05' and so on.

Related

Pandas, select dates using input from list

here is my input df:
df:
date , name
1990-12-21, adam1
1990-12-22, adam2
1990-12-23, adam3
1990-12-24, adam4
1990-12-25, adam5
I want to select all dates above given date from list (always on fist place)
list = ['1990-12-23','name','22']
df = pd.to_datetime(df['date'))
df = df[df.date > list[0]]
And its working.
My question is, why its working without converting this first element of a list to datetime format?
Pandas has flexible Partial String Indexing. This allows dates and times that can be automatically parsed into a datetime or timestamp to be used as strings without first converting them.

Date Formatting Problem in pandas Dataframe

I have a Date column in my Dataframe, when I display the dates, The Dates format are merged, and are in random format.How to put them in right format? Like in dd/mm/yyyy
This is pseudo code since you did not gave us your code. It assumed that the column date of a dataframe df is correctly formatted as datetime.
You can use the vectorized datetime function strftime() with (see the docs):
df['date'].dt.strftime("%d/%m/%Y")
When you want to save the changes of the format, you need to assign it again to the date column, like this
df['date'] = df['date'].dt.strftime("%d/%m/%Y")

Convert to datetime using column position/number in python pandas

Very simple query but did not find the answer on google.
df with timestamp in date column
Date
22/11/2019 22:30:10 etc. say which is of the form object on doing df.dtype()
Code:
df['Date']=pd.to_datetime(df['Date']).dt.date
Now I want the date to be converted to datetime using column number rather than column name. Column number in this case will be 0(I have very big column names and similar multipe files, so I want to change date column to datetime using its position '0' in this case).
Can anyone help?
Use DataFrame.iloc for column (Series) by position:
df.iloc[:, 0] = pd.to_datetime(df.iloc[:, 0]).dt.date
Or is also possible extract column name by indexing:
df[df.columns[0]] = pd.to_datetime(df[df.columns[0]]).dt.date

Convert a series of dates in format YYYYMMDD in a dataframe of massive data

hi i´m trying to convert to date one field in a pd dataframe that is date but formated as YYYYMMDD
i have tried
pd.to_datetime('20180331').strftime('%Y:%m:%d')
but it doesn´t work for a full series of data, only for 1 case, i have a 500.000 lines data set so a lambda function wouldn´t be so fast.
thanks for the help
assuming your column is df['col']:
pd.to_datetime(df['col'], format = '%Y%m%d')
documentation

Pandas - Converting Derived Datetime to Integer

I have a pandas dataframe, 'df', where there is an original column with dates in datetime format. I set a hard date as a variable:
hard_date = datetime.date(2013, 5, 2)
I then created a new column in my df with the difference between the values in the date column and the hard_date...
df['days_from'] = df['date'] - hard_date
This produced a good output. for instance, when I print the first cell in the new column it shows:
print (df['days_from'].iloc[0])
28 days 00:00:00
But now I want to convert the new column to just the number of days as an integer. I thought about just taking the first 2 characters, but many of the values are negative, so I am seeking a better route.
Any thoughts on an efficient way to convert the column to just the integer of the days?
Thanks
Just use the .dt accessor and .days attribute on the timedeltas.
df.days_from = df.days_from.dt.days

Categories

Resources