Extract year from YYYYMMDD column in Pandas DataFrame - python

I have a pandas DataFrame in which I would like to create an additional column containing only the year which I extract from a column in YYYYMMDD format.
When searching the forum I found the to_datetime command, but for my case it didn't work.
I tried the following:
df = pd.DataFrame({'name' : ['A','B'],
'date' :[20130102,20140511]})
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
what I get as output is:
date name year
0 1970-01-01 00:00:00.020130102 A 1970
1 1970-01-01 00:00:00.020140511 B 1970
but I would like to get:
date name year
0 20130102 A 2013
1 20140511 B 2014
I also tried it without to_datetime as my date is not in exactly in the yyyy-mm-dd format, but also couldn't make it that way.
I hope you can help me with this 'newbie' problem, thanks a lot!

This is what you need, to specify the format in which you're providing the date.
df['date'] = pd.to_datetime(df['date'],format='%Y%m%d')

Related

pandas convert string to date (the string time is B.E year)

I have a data frame in which a column contains a date having time represent in B.E year format:
date
28-01-2562
29-01-2562
30-01-2562
31-01-2562
I tried using pd.to_datetime but its give me an error:
pd.to_datetime(df['date'])
This is the error I got:
Out of bounds nanosecond timestamp: 2562-01-30 00:00:00
You can convert values to daily periods:
df['date'] = df['date'].apply(pd.Period)
print (df)
date
0 2562-01-28
1 2562-01-29
2 2562-01-30
3 2562-01-31

I want to slice the data in pandas based on date time

I am trying to slice the data based on the date.
If I know what date , I know how to do the slicing. In my case I will NOT the date stamp.
So based on date , I want to do slicing to do my further operation on the data
Please refer to the example for data. Here date column can have a date of any day. I want slice the data.
First slice will be for date : 20211201
Second slice will be for date : 20211202
I am able to covert column into date time format as below
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
please help over this
here is what you need to do :
df = df[df['time'].between('9:10','9:20')].groupby('date')['Open'].max()
Input data
The data you used is:
import pandas as pd
df = pd.DataFrame({"date":[20211201,20211201,20211201,20211201,20211201,20211202,20211202,20211202,20211202],\
"time":["9:08","9:16","9:17","9:18","9:19","13:08","13:09","13:10","13:11"],\
"Open":[17104.4,17105.05,171587.75,17175.2,17168.6,17311.95,17316.5,17322.55,17325.9]})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
Solution
You can slice the DataFrame as follows:
import datetime
df1 = df[df.index==datetime.datetime(2021,12,1)]
df2 = df[df.index==datetime.datetime(2021,12,2)]
Output
Then the outputs you would obtain are:
>>> df1
time Open
date
2021-12-01 9:08 17104.40
2021-12-01 9:16 17105.05
2021-12-01 9:17 171587.75
2021-12-01 9:18 17175.20
2021-12-01 9:19 17168.60
>>> df2
time Open
date
2021-12-02 13:08 17311.95
2021-12-02 13:09 17316.50
2021-12-02 13:10 17322.55
2021-12-02 13:11 17325.90

Dataframe datetime switching month into days

I am trying to convert a day/month/Year Hours:Minutes column into just day and month. When I run my code, the conversion switches the months into days and the days into months.
You can find a copy of my dataframe with the one column I want to switch to Day/Month here
https://file.io/JkWl7fsBN0vl
Below is the code I am using to convert:
df =pd.read_csv('Example.csv')
df['DateTime'] = pd.to_datetime(df['DateTime'])
df.to_csv("output.csv", index=False)
Without knowing the exact DateTime format you are using (the link to the dataframe is broken), I'm going to use an example of
day/month/Year Hours:Minutes
05/09/2014 12:30
You can determine the exact format date code using this site
Essentially, to_datetime() has a format argument where you can pass in the specific format when it is not immediately obvious. This will let you specify that what it keeps confusing for month -> day, day -> month is actually the opposite.
>>> df = pd.DataFrame(['05/09/2014 12:30'],columns=['DateTime'])
DateTime
0 05/09/2014 12:30
>>> df['DateTime'] = pd.to_datetime(df['DateTime'], format='%d/%m/%Y %H:%M')
DateTime
0 2014-09-05 12:30:00
>>> df['day'] = df['DateTime'].dt.day
>>> df['month'] = df['DateTime'].dt.month
DateTime day month
0 2014-09-05 12:30:00 5 9
>>> df['DD/MM'] = df['DateTime'].dt.strftime('%d/%m')
DateTime day month DD/MM
0 2014-09-05 12:30:00 5 9 05/09
I'm unsure about the exact format you want the day and month available in (separate columns, combined), but I provided a few examples, so you can remove the DateTime column when you're done with it and use the one you need.

Cannot remove timestamp in datetime

I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.

Update a specific position in a column of a pandas dataframe based on some condition

For example, one of my column of my dataframe has data like below:
29-APR-19 11.50.00.000000000 PM
29-APR-19 11.50.00.000000000 AM
Hence, I need to update the column having PM to:
29-APR-19 23.50.00.000000000 PM
How can we do that?
I have tried formatting the column to specific date format types, but not able to find the solution.
If need same format here is necessary add 12 hours for datetimes below 12:00:00.
Solution is first convert column by to_datetime, then add 12 hours by condition and for same format use Series.dt.strftime:
print (df)
Date
0 29-APR-19 11.50.00.000000000 PM
1 29-APR-19 11.50.00.000000000 AM
df['Date'] = pd.to_datetime(df['Date'], format='%d-%b-%y %I.%M.%S.%f %p')
df.loc[df['Date'].dt.hour < 12] += pd.Timedelta(12, 'h')
df['Date'] = df['Date'].dt.strftime('%d-%b-%y %I.%M.%S.%f %p')
print (df)
Date
0 29-Apr-19 11.50.00.000000 PM
1 29-Apr-19 11.50.00.000000 PM

Categories

Resources