I have pandas data frame that had a Date (string) which i could convert and set it up as a index using the set_index and to_datetime functions
usd2inr_df.set_index(pd.to_datetime(usd2inr_df['Date']), inplace=True)
but the resulting dataframe has the time portion which i wanted to remove ...
2023-02-14 00:00:00
I wanted to have it as 2023-02-14
How do i setup the call such that, i can get have the date without the time portion as a index on my dataframe
usd2inr_df['Date'] = pd.to_datetime(usd2inr_df['Date']).dt.normalize()
usd2inr_df.set_index(usd2inr_df['date'])
Using the .to_datetime() method, converts a Series to a pandas datetime object.
Using the Series.dt.date, returns a 'yyyy-mm-dd' date form.
Using the DataFrame.index, sets the index of the dataFrame.
import pandas as pd
# create a dataFrame as an example
df = pd.DataFrame({'Name': ['Example'],'Date': ['2023-02-14 10:01:11']})
print(df)
# convert 'yyyy-mm-dd hh:mm:ss' to 'yyyy-mm-dd'.
df['Date'] = pd.to_datetime(df['Date']).dt.date
# set 'Date' as index
df.index = df['Date']
print(df)
Output
Name Date
0 Example 2023-02-14 10:01:11
-------------------------------------------------------
Name Date
Date
2023-02-14 Example 2023-02-14
I have an excel file with data. I defined this file as a DataFrame (5000,12) using python/pandas. As an index, I set the date based on the below:
Data_Final=Data.set_index(['Date Time']) # Data_Final is Dataframe
For example, the first index is 01/01/2016 00:00. Now I want this index in datetime. How is this conversion done?
use the .to_datetime() method
Data_Final = Data
Data_Final['Date Time'] = pd.to_datetime(Data['Date Time'])
Data_Final.set_index('Date Time', inplace=True)
How to convert string to datetime format in pandas python?
1. Question
I have a dataframe, and the Year-Month column contains the year and month which I want to extract.
For example, an element in this column is "2022-10". And I want to extract year=2022, month=10 from it.
My current solution is to use apply and lambda function:
df['xx_month'] = df['Year-Month'].apply(lambda x: int(x.split('-')[1]))
But it's super slow on a huge dataframe.
How to do it more efficiently?
2. Solutions
Thanks for your wisdom, I summarized each one's solution with the code:
(1) split by '-' and join #Vitalizzare
pandas.Series.str.split - split strings of a series, if expand=True then return a data frame with each part in a separate column;
pandas.DataFrame.set_axis - if axis='columns' then rename column names of a data frame;
pandas.DataFrame.join - if the indices are equal, then the frames stacked together horizontally are returned.
df = pd.DataFrame({'Year-Month':['2022-10','2022-11','2022-12']})
df = df.join(
df['Year-Month']
.str.split('-', expand=True)
.set_axis(['year','month'], axis='columns')
)
(2) convert the datatype from object (str) into datetime format #Neele22
import pandas as pd
df['Year-Month'] = pd.to_datetime(df['Year-Month'], format="%Y-%m")
(3) use regex or datetime to extract year and month #mozway
df['Year-Month'].str.extract(r'(?P<year>\d+)-(?P<month>\d+)').astype(int)
# If you want to assign the output to the same DataFrame while removing the original Year-Month:
df[['year', 'month']] = df.pop('Year-Month').str.extract(r'(\d+)-(\d+)').astype(int)
Or use datetime:
date = pd.to_datetime(df['Year-Month'])
df['year'] = date.dt.year
df['month'] = date.dt.month
3. Follow up question
But there will be a problem if I want to subtract 'Year-Month' with other datetime columns after converting the incomplete 'Year-Month' column from string to datetime.
For example, if I want to get the data which is no later than 2 months after the timestamp of each record.
import dateutil # dateutil is a better package than datetime package according to my experience
df[(df['timestamp'] - df['Year-Month'])>= dateutil.relativedelta.relativedelta(months=0) and (df['timestamp'] - df['Year-Month'])<= datetime.timedelta(months=2)]
This code will have type error for subtracting the converted Year-Month column with actual datetime column.
TypeError: Cannot subtract tz-naive and tz-aware datetime-like objects
The types for these two columns are:
Year-Month is datetime64[ns]
timestamp is datetime64[ns, UTC]
Then, I tried to specify utc=True when changing Year-Month to datetime type:
df[["Year-Month"]] = pd.to_datetime(df[["Year-Month"]],utc=True,format="%Y-%m")
But I got Value Error.
ValueError: to assemble mappings requires at least that [year, month,
day] be specified: [day,month,year] is missing
4. Take away
If the [day,month,year] is not complete for the elements in a column. (like in my case, I only have year and month), we can't change this column from string type into datetime type to do calculations. But to use the extracted day and month to do the calculations.
If you don't need to do calculations between the incomplete datetime column and other datetime columns like me, you can change the incomplete datetime string into datetime type, and extract [day,month,year] from it. It's easier than using regex, split and join.
df = pd.DataFrame({'Year-Month':['2022-10','2022-11','2022-12']})
df = df.join(
df['Year-Month']
.str.split('-', expand=True)
.set_axis(['year','month'], axis='columns')
)
pandas.Series.str.split - split strings of a series, if expand=True then return a data frame with each part in a separate column;
pandas.DataFrame.set_axis - if axis='columns' then rename column names of a data frame;
pandas.DataFrame.join - if the indices are equal, then the frames stacked together horizontally are returned.
You can use a regex for that.
Creating a new DataFrame:
df['Year-Month'].str.extract(r'(?P<year>\d+)-(?P<month>\d+)').astype(int)
If you want to assign the output to the same DataFrame while removing the original Year-Month:
df[['year', 'month']] = df.pop('Year-Month').str.extract(r'(\d+)-(\d+)').astype(int)
Example input:
Year-Month
0 2022-10
output:
year month
0 2022 10
alternative using datetime:
You can also use a datetime intermediate
date = pd.to_datetime(df['Year-Month'])
df['year'] = date.dt.year
df['month'] = date.dt.month
output:
Year-Month year month
0 2022-10 2022 10
You can also convert the datatype from object (str) into datetime format. This will make it easier to work with the dates.
import pandas as pd
df['Year-Month'] = pd.to_datetime(df['Year-Month'], format="%Y-%m")
I have a date feature in the format 20001130 and another 2000-11-30 without any space. How can i write the optimized code that works for both to split the date into day month and year efficiently
You can use pandas.to_datetime:
import pandas as pd
pd.to_datetime([20001130, 20001129], format='%Y%m%d')
or with a dataframe.
df = pd.DataFrame({'time': [20001129, 20001130]})
df.time = pd.to_datetime(df.time, format='%Y%m%d')
EDIT
The two date formats should be in one column. In this case, convert all to strings and let pandas.to_datetime interpret the values, as it supports different formats in one column.
df = pd.DataFrame({'time': [20001129, '2000-11-30']})
df.time = pd.to_datetime(df.time.astype(str))
time
0
2000-11-29
1
2000-11-30
I am currently working on multiple datasets with TimeStamp column : dd/mm/yyyy HH:MM daily data at 5 mins interval
i want to resample dataset to fill missing dates n timestamps
Issue is few datasets have some rows as ddmmyy and then format abruptly
changes to mmddyyyy after say first few 100 rows and again ddmmyy without any pattern...
need solution or help to correct this issue
code i am using :::
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['Timestamp'] = df.Timestamp.dt.strftime('%d/%m/%y %H:%M')
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
start_dt = df.loc[0, "Timestamp"]
end_dt = df["Timestamp"].iloc[-1]
r = pd.date_range(start=start_dt, end=end_dt, freq="5min")
# Reindexing by adding missing dates
df = df.set_index('Timestamp').reindex(r).rename_axis("Timestamp").reset_index()
Use regex to filter rows having ddmmyy & mmddyy and then convert to datetime format.