Pandas: Multiple date formats in one column - python

I have two date formats in one Pandas series (column) that need to be standardized into one format (mmm dd & mm/dd/YY)
Date
Jan 3
Jan 2
Jan 1
12/31/19
12/30/19
12/29/19
Even Excel won't recognize the mmm dd format as a date format. I can change the mmm to a fully-spelled out month using str.replace:
df['Date'] = df['Date'].str.replace('Jan', 'January', regex=True)
But how do I add the current year? How do I then convert January 1, 2020 to 01/01/20?

Have you tried the parse()
from dateutil.parser import parse
import datetime
def clean_date(text):
datetimestr = parse(text)
text = datetime.strptime(datetimestr, '%Y%m%d')
return text
df['Date'] = df['Date'].apply(clean_date)
df['Date'] = pd.to_datetime(df['Date'])

If it's in a data frame use this:
from dateutil.parser import parse
import pandas as pd
for i in range(len(df['Date'])):
df['Date'][i] = parse(df['Date'][i])
df['Date'] = pd.to_datetime(df['Date']).dt.strftime("%d-%m-%Y")

Found the solution (needed to use apply):
df['date'] = df['date'].apply(dateutil.parser.parse)

Related

Pandas to datetime

I have a date that is formatted like this:
01-19-71
and 71 is 1971 but whenever to_datetime is used it converts is to 2071! how can I solve this problem? I am told that this would need regex but I can't imagine how since there are many cases in this data
my current code:
re_1 = r"\d{1,2}[/-]\d{1,2}[/-]\d{2,4}"
re_2 = r"(?:\d{1,2} )?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[ \-\.,]+(?:\d{1,2}[\w]*[ \-,]+)?[1|2]\d{3}"
re_3 = r"(?:\d{1,2}/)?[1|2]\d{3}"
# Correct misspillings
df = df.str.replace("Janaury", "January")
df = df.str.replace("Decemeber", "December")
# Extract dates
regex = "((%s)|(%s)|(%s))"%(re_1, re_2, re_3)
dates = df.str.extract(regex)
# Sort the Series
dates = pd.Series(pd.to_datetime(dates.iloc[:,0]))
dates.sort_values(ascending=True, inplace=True)
Considering that one has a string as follows
date = '01-19-71'
In order to convert to datetime object where 71 is converted to 1971 and not 2071, one can use datetime.strptime as follows
import datetime as dt
date = dt.datetime.strptime(date, '%m-%d-%y')
[Out]:
1971-01-19 00:00:00

python pandas converting UTC integer to datetime

I am calling some financial data from an API which is storing the time values as (I think) UTC (example below):
enter image description here
I cannot seem to convert the entire column into a useable date, I can do it for a single value using the following code so I know this works, but I have 1000's of rows with this problem and thought pandas would offer an easier way to update all the values.
from datetime import datetime
tx = int('1645804609719')/1000
print(datetime.utcfromtimestamp(tx).strftime('%Y-%m-%d %H:%M:%S'))
Any help would be greatly appreciated.
Simply use pandas.DataFrame.apply:
df['date'] = df.date.apply(lambda x: datetime.utcfromtimestamp(int(x)/1000).strftime('%Y-%m-%d %H:%M:%S'))
Another way to do it is by using pd.to_datetime as recommended by Panagiotos in the comments:
df['date'] = pd.to_datetime(df['date'],unit='ms')
You can use "to_numeric" to convert the column in integers, "div" to divide it by 1000 and finally a loop to iterate the dataframe column with datetime to get the format you want.
import pandas as pd
import datetime
df = pd.DataFrame({'date': ['1584199972000', '1645804609719'], 'values': [30,40]})
df['date'] = pd.to_numeric(df['date']).div(1000)
for i in range(len(df)):
df.iloc[i,0] = datetime.utcfromtimestamp(df.iloc[i,0]).strftime('%Y-%m-%d %H:%M:%S')
print(df)
Output:
date values
0 2020-03-14 15:32:52 30
1 2022-02-25 15:56:49 40

convert yyyy-mm-dd to mmm-yy in dataframe python

I am trying to convert the way month and year is presented.
I have dataframe as below
Date
2020-01-31
2020-04-30
2021-05-05
and I want to convert it in the way like month and year.
The output that I am expecting is
Date
Jan-20
Apr-20
May-21
I tried to do it with datetime but it doesn't work.
pd.to_datetime(pd.Series(df['Date'),format='%mmm-%yy')
Use .dt.strftime() to change the display format. %b-%y is the format string for Mmm-YY:
df.Date = pd.to_datetime(df.Date).dt.strftime('%b-%y')
# Date
# 0 Jan-20
# 1 Apr-20
# 2 May-21
Or if Date is the index:
df.index = pd.to_datetime(df.index).dt.strftime('%b-%y')
import pandas as pd
date_sr = pd.to_datetime(pd.Series("2020-12-08"))
change_format = date_sr.dt.strftime('%b-%Y')
print(change_format)
reference https://docs.python.org/3/library/datetime.html
%Y-%m-%d changed to ('%b-%y')
import datetime
df['Date'] = df['Date'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d').strftime('%b-%y'))
# reference https://docs.python.org/3/library/datetime.html
# %Y-%m-%d changed to ('%b-%y')

Bad datetime conversion in pandas when a csv file it's opened

I have a simple csv in which there are a Date and Activity column like this:
and when I open it with pandas and I try to convert the Date column with pd.to_datetime its change the date. When there are a change of month like this
Its seems that pandas change the day by the month or something like that:
The format of date that I want it's dd-mm-yyyy or yyyy-mm-dd.
This it's the code that I using:
import pandas as pd
dataset = pd.read_csv(directory + "Time 2020 (Activities).csv", sep = ";")
dataset[["Date"]] = dataset[["Date"]].apply(pd.to_datetime)
How can I fix that?
You could specify the date format in the pd.to_datetime parameters:
dataset['Date'] = pd.to_datetime(dataset['Date'], format='%Y-%m-%d')

pandas reading dates from csv in yy-mm-dd format

I have a csv files with dates in the format displayed as dd-mmm-yy and i want to read in the format yyyy-mm-dd. parse dates option works but it not converting dates correct before 2000
Example: actual date is 01-Aug-1968. It is displayed as 01-Aug-68. Pandas date parase and correction=true reads the date as 01-Aug-2068.
Is there any option to read the date in pandas in the correct format for the dates before 2000.
from dateutil.relativedelta import relativedelta
import datetime
let's assume you have a csv like this:
mydates
18-Aug-68
13-Jul-45
12-Sep-00
20-Jun-10
15-Jul-60
Define your date format
d = lambda x: pd.datetime.strptime(x, '%d-%b-%y')
Put a constraint on them
dateparse = lambda x: d(x) if d(x) < datetime.datetime.now() else d(x) - relativedelta(years=100)
read your csv:
df = pd.read_csv("myfile.csv", parse_dates=['mydates'], date_parser=dateparse)
here is your result:
print df
mydates
0 1968-08-18
1 1945-07-13
2 2000-09-12
3 2010-06-20
4 1960-07-15
VoilĂ 

Categories

Resources