pandas reading dates from csv in yy-mm-dd format - python

I have a csv files with dates in the format displayed as dd-mmm-yy and i want to read in the format yyyy-mm-dd. parse dates option works but it not converting dates correct before 2000
Example: actual date is 01-Aug-1968. It is displayed as 01-Aug-68. Pandas date parase and correction=true reads the date as 01-Aug-2068.
Is there any option to read the date in pandas in the correct format for the dates before 2000.

from dateutil.relativedelta import relativedelta
import datetime
let's assume you have a csv like this:
mydates
18-Aug-68
13-Jul-45
12-Sep-00
20-Jun-10
15-Jul-60
Define your date format
d = lambda x: pd.datetime.strptime(x, '%d-%b-%y')
Put a constraint on them
dateparse = lambda x: d(x) if d(x) < datetime.datetime.now() else d(x) - relativedelta(years=100)
read your csv:
df = pd.read_csv("myfile.csv", parse_dates=['mydates'], date_parser=dateparse)
here is your result:
print df
mydates
0 1968-08-18
1 1945-07-13
2 2000-09-12
3 2010-06-20
4 1960-07-15
VoilĂ 

Related

python pandas converting UTC integer to datetime

I am calling some financial data from an API which is storing the time values as (I think) UTC (example below):
enter image description here
I cannot seem to convert the entire column into a useable date, I can do it for a single value using the following code so I know this works, but I have 1000's of rows with this problem and thought pandas would offer an easier way to update all the values.
from datetime import datetime
tx = int('1645804609719')/1000
print(datetime.utcfromtimestamp(tx).strftime('%Y-%m-%d %H:%M:%S'))
Any help would be greatly appreciated.
Simply use pandas.DataFrame.apply:
df['date'] = df.date.apply(lambda x: datetime.utcfromtimestamp(int(x)/1000).strftime('%Y-%m-%d %H:%M:%S'))
Another way to do it is by using pd.to_datetime as recommended by Panagiotos in the comments:
df['date'] = pd.to_datetime(df['date'],unit='ms')
You can use "to_numeric" to convert the column in integers, "div" to divide it by 1000 and finally a loop to iterate the dataframe column with datetime to get the format you want.
import pandas as pd
import datetime
df = pd.DataFrame({'date': ['1584199972000', '1645804609719'], 'values': [30,40]})
df['date'] = pd.to_numeric(df['date']).div(1000)
for i in range(len(df)):
df.iloc[i,0] = datetime.utcfromtimestamp(df.iloc[i,0]).strftime('%Y-%m-%d %H:%M:%S')
print(df)
Output:
date values
0 2020-03-14 15:32:52 30
1 2022-02-25 15:56:49 40

Bad datetime conversion in pandas when a csv file it's opened

I have a simple csv in which there are a Date and Activity column like this:
and when I open it with pandas and I try to convert the Date column with pd.to_datetime its change the date. When there are a change of month like this
Its seems that pandas change the day by the month or something like that:
The format of date that I want it's dd-mm-yyyy or yyyy-mm-dd.
This it's the code that I using:
import pandas as pd
dataset = pd.read_csv(directory + "Time 2020 (Activities).csv", sep = ";")
dataset[["Date"]] = dataset[["Date"]].apply(pd.to_datetime)
How can I fix that?
You could specify the date format in the pd.to_datetime parameters:
dataset['Date'] = pd.to_datetime(dataset['Date'], format='%Y-%m-%d')

Changing date format convert mmm-yy to yyyy/mm/dd

I have a .CSV file with a column "Date". It has the full date in it e.g. 1/9/2020 but is formatted to Sep-20. (All dates are the first of every month)
The issue is that python is reading the formatted .CSV file's formatted value of Sep-20. How do I change all the values to a yyyy/mm/dd (2020/09/01) format?
What I tried so far but to no avail.
import pandas as pd
tw_df = pd.read_csv("tw_data.csv", index_col = "Date", parse_dates = True, format = "%Y%m%d")
Error Message
TypeError: parser_f() got an unexpected keyword argument 'format'
You can use datetime to convert the information to date inside Pandas. Use strptime to convert string on a given format to date format that you can work inside Pandas.
Check the code below:
import pandas as pd
from datetime import datetime
df = pd.read_csv('tw_data.csv')
conv = lambda x: datetime.strptime(x, "%b-%y")
df["Date"] = df["Date"].apply(conv)

Pandas: Multiple date formats in one column

I have two date formats in one Pandas series (column) that need to be standardized into one format (mmm dd & mm/dd/YY)
Date
Jan 3
Jan 2
Jan 1
12/31/19
12/30/19
12/29/19
Even Excel won't recognize the mmm dd format as a date format. I can change the mmm to a fully-spelled out month using str.replace:
df['Date'] = df['Date'].str.replace('Jan', 'January', regex=True)
But how do I add the current year? How do I then convert January 1, 2020 to 01/01/20?
Have you tried the parse()
from dateutil.parser import parse
import datetime
def clean_date(text):
datetimestr = parse(text)
text = datetime.strptime(datetimestr, '%Y%m%d')
return text
df['Date'] = df['Date'].apply(clean_date)
df['Date'] = pd.to_datetime(df['Date'])
If it's in a data frame use this:
from dateutil.parser import parse
import pandas as pd
for i in range(len(df['Date'])):
df['Date'][i] = parse(df['Date'][i])
df['Date'] = pd.to_datetime(df['Date']).dt.strftime("%d-%m-%Y")
Found the solution (needed to use apply):
df['date'] = df['date'].apply(dateutil.parser.parse)

Pandas DateTime Format not working - python

I am trying to format some dates with datetime, but for some reason it is ignoring my format call. I want day/month/Year format which is what the CSV file has the format is, but when I try this.
df = pd.read_csv('test.csv', parse_dates=['Date'],
date_parser=lambda x: pd.to_datetime(x, format='%d/%m/%Y'))
Result:
Why is it what I can only assume "defaulting" to %Y-%m-%d ???
This should work.
import datetime as dt
import pandas as pd
df = pd.read_csv('test.csv')
formatted_dates =[]
for old_date in df['Date']:
dt_obj = dt.datetime.strptime(old_date,'%d/%m/%Y')
new_date = """{}/{}/{}""".format(dt_obj.day,dt_obj.month,dt_obj.year)
formatted_dates.append(new_date)
df['Date'] = formatted_dates
Output:
18/1/2017
22/1/2017
31/1/2017
...
P.S. There's a bug with the parse_dates,date_parser in pd.read_csv which automatically changes the format to the YYYY-MM-DD.

Categories

Resources