How to transform invalid date to valid date using Pandas? - python

I have a dataframe like as shown below
df = pd.DataFrame({'d1' :['2/26/2019 03:31','10241-2-19 0:0:0','31/03/2016 16:00'],
'd2' :['2/29/2019 05:21','10241-2-29 0:0:0','03/04/2016 12:00']})
As you can see there are some invalid date values. Meaning records with year like 10241.
On the other hand, valid dates can be in both format as mdy_dm or dmy_dm.
When I try the below, I get an error message that "date of out of range"
df['d1'] = pd.to_datetime(df.d1)
df['d1'].dt.strftime('%m/%d/%Y hh:ss')
Is there anyway to fix this?
I expect my output to be like as shown below

Related

date formatting for time series data

I'm trying to format the start_date and end_date of this data frame. The issue is even though with other time-series data I'm not losing any observation when I'm formatting both dates using the following command:
df1['start_date'] = pd.to_datetime(df2['start_date']).dt.date
df1['end_date'] = pd.to_datetime(df2['end_date']).dt.date
df2['start_date'] = pd.to_datetime(df2['start_date']).dt.date
df2['end_date'] = pd.to_datetime(df2['end_date']).dt.date
But, when I'm doing this for the uploaded excel file I lose half of the observation though the data format is the same. then I went ahead with this code but in vain since it returns me an error:
#formatting the date column correctly
df4.start_date=df4.start_date.apply(lambda x:datetime.datetime.strptime(x, '%Y-%m-%dT%H::%M::%S.%f'))
Error message : ValueError: time data '2014-01-02T00:00:00.000Z' does not match format '%Y-%m-%dT%H::%M::%S.%f'
start_date,end_date,temp,company_id
2014-01-02T00:00:00.000Z,2014-01-02T00:00:00.000Z,24.7076052756763,93
2014-01-11T00:00:00.000Z,2014-01-11T00:00:00.000Z,26.4211755123126,93
2014-01-15T00:00:00.000Z,2014-01-15T00:00:00.000Z,24.305641594482,93
2014-01-20T00:00:00.000Z,2014-01-20T00:00:00.000Z,25.1427441225934,93
2014-01-23T00:00:00.000Z,2014-01-23T00:00:00.000Z,23.6531733860261,93

logger print error: not enough arguments for format string

I'm trying to fix a "logger print error: not enough arguments for format string" cropping up on a jupyter lab report and have tried a few solutions but no joy.
my dataframe looks like this:
df_1 = pd.DataFrame(df, columns = ['col1','col2','col3','col4','col5','col6','col7', 'col8', 'col9', 'col10'])
#I'm applying a % format because I only need last four columns in percentage:
df_1['col7'] = df_1['col7'].apply("{0:.0f}%".format)
df_1['col8'] = df_1['col8'].apply("{0:.0f}%".format)
df_1['col9'] = df_1['col9'].apply("{0:.0f}%".format)
df_1['col10'] = df_1['col10'].apply("{0:.0f}%".format)
I want to maintain the table format/structure so i'm not doing print(df_1) but rather just:
df_1
The above works fine, but I can't seem to get past the "logger print error: not enough arguments for format string" error.
p.s I've also tried using formats like "{:.2%}" or "{0:.0%}" but it turns -3 to -300%
Here is what the columns look like without any format:
Edit: fixed by removing this line from dataframe source query '%Y-%m-%d'
If you are using python 3, this should do it:
from random import randint
df_1['col7'] = df_1['col7'].apply(f"{randint(-3,-301)}%")
df_1['col8'] = df_1['col8'].apply(f"{randint(-3,-301)}%")
df_1['col9'] = df_1['col9'].apply(f"{randint(-3,-301)}%")
df_1['col10'] = df_1['col10'].apply(f"{randint(-3,-301)}%")

Date and time mix up in pandas

Please consider below Dataset,
The column with dates is 'Date Announced' ,current date format id 'DD-MM-YYYY' i want to change the date format to 'MM/DD/YYYY'.
To do so i have written the following pandas code,
df3=pd.read_csv('raw_data_27th_APRonwards.csv',parse_dates=[0], dayfirst=True)
df3['Date Announced'] = pd.to_datetime(df3['Date Announced'])
df3['Date Announced'] = df3['Date Announced'].dt.strftime('%m/%d/%Y')
Post executing above code, i didn't get the desired output, please consider below Dataset,
Notice in the output, Date '09/05/2020'is coming wrong , it should be like '05/09/2020' , there is a mix up btw date and month for this particular date. how to fix this?
Do like this :
df3['Date Announced'] = pd.to_datetime(df3['Date Announced'], format='%d-%m-%Y')
Now :
df3['Date Announced'] = df3['Date Announced'].dt.strftime('%m/%d/%Y')
or pass parse_dates parameter while reading csv file like this:
pd.read_csv('your_file.csv', parse_dates=['Date Announced'])

How to check if a date in a string is greater than a given date? Python 3

So I have a CSV file of users which is in the format:
"Lastname, Firstname account_last_used_date"
I've tried dateutil parser, however it states this list is an invalid string. I need to keep the names and the dates together. I've also tried datetime but i'm having issues with "datetime not defined". I'm very new to Python, so forgive me if i've missed an easy solution.
import re
from datetime import date
with open("5cUserReport.csv","r") as dilly:
li = [(x.replace("\n","")) for x in dilly]
li2 = [(x.replace(",","")) for x in li]
for x in li2:
match = re.search(r"\d{2}-\d{2}-\d{4}", x)
date = datetime.strptime(match.group(), "%d-%m-%Y").x()
print(date)
The end goal is I need to check if the date the user last logged in is longer than 4 months. Honestly, any help here is massively welcome!
The CSV format is:
am_testuser1 02/12/2017 08:42:48
am_testuser11 13/10/2017 17:44:16
am_testuser20 27/10/2017 16:31:07
am_testuser5 23/08/2017 09:42:41
am_testuser50 21/10/2017 15:38:12
Edit: Edited the answer based on the given csv
You could do something like this with pandas
import pandas as pd
colnames = ['Lastname, Firstname', 'Date', 'Time']
df = pd.read_csv('5cUserReport.csv', delim_whitespace=True, skiprows=1, names=colnames, parse_dates={'account_last_used_date': [1,2]}, dayfirst =True)
more_than_4_months_ago = df[df['account_last_used_date'] < (pd.to_datetime('now') - pd.DateOffset(months=4))]
print(more_than_4_months_ago)
The DataFrame more_than_4_months_ago will give you a subset of all records, based on if the account_last_used_date is more than 4 months ago.
This is based on the given format. Allthough I doubt that this is your actual format, since the given usernames don't really match the format 'firstname, lastname'
Lastname, Firstname account_last_used_date
am_testuser1 02/12/2017 08:42:48
am_testuser11 13/10/2018 17:44:16
am_testuser20 27/10/2017 16:31:07
am_testuser5 23/08/2018 09:42:41
am_testuser50 21/10/2017 15:38:12
(I edited 2 lines to 2018, so that the test actually shows that it works).

Decoding the contents of a table using python

Click here to view the image
Above image is my working environment (i.e. Pandas (python))
I have a csv file, I transferred the contents of csv file into python.
file_path=filedialog.askopenfilename()
csv_file=open(file_path,'r')
pd.read_csv(csv_file)
Now after the set of codes, I can display it's contents in Python pandas as a Table.
Now I want to decode the Data in one particular Column "Batch"
In the picture, you can see a Table and in that table the particular column "Batch" is very important which is to be decoded.
Look into the data under the **column Batch.
First Character : Year
Second Character : Alphabet. It is mapped to a month (A-jan, B-feb,C- Mar, D- April, E- May.......)
Third & 4th character ; Date
Ex: The manufacturing date for 6B08MK1D11 is 08-02-2016.
Now I want to decode the every individual data in a column to find it's date based on it's Batch number. After decoding , I want to create a new column in which I have the values of seperated dates put into a new column.
For Example
after decoding this data "6B08MK1D11" I get the date as 08-02-2016. Now for all such individual batch number, I will get individual date and now the new date values should be placed by creating a new column inside the same table.
After creating a new column, the Date column should be sorted A-Z (ascending).
I tried to teach how to assign months to Python: like following,
for everycode[1] in Bat:
if everycode[1]=='A':
everycode[1] = 'Jan'
if everycode[1]=='B':
everycode[1] = 'Feb'
if everycode[1]=='C':
everycode[1] = 'Mar'
if everycode[1]=='D':
everycode[1]= 'Apr'
if everycode[1]=='E':
everycode[1]= 'May'
if everycode[1]=='F':
everycode[1]= 'Jun'
if everycode[1] == 'G':
everycode[1]= 'Jul'
if everycode[1]=='H':
everycode[1]= 'Aug'
if everycode[1]=='I':
everycode[1]= 'Sep'
if everycode[1]=='J':
everycode[1] = 'Oct'
if everycode[1]=='K':
everycode[1]= 'Nov'
if everycode[1]=='L':
everycode[1]= 'Dec'
But When I execute this, it returns an error like this:
"TypeError: 'str' object does not support item assignment"
You could try something like this:
df = pd.read_csv(csv_file)
df['Batch'] = df['Batch'].apply(interpret_date)
def interpret_date(code):
_year = replace_year(code[0])
_month = replace_month(code[1])
_date = replace_date(code[2:4])
return '-'.join([_date, _month, _year])
You will need to write the replace_...() functions to map each input to the right values.

Categories

Resources