Date and time mix up in pandas

Date and time mix up in pandas - python

Please consider below Dataset,
The column with dates is 'Date Announced' ,current date format id 'DD-MM-YYYY' i want to change the date format to 'MM/DD/YYYY'.
To do so i have written the following pandas code,
df3=pd.read_csv('raw_data_27th_APRonwards.csv',parse_dates=[0], dayfirst=True)
df3['Date Announced'] = pd.to_datetime(df3['Date Announced'])
df3['Date Announced'] = df3['Date Announced'].dt.strftime('%m/%d/%Y')
Post executing above code, i didn't get the desired output, please consider below Dataset,
Notice in the output, Date '09/05/2020'is coming wrong , it should be like '05/09/2020' , there is a mix up btw date and month for this particular date. how to fix this?

Do like this :
df3['Date Announced'] = pd.to_datetime(df3['Date Announced'], format='%d-%m-%Y')
Now :
df3['Date Announced'] = df3['Date Announced'].dt.strftime('%m/%d/%Y')
or pass parse_dates parameter while reading csv file like this:
pd.read_csv('your_file.csv', parse_dates=['Date Announced'])

Related

Partial string filter pandas

On Pandas 1.3.4 and Python 3.9.
So I'm having issues filtering for a partial piece of the string. The "Date" column is listed in the format of MM/DD/YYYY HH:MM:SS A/PM where the most recent one is on top. If the date is single digit (example: November 3rd), it does not have the 0 such that it is 11/3 instead of 11/03. Basically I'm looking to go look at column named "Date" and have python read parts of the string to filter for only today.
This is what the original csv looks like. This is what I want to do to the file. Basically looking for a specific date but not any time of that date and implement the =RIGHT() formula. However this is what I end up with with the following code.
from datetime import date
import pandas as pd
df = pd.read_csv(r'file.csv', dtype=str)
today = date.today()
d1 = today.strftime("%m/%#d/%Y") # to find out what today is
df = pd.DataFrame(df, columns=['New Phone', 'Phone number', 'Date'])
df['New Phone'] = df['Phone number'].str[-10:]
df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)
df_today.to_csv(r'file.csv', index=False)

This line is wrong:
df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)
All you're doing there is creating a mask; essentially what that is is just a Pandas series, containg True or False in each row, according to the condition you created the mask in. The spreadsheet get's only FALSE as you showed because non of the items in the Date contain the string that the variable d1 holds...
Instead, try this:
from datetime import date
import pandas as pd
# Load the CSV file, and change around the columns
df = pd.DataFrame(pd.read_csv(r'file.csv', dtype=str), columns=['New Phone', 'Phone number', 'Date'])
# Take the last ten chars of each phone number
df['New Phone'] = df['Phone number'].str[-10:]
# Convert each date string to a pd.Timestamp, removing the time
df['Date'] = pd.to_datetime(df['Date'].str.split(r'\s+', n=1).str[0])
# Get the phone numbers that are from today
df_today = df[df['Date'] == date.today().strftime('%m/%d/%Y')]
# Write the result to the CSV file
df_today.to_csv(r'file.csv', index=False)

pyspark string to date

I am trying to convert the string to date format,
Date column consist data in such order but this are in string datatype
20191130
20191231
when using string to date, date should display as
2019-11-31
2019-12-31
I tried this approach but script returned error
df = spark.sql('select * from tablename)
df2 = df.withColumn('Date', expr("cast(as_of_date,'yyyyMMdd) as date"))
I also tried on this script and it works , however, with this , it is displaying date and time which is not I wanted
df2 = df.withColumn("Date",expr("cast(unix_timestamp(as_of_date ,'yyyyMMdd') as date)")).show()

Try using to_date?
df2 = df.withColumn('Date', to_date(col('as_of_date'), 'yyyyMMdd'))

How to transform invalid date to valid date using Pandas?

I have a dataframe like as shown below
df = pd.DataFrame({'d1' :['2/26/2019 03:31','10241-2-19 0:0:0','31/03/2016 16:00'],
'd2' :['2/29/2019 05:21','10241-2-29 0:0:0','03/04/2016 12:00']})
As you can see there are some invalid date values. Meaning records with year like 10241.
On the other hand, valid dates can be in both format as mdy_dm or dmy_dm.
When I try the below, I get an error message that "date of out of range"
df['d1'] = pd.to_datetime(df.d1)
df['d1'].dt.strftime('%m/%d/%Y hh:ss')
Is there anyway to fix this?
I expect my output to be like as shown below

How to check if a date in a string is greater than a given date? Python 3

So I have a CSV file of users which is in the format:
"Lastname, Firstname account_last_used_date"
I've tried dateutil parser, however it states this list is an invalid string. I need to keep the names and the dates together. I've also tried datetime but i'm having issues with "datetime not defined". I'm very new to Python, so forgive me if i've missed an easy solution.
import re
from datetime import date
with open("5cUserReport.csv","r") as dilly:
li = [(x.replace("\n","")) for x in dilly]
li2 = [(x.replace(",","")) for x in li]
for x in li2:
match = re.search(r"\d{2}-\d{2}-\d{4}", x)
date = datetime.strptime(match.group(), "%d-%m-%Y").x()
print(date)
The end goal is I need to check if the date the user last logged in is longer than 4 months. Honestly, any help here is massively welcome!
The CSV format is:
am_testuser1 02/12/2017 08:42:48
am_testuser11 13/10/2017 17:44:16
am_testuser20 27/10/2017 16:31:07
am_testuser5 23/08/2017 09:42:41
am_testuser50 21/10/2017 15:38:12

Edit: Edited the answer based on the given csv
You could do something like this with pandas
import pandas as pd
colnames = ['Lastname, Firstname', 'Date', 'Time']
df = pd.read_csv('5cUserReport.csv', delim_whitespace=True, skiprows=1, names=colnames, parse_dates={'account_last_used_date': [1,2]}, dayfirst =True)
more_than_4_months_ago = df[df['account_last_used_date'] < (pd.to_datetime('now') - pd.DateOffset(months=4))]
print(more_than_4_months_ago)
The DataFrame more_than_4_months_ago will give you a subset of all records, based on if the account_last_used_date is more than 4 months ago.
This is based on the given format. Allthough I doubt that this is your actual format, since the given usernames don't really match the format 'firstname, lastname'
Lastname, Firstname account_last_used_date
am_testuser1 02/12/2017 08:42:48
am_testuser11 13/10/2018 17:44:16
am_testuser20 27/10/2017 16:31:07
am_testuser5 23/08/2018 09:42:41
am_testuser50 21/10/2017 15:38:12
(I edited 2 lines to 2018, so that the test actually shows that it works).

Python Pandas filtering dataframe on date

I am trying to manipulate a CSV file on a certain date in a certain column.
I am using pandas (total noob) for that and was pretty successful until i got to dates.
The CSV looks something like this (with more columns and rows of course).
These are the columns:
Circuit
Status
Effective Date
These are the values:
XXXX001
Operational
31-DEC-2007
I tried dataframe query (which i use for everything else) without success.
I tried dataframe loc (which worked for everything else) without success.
How can i get all rows that are older or newer from a given date? If i have other conditions to filter the dataframe, how do i combine them with the date filter?
Here's my "raw" code:
import pandas as pd
# parse_dates = ['Effective Date']
# dtypes = {'Effective Date': 'str'}
df = pd.read_csv("example.csv", dtype=object)
# , parse_dates=parse_dates, infer_datetime_format=True
# tried lot of suggestions found on SO
cols = df.columns
cols = cols.map(lambda x: x.replace(' ', '_'))
df.columns = cols
status1 = 'Suppressed'
status2 = 'Order Aborted'
pool = '2'
region = 'EU'
date1 = '31-DEC-2017'
filt_df = df.query('Status != #status1 and Status != #status2 and Pool == #pool and Region_A == #region')
filt_df.reset_index(drop=True, inplace=True)
filt_df.to_csv('filtered.csv')
# this is working pretty well
supp_df = df.query('Status == #status1 and Effective_Date < #date1')
supp_df.reset_index(drop=True, inplace=True)
supp_df.to_csv('supp.csv')
# this is what is not working at all
I tried many approaches, but i was not able to put it together. This is just one of many approaches i tried.. so i know it is perhaps completely wrong, as no date parsing is used.
supp.csv will be saved, but the dates present are all over the place, so there's no match with the "logic" in this code.
Thanks for any help!

Make sure you convert your date to datetime and then filter slice on it.
df['Effective Date'] = pd.to_datetime(df['Effective Date'])
df[df['Effective Date'] < '2017-12-31']
#This returns all the values with dates before 31th of December, 2017.
#You can also use Query

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Date and time mix up in pandas - python

Do like this : df3['Date Announced'] = pd.to_datetime(df3['Date Announced'], format='%d-%m-%Y') Now : df3['Date Announced'] = df3['Date Announced'].dt.strftime('%m/%d/%Y') or pass parse_dates parameter while reading csv file like this: pd.read_csv('your_file.csv', parse_dates=['Date Announced'])

Related

Partial string filter pandas

pyspark string to date

How to transform invalid date to valid date using Pandas?

How to check if a date in a string is greater than a given date? Python 3

Python Pandas filtering dataframe on date

Categories

Resources