I have a pyspark data frame that I am going to convert one of its column( which is in timestamp ) into Jalali date.
My data frame:
Name
CreationDate
Sara
2022-01-02 10:49:43
Mina
2021-01-02 12:30:21
I want the following result:
Name
CreationDate
Sara
1400-10-12 10:49:43
Mina
1399-10-13 12:30:21
I try the following code, but It does not work, I cannot find a way to convert the date and time:
df_etl_test_piko1.select(jdatetime.datetime.col('creationdate').strftime("%a, %d %b %Y %H:%M:%S"))
You need to define UDF like this:
import jdatetime
from pyspark.sql import functions as F
#F.udf(StringType())
def to_jalali(ts):
jts = jdatetime.datetime.fromgregorian(datetime=ts)
return jts.strftime("%a, %d %b %Y %H:%M:%S")
Then applying to your example:
df = spark.createDataFrame([("Sara", "2022-01-02 10:49:43"), ("Mina", "2021-01-02 12:30:21")], ["Name", "CreationDate"])
# cast column CreationDate into timestamp type of not already done
# df = df.withColumn("CreationDate", F.to_timestamp("CreationDate"))
df = df.withColumn("CreationDate", to_jalali("CreationDate"))
df.show(truncate=False)
#+----+-------------------------+
#|Name|CreationDate |
#+----+-------------------------+
#|Sara|Sun, 12 Dey 1400 10:49:43|
#|Mina|Sat, 13 Dey 1399 12:30:21|
#+----+-------------------------+
Related
I am trying to convert the way month and year is presented.
I have dataframe as below
Date
2020-01-31
2020-04-30
2021-05-05
and I want to convert it in the way like month and year.
The output that I am expecting is
Date
Jan-20
Apr-20
May-21
I tried to do it with datetime but it doesn't work.
pd.to_datetime(pd.Series(df['Date'),format='%mmm-%yy')
Use .dt.strftime() to change the display format. %b-%y is the format string for Mmm-YY:
df.Date = pd.to_datetime(df.Date).dt.strftime('%b-%y')
# Date
# 0 Jan-20
# 1 Apr-20
# 2 May-21
Or if Date is the index:
df.index = pd.to_datetime(df.index).dt.strftime('%b-%y')
import pandas as pd
date_sr = pd.to_datetime(pd.Series("2020-12-08"))
change_format = date_sr.dt.strftime('%b-%Y')
print(change_format)
reference https://docs.python.org/3/library/datetime.html
%Y-%m-%d changed to ('%b-%y')
import datetime
df['Date'] = df['Date'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d').strftime('%b-%y'))
# reference https://docs.python.org/3/library/datetime.html
# %Y-%m-%d changed to ('%b-%y')
I have a large dataset with a datetime variable "CHECKIN_DATE_TIME" and would like to create a new variable that is just the date sans the time. The "CHECKIN_DATE_TIME" is formatted as such 2020-02-01 11:13:17.000. I want the new variable to be formatted like 2020-02-01.
I'm referencing the following for help https://www.programiz.com/python-programming/datetime/strptime but when I write my code, I'm getting attribute errors: "Traceback(most recent call last)" and "DataFrame' object has no attribute 'strptime'"
import datetime
NOTES_TAT=NOTES_TAT.strptime(CHECKIN_DATE_TIME,"%d %B, %Y")
You are using pandas dataframe. Try,
NOTES_TAT['CHECKIN_DATE_TIME'].dt.strftime('%d %B, %Y')
You can access the datetime wrapper via the .dt DataFrame accessor. To get just the date, use the .date property at the end.
Example:
import pandas as pd
# Build a sample DataFrame
df = pd.DataFrame({'checkin': '2020-02-01 11:13:17.000'}, index=[0])
df['checkin'] = pd.to_datetime(df['checkin'])
# Create the date column using the `date` property.
df['date'] = df['checkin'].dt.date
# For a formatted date:
df['date'] = df['checkin'].dt.strftime('%d %B, %Y')
Output 1:
checkin date
0 2020-02-01 11:13:17 2020-02-01
Output 2:
checkin date
0 2020-02-01 11:13:17 01 February, 2020
I have a file where the date and time are in mixed formats as per below:
Ref_ID Date_Time
5.645217e 2020-12-02 16:23:15
5.587422e 2019-02-25 18:33:24
What I'm trying to do is convert the dates into a standard format so that I can further analyse my dataset.
Expected Outcome:
Ref_ID Date_Time
5.645217e 2020-02-12 16:23:15
5.587422e 2019-02-25 18:33:24
So far I've tried a few things like Pandas to_datetime conversion and converting the date using strptime but none has worked so far.
# Did not work
data["Date_Time"] = pd.to_datetime(data["Date_Time"], errors="coerce")
# Also Did not work
data["Date_Time"] = data["Date_Time"].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%y'))
I've also searched this site for a solution but haven't found one yet.
you could try uisng str.split to extract the day and month and use some boolean testing:
this may be a bit confusing with all the variables but all we are doing is creating new series and dataframes to manipulate the variables, those being the day and month of your original date-time column
# create new dataframe with time split by space so date and time are split
s = df['Date_Time'].str.split('\s',expand=True)
# split date into its own series
m = s[0].str.split('-',expand=True).astype(int)
#use conditional logic to figure out column is the month or day.
m['possible_month'] = np.where(m[1].ge(12),m[2],m[1])
m['possible_day'] = np.where(m[1].ge(12),m[1],m[2])
#concat this back into your first split to re-create a proper datetime.
s[0] = m[0].astype(str).str.cat([m['possible_month'].astype(str),
m['possible_day'].astype(str)],'-')
df['fixed_date'] = pd.to_datetime(s[0].str.cat(s[1].astype(str),' ')
,format='%Y-%m-%d %H:%M:%S')
print(df)
Ref_ID Date_Time fixed_date
0 5.645217e 2020-12-02 16:23:15 2020-02-12 16:23:15
1 5.587422e 2019-02-25 18:33:24 2019-02-25 18:33:24
print(df.dtypes)
Ref_ID object
Date_Time object
fixed_date datetime64[ns]
dtype: object
I'm trying to subtract a day from this date 06-30-2019 in order to make it 06-29-2019 but can't figure out any way to achive that.
I've tried with:
import datetime
date = "06-30-2019"
date = datetime.datetime.strptime(date,'%m-%d-%Y').strftime('%m-%d-%Y')
print(date)
It surely gives me back the date I used above.
How can I subtract a day from a date in the above format?
try this
import datetime
date = "06/30/19"
date = datetime.datetime.strptime(date, "%m/%d/%y")
NewDate = date + datetime.timedelta(days=-1)
print(NewDate) # 2019-06-29 00:00:00
Your code:
date = "06-30-2019"
date = datetime.datetime.strptime(date,'%m-%d-%Y').strftime('%m-%d-%Y')
Check type of date variable.
type(date)
Out[]: str
It is in string format. To perform subtraction operation you must convert it into date format first. You can use pd.to_datetime()
# Import packages
import pandas as pd
from datetime import timedelta
# input date
date = "06-30-2019"
# Convert it into pd.to_datetime format
date = pd.to_datetime(date)
print(date)
# Substracting days
number_of_days = 1
new_date = date - timedelta(number_of_days)
print(new_date)
output:
2019-06-29 00:00:00
If you want to get rid of timestamp you can use:
str(new_date.date())
Out[]: '2019-06-29'
use timedelta
import datetime
date = datetime.datetime.strptime("06/30/19" ,"%m/%d/%y")
print( date - datetime.timedelta(days=1))
I have a timestamp column in my dataframe which is originally a str type. Some sample values:
'6/13/2015 6:45:58 AM'
'6/13/2015 7:00:37 PM'
I use the following code to convert this values into datetime with 24H format using this code:
df['timestampx'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %H:%M:%S %p')
And, I obtain this result:
2015-06-13 06:45:58
2015-06-13 07:00:37
That means, the dates are NOT converted with 24H format and I am also loosing the AM/PM info. Any help?
You're reading it in as a 24 hour time, but really the current format isn't 24 hour time, it's 12 hour time. Read it in as 12 hour with the suffix (AM/PM), then you'll be OK to output in 24 hour time later if need be.
df = pd.DataFrame(['6/13/2015 6:45:58 AM','6/13/2015 7:00:37 PM'], columns = ['timestamp'])
df['timestampx'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %I:%M:%S %p')
print df
timestamp timestampx
0 6/13/2015 6:45:58 AM 2015-06-13 06:45:58
1 6/13/2015 7:00:37 PM 2015-06-13 19:00:37