Converting String to dates in pyspark - python

I have a String in the format of "MMM-YY" (ie) "jun-22","Jan-22" etc.
I want to convert it into Date with 01st Day of Month in the following format.
Jan-22 --> 01-Jan-22
Feb-21 --> 01-Feb-21
I have tried a few ways but couldn't get to the solution.
Can someone please advise on what is the quickest and most efficient way of doing this in a Pyspark Dataframe.
Code used could be pyspark or Python.

Thanks for the help. I was able to add "01-" at the beginning of the date string and converting it into a date.

Related

how can I find a date with incorrect Syntax and fix it

I am new to python. I have a dataset I converted it to dataframe. all my dates are objects now. I need to convert them into dates in order to find the age of patients. My dimensions are 3400x14 long. there are date values inside which have incorrect syntax. I cannot find them. is there a way to find them?
Cdf['Birthday'] = Cdf['Birthday'].astype('datetime64[ns]')
I am using this formula to convert. I need date without time. I am getting an error which is
"DateParseError: Invalid date specified (25/15)"
Thank you for help in advance
You can use pd.to_datetime
Cdf['date'] = pd.to_datetime(Cdf['Birthday'], errors='coerce')
and check for which values in this column have NaTs as values. All invalid dates will be converted to NaT. You can use
Cdf.loc[Cdf['date'].isnull(), 'date']
to find all values which are invalid.

which date format looks like this: 46:53.4

I am processing a dataset with a date column in it. But the date format is strange to me:
date
59:06.4
42:42.9
07:18.0
......
I have never seen this format before. Could anyone let me know what this format is? and if I use python to process it, what functions I should use?
I think I know. This is the date + time format. When I read it in python. It automatically transfer into datetime format

Unable to convert index to date format

I have data which is in-64 in the Index with values like "01/11/2018" in the index. It is data that has been imported from a csv. I am unable to convert it to a "01-11-2018" format. How do I do this because I get an error message:
'time data 0 does not match format '%Y' (match)'
I got the data from the following website:
https://www.nasdaq.com/symbol/spy/historical
and you can find a ' Download this file in Excel Format ' icon at the bottom.
import datetime
spyderdat.index = pd.to_datetime(spyderdat.index, format='%Y')
spyderdat.head()
How do I format this correctly?
Thanks a lot.
Your format string must match exactly:
import datetime
spyderdat.index = pd.to_datetime(spyderdat.index, format='%d/%m/%Y')
spyderdat.head()
Example w/o spyder:
import datetime
date = "1/11/2018"
print(datetime.datetime.strptime(date,"%d/%m/%Y"))
Output:
2018-11-01 00:00:00
You can strftime this datetime then anyhow you like. See link for formats. Or you store datetimes.
Assuming your input is a string, simply converting the / to - won't fix the issue.
The real problem is that you've told to_datetime to expect the input string to be only a 4-digit year but you've handed it an entire date, days and months included.
If you meant to use only the year portion you should manually extract the year first with something like split.
If you meant to use the full date as a value, you'll need to change your format to something like %d/%m/%Y. (Although I can't tell if your input is days first or months first due to their values.)
The easy way is to try this
datetime.datetime.strptime("01/11/2018", '%d/%m/%Y').strftime('%d-%m-%Y')

In python, how do I convert my string date column to date format using pandas?

I am unable to convert my date column from my data frame from a string to date format. I've tried using the lambda date conversion code in the image above, as well as a few other methods and I can't seem to make it work. It appears to possibly be because my 'variable' column does not appear to be like the rest of the columns (I think that it is maybe "indexed?"). Help would be much appreciate!
enter image description here
Use to_datetime:
Hotsprings2['variable'] = pd.to_datetime(Hotsprings2['variable'], format='%Y-%m-%d')

Changing date format in python

I have a pandas dataframe with a column containing a date; the format of the original string is YYYY/DD/MM HH:MM:SS.
I am trying to convert the string into a datetime format, by using
df['Date']=pd.to_datetime(df['Data'], errors='coerce')
but plotting it I can see it doesn't recognize the correct format.
Can you help me to understand whether there is an option to give python the correct format to read the column?
I have seen the format tag for to_datetime function, but I can't use it correctly.
Thanks a lot for your help!
Try this:
df['Date'] = pd.to_datetime(df['Data'], format='%Y/%d/%m %H:%M:%S')
It looks like you're using a non-standard date format. It should be YYYY-MM-DD. Try formating with the strptime() method.
time.strptime('2016/15/07', '%Y/%d/%m')
If you need to get it to a string after that use time.strftime().

Categories

Resources