Currently I'm capturing dates from a csv file, but the date field can come in any format.
I want to transform this dates to only %Y-%m-%d date format. But with strptime does not work.
For example:
Csv Dates ----> Transformation
2020/06/23 06:00
---> 2020-06-23
23/04/2020 05:00
---> 2020-04-23
11/4/2020 10:00
---> 2020-04-11
2022/1/24 11:00
---> 2022-01-24
Code:
fecha_csv = row[7]
fecha_csv = datetime.strptime(fecha_csv, '%Y-%m-%d %H:%M:%S')
fecha_csv = fecha_csv.date()
This is assuming the format of dates that you have given in your example. to format further dates this may need to be modified depending on the date given.
the problem you are having possibly is that you aren't converting it into a proper datetime object so that you can change it to the date format that you would like.
you can change the date format with time into just the date with a couple of methods. one is to just do string manipulation if the date is always formatted the same like the examples shown or you could convert it to datetime objects like the following.
fecha_csv = row[7]
if len(fecha_csv.split('/')[0]) > 2: # year is first
datetime.strptime(fecha_csv, '%Y/%m/%d %H:%M').strftime("%Y-%m-%d")
else: # year is last
datetime.strptime(fecha_csv, '%d/%m/%Y %H:%M').strftime("%Y-%m-%d")
A problem with your current code is that it was formatted to read dates in as 2020-06-23 06:00:00 when it should only be formatted to read in as 2020/06/23 06:00
Similarly, you could use a date parser -
from dateutil.parser import parse
fecha_csv = row[7]
csv_date = parse(fetch_csv).date()
Related
I have this column where the string has date, month, year and also time information. I need to take the date, month and year only.
There is no space in the string.
The string is on this format:
date
Tuesday,August22022-03:30PMWIB
Monday,July252022-09:33PMWIB
Friday,January82022-09:33PMWIB
and I expect to get:
date
2022-08-02
2022-07-25
2022-01-08
How can I get the date, month and year only and change the format into yyyy-mm-dd in python?
thanks in advance
Use strptime from datetime library
var = "Tuesday,August22022-03:30PMWIB"
date = var.split('-')[0]
formatted_date = datetime.strptime(date, "%A,%B%d%Y")
print(formatted_date.date()) #this will get your output
Output:
2022-08-02
You can use the standard datetime library
from datetime import datetime
dates = [
"Tuesday,August22022-03:30PMWIB",
"Monday,July252022-09:33PMWIB",
"Friday,January82022-09:33PMWIB"
]
for text in dates:
text = text.split(",")[1].split("-")[0]
dt = datetime.strptime(text, '%B%d%Y')
print(dt.strftime("%Y-%m-%d"))
An alternative/shorter way would be like this (if you want the other date parts):
for text in dates:
dt = datetime.strptime(text[:-3], '%A,%B%d%Y-%I:%M%p')
print(dt.strftime("%Y-%m-%d"))
The timezone part is tricky and works only for UTC, GMT and local.
You can read more about the format codes here.
strptime() only accepts certain values for %Z:
any value in time.tzname for your machine’s locale
the hard-coded values UTC and GMT
You can convert to datetime object then get string back.
from datetime import datetime
datetime_object = datetime.strptime('Tuesday,August22022-03:30PM', '%A,%B%d%Y-%I:%M%p')
s = datetime_object.strftime("%Y-%m-%d")
print(s)
You can use the datetime library to parse the date and print it in your format. In your examples the day might not be zero padded so I added that and then parsed the date.
import datetime
date = 'Tuesday,August22022-03:30PMWIB'
date = date.split('-')[0]
if not date[-6].isnumeric():
date = date[:-5] + "0" + date[-5:]
newdate = datetime.datetime.strptime(date, '%A,%B%d%Y').strftime('%Y-%m-%d')
print(newdate)
# prints 2022-08-02
I have datetime in yyyy-MM-dd'T'HH:mm:ssZ i.e (2022-04-30T07:00:00+0000) format and I want it to be converted into %Y-%m-%d i.e 2022-04-30 format in python. Can anyone tell me how to do it?
I have tried date = datetime.strptime(end_time, "%Y-%m-%d") and
date = datetime.strptime(end_time, "%Y-%m-%d").date(), but it's not working.
I have a requirement where the date returned may or may not have timestamp. so if a date is returned like 2021-03-16 I would like to append 00:00:00 to the date referring to format "%Y-%m-%d %H:%M:%S".
what is the best way to do this?
import datetime
s='2021-03-16'
s1='2021-03-16 23-12-34'
if len(s) <=10:
print(datetime.datetime.strptime(s, "%Y-%m-%d").strftime("%Y-%m-%d %H:%M:%S"))
else:
print(datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S").strftime("%Y-%m-%d %H:%M:%S"))
Output:
with s
2021-03-16 00:00:00
with s1
2021-03-16 23:12:34
Simple if clause would do if only datetime is there check for the length
datetime.strptime() class method creates a datetime object from a
string representing a date and time and a corresponding format string.
strftime(format) method, to create a string representing the time
under the control of an explicit format string.
I tried:
df["datetime_obj"] = df["datetime"].apply(lambda dt: datetime.strptime(dt, "%d/%m/%Y %H:%M"))
but got this error:
ValueError: time data '10/11/2006 24:00' does not match format
'%d/%m/%Y %H:%M'
How to solve it correctly?
The reason why this does not work is because the %H parameter only accepts values in the range of 00 to 23 (both inclusive). This thus means that 24:00 is - like the error says - not a valid time string.
I think therefore we have not much other options than convert the string to a valid format. We can do this by first replacing 24:00 with 00:00, and then later increment the day for these timestamps.
Like:
from datetime import timedelta
import pandas as pd
df['datetime_zero'] = df['datetime'].str.replace('24:00', '0:00')
df['datetime_er'] = pd.to_datetime(df['datetime_zero'], format='%d/%m/%Y %H:%M')
selrow = df['datetime'].str.contains('24:00')
df['datetime_obj'] = df['datetime_er'] + selrow * timedelta(days=1)
The last line thus adds one day to the rows that contain 24:00, such that '10/11/2006 24:00' gets converted to '11/11/2006 24:00'. Note however that the above is rather unsafe since depending on the format of the timestamp this will/will not work. For the above it will (probably) work, since there is only one colon. But if for example the datetimes have seconds as well, the filter could get triggered for 00:24:00, so it might require some extra work to get it working.
Your data doesn't follow the conventions used by Python / Pandas datetime objects. There should be only one way of storing a particular datetime, i.e. '10/11/2006 24:00' should be rewritten as '11/11/2006 00:00'.
Here's one way to approach the problem:
# find datetimes which have '24:00' and rewrite
twenty_fours = df['strings'].str[-5:] == '24:00'
df.loc[twenty_fours, 'strings'] = df['strings'].str[:-5] + '00:00'
# construct datetime series
df['datetime'] = pd.to_datetime(df['strings'], format='%d/%m/%Y %H:%M')
# add one day where applicable
df.loc[twenty_fours, 'datetime'] += pd.DateOffset(1)
Here's some data to test:
dateList = ['10/11/2006 24:00', '11/11/2006 00:00', '12/11/2006 15:00']
df = pd.DataFrame({'strings': dateList})
Result after transformations described above:
print(df['datetime'])
0 2006-11-11 00:00:00
1 2006-11-11 00:00:00
2 2006-11-12 15:00:00
Name: datetime, dtype: datetime64[ns]
As indicated in the documentation (https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior), hours go from 00 to 23. 24:00 is then an error.
I have a timestamp column in my dataframe which is originally a str type. Some sample values:
'6/13/2015 6:45:58 AM'
'6/13/2015 7:00:37 PM'
I use the following code to convert this values into datetime with 24H format using this code:
df['timestampx'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %H:%M:%S %p')
And, I obtain this result:
2015-06-13 06:45:58
2015-06-13 07:00:37
That means, the dates are NOT converted with 24H format and I am also loosing the AM/PM info. Any help?
You're reading it in as a 24 hour time, but really the current format isn't 24 hour time, it's 12 hour time. Read it in as 12 hour with the suffix (AM/PM), then you'll be OK to output in 24 hour time later if need be.
df = pd.DataFrame(['6/13/2015 6:45:58 AM','6/13/2015 7:00:37 PM'], columns = ['timestamp'])
df['timestampx'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %I:%M:%S %p')
print df
timestamp timestampx
0 6/13/2015 6:45:58 AM 2015-06-13 06:45:58
1 6/13/2015 7:00:37 PM 2015-06-13 19:00:37