Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)
Related
Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)
I'm working on date formatting and few cells contains data i.e. June/142017(no slash between date and year). I want to split the date and year and convert into standard format MM/DD/YYYY.
I'm formatting the date into standard format, which is becoming exclusive to June Month, by using the replace function, i.e. replace("June/142017", "June/14/2017"). Please, could you assist me with the code that should split and convert into standard format which is not specific.
Below is the code I'm using:
`import pandas as pd
import datetime as dt
File = pd.read_excel("Final_file.xlsx")
LFile = File.replace("June/142017","June/14/2017")
LFile["Date"] = pd.to_datetime(LFile["Date"]).dt.strftime("%m/%d/%Y")
LFile.to_excel("Updated_Final_File.xlsx")`
*** FYI - I'm new to Python.
Thank you in Advance.
Use format %B/%d%Y for match June/142017:
File = pd.read_excel("Final_file.xlsx")
d1 = pd.to_datetime(LFile["Date"], format='%B/%d%Y', errors='coerce')
d2 = pd.to_datetime(LFile["Date"], errors='coerce')
LFile["Date"] = d2.fillna(d1).dt.strftime("%m/%d/%Y")
LFile.to_excel("Updated_Final_File.xlsx")
I am currently attempting to convert a column "datetime" which has values that are dates/times in string form, and I want to convert the column such that all of the strings are converted to timestamps.
The date/time strings are of the form "10/11/2015 0:41", and I'd like to convert the string to a timestamp of form YYYY-MM-DD HH:MM:SS. At first I attempted to cast the column to timestamp in the following way:
df=df.withColumn("datetime", df["datetime"].cast("timestamp"))
Though when I did so, I received null for every value, which lead me to believe that the input dates needed to be formatted somehow. I have looked into numerous other possible remedies such as to_timestamp(), though this also gives the same null results for all of the values. How can a string of this format be converted into a timestamp?
Any insights or guidance are greatly appreciated.
Try:
import datetime
def to_timestamp(date_string):
return datetime.datetime.strptime(date_string, "%m/%d/%Y %H:%M")
df = df.withColumn("datetime", to_timestamp(df.datetime))
You can use the to_timestamp function. See Datetime Patterns for valid date and time format patterns.
df = df.withColumn('datetime', F.to_timestamp('datetime', 'M/d/y H:m'))
df.show(truncate=False)
You were doing it in the right way, except you missed to add the format ofstring type which is in this case MM/dd/yyyy HH:mm. Here M is used for months and m is used to detect minutes. Having said that, see the code below for reference -
df = spark.createDataFrame([('10/11/2015 0:41',), ('10/11/2013 10:30',), ('12/01/2016 15:56',)], ("String_Timestamp", ))
from pyspark.sql.functions import *
df.withColumn("Timestamp_Format", to_timestamp(col("String_Timestamp"), "MM/dd/yyyy HH:mm")).show(truncate=False)
+----------------+-------------------+
|String_Timestamp| Timestamp_Format|
+----------------+-------------------+
| 10/11/2015 0:41|2015-10-11 00:41:00|
|10/11/2013 10:30|2013-10-11 10:30:00|
|12/01/2016 15:56|2016-12-01 15:56:00|
+----------------+-------------------+
I'm doing some data analysis on a dataset (https://www.kaggle.com/sudalairajkumar/covid19-in-usa) and Im trying to convert the date and time column (lastModified) to the proper datetime format. When I tried it first it returned an error
ValueError: hour must be in 0..23
so I tried doing this -
data_df[['date','time']] =
data_df['lastModified'].str.split(expand=True)
data_df['lastModified'] = (pd.to_datetime(data_df.pop('date'),
format='%d/%m/%Y') +
pd.to_timedelta(data_df.pop('time') + ':00'))
This gives an error - Columns must be same length as key
I understand this means that both columns I'm splitting arent the same size. How do I resolve this issue? I'm relatively new to python. Please explain in a easy to understand manner. thanks very much
This is my whole code-
import pandas as pd
dataset_url = 'https://www.kaggle.com/sudalairajkumar/covid19-in-
usa'
import opendatasets as od
od.download(dataset_url)
data_dir = './covid19-in-usa'
import os
os.listdir(data_dir)
data_df = pd.read_csv('./covid19-in-usa/us_covid19_daily.csv')
data_df
data_df[['date','time']] =
data_df['lastModified'].str.split(expand=True)
data_df['lastModified'] = (pd.to_datetime(data_df.pop('date'),
format='%d/%m/%Y') +
pd.to_timedelta(data_df.pop('time') + ':00'))
Looks like lastModified is in ISO format. I have used something like below to convert iso date string:
from dateutil import parser
from datetime import datetime
...
timestamp = parser.isoparse(lastModified).timestamp()
dt = datetime.fromtimestamp(timestamp)
...
On this line:
data_df[['date','time']] = data_df['lastModified'].str.split(expand=True)
In order to do this assignment, the number of columns on both sides of the = must be the same. split can output multiple columns, but it will only do this if it finds the character it's looking for to split on. By default, it splits by whitespace. There is no whitespace in the date column, and therefore it will not split. You can read the documentation for this here.
For that reason, this line should be like this, so it splits on the T:
data_df[['date','time']] = data_df['lastModified'].str.split('T', expand=True)
But the solution posted by #southiejoe is likely to be more reliable. These timestamps are in a standard format; parsing them is a previously-solved problem.
You need these libraries
#import
from dateutil import parser
from datetime import datetime
Then try writing something similar for convert the date and time column. This way the columns should be the same length as the key
#convert the time column to the correct datetime format
clock = parser.isoparse(lastModified).timestamp()
#convert the date column to the correct datetime format
data = datetime.fromtimestamp(timestamp)
I try to convert a date in english (2019-10-07) in french (07/10/2016)
I try
dat = '07/10/2019'
dat = time.strftime('%Y-%m-%d')
but got the result '2019-10-16' instead of '2019-10-07'
using datetime you can decide the format in which the source date is provided, and the target format you want.
from datetime import datetime
dat = '07/10/2019'
datetime.strptime(dat, "%d/%m/%Y").strftime("%Y-%m-%d")
out[6]: '2019-10-07'
strftime needs a time/date to convert, and it will use the current date and time if you don't provide one. The previous value of dat is not relevant - this information is not seen by strftime.
You need to provide the time information that strftime will format, as a tuple that you can get by parsing the original string. For this, use strptime (f for format, p for parse).
So:
dmy = '07/10/2019'
ymd = time.strftime('%Y-%m-%d', time.strptime(dmy, '%d/%m/%Y'))
# ^^^^^^^^ ^^^^^^^^
# output schema input schema
# now ymd is '2019-10-07'
(Or you can use the datetime module as in the other answer. This way, the parsing gives you an object, which has a method to format back - so you can write the whole operation "in order" on the line. But the general principle is the same: you need to parse, then format, and you need to specify the schema on each side.)
with :
dat = time.strftime('%Y-%m-%d')
you recover your actual date.
you need to make :
from datetime import datetime
dat = '07/10/2019'
dat = datetime.strptime(dat, '%m/%d/%Y')
print(dat.strftime('%Y-%m-%d') )