Convert year string into datetime object - python

I have a date column in my dataframe that consists of strings like this...'201512'
I would like to convert it into a datetime object of just year to do some time series analysis.
I tried...
df['Date']= pd.to_datetime(df['Date'])
and something similar to
datetime.strptime(Date, "%Y")

I am not sure how datetime interfaces with pandas dataframes (perhaps somebody will comment if there is special usage), but in general the datetime functions would work like this:
import datetime
date_string = "201512"
date_object = datetime.datetime.strptime(date_string, "%Y%m")
print(date_object)
Getting us:
2015-12-01 00:00:00
Now that the hard part of creating a datetime object is done we simply
print(date_object.year)
Which spits out our desired
2015
More info about the parsing operators (the "%Y%m" bit of my code) is described in the documentation

I would look at the module arrow
https://arrow.readthedocs.io/en/latest/
import arrow
date = arrow.now()
#example of text formatting
fdate = date.format('YYYY')
#example of converting text into datetime
date = arrow.get('201905', 'YYYYMM').datetime

Related

How can I add a zero to dates in a string so all months are 2 characters? [duplicate]

Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)

Python datetime to Excel serial date conversion

The following code converts a string into a timestamp. The timestamp comes out to: 1646810127.
However, if I use Excel to convert this date and time into a float I get: 44629,34.
I need the Excel's output from the Python script.
I have tried with a few different datetime strings to see if there is any pattern in between the two numbers, but cannot seem to find any.
Any thoughts on how I get the code to output 44629,34?
Much appreciated
import datetime
date_time_str = '2022-03-09 08:15:27'
date_time_obj = datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S')
print('Date:', date_time_obj.date())
print('Time:', date_time_obj.time())
print('Date-time:', date_time_obj)
print(date_time_obj.timestamp())
>>output:
Date: 2022-03-09
Time: 08:15:27
Date-time: 2022-03-09 08:15:27
1646810127.0
calculate the timedelta of your datetime object versus Excel's "day zero", then divide the total_seconds of the timedelta by the seconds in a day to get Excel serial date:
import datetime
date_time_str = '2022-03-09 08:15:27'
UTC = datetime.timezone.utc
dt_obj = datetime.datetime.fromisoformat(date_time_str).replace(tzinfo=UTC)
day_zero = datetime.datetime(1899,12,30, tzinfo=UTC)
excel_serial_date = (dt_obj-day_zero).total_seconds()/86400
print(excel_serial_date)
# 44629.3440625
Note: I'm setting time zone to UTC here to avoid any ambiguities - adjust as needed.
Since the question is tagged pandas, you'd do the same thing here, only that you don't need to set UTC as pandas assumes UTC by default for naive datetime:
import pandas as pd
ts = pd.Timestamp('2022-03-09 08:15:27')
excel_serial_date = (ts-pd.Timestamp('1899-12-30')).total_seconds()/86400
print(excel_serial_date)
# 44629.3440625
See also:
background: What is story behind December 30, 1899 as base date?
inverse operation: Convert Excel style date with pandas

Converting to_datetime but keeping original time

I am trying to convert string to Datetime- but the conversion adds 5 hours to the original time. How do I convert but keep the time as is?
>>> import pandas as pd
>>> t = pd.to_datetime("2016-09-21 08:56:29-05:00", format='%Y-%m-%d %H:%M:%S')
>>> t
Timestamp('2016-09-21 13:56:29')
The conversion doesn't add 5 hours to the original time. Pandas just detects that your datetime is timezone-aware and converts it to naive UTC. But it's still the same datetime.
If you want a localized Timestamp instance, use Timestamp.tz_localize() to make t a timezone-aware UTC timestamp, and then use the Timestamp.tz_convert() method to convert to UTC-0500:
>>> import pandas as pd
>>> import pytz
>>> t = pd.to_datetime("2016-09-21 08:56:29-05:00", format='%Y-%m-%d %H:%M:%S')
>>> t
Timestamp('2016-09-21 13:56:29')
>>> t.tz_localize(pytz.utc).tz_convert(pytz.timezone('America/Chicago'))
Timestamp('2016-09-21 08:56:29-0500', tz='America/Chicago')
To achieve what you want you can remove the "-5:00" from the end of your time string "2016-09-21 08:56:29-05:00"
However, Erik Cederstrand is correct in explaining that pandas is not modifying the time, it's simply displaying it in a different format.

convert string date to date format

This is a basic question but am getting tangled up
I have a string variable referencePeriodEndDate which contains a date with type string
Which I am trying to convert to a date only format
so '31/3/2017' to 2017-03-31
But am getting stuck. I've so far tried to use:
datetimeobject = datetime.strptime(referencePeriodEndDate,'%Y-%m-%d')
datetimeobject = referencePeriodEndDate.strftime('%Y-%m-%d')
if you can use the dateutil module
from dateutil import parser
dt = parser.parse("31/3/2017")
print dt.strftime('%Y-%m-%d')
Output:
2017-03-31
Using datetime
import datetime
A = datetime.datetime.strptime('31/3/2017','%d/%m/%Y')
print A.strftime('%Y-%m-%d')

Converting date between DD/MM/YYYY and YYYY-MM-DD?

Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)

Categories

Resources