Using Python how to extract the Date values from a DateTime column?
Like this example using SQL:
SELECT
CONVERT(DATE, GETDATE()) date;
Having a string (datestring in this example) that represents each value of that column, you can use the strptime method of datetime module:
import datetime as dt
datestring = "2016-02-05 00:48:23"
date = dt.datetime.strptime(datestring, "%Y-%m-%d %H:%M:%S").date()
Then you can have access to the day, month and year as follows:
day = date.day
month = date.month
year = date.year
You could use a Pandas DataFrame and read the sql table using the pandas.read_sql() function, given your SQL connection:
import pandas as pd
df = pd.read_sql('select referrer_col, timestamp_col from my_table', your_connection)
Then convert the timestamp column using Series.dt.date
df['date_only'] = df['timestamp_col'].dt.date
Related
I have an SQL query:
select * from db where start date = '2015-01-01'
When I run this it is output as a dataframe. I then require the same SQL statement to run again, but this time the month to increase by 1 (2015-02-01) in the start date and then append this dataframe to the previous, run again with '2015-03-01' and append this dataframe to the previous and keep looping all the way up to 2023-01-01. I then need this to extract this dataframe as a CSV file.
How can I do this in Python?
year = 2015
month = 1
day = 1
for add_month in range(0, 25):
date = f"'{year + add_month // 12}-{month + (add_month % 12):02}-{day:02}'"
sql_query = "select * from db where start date = "+ date
print(sql_query)
we could also use the datetime and dateutil module from python. first we create a datetime object from the date string. then we can use relativedelta module to add a time span.
from datetime import datetime
from dateutil.relativedelta import relativedelta
my_date = datetime.strptime('2015-01-01', '%Y-%m-%d')
for repeat in range(0,12):
print(f"select * from db where start date = '{datetime.strftime(my_date, '%Y-%m-%d')}'")
my_date = my_date + relativedelta(months=+1)
What I wanted to do is get 1 year of data.
By calculate latest date from the column date, as my end date. Then use the end date - 1 year to get the start date. After that, I can filter the data in between those start and end date.
I did manage to get the end date, but can't find how I can get the start date.
Below is the code that I have used so far. -1 year is what needs to be solved.
and if you know how to filter in pyspark is also welcome.
from pyspark.sql.functions import min, max
import datetime
import pyspark.sql.function as F
from pyspark.sql.functions import date_format, col
#convert string to date type
df = df.withColumn('risk_date', F.to_date(F.col('chosen_risk_prof_date'), 'dd.MM.yyyy'))
#filter only 1 year of data from big data set.
#calculate the start date and end date. lastest_date = end end.
latest_date = df.select((max("risk_date"))).show()
start_date = latest_date - *1 year*
new_df = df.date > start_date & df.date < end_date
Then after this get all the data between start date and end date
you can use relativedelta as below
from datetime import datetime
from dateutil.relativedelta import relativedelta
print(datetime.now() - relativedelta(years=1))
I'm trying to subtract a day from this date 06-30-2019 in order to make it 06-29-2019 but can't figure out any way to achive that.
I've tried with:
import datetime
date = "06-30-2019"
date = datetime.datetime.strptime(date,'%m-%d-%Y').strftime('%m-%d-%Y')
print(date)
It surely gives me back the date I used above.
How can I subtract a day from a date in the above format?
try this
import datetime
date = "06/30/19"
date = datetime.datetime.strptime(date, "%m/%d/%y")
NewDate = date + datetime.timedelta(days=-1)
print(NewDate) # 2019-06-29 00:00:00
Your code:
date = "06-30-2019"
date = datetime.datetime.strptime(date,'%m-%d-%Y').strftime('%m-%d-%Y')
Check type of date variable.
type(date)
Out[]: str
It is in string format. To perform subtraction operation you must convert it into date format first. You can use pd.to_datetime()
# Import packages
import pandas as pd
from datetime import timedelta
# input date
date = "06-30-2019"
# Convert it into pd.to_datetime format
date = pd.to_datetime(date)
print(date)
# Substracting days
number_of_days = 1
new_date = date - timedelta(number_of_days)
print(new_date)
output:
2019-06-29 00:00:00
If you want to get rid of timestamp you can use:
str(new_date.date())
Out[]: '2019-06-29'
use timedelta
import datetime
date = datetime.datetime.strptime("06/30/19" ,"%m/%d/%y")
print( date - datetime.timedelta(days=1))
I have a data frame that some of the columns have dates in this format (iso format):
YYYY-MM-DDThh:mm:ssTZD
I want to convert it to
YYYY-MM-DD HH:MM[:SS[.SSSSSS]]
For example when I do:
print (df["create_date"])
I get:
2014-11-24 20:21:49-05:00
How can I alter the date in the column ?
You need to do this:
from datetime import datetime
df["new_date"] = df["create_date"].strftime("%Y-%m-%d %H:%M[:%S[.%f]]")
If the column is type string, the try:
df["new_date"] = df["create_date"].dt.strftime("%Y-%m-%d %H:%M[:%S[.%f]]")
Then write this to csv/excel
import pandas as pd
df.to_csv("\\path\\file.csv")
import csv
import pandas as pd
from datetime import datetime,time,date
from pandas.io.data import DataReader
fd = pd.read_csv('c:\\path\\to\\file.csv')
fd.columns = ['Date','Time']
datex = fd.Date
timex = fd.Time
timestr = datetime.strptime ( str(datex+" "+timex) , "%m/%d/%Y %H:%M")
So, what I'm trying to do is pass columns Date and Time to datetime. There are two columns, date and time containing, obviously, the date and time. But when I try the above method, I receive this error:
\n35760 08/07/2015 04:56\n35761 08/07/2015 04:57\n35762 08/07/2015 04:58\n35763 08/07/2015 04:59\ndtype: object' does not match format '%m/%d/%Y %H:%M'
So, how do I either strip or remove \nXXXXX from datex and timex? Or otherwise match the format?
# concatenate two columns ( date and time ) into one column that represent date time now into one columns
datetime = datex + timex
# remove additional characters and spaces from newly created datetime colummn
datetime = datetime.str.replace('\n\d+' , '').str.strip()
# then string should be ready to be converted to datetime easily
pd.to_datetime(datetime , format='%m/%d/%Y%H:%M')
Use pandas built-in parse_dates function :)
pd.read_csv('c:\\path\\to\\file.csv', parse_dates=True)