I am using python to read an excel file.
The excel file contains a 'Date' column.
My question is how to check in python if the date in the excel sheet is in the last 3 months of this year.
Excel sheet:
Date
2018-01-20
2018-10-01
2018-10-01
2018-11-01
2018-11-17
and my code is something like:
for i in content:
valuerContent = content[(content['valuer'] == 'ahmad') & (content['Date'] == ??)]
What should I write instead of '??' to get the needed results
Assuming content['Date'] is a string
valuerContent = content[(content['valuer'] == 'ahmad') and int(content['Date'].split('-')[1]) >= 10
Only been coding in python for 4 months, but from the looks of it you could use the datetime module. Here are some examples
import datetime as dt
check_date = (2018,11,01)
current_date = (2018,12,31) #assuming you want to check the year 2018
if (current_date[1] - check_date[1]) < 3:
#insert code
datetime allows you to access the month/day/year by tuple indexes. Where [0] is year, [1] is month, [2] is day and so on.
Related
I am trying to compare a date from a text document to the current date to see whether the date in the document has already passed or not.
The code looks something like
'''
from datetime import date
tasks_file = open("tasks.txt", "r", encoding='utf-8')
lines = tasks_file.readlines()
overdue_tasks = 0
for line in lines:
due_date = line.split(",")[4]
if due_date.strip() < date.today().strftime("%d %b %Y"):
overdue_tasks += 1
tasks_file.close()
print(overdue_tasks)
'''
in the tasks.txt file, the date is written as 10 Oct 2022 and the current date is 17 Sep 2022.
The issue is that it only compares the 10 to the 17 and therefore concludes that the current date is greater than the due_date.
Is there a way for me to fix this without reformatting the way the date is written in the file?
I have a csv with a date column with dates listed as MM/DD/YY but I want to change the years from 00,02,03 to 1900, 1902, 1903 so that they are instead listed as MM/DD/YYYY
This is what works for me:
df2['Date'] = df2['Date'].str.replace(r'00', '1900')
but I'd have to do this for every year up until 68 (aka repeat this 68 times). I'm not sure how to create a loop to do the code above for every year in that range. I tried this:
ogyear=00
newyear=1900
while ogyear <= 68:
df2['date']=df2['Date'].str.replace(r'ogyear','newyear')
ogyear += 1
newyear += 1
but this returns an empty data set. Is there another way to do this?
I can't use datetime because it assumes that 02 refers to 2002 instead of 1902 and when I try to edit that as a date I get an error message from python saying that dates are immutable and that they must be changed in the original data set. For this reason I need to keep the dates as strings. I also attached the csv here in case thats helpful.
I would do it like this:
# create a data frame
d = pd.DataFrame({'date': ['20/01/00','20/01/20','20/01/50']})
# create year column
d['year'] = d['date'].str.split('/').str[2].astype(int) + 1900
# add new year into old date by replacing old year
d['new_data'] = d['date'].str.replace('[0-9]*.$','') + d['year'].astype(str)
date year new_data
0 20/01/00 1900 20/01/1900
1 20/01/20 1920 20/01/1920
2 20/01/50 1950 20/01/1950
I'd do it the following way:
from datetime import datetime
# create a data frame with dates in format month/day/shortened year
d = pd.DataFrame({'dates': ['2/01/10','5/01/20','6/01/30']})
#loop through the dates in the dates column and add them
#to list in desired form using datetime library,
#then substitute the dataframe dates column with the new ordered list
new_dates = []
for date in list(d['dates']):
dat = datetime.date(datetime.strptime(date, '%m/%d/%y'))
dat = dat.strftime("%m/%d/%Y")
new_dates.append(dat)
new_dates
d['dates'] = pd.Series(new_dates)
d
I have a data sheet in which issue_d is a date column having values stored in a format - 11-Dec. On clicking any cell of the column, date is coming as 12/11/2018.
But while reading the csv file, issue_d is getting imported as 11-Dec. Year is not getting imported.
How do I get the issue_d column in format- d/m/y?
Code i tried -
import pandas
data=pandas.read_csv('Project_data.csv')
print(data)
checking issue_d column: data['issue_d']
result :
0 11-Dec
1 11-Dec
2 11-Dec
expected:
0 11-Dec-2018
1 11-Dec-2018
2 11-Dec-201
You can use to_datetime with add year to column:
df['issue_d'] = pd.to_datetime(df['issue_d'] + '-2018')
print (df)
issue_d
0 2018-12-11
1 2018-12-11
2 2018-12-11
A more 'controllable' way of getting the data is to first get the datetime from the data frame as normal, and then convert it:
dt = dt.strftime('%Y-%m-%d')
In this case, you'd put %d in front. strftime is a great technique because it allows the most customization when converting a datetime variable, and I used it in my tutorial book - if you're a beginner to python algorithms, you should definitely check it out!
After you do this, you can splice out each individual month, day, and year, and then use
strftime("%B")
to get the string-name of the month (e.g. "February").
Good Luck!
I was trying to fit my scenario with the below - but failed
Pandas - Python, deleting rows based on Date column
I have a output.csv file with the following columns
Customer, Alertkey, Node, Alertgroup, FirstOccurrence,
TKT_Flag, X733SpecificProb, TKT_TicketNumber, TKT_Keyword
The file will be updated from database every 7 days incrementally with last 7 days data
So ideally I have to drop the first 7 days of data from the file itself.
I could write below but getting type error "TypeError: string indices must be integers"
import pandas as pd
from dateutil.relativedelta import relativedelta
from dateutil import parser
df=pd.read_csv('output.csv', usecols=['FirstOccurrence'],parse_dates=[0])
df=df['FirstOccurrence'].iloc[0]
dt = parser.parse(df)
SevenDays = dt + relativedelta( days = +7 )
df=df[(parser.parse(df['FirstOccurrence']) < SevenDays)].drop(df.columns)
There will be millions of lines. I am copying first few lines from 1st Jan 2016. But it will be from 1st Jan 2016 to till date. Every week it will append and should delete records of first 7 days - i.e first time it should delete records from 1st Jan to 6th Jan and so on
Customer,Alertkey,Node,Alertgroup,FirstOccurrence,TKT_Flag,X733SpecificProb,TKT_TicketNumber,TKT_Keyword
Cust1,Cust1_11_53_Services_Warning,Node_Cust1,ITM_K53_SERVICEMON,2016-01-01 00:12:59,1005,TOLPUKC_OS:25223174,INC000014799786,CGMIDDLEWARE_MEDIUM_CONNECTDIRECT
Cust1,Cust1_11_53_Services_Warning,Node1_Cust1,ITM_K53_SERVICEMON,2016-01-01 00:12:59,1005,TOLPUKC_OS:25223175,INC000014799785,CGMIDDLEWARE_MEDIUM_CONNECTDIRECT
Cust2,Cust2_21_NT_System_CPU_Critical,Cust2_Node8,ITM_NT_System,2016-01-01 00:15:48,101,PARPFRC_OS:21192843,INC000000628410,WINDOWS_MEDIUM_DEFPRODUCTSILVER
Cust3,Cust3_10352_LZ_TDW_DISK_Critica,Cust3_Node22,ITM_Linux_Disk,2016-01-01 00:17:05,200,TOLPUKC_OS:25223370,INC000001412280,CGMOM_HIGH_DEFPRODUCT
Cust6,Cust6_11_53_Services_Warning,Cust6_Node700,ITM_K53_SERVICEMON,2016-01-01 00:22:36,22,TOLPUKC_OS:25223601,INC000002250120,CGIOWINTELIMOC_MEDIUM_DEFPRODUCT
replace this :
df=df[(parser.parse(df['FirstOccurrence']) < SevenDays)].drop(df.columns)
with:
df = df.drop(df[(parser.parse(df['FirstOccurrence']) < SevenDays)].index, inplace=True)
try this hope this help you.
I've researched this question heavily for the past few days and I still cannot find suggestions to my problem.
Below is an example of my dataframe titled 'dfs'. There are around 80 columns, only 4 shown in the below example.
dfs is a large dataframe consisting of rows of data reported every 15 minutes for over 12 months (i.e. 2015-08-01 00:00:00 to 2016-09-30 23:45:00). The Datetime column is in the format datetime.
...
...
I want to export (or write) multiple monthly csv files, which are snippets of monthly data taken from the original large csv file (dfs). For each month, I want a file to be written that contains the the raw data, day data (6am-6pm) and night data (6pm-6am). I also want the name of each monthly file to be automated so it knows whether to call itself dfs_%Y%m, or dfs_day_%Y%m, or dfs_night_%Y%m depending on the data it contains.
At the moment I am writing out over 180 lines of code to export each csv file.
For example:
I create monthly raw, day and night files by grabbing the data between the datetimes listed below from the index Datetime column
dfs201508 = dfs.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']
dfs201508Day = dfsDay.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']
dfs201508Night = dfsNight.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']
Then I export these files to their respective outputpaths and give them a filename
dfs201508 = dfs201508.to_csv(outputpath+"dfs201508.csv")
dfs201508Day = dfs201508Day.to_csv(outputpathDay+"dfs_day_201508.csv")
dfs201508Night = dfs201508Night.to_csv(outputpathNight+"dfs_night_201508.csv")
What I want to write is something like this
dfs_%Y%m = dfs.ix["%Y%m"]
dfs_day_%Y%m = dfs.ix["%Y%m(between 6am-6pm)"]
dfs_night_%Y%m = dfs.ix["%Y%m(between 6pm-6am)"]
dfs_%Y%m = dfs_%Y%m.to_csv(outputpath +"dfs_%Y%m.csv")
dfs_day_%Y%m = dfs_day_%Y%m.to_csv(outputpath%day +"dfs_day_%Y%m.csv")
dfs_night_%Y%m = dfs_night_%Y%m.to_csv(outputpath%night +"dfs_night_%Y%m.csv")
Any suggestions on the code to automate this process would be greatly appreciated.
Here are some links to pages I researched:
https://www.youtube.com/watch?v=aeZKJGEfD7U
Writing multiple Python dictionaries to csv file
Open a file name +date as csv in Python
You can use a for loop to iterate over the years and months contained within dfs. I created a dummy dataframe called DF in the below example, which contains just three sample columns:
dates Egen1_kwh Egen2_kwh
2016-01-01 00:00:00 15895880 15877364
2016-01-01 00:15:00 15895880 15877364
2016-01-01 00:30:00 15895880 15877364
2016-01-01 00:45:00 15895880 15877364
2016-01-01 01:00:00 15895880 15877364
The below code filters the main dataframe DF into smaller dataframes (NIGHT and DAY) for each month within each year and saves them to as .csv with a name corresponding to their date (e.g. 2016_1_DAY and 2016_1_NIGHT for Jan 2016 Day and Jan 2016 Night).
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
from random import randint
# I defined a sample dataframe with dummy data
start = datetime.datetime(2016,1,1,0,0)
dates = [start + relativedelta(minutes=15*i) for i in range(0,10000)]
Egen1_kwh = randint(15860938,15898938)
Egen2_kwh = randint(15860938,15898938)
DF = pd.DataFrame({
'dates': dates,
'Egen1_kwh': Egen1_kwh,
'Egen2_kwh': Egen2_kwh,
})
# define when day starts and ends (MUST USE 24 CLOCK)
day = {
'start': datetime.time(6,0), # start at 6am (6:00)
'end': datetime.time(18,0) # ends at 6pm (18:00)
}
# capture years that appear in dataframe
min_year = DF.dates.min().year
max_year = DF.dates.max().year
if min_year == max_year:
yearRange = [min_year]
else:
yearRange = range(min_year, max_year+1)
# iterate over each year and each month within each year
for year in yearRange:
for month in range(1,13):
# filter to show NIGHT and DAY dataframe for given month within given year
NIGHT = DF[(DF.dates >= datetime.datetime(year, month, 1)) &
(DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) &
((DF.dates.apply(lambda x: x.time()) <= day['start']) | (DF.dates.apply(lambda x: x.time()) >= day['end']))]
DAY = DF[(DF.dates >= datetime.datetime(year, month, 1)) &
(DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) &
((DF.dates.apply(lambda x: x.time()) > day['start']) & (DF.dates.apply(lambda x: x.time()) < day['end']))]
# save to .csv with date and time in file name
# specify the save path of your choice
path_night = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_NIGHT.csv'.format(year, month)
path_day = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_DAY.csv'.format(year, month)
# some of the above NIGHT / DAY filtering will return no rows.
# Check for this, and only save if the dataframe contains rows
if NIGHT.shape[0] > 0:
NIGHT.to_csv(path_night, index=False)
if DAY.shape[0] > 0:
DAY.to_csv(path_day, index=False)