How to automate the generation of dates in this special situation? - python

I have a result file generated from software and it looks like this:
,0,1,2,3,4,5,6,7,8,9
0,Month,Decade,Stage,Kc,ETc,ETc,Eff,rain,Irr.,Req.
1,coeff,mm/day,mm/dec,mm/dec,mm/dec,,,,,
2,Sep,1,Init,0.50,1.85,18.5,21.8,0.0,,
3,Sep,2,Init,0.50,1.77,17.7,30.3,0.0,,
4,Sep,3,Init,0.50,1.72,17.2,37.1,0.0,,
5,Oct,1,Deve,0.61,2.05,20.5,49.5,0.0,,
6,Oct,2,Deve,0.82,2.66,26.6,59.3,0.0,,
7,Oct,3,Deve,1.03,3.24,35.6,43.0,0.0,,
8,Nov,1,Mid,1.20,3.63,36.3,20.9,15.4,,
9,Nov,2,Mid,1.21,3.53,35.3,6.0,29.2,,
10,Nov,3,Mid,1.21,3.70,37.0,4.0,33.0,,
11,Dec,1,Mid,1.21,3.87,38.7,0.1,38.6,,
12,Dec,2,Late,1.18,3.92,39.2,0.0,39.2,,
13,Dec,3,Late,1.00,3.58,39.4,0.0,39.4,,
14,Jan,1,Late,0.88,3.36,10.1,0.0,10.1,,
15,,,,,,,,,,
16,372.1,272.2,204.9,,,,,,,
As one can observe, the months vary from September to January. Each month is divided into three divisions or decades. To be exact, the months vary from September 2017 to 1st decade of January 2018. Now, I have to generate dates with the starting date of each decade in a month in this format: 01-Sep-2017. So I will have 01-Sep-2017, 11-Sep-2017, 21-Sep-2017, ..., 01-Jan-2018. How to generate these dates? I will share the code that I have written until now.
years = [2017, 2018, 2019]
temp = pd.read_csv(folder_link) # Reading the particular result file
Month = temp['0'][2:] # First column = Month (Jul, Aug, ..)
Decade = temp['1'][2:]
for year in years:
for j in range(2,len(Decade)): # First two lines are headers, so removed them
if(int(Decade[j]) == 1): # First decade = 1-10 days of month
Date = "1" + "-" + Month[j] + "-" + str(year) # Writing the date as 1-Jan-2017
Dates.append(Date)
if(int(Decade[j]) == 2): # Second decade = 11-20 days of month
Date = "11" + "-" + Month[j] + "-" + str(year)
Dates.append(Date)
if(int(Decade[j]) == 3): # Third decade = 21-28 or 21-30 or 21-31 days of month
Date = "21" + "-" + Month[j] + "-" + str(year)
Dates.append(Date)
The problem with this code is I will get 01-Sep-2017, 11-Sep-2017, 21-Sep-2017, ..., 01-Jan-2017 (instead of 2018). I need a generalized solution that could work for all months, not just for January. I have some results ranging from Sep 2017 - Aug 2018. Any help?

First you could start by setting your columns and index right while reading the csv file. Then you can use a formula to deduce the day from decade.
Increment year when switching from december to january only (you can extend your condition here if there are cases where january and/or december are missing).
The code becomes much easier to read and understand once you apply these:
temp = pd.read_csv(folder_link, header=1, index_col=0)
Dates = []
year = 2017
for index, row in temp.iloc[1:].iterrows():
month = row["Month"]
if month == "Jan" and temp.at[index-1, "Month"] == "Dec":
year += 1 # incrementing year if row is january while preceding row is december
day = (int(row["Decade"]) - 1) * 10 + 1
Dates.append(f"{day}-{month}-{year}")
print(Dates)
Output:
['1-Sep-2017', '11-Sep-2017', '21-Sep-2017', '1-Oct-2017', '11-Oct-2017', '21-Oct-2017', '1-Nov-2017', '11-Nov-2017', '21-Nov-2017', '1-Dec-2017', '11-Dec-2017', '21-Dec-2017', '1-Jan-2018']

If you want to stay with the iteration approach (there may be more efficient one using pandas functions), here is a simple way to do :
dates = []
year = 2017
month_list = ['Jan', 'Sep', 'Oct', 'Nov', 'Dec']
temp = pd.read_csv("data.csv") # Reading the particular result file
for index, row in temp.iterrows():
# First two lines are headers, so skip them. Same for last two lines.
if index > 1 and row[1] in month_list:
if row[1] == 'Jan':
year += 1
if(int(row[2]) == 1): # First decade = 1-10 days of month
date = "1" + "-" + row[1] + "-" + str(year) # Writing the date as 1-Jan-2017
dates.append(date)
elif(int(row[2]) == 2): # Second decade = 11-20 days of month
date = "11" + "-" + row[1] + "-" + str(year)
dates.append(date)
elif(int(row[2]) == 3): # Third decade = 21-28 or 21-30 or 21-31 days of month
date = "21" + "-" + row[1] + "-" + str(year)
dates.append(date)
else:
print("Unrecognized value for month {}".format(row[2]))
pass
print(dates)
Explanation :
use iterrows to iterate over your dataframe rows
then, skip headers and check you are parsing actual data by looking at month value (using a predefined list)
finally, just increment year when your month value is Jan
*Note : this solution assumes that your data is a time series with rows ordered in time.
P.S: only use capital letters for classes in Python, not variables.

Related

Regex with date as String in Azure path

I have many folders (in Microsoft Azure data lake), each folder is named with a date as the form "ddmmyyyy". Generally, I used the regex to extract all files of all folders of an exact month of a year in the way
path_data="/mnt/data/[0-9]*032022/data_[0-9]*.json" # all folders of all days of month 03 of 2022
result=spark.read.json(path_data)
My problem now is to extract all folders that match exactly one year before a given date
For example: for the date 14-03-2022; I need a regex to automatically read all files of all folders between 14-03-2021 and 14-03-2022.
I tried to extract the month and year in vars using strings, then using those two strings in a regex respecting the conditions ( for the showed example month should be greater than 03 when year equal to 2021 and less than 03 when the year is equal to 2022). I tried something similar to (while replacing the vars with 03, 2021 and 2022).
date_regex="([0-9]{2}[03-12]2021)|([0-9]{2}[01-03]2022)"
Is there any hint how I can perform such a task!
Thanks in advance
If I understand your question correctly.
To find our date between ??-03-2021 and ??-03-2022 from the file name field, you can use the following Regex
date_regex="([0-9]{2}-03-2021)|([0-9]{2}-03-2022)"
Also, if you want to be more customized, it is better to apply the changes from the link below and take advantage of it
https://regex101.com/r/AgqFfH/1
update : extract any folder named with a date between 14032021 and 14032022
solution : First we extract the date in ddmmyyyy format with ridge, then we give the files assuming that our format is correct and such a phrase is found in it.
date_regex="((0[1-9]|1[0-9]|2[0-8])|(0[1-9]|1[012]))"
if re.find(r"((0[1-9]|1[0-9]|2[0-8])|(0[1-9]|1[012]))") > 14032021 and re.find(r"((0[1-9]|1[0-9]|2[0-8])|(0[1-9]|1[012]))") < 14032022
..do any operation..
The above code is just overnight code for your overview of the solution method.
First we extract the date in ddmmyyyy format with regex, then we give the files assuming that our format is correct and such a phrase is found in it.
I hope this solution helps.
It certainly isn't pretty, but here you go:
#input
day = "14"; month = "03"; startYear = "2021";
#day construction
sameTensAfter = '(' + day[0] + '[' + day[1] + '-9])';
theDaysAfter = '([' + chr(ord(day[0])+1) + '-9][0-9])';
sameTensBefore = '(' + day[0] + '[0-' + day[1] + '])';
theDaysBefore = '';
if day[0] != '0':
theDaysBefore = '([0-' + chr(ord(day[0])-1) + '][0-9])';
#build the part for the dates with the same month as query
afterDayPart = '%s|%s' %(sameTensAfter, theDaysAfter);
beforeDayPart = '%s|%s' %(sameTensBefore, theDaysBefore);
theMonthAfter = str(int(month) + 1).zfill(2);
afterMonthPart = theMonthAfter[0] + '([' + theMonthAfter[1] + '-9])';
if theMonthAfter[0] == '0':
afterMonthPart += '|(1[0-2])';
theMonthBefore = str(int(month) - 1).zfill(2);
beforeMonthPart = theMonthBefore[0] + '([0-' + theMonthBefore[1] + '])';
if theMonthBefore[0] == '1':
beforeMonthPart = '(0[0-9])|' + beforeMonthPart;
#4 kinds of matches:
startDateRange = '((%s)(%s)(%s))' %(afterDayPart, month, startYear);
anyDayAfterMonth = '((%s)(%s)(%s))' %('[0-9]{2}', afterMonthPart, startYear);
endDateRange = '((%s)(%s)(%s))' %(beforeDayPart, month, int(startYear)+1);
anyDayBeforeMonth = '((%s)(%s)(%s))' %('[0-9]{2}', beforeMonthPart, int(startYear)+1);
#print regex
date_regex = startDateRange + '|' + anyDayAfterMonth + '|' + endDateRange + '|' + anyDayBeforeMonth;
print date_regex;
#this prints:
#(((1[4-9])|([2-9][0-9]))(03)(2021))|(([0-9]{2})(0([4-9])|(1[0-2]))(2021))|(((1[0-4])|([0-0][0-9]))(03)(2022))|(([0-9]{2})(0([0-2]))(2022))
startDateRange: the month is the same and it's the starting year, this will take all the days including and after.
anyDayAfterMonth: the month is greater and it's the starting year, this will take any day.
endDateRange: the month is the same and it's the ending year, this will take all the days including and before.
anyDayBeforeMonth: the month is less than and it's the ending year, this will take any day.
Here's an example: https://regex101.com/r/i76s58/1
to compare the date, use datetime module, example below.
Then you can only extract folders within your condition
# importing datetime module
import datetime
# date in yyyy/mm/dd format
d1 = datetime.datetime(2018, 5, 3)
d2 = datetime.datetime(2018, 6, 1)
# Comparing the dates will return
# either True or False
print("d1 is greater than d2 : ", d1 > d2)
print("d1 is less than d2 : ", d1 < d2)
print("d1 is not equal to d2 : ", d1 != d2)

Calculate days between dates with unique days in months in python

from datetime import datetime
x = input("first date: ")
y = input("second date: ")
a = datetime.strptime(x, "%Y/%m/%d")
b = datetime.strptime(y, "%Y/%m/%d")
result = (a-b).days
print("days: ",result)
# my first date is = 2021/2/8
# my second date is = 2021/1/24
# output = days : 15
So as you see everything is fine in this code But my teacher make a challenge for me . He said can you write a code with unusual days in months . For ex : January have 31 days but I want it to be 41 days and etc .
What should I do now ? (Please don't say : sum the output with 10 because the user inputs could be changeable and I should change all of the days in months so this will not work)
I am amatuar in coding so simple explanation would be better.
So I am looking for something like this :
# if January have 41 days instead of 31 days
# my first date is = 2021/2/8
# my second date is = 2021/1/24
# output will be = days : 15 + 10 = 25
You can make dictionary consisting of months and their custom days (For example, '1': 41 means first month consisting of 41 days). Then all you need to do is to add input date of the first month with the subtraction of total days of current month and days of input date. (Assuming first date is always greater than the second).
months = {
'1': 41,
'2': 38,
'3': 24,
...
...
'12': 45,
}
x = input("first date: ")
y = input("second date: ")
a = list(x.split('/'))
b = list(y.split('/'))
# 2021/2/8
# ['2021', '2', '8']
result = int(a[2]) + (months[b[1]] - int(b[2]))
print(result)
I think you're close the answer.
you don't want to 'sum the output with 10', but why not?
the answer to the problem is 'result + extra_days' (so sum of output + offset).
So instead of the '10' you want the offset, the offset is maxDayOfMonth +/- requestedDate
Here is a related post which gives a function to get the last day of any month:
def last_day_of_month(any_day):
# this will never fail
# get close to the end of the month for any day, and add 4 days 'over'
next_month = any_day.replace(day=28) + datetime.timedelta(days=4)
# subtract the number of remaining 'overage' days to get last day of current month, or said programattically said, the previous day of the first of next month
return next_month - datetime.timedelta(days=next_month.day)
It always helps to find a scenario for your problem, for example:
Your teacher discoverd an alternate universe where the days in a month are variable, and he wants to use the datetime library to work. :)

How to make months not exceed 12

I am working on a small project where the code will approximately calculate what the date will be when they are one billion seconds old. I am done but I have a problem. If a user enters "5" as month or higher, then the month will exceed 12. Same for the date, it will go over 31 days if the user enters "24" or higher. How do I make it so that it will not go above "12" for months and "31" for days. Only problem is the month and the day, the year is working fine. Thanks!
When running the code, a sample you can use is, "10" as month, "24" as day and "2020" for year.
Code:
month = int(input("What Month Were You Born In: "))
day = int(input("What Day Was It: "))
year = int(input("What Year Was It: "))
sum1 = day + 7
sum2 = month + 8
sum3 = year + 32
print("You will be a billion seconds old approximately around, " + str(sum1) + "/" + str(sum2) + "/" + str(sum3))
While you could do this using if statements, it's cleaner to use the datetime built-in library, which natively handles dates.
import datetime # imports it so it can be used
month = int(input("What Month Were You Born In: ")) #your code
day = int(input("What Day Was It: "))
year = int(input("What Year Was It: "))
born = datetime.datetime(year, month, day)#creates a datetime object which contains a date. Time defaults to midnight.
output = born + datetime.timedelta(seconds = 1_000_000_000)#defines and applies a difference of 1 billion seconds to the date
print(output.strftime("%m/%d/%Y")) #prints month/day/year
If you do want to do it without datetime, you can use this:
month = int(input("What Month Were You Born In: "))
day = int(input("What Day Was It: "))
year = int(input("What Year Was It: "))
sum1 = day + 7
sum2 = month + 8
sum3 = year + 32
if sum1 > 31: # checks if day is too high, and if it is, corrects that and addr 1 to the month to account for that.
sum1 -= 31
sum2 += 1
if sum2 > 12: #does the same thing for month
sum2 -= 12
sum3 += 1
print("You will be a billion seconds old approximately around, " + str(sum1) + "/" + str(sum2) + "/" + str(sum3))
Note that this option is less precise than the datetime option, since it doesn't take into account leapyears or various lengths of months.
You should use the datetime module to correctly handle dates which will handle issues like this.
timedelta will allow you to add, subtract etc on datetime objects.
from datetime import datetime, timedelta
month = 6
day = 24
year = 1990
born = datetime(month=month, year=year, day=day)
billion = born + timedelta(seconds=1000000000)
print(billion)
#datetime.datetime(2022, 3, 2, 1, 46, 40)
print(billion.strftime('%d/%m/%Y'))
#'02/03/2022'

Calculate Last Friday of Month in Pandas

I've written this function to get the last Thursday of the month
def last_thurs_date(date):
month=date.dt.month
year=date.dt.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
But its not working with the lambda function.
datelist['Date'].map(lambda x: last_thurs_date(x))
Where datelist is
datelist = pd.DataFrame(pd.date_range(start = pd.to_datetime('01-01-2014',format='%d-%m-%Y')
, end = pd.to_datetime('06-03-2019',format='%d-%m-%Y'),freq='D').tolist()).rename(columns={0:'Date'})
datelist['Date']=pd.to_datetime(datelist['Date'])
Jpp already added the solution, but just to add a slightly more readable formatted string - see this awesome website.
import calendar
def last_thurs_date(date):
year, month = date.year, date.month
cal = calendar.monthcalendar(year, month)
# the last (4th week -> row) thursday (4th day -> column) of the calendar
# except when 0, then take the 3rd week (February exception)
last_thurs_date = cal[4][4] if cal[4][4] > 0 else cal[3][4]
return f'{year}-{month:02d}-{last_thurs_date}'
Also added a bit of logic - e.g. you got 2019-02-0 as February doesn't have 4 full weeks.
Scalar datetime objects don't have a dt accessor, series do: see pd.Series.dt. If you remove this, your function works fine. The key is understanding that pd.Series.apply passes scalars to your custom function via a loop, not an entire series.
def last_thurs_date(date):
month = date.month
year = date.year
cal = calendar.monthcalendar(year, month)
last_thurs_date = cal[4][4]
if month < 10:
thurday_date = str(year)+'-0'+ str(month)+'-' + str(last_thurs_date)
else:
thurday_date = str(year) + '-' + str(month) + '-' + str(last_thurs_date)
return thurday_date
You can rewrite your logic more succinctly via f-strings (Python 3.6+) and a ternary statement:
def last_thurs_date(date):
month = date.month
year = date.year
last_thurs_date = calendar.monthcalendar(year, month)[4][4]
return f'{year}{"-0" if month < 10 else "-"}{month}-{last_thurs_date}'
I know that a lot of time has passed since the date of this post, but I think it would be worth adding another option if someone came across this thread
Even though I use pandas every day at work, in that case my suggestion would be to just use the datetutil library. The solution is a simple one-liner, without unnecessary combinations.
from dateutil.rrule import rrule, MONTHLY, FR, SA
from datetime import datetime as dt
import pandas as pd
# monthly options expiration dates calculated for 2022
monthly_options = list(rrule(MONTHLY, count=12, byweekday=FR, bysetpos=3, dtstart=dt(2022,1,1)))
# last satruday of the month
last_saturday = list(rrule(MONTHLY, count=12, byweekday=SA, bysetpos=-1, dtstart=dt(2022,1,1)))
and then of course:
pd.DataFrame({'LAST_ST':last_saturdays}) #or whatever you need
This question answer Calculate Last Friday of Month in Pandas
This can be modified by selecting the appropriate day of the week, here freq='W-FRI'
I think the easiest way is to create a pandas.DataFrame using pandas.date_range and specifying freq='W-FRI.
W-FRI is Weekly Fridays
pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')
Creates all the Fridays in the date range between the min and max of the dates in df
Use a .groupby on year and month, and select .last(), to get the last Friday of every month for every year in the date range.
Because this method finds all the Fridays for every month in the range and then chooses .last() for each month, there's not an issue with trying to figure out which week of the month has the last Friday.
With this, use pandas: Boolean Indexing to find values in the Date column of the dataframe that are in last_fridays_in_daterange.
Use the .isin method to determine containment.
pandas: DateOffset objects
import pandas as pd
# test data: given a dataframe with a datetime column
df = pd.DataFrame({'Date': pd.date_range(start=pd.to_datetime('2014-01-01'), end=pd.to_datetime('2020-08-31'), freq='D')})
# create a dateframe with all Fridays in the daterange for min and max of df.Date
fridays = pd.DataFrame({'datetime': pd.date_range(df.Date.min(), df.Date.max(), freq='W-FRI')})
# use groubpy and last, to get the last Friday of each month into a list
last_fridays_in_daterange = fridays.groupby([fridays.datetime.dt.year, fridays.datetime.dt.month]).last()['datetime'].tolist()
# find the data for the last Friday of the month
df[df.Date.isin(last_fridays_in_daterange)]

How can i query for objects in current year , current month in django

I need to find total objects created in
1. current year
2. current month
3. last month
4. last year
I am thinking like this
this_year = datetime.now().year
last_year = datetime.now().year -1
this_month = datetime.now().month
last month = (datetime.today() - timedelta(days=30)).month
Use like
Order.objects.filter(created_at__month=this_month)
The problem is
last_month i want is calendar month not 30 days back
i am not sure whether created_at__month=this_month will match current month or same month in previous year
is it possible to get all counts in single query
today = datetime.datetime.now()
1 Current year
Order.objects.filter(created_at__year=today.year)
2 Current month
Order.objects.filter(created_at__year=today.year, created_at__month=today.month)
3 Last month
last_month = today.month - 1 if today.month>1 else 12
last_month_year = today.year if today.month > last_month else today.year - 1
Order.objects.filter(created_at__year=last_month_year, created_at__month=last_month)
4 Last year
last_year = today.year - 1
Order.objects.filter(created_at__year=last_year)
5 Single Query
As last year + current year includes last month and current month, and all orders>= last_year includes current year, the query is super simple:
Order.objects.filter(created_at__year__gte=last_year)
I don't think you'll be able to just match the "month" or "year" part of a date field without some significant fiddling or annotating. Most likely, your simplest solution is to define the start and end of the range you want and search against that. And that might involve a little bit of work.
For example, last calendar month would be:
today = datetime.now()
if today.month == 1:
last_month_start = datetime.date(today.year-1, 12, 1)
last_month_end = datetime.date(today.year-1, 12, 31)
else:
last_month_start = datetime.date(today.year, today.month -1, 1)
last_month_end = datetime.date(today.year, today.month, 1) - datetime.timedelta(days=1)
Order.objects.filter(created_at__gte=last_month_start, created_at__lte=last_month_end)
GTE and LTE are "greater than or equal" and "less than or equal". Also worth noting, we use timedelta to figure out what the day before the first of this month is rather than go through all the different cases of whether the previous month had 28, 29, 30 or 31 days.
If you want it in separate queries, do something like that.
from_this_year = Order.objects.filter(created_at__year=this_year)
from_last_year = Order.objects.filter(created_at__year=last_year)
from_june = Order.objects.filter(created_at__month='06',created_at__year=this_year)
from_this_month = Order.objects.filter(created_at__month=this_month,created_at__year=this.year)
note: in my example, I put '06' that is June, but you can change it.

Categories

Resources