Find a day of week for given first day in month - python

How could be calculated a day of the week if we know the day number of the first day in month?
Lets say we have 1..7 days in a week
I want to get number of the 4th day in the month if the 1st = 5 (Friday) then result should be 1 (Monday).
1st - 5 Friday
2nd - 6 Saturday
3rd - 7 Sunday
4th - 1 Monday
(a=4, b=5) = 1
I tried to calculate the common formula:
result = (a + b - 1) % 7
So it works for all cases except the case when a = 3, 10, 17, 24, 31, because result = 0, but should be 7.
How can it be fixed to get this formula work for all days?

You need to avoid the result zero. Here is one way:
result = (a + b - 2) % 7 + 1
You subtract one more from your sum, to allow zero and work on the previous day, then you take the remainder modulo 7 to get the day which can include zero, then add one to get to the day wanted and avoid zero. Note that the order of operations will do the modulus before adding one. If you want to make that more explicit, you could use
result = ((a + b - 2) % 7) + 1

Related

How to add +1 to a python variable every day?

I have three variables
a = 1
b = 2
c = 3
Every day, I need to add 1 to each variable
a = a + 1 (so a=2)
b = b + 1
c = c + 1
But I need that when tomorrow I run the script, to add 1 unit more:
a = a + 2 (so tomorrow a=3, after 2 days a = 4....)
b = b + 2
c = c + 2
And so on...I need every day to add +1.
Any ideas?
Choose some fixed reference date, and when the code runs, calculate the number of days from the reference date, adjust by some constant offset, and add that to your variables. So, maybe I choose 1/1/2022 as my reference date and an offset of 100 days. This means on 100 days after 1/1/2022 the variables don't get increased at all, 101 days after 1/1/2022 the variables are greater by 1, and so on.
If you need to only increase if the script actually ran on a date, keep a log file of days the script actually ran, or for that matter, save the increments directly!

print the first three months as a string in python

day = 1
month = 3
for x in range(3):
while day <= 31:
print(str(month)+"/"+str(day)+"/2019'")
day += 1
month += 1
I am trying to print the first 31 days from March-May. (I know April has 30 days, but I am not concerned with that.
The while loop works, printing out the first 31 days in march. The code does not loop through 2 more times and increment the month to sequentially print out the 31 days for April and May.
I am used to Java for loops and I am not familiar iterating over an nondeclared variable.
You need to reset the day back to one inside the outer loop otherwise day stays at 31 the second time through the loop. You can do this by moving the assignment inside the loop:
month = 3
for x in range(3):
day = 1
while day <= 31:
print(str(month)+"/"+str(day)+"/2019'")
day += 1
month += 1
Having said that, it's easier just to use for loops:
for month in range(3, 6):
for day in range(1, 32):
print(f"{month}/{day}/2019")
The only problem is that the variable day is not being reseted once a new month starts. You need to move the declaration day=1 inside the for-loop:
month = 3
for x in range(3):
day = 1
while day <= 31:
print(str(month)+"/"+str(day)+"/2019'")
day += 1
month += 1

Calculate average based on available data points

Imagine I have the following data frame:
Product
Month 1
Month 2
Month 3
Month 4
Total
Stuff A
5
0
3
3
11
Stuff B
10
11
4
8
33
Stuff C
0
0
23
30
53
that can be constructed from:
df = pd.DataFrame({'Product': ['Stuff A', 'Stuff B', 'Stuff C'],
'Month 1': [5, 10, 0],
'Month 2': [0, 11, 0],
'Month 3': [3, 4, 23],
'Month 4': [3, 8, 30],
'Total': [11, 33, 53]})
This data frame shows the amount of units sold per product, per month.
Now, what I want to do is to create a new column called "Average" that calculates the average units sold per month. HOWEVER, notice in this example that Stuff C's values for months 1 and 2 are 0. This product was probably introduced in Month 3, so its average should be calculated based on months 3 and 4 only. Also notice that Stuff A's units sold in Month 2 were 0, but that does not mean the product was introduced in Month 3 since 5 units were sold in Month 1. That is, its average should be calculated based on all four months. Assume that the provided data frame may contain any number of months.
Based on these conditions, I have come up with the following solution in pseudo-code:
months = ["list of index names of months to calculate"]
x = len(months)
if df["Month 1"] != 0:
df["Average"] = df["Total"] / x
elif df["Month 2"] != 0:
df["Average"] = df["Total"] / x - 1
...
elif df["Month " + str(x)] != 0:
df["Average"] = df["Total"] / 1
else:
df["Average"] = 0
That way, the average would be calculated starting from the first month where units sold are different from 0. However, I haven't been able to translate this logical abstraction into actual working code. I couldn't manage to iterate over len(months) while maintaining the elif conditions. Or maybe there is a better, more practical approach.
I would appreciate any help, since I've been trying to crack this problem for a while with no success.
There is numpy method np.trim_zeros that trims leading and/or trailing zeros. Using a list comprehension, you can iterate over the relevant DataFrame rows, trim the leading zeros and find the average of what remains for each row.
Note that since 'Month 1' to 'Month 4' are consecutive, you can slice the columns between them using .loc.
import numpy as np
df['Average Sales'] = [np.trim_zeros(row, trim='f').mean() for row in df.loc[:, 'Month 1':'Month 4'].to_numpy()]
Output:
Product Month 1 Month 2 Month 3 Month 4 Total Average Sales
0 Stuff A 5 0 3 3 11 2.75
1 Stuff B 10 11 4 8 33 8.25
2 Stuff C 0 0 23 30 53 26.50
Try:
df = df.set_index(['Product','Total'])
df['Average'] = df.where(df.ne(0).cummax(axis=1)).mean(axis=1)
df_out=df.reset_index()
print(df_out)
Output:
Product Total Month 1 Month 2 Month 3 Month 4 Average
0 Stuff A 11 5 0 3 3 2.75
1 Stuff B 33 10 11 4 8 8.25
2 Stuff C 53 0 0 23 30 26.50
Details:
Move Product and Total into the dataframe index, so we can do calcation on the rest of the dataframe.
First create a boolean matrix using ne to zero. Then, use cummax along the rows which means that if there is a non-zero value, It will remain True until then end of the row. If it starts with a zero, then the False will stay until first non-zero then turns to Turn and remain True.
Next, use pd.DataFrame.where to only select those values for that boolean matrix were Turn, other values (leading zeros) will be NaN and not used in the calcuation of mean.
If you don't mind it being a little memory inefficient, you could put your dataframe into a numpy array. Numpy has a built-in function to remove zeroes from an array, and then you could use the mean function to calculate the average. It could look something like this:
import numpy as np
arr = np.array(Stuff_A_DF)
mean = arr[np.nonzero(arr)].mean()
Alternatively, you could manually extract the row to a list, then loop through to remove the zeroes.

Pandas conditions across multiple series

Lets say I have some data like this:
category = pd.Series(np.ones(4))
job1_days = pd.Series([1, 2, 1, 2])
job1_time = pd.Series([30, 35, 50, 10])
job2_days = pd.Series([1, 3, 1, 3])
job2_time = pd.Series([10, 40, 60, 10])
job3_days = pd.Series([1, 2, 1, 3])
job3_time = pd.Series([30, 15, 50, 15])
Each entry represents an individual (so 4 people total). xxx_days represents the number of days an individual did something and xxx_time represents the number of minutes spent doing that job on a single day
I want to assign a 2 to category for an individual, if across all jobs they spent at least 3 days of 20 minutes each. So for example, person 1 does not meet the criteria because they only spent 2 total days with at least 20 minutes (their job 2 day count does not count toward the total because time is < 20). Person 2 does meet the criteria as they spent 5 total days (jobs 1 and 2).
After replacement, category should look like this:
[1, 2, 2, 1]
My current attempt to do this requires a for loop and manually indexing into each series and calculating the total days where time is greater than 20. However, this approach doesn't scale well to my actual dataset. I haven't included the code here as i'd like to approach it from a Pandas perspective instead
Whats the most efficient way to do this in Pandas? The thing that stumps me is checking conditions across multiple series and act accordingly after summation of days
Put days and time in two data frames with column positions correspondence maintained, then do the calculation in a vectorized approach:
import pandas as pd
time = pd.concat([job1_time, job2_time, job3_time], axis = 1) ​
days = pd.concat([job1_days, job2_days, job3_days], axis = 1)
((days * (time >= 20)).sum(1) >= 3) + 1
#0 1
#1 2
#2 2
#3 1
#dtype: int64

Python date function bugs

I am trying to create a function in python which will display the date. So I can see the program run, I have set one day to five seconds, so every five seconds it will become the next 'day' and it will print the date.
I know there is already an in-build function for displaying a date, however I am very new to python and I am trying to improve my skills (so excuse my poor coding.)
I have set the starting date to the first of January, 2000.
Here is my code:
import time
def showDate():
year = 00
month = 1
day = 1
oneDay = 5
longMonths = [1, 3, 5, 7, 8, 10, 12]
shortMonths = [4, 6, 9, 11]
while True:
time.sleep(1)
oneDay = oneDay - 1
if oneDay == 0:
if month in longMonths:
if day > 31:
day = day + 1
else:
month = month + 1
day = 0
if month == 2:
if day > 28:
day = day + 1
else:
month = month + 1
day = 0
if month in shortMonths:
if day > 30:
day = day + 1
else:
month = month + 1
day = 0
if day == 31 and month == 12:
year = year + 1
print(str(day) + '/' + str(month) + '/' + str(year))
oneDay = 5
showDate()
However, when I try to run the program this is the output I get this:
>>>
0/3/0
0/5/0
0/7/0
0/8/0
0/10/0
0/12/0
0/13/0
0/13/0
0/13/0
I don't know why this is happening, could someone please suggest a solution?
There's no possible path through your code where day gets incremented.
I think you are actually confused between > and <: you check if day is greater than 31 or 28, which it never is. I think you mean if day < 31: and so on.
First of all, it's easier to just set time.sleep(5) instead of looping over time.sleep(1) 5 times. It's better to have a list of values with days of the month, not just 2 lists of the long and short months. Also your while loop is currently indefinite, is that intentional?
Anyway, your main problem was comparing day > 31, but there's lots of things that can be improved. As I said, I'm removing the use of oneDay to just do sleep(5) as it's cleaner and having one daysInMonths list.
import time
def showDate():
year = 00
month = 1
day = 1
daysInMonths = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
Now you can have only one if check about if the day has reached the end of a month, like this:
while True:
time.sleep(5)
if day < daysInMonths[month-1]:
day += 1
This will check the index of the list for the current month. It uses -1 because lists begin at index 0, and your months begin at 1. (ie. the months run from 1-12 but the list's indices are 0-11). Also I used the += operator, which is basically short hand for var = var + something. It works the same and looks neater.
This test encompasses all months, and then the alternative scenario is that you need to increment the month. I recommend in this block that you first check if the month is 12 and then increment the year from there. Also you should be setting day and month back to 1, since that was their starting value. If it's not the end of the year, increment the month and set day back to 1.
else:
if month == 12:
year += 1
day = 1
month = 1
else:
month += 1
day = 1
print("{}/{}/{}".format(day, month, year))
I also used the string.format syntax for neatness. With format, it will substitute the variables you pass in for {} in the string. It makes it easier to lay out how the string should actually look, and it converts the variables to string format implicitly.
Try this.
The day comparisons should be <, not >. When going to the next month, I set the day to 1, because there are no days 0 in the calendar. And I use elif for the subsequent month tests, because all the cases are exclusive.
def showDate():
year = 00
month = 1
day = 1
oneDay = 5
longMonths = [1, 3, 5, 7, 8, 10, 12]
shortMonths = [4, 6, 9, 11]
while True:
time.sleep(1)
oneDay = oneDay - 1
if oneDay == 0:
if month in longMonths:
if day < 31:
day = day + 1
else:
month = month + 1
day = 1
elif month == 2:
if day < 28:
day = day + 1
else:
month = month + 1
day = 1
if month in shortMonths:
if day < 30:
day = day + 1
else:
month = month + 1
day = 1
if day == 31 and month == 12:
year = year + 1
month = 1
print(str(day) + '/' + str(month) + '/' + str(year))
oneDay = 5

Categories

Resources