I have three variables
a = 1
b = 2
c = 3
Every day, I need to add 1 to each variable
a = a + 1 (so a=2)
b = b + 1
c = c + 1
But I need that when tomorrow I run the script, to add 1 unit more:
a = a + 2 (so tomorrow a=3, after 2 days a = 4....)
b = b + 2
c = c + 2
And so on...I need every day to add +1.
Any ideas?
Choose some fixed reference date, and when the code runs, calculate the number of days from the reference date, adjust by some constant offset, and add that to your variables. So, maybe I choose 1/1/2022 as my reference date and an offset of 100 days. This means on 100 days after 1/1/2022 the variables don't get increased at all, 101 days after 1/1/2022 the variables are greater by 1, and so on.
If you need to only increase if the script actually ran on a date, keep a log file of days the script actually ran, or for that matter, save the increments directly!
Related
I think it might be a noob question, but I'm new to coding. I used the following code to categorize my data. But I need to command that if, e.g., not all my conditions together fulfill the categories terms, e.g., consider only 4 out of 7 conditions, and give me the mentioned category. How can I do it? I really appreciate any help you can provide.
c1=df['Stroage Condition'].eq('refrigerate')
c2=df['Profit Per Unit'].between(100,150)
c3=df['Inventory Qty']<20
df['Restock Action']=np.where(c1&c2&c3,'Hold Current stock level','On Sale')
print(df)
Let`s say this is your dataframe:
Stroage Condition refrigerate Profit Per Unit Inventory Qty
0 0 1 0 20
1 1 1 102 1
2 2 2 5 2
3 3 0 100 8
and the conditions are the ones you defined:
c1=df['Stroage Condition'].eq(df['refrigerate'])
c2=df['Profit Per Unit'].between(100,150)
c3=df['Inventory Qty']<20
Then you can define a lambda function and pass this to your np.where() function. There you can define how many conditions have to be True. In this example I set the value to at least two.
def my_select(x,y,z):
return np.array([x,y,z]).sum(axis=0) >= 2
Finally you run one more line:
df['Restock Action']=np.where(my_select(c1,c2,c3), 'Hold Current stock level', 'On Sale')
print(df)
This prints to the console:
Stroage Condition refrigerate Profit Per Unit Inventory Qty Restock Action
0 0 1 0 20 On Sale
1 1 1 102 1 Hold Current stock level
2 2 2 5 2 Hold Current stock level
3 3 0 100 8 Hold Current stock level
If you have more conditions or rules, you have extend the lambda function with as many variables as rules.
I have a list of tuples, including the date, and the number of "beeps" on that date. However, if there were 0 beeps on a certain date, then that date is simply not present in the list. I need these dates to be present, with the number 0 for the beeps
I've tried solving this using excel and python, but I can't find a solution for it.
16/10/2017 7
18/10/2017 3
21/10/2017 7
23/10/2017 20
24/10/2017 7
25/10/2017 6
This is the start of what I have, and I need this to become:
16/10/2017 7
17/10/2017 0
18/10/2017 3
19/10/2017 0
20/10/2017 0
21/10/2017 7
22/10/2017 0
23/10/2017 20
24/10/2017 7
25/10/2017 6
First save the first date with its value. Then iterate through the dates, saving the dates between the last saved date and the current date with a value of 0, then saving the current date with its value.
A psuedo-code solution would be:
last_saved_date, value = read_first_date()
save(last_saved_date, value)
while not_at_end_of_file():
date, value = read_date()
while last_saved_date + 1 < date:
last_saved_date += 1
save(last_saved_date, 0)
save(date, value)
last_saved_date = date
Recently I asked how one could count the number of registers by the interval as answered in Count number of registers in interval.
The solution works great, but I had to adapt it to also take into account some localization key.
I did it through the following code:
def time_features(df, time_key, T, location_key, output_key):
"""
Create features based on time such as: how many BDs are open in the same GRA at this moment (hour)?
"""
from datetime import date
assert np.issubdtype(df[time_key], np.datetime64)
output = pd.DataFrame()
grouped = df.groupby(location_key)
for name, group in grouped:
# initialize times registers open as 1, close as -1
start_times = group.copy()
start_times[time_key] = group[time_key]-pd.Timedelta(hours=T)
start_times[output_key] = 1
aux = group.copy()
all_times = start_times.copy()
aux[output_key] = -1
all_times = all_times.append(aux, ignore_index=True)
# sort by time and perform a cumulative sum to get opened registers
# (subtract 1 since you don't want to include the current time as opened)
all_times = all_times.sort_values(by=time_key)
all_times[output_key] = all_times[output_key].cumsum() - 1
# revert the index back to original order, and truncate closed times
all_times = all_times.sort_index().iloc[:len(all_times)//2]
output = output.append(all_times, ignore_index=True)
return output
Output:
time loc1 loc2
0 2013-01-01 12:56:00 1 "a"
1 2013-01-01 12:00:12 1 "b"
2 2013-01-01 10:34:28 2 "c"
3 2013-01-01 09:34:54 2 "c"
4 2013-01-01 08:34:55 3 "d"
5 2013-01-01 08:34:55 5 "d"
6 2013-01-01 16:35:19 4 "e"
7 2013-01-01 16:35:30 4 "e"
time_features(df, time_key='time', T=2, location_key='loc1', output_key='count')
This works great for small data, but for longer data (I using it with a file with 1 million rows) it takes "forever" to run. I wonder if I could optimize this computation somehow.
How could be calculated a day of the week if we know the day number of the first day in month?
Lets say we have 1..7 days in a week
I want to get number of the 4th day in the month if the 1st = 5 (Friday) then result should be 1 (Monday).
1st - 5 Friday
2nd - 6 Saturday
3rd - 7 Sunday
4th - 1 Monday
(a=4, b=5) = 1
I tried to calculate the common formula:
result = (a + b - 1) % 7
So it works for all cases except the case when a = 3, 10, 17, 24, 31, because result = 0, but should be 7.
How can it be fixed to get this formula work for all days?
You need to avoid the result zero. Here is one way:
result = (a + b - 2) % 7 + 1
You subtract one more from your sum, to allow zero and work on the previous day, then you take the remainder modulo 7 to get the day which can include zero, then add one to get to the day wanted and avoid zero. Note that the order of operations will do the modulus before adding one. If you want to make that more explicit, you could use
result = ((a + b - 2) % 7) + 1
I want to compare two continuous home price Sale, and create new column that stores binary variables.
This is my process so far:
dataset['High'] = dataset['November'].map(lambda x: 1 if x>50000 else 0)
This allows me to work on only one column, but I want to compare both November and December home price columns and create new column that contains binary variables.
I want this output
November - December - NewCol
-------------------------------
651200 - 626600 - 0
420900 - 423600 - 1
82300 - 83100 - 1
177000 - 169600 - 0
285500 - 206300 - 0
633900 - 640000 - 1
218900 - 222400 - 1
461700 - 403800 - 0
419100 - 421300 - 1
127600 - 128300 - 1
553400 - 547800 - 0
November and December is a continuous variable, and so I wanted by converting it to a binary variable. I want to use the ifelse() function to create a variable, called "NewCol", which takes on a value of "1" if the ['November'] column is greater than ['December'], and takes on a value of "0" otherwise.
Similar to #3novak but with casting. One uses pandas for greater efficiency but when you use something like map that needs values expressed as (more expensive) python variables, you may as well just use python lists. Try to use pandas operations that apply to entire series and dataframes instead.
>>> import pandas as pd
>>> df = pd.read_csv('test.csv')
>>> df
November December
0 651200 626600
1 420900 423600
2 82300 83100
3 177000 169600
4 285500 206300
5 633900 640000
6 218900 222400
7 461700 403800
8 419100 421300
9 127600 128300
10 553400 547800
>>> df['Higher'] = df['December'].gt(df['November']).astype(int)
>>> df
November December Higher
0 651200 626600 0
1 420900 423600 1
2 82300 83100 1
3 177000 169600 0
4 285500 206300 0
5 633900 640000 1
6 218900 222400 1
7 461700 403800 0
8 419100 421300 1
9 127600 128300 1
10 553400 547800 0
Answer: This would do the trick.
dataset['deff'] = np.where(dataset['2016-11'] >= dataset['2016-12'], 0,1)
If I understand correctly, you can use the following to create a boolean column. We don't even need to use an ifelse statement. Instead we can use the vectorized nature of pandas data frames.
data['NewCol'] = data['November'] > data['December']
This returns a column of True and False values instead of 1 and 0, but they are functionally equivalent. You can sum, take means, etc. treating True as 1 and False as 0.