Retrieve list of weekdays for any case - python

Hi everybody I have a small issue of coding:
I have a list of numbers from 0 to 6 that represent days of the week from Sunday to Saturday
So Sunday = 0 and Saturday = 6
So the list is [0,1,2,3,4,5,6]
There is no problem to retrieve the list between two days (or number) if the from < to.
example:
from Monday= 1 to Thursday= 4 there are these numbers included : [1,2,3,4]
Problems come when you have to retrieve the list of days when from > to:
example:
from Friday = 5 to Tuesday =2 we need to catch this list : [5, 6, 0, 1, 2]
Do you have idea how can I code an algorithm or a function that will give me a list of days (here numbers) to include if I give a number "from" and a number "to" whatever "from" is inferior or superior to the "to" value.
Thank you very much for your help.

I would do it this way:
def getweekdays(frm, to):
days = [0,1,2,3,4,5,6]
if frm <= to:
return days[frm:to+1]
else:
return days[frm:] + days[:to+1]
(I haven't checked the code, but you get the idea :) )

The easiest aproach for this would be:
day = first_day
while int day != last_day:
interval_days.add(day)
if day > 6:
day = 0
Initialize the day counter at the bottom limit of the day, then keep increasing the counter until the day matches the top limit, if it goes over the biggest day (6) you set it back into the first day of the week(0).

You can use list slicing:
s = [0,1,2,3,4,5,6]
days_of_week = dict(zip(['sun', 'mon', 'tue', 'wed', 'th', 'fri', 'sat'], s))
def get_days(day_range):
return range(days_of_week[day_range[0]], days_of_week[day_range[-1]]+1) if days_of_week[day_range[0]] <= days_of_week[day_range[-1]] else s[days_of_week[day_range[0]]:]+s[:days_of_week[day_range[-1]]+1]
ranges = [['fri', 'tue'], ['th', 'sat'], ['mon', 'th']]
print(list(map(get_days, ranges))))
Output:
[[5, 6, 0, 1, 2], [4, 5, 6], [1, 2, 3, 4]]

days_of_week = [0,1,2,3,4,5,6]
day_1 = 5
day_2 = 2
def days_needed(days_of_week, day_1, day_2, rollover=True):
if rollover:
day_list = days_of_week[day_1:day_2 - 1:-1]
else:
day_list = days_of_week[day_1:day_2 + 1]
return day_list
day_list = days_needed(days_of_week,day_1,day_2,rollover=True)
print(day_list)
This should do it. Let me know if you need any tweaking.

Related

Count by groups (pandas)

I have 5 years of stock date. I need to do this: take years 1, 2 and 3 What is the probability that after seeing k consecutive ”down days”, the next day is an ”up day”? For example, if k = 3, what is the probability of seeing ”−, −, −, +” as opposed to seeing ”−, −, −, −”. Compute this for k = 1, 2, 3. I have played with groupby and cumsum, but can't seem to get it right.
For example:
group1 = df[df['True Label'] == "-"].groupby((df['True Label'] != "-").cumsum())
Date
True Label
2019-01-02
+
2019-01-03
-
2019-01-04
+
2019-01-07
+
2019-01-08
+
Try this bit of logic:
import pandas as pd
import numpy as np
np.random.seed(123)
s = pd.Series(np.random.choice(['+','-'], 1000))
sm = s.groupby((s == '+').cumsum()).cumcount()
prob = (sm.diff() == -3).sum() / (sm == 3).sum()
prob
Output:
0.43661971830985913
Details:
Use (s == '+').cumsum() to create groups of '-' records, groupby and cumcount the elements in this group, the first element is the '+' and cumcount starts with zero. There fore '+--' will become 0, 1, 2. Now, take the difference to find out where '-' turns to '+'.
If this is equal to -3 then we know this group has three minus and is followed by a '+'.
Check sm==3 to get to all number of times you hand '---', sum then divide.

Retrieve indices of specific date

I am trying to retrieve the indices corresponding to a specific month for my data set.
This is for keeping track of the index lines for training a Neural Network after the train-test data split. I want to know for which dates my predictions correspond to.
I have tried the following in which i retrieve the indices corresponding to a specific day. Is there a way to have an argument like * in the day so i retrieve 1 whole month
target_date = pd.to_datetime('2013-10-24').date()
metadata.loc[metadata.Starttime.dt.date == target_date, :].index.values
which gives
array([0, 1], dtype=int64)
I would expect something like:
array([10, 14, 17], dtype=int64)
Where 10,14,17 are the indices corresponding to the month i searched for, not a specific day
Example:
installation = range(0,5)
equipment = range(0,5)
tag_name = range(0,5)
start_time = ['2013-10-15 02:30:24.670', '2013-9-15 02:30:24.670', '2013-8-15 02:30:24.670', '2013-7-15 02:30:24.670', '2013-6-15 02:30:24.670']
dic = {'Installation':installation,'Equipment':equipment,'Tag name':tag_name,'Starttime':start_time,}
metadata = pd.DataFrame(dic) #Create the dataframe
metadata['Starttime'] = pd.to_datetime(metadata['Starttime'])
target_date = pd.to_datetime('2013-10-15').date()
metadata.loc[metadata.Starttime.dt.date == target_date, :].index.values
You can just change your filtering condition to get the whole month:
metadata.loc[
(metadata.Starttime.dt.month == 10) &
(metadata.Starttime.dt.year == 2013)
].index.values
I think you need compare month peiod created by Series.dt.to_period for column and for scalar Timestamp.to_period:
installation = range(0,5)
equipment = range(0,5)
tag_name = range(0,5)
#change first 3 datetimes for same months
start_time = ['2013-10-15 02:30:24.670', '2013-10-16 02:30:24.670', '2013-10-17 02:30:24.670',
'2013-7-15 02:30:24.670', '2013-6-15 02:30:24.670']
dic = {'Installation':installation,'Equipment':equipment,
'Tag name':tag_name,'Starttime':start_time}
metadata = pd.DataFrame(dic) #Create the dataframe
metadata['Starttime'] = pd.to_datetime(metadata['Starttime'])
print (metadata)
Installation Equipment Tag name Starttime
0 0 0 0 2013-10-15 02:30:24.670
1 1 1 1 2013-10-16 02:30:24.670
2 2 2 2 2013-10-17 02:30:24.670
3 3 3 3 2013-07-15 02:30:24.670
4 4 4 4 2013-06-15 02:30:24.670
target_date = pd.to_datetime('2013-10-15').to_period('m')
idx = metadata.loc[metadata.Starttime.dt.to_period('m') == target_date].index.values
print (idx)
[0 1 2]

Pythonic way to map yyyymm formated column into a numeric column?

Sorry if it's not totally clear in the title, but the point is I have a Pandas DataFrame with the following Date column:
Date
201611
201612
201701
And I want to map that so I have a period column that takes value 1 for the first period, and then starts counting one by one until the last period, like this:
Date Period
201611 1
201612 2
201701 3
I achieved what I want doing this:
dic_t={}
for n,t in enumerate(sorted(df.Date.unique())):
dic_t[t]=n+1
df['Period']=df.Date.map(dic_t)
But it doesn't seem too pythonic. I guess I could achieve something similar using dictionary comprehensions, but I'm not good at them yet.
Any ideas?
pd.factorize can sort a list of items and return unique integer labels:
In [209]: pd.factorize(['201611','201612','201701','201702','201704','201612'], sort=True)[0]+1
Out[209]: array([1, 2, 3, 4, 5, 2])
Therefore you could use
df['Period'] = pd.factorize(df['Date'], sort=True)[0] + 1
pd.factorize returns both an array of labels and an array of unique values:
In [210]: pd.factorize(['201611','201612','201701','201702','201704','201612'], sort=True)
Out[210]:
(array([0, 1, 2, 3, 4, 1]),
array(['201611', '201612', '201701', '201702', '201704'], dtype=object))
Since, in this question, it appears you only want the labels, I used pd.factorize(...)[0] to obtain just the labels.
So, based on the info from the question and the comments, the enumeration of the periods (combinations of year and month) should start at the first period that is present in the dataframe.
For that purpose, your code works just fine. If you think that dict comprehensions look "more pythonic", you could express that as:
period_dict = {
period: i+1
for i, period in enumerate(sorted(df.Date.unique()))}
df['Period'] = df.Date.map(period_dict)
Just note: with this method, if for some reason there aren't any datapoints for a month after the start month, that month will not have a period number assigned for it.
For example, if you have no data for march 2017, then:
Date Period
201611 1
201612 2
201701 3
201702 4
201704 5 <== April is period 5 and not 6
If you need to generate the full enumeration for all possible periods, use something like this:
start_year = 2016
end_year = 2018
period_list = [
y*100 + m
for y in range(start_year, end_year+1)
for m in range(1, 13)]
period_dict = {
period: i+1
for i, period in enumerate(period_list)}
df['Period'] = df.Date.map(period_dict)

Find the minimum value between a current and previous crossovers of moving-averages in Pandas

I have a large dataframe of stockprice data with df.columns = ['open','high','low','close']
Problem definition:
When an EMA crossover happens, i am mentioning df['cross'] = cross. Everytime a crossover happens, if we label the current crossover as crossover4, I want to check if the minimum value of df['low'] between crossover 3 and 4 IS GREATER THAN the minimum value of df['low'] between crossover 1 and 2. I have made an attempt at the code based on the help i have received from 'Gherka' so far. I have indexed the crossing over and found minimum values between consecutive crossovers.
So, everytime a crossover happens, it has to be compared with the previous 3 crossovers and I need to check MIN(CROSS4,CROSS 3) > MIN(CROSS2,CROSS1).
I would really appreciate it if you guys could help me complete.
import pandas as pd
import numpy as np
import bisect as bs
data = pd.read_csv("Nifty.csv")
df = pd.DataFrame(data)
df['5EMA'] = df['Close'].ewm(span=5).mean()
df['10EMA'] = df['Close'].ewm(span=10).mean()
condition1 = df['5EMA'].shift(1) < df['10EMA'].shift(1)
condition2 = df['5EMA'] > df['10EMA']
df['cross'] = np.where(condition1 & condition2, 'cross', None)
cross_index_array = df.loc[df['cross'] == 'cross'].index
def find_index(a, x):
i = bs.bisect_left(a, x)
return a[i-1]
def min_value(x):
"""Find the minimum value of 'Low' between crossovers 1 and 2, crossovers 3 and 4, etc..."""
cur_index = x.name
prev_cross_index = find_index(cross_index_array, cur_index)
return df.loc[prev_cross_index:cur_index, 'Low'].min()
df['min'] = None
df['min'][df['cross'] == 'cross'] = df.apply(min_value, axis=1)
print(df)
This should do the trick:
import pandas as pd
df = pd.DataFrame({'open': [1, 2, 3, 4, 5],
'high': [5, 6, 6, 5, 7],
'low': [1, 3, 3, 4, 4],
'close': [3, 5, 3, 5, 6]})
df['day'] = df.apply(lambda x: 'bull' if (
x['close'] > x['open']) else None, axis=1)
df['min'] = None
df['min'][df['day'] == 'bull'] = pd.rolling_min(
df['low'][df['day'] == 'bull'], window=2)
print(df)
# close high low open day min
# 0 3 5 1 1 bull NaN
# 1 5 6 3 2 bull 1
# 2 3 6 3 3 None None
# 3 5 5 4 4 bull 3
# 4 6 7 4 5 bull 4
Open for comments!
If I understand your question correctly, you need a dynamic "rolling window" over which to calculate the minimum value. Assuming your index is a default one meaning it's sorted in the ascending order, you can try the following approach:
import pandas as pd
import numpy as np
from bisect import bisect_left
df = pd.DataFrame({'open': [1, 2, 3, 4, 5],
'high': [5, 6, 6, 5, 7],
'low': [1, 3, 2, 4, 4],
'close': [3, 5, 3, 5, 6]})
This uses the same sample data as mommermi, but with low on the third day changed to 2 as the third day should also be included in the "rolling window".
df['day'] = np.where(df['close'] > df['open'], 'bull', None)
We calculate the day column using vectorized numpy operation which should be a little faster.
bull_index_array = df.loc[df['day'] == 'bull'].index
We store the index values of the rows (days) that we've flagged as bulls.
def find_index(a, x):
i = bisect_left(a, x)
return a[i-1]
Bisect from the core library will enable us to find the index of the previous bull day in an efficient way. This requires that the index is sorted which it is by default.
def min_value(x):
cur_index = x.name
prev_bull_index = find_index(bull_index_array, cur_index)
return df.loc[prev_bull_index:cur_index, 'low'].min()
Next, we define a function that will create our "dynamic" rolling window by slicing the original dataframe by previous and current index.
df['min'] = df.apply(min_value, axis=1)
Finally, we apply the min_value function row-wise to the dataframe, yielding this:
open high low close day min
0 1 5 1 3 bull NaN
1 2 6 3 5 bull 1.0
2 3 6 2 3 None 2.0
3 4 5 4 5 bull 2.0
4 5 7 4 6 bull 4.0

Python date function bugs

I am trying to create a function in python which will display the date. So I can see the program run, I have set one day to five seconds, so every five seconds it will become the next 'day' and it will print the date.
I know there is already an in-build function for displaying a date, however I am very new to python and I am trying to improve my skills (so excuse my poor coding.)
I have set the starting date to the first of January, 2000.
Here is my code:
import time
def showDate():
year = 00
month = 1
day = 1
oneDay = 5
longMonths = [1, 3, 5, 7, 8, 10, 12]
shortMonths = [4, 6, 9, 11]
while True:
time.sleep(1)
oneDay = oneDay - 1
if oneDay == 0:
if month in longMonths:
if day > 31:
day = day + 1
else:
month = month + 1
day = 0
if month == 2:
if day > 28:
day = day + 1
else:
month = month + 1
day = 0
if month in shortMonths:
if day > 30:
day = day + 1
else:
month = month + 1
day = 0
if day == 31 and month == 12:
year = year + 1
print(str(day) + '/' + str(month) + '/' + str(year))
oneDay = 5
showDate()
However, when I try to run the program this is the output I get this:
>>>
0/3/0
0/5/0
0/7/0
0/8/0
0/10/0
0/12/0
0/13/0
0/13/0
0/13/0
I don't know why this is happening, could someone please suggest a solution?
There's no possible path through your code where day gets incremented.
I think you are actually confused between > and <: you check if day is greater than 31 or 28, which it never is. I think you mean if day < 31: and so on.
First of all, it's easier to just set time.sleep(5) instead of looping over time.sleep(1) 5 times. It's better to have a list of values with days of the month, not just 2 lists of the long and short months. Also your while loop is currently indefinite, is that intentional?
Anyway, your main problem was comparing day > 31, but there's lots of things that can be improved. As I said, I'm removing the use of oneDay to just do sleep(5) as it's cleaner and having one daysInMonths list.
import time
def showDate():
year = 00
month = 1
day = 1
daysInMonths = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
Now you can have only one if check about if the day has reached the end of a month, like this:
while True:
time.sleep(5)
if day < daysInMonths[month-1]:
day += 1
This will check the index of the list for the current month. It uses -1 because lists begin at index 0, and your months begin at 1. (ie. the months run from 1-12 but the list's indices are 0-11). Also I used the += operator, which is basically short hand for var = var + something. It works the same and looks neater.
This test encompasses all months, and then the alternative scenario is that you need to increment the month. I recommend in this block that you first check if the month is 12 and then increment the year from there. Also you should be setting day and month back to 1, since that was their starting value. If it's not the end of the year, increment the month and set day back to 1.
else:
if month == 12:
year += 1
day = 1
month = 1
else:
month += 1
day = 1
print("{}/{}/{}".format(day, month, year))
I also used the string.format syntax for neatness. With format, it will substitute the variables you pass in for {} in the string. It makes it easier to lay out how the string should actually look, and it converts the variables to string format implicitly.
Try this.
The day comparisons should be <, not >. When going to the next month, I set the day to 1, because there are no days 0 in the calendar. And I use elif for the subsequent month tests, because all the cases are exclusive.
def showDate():
year = 00
month = 1
day = 1
oneDay = 5
longMonths = [1, 3, 5, 7, 8, 10, 12]
shortMonths = [4, 6, 9, 11]
while True:
time.sleep(1)
oneDay = oneDay - 1
if oneDay == 0:
if month in longMonths:
if day < 31:
day = day + 1
else:
month = month + 1
day = 1
elif month == 2:
if day < 28:
day = day + 1
else:
month = month + 1
day = 1
if month in shortMonths:
if day < 30:
day = day + 1
else:
month = month + 1
day = 1
if day == 31 and month == 12:
year = year + 1
month = 1
print(str(day) + '/' + str(month) + '/' + str(year))
oneDay = 5

Categories

Resources