I'm trying to write a script to randomise a round-robin schedule for a tournament.
The constraints are:
8 Teams
Teams face each other twice, once at home and once away
14 weeks, one game for each team per week
My code works fine in theory, but when it's generated it sometimes freezes on certain weeks when there are only two teams left for that week, and both possible games have already been played. I use a numpy array to check which matchups have been played.
At the moment my code looks like this:
import random
import numpy
regular_season_games = 14
regular_season_week = 0
checker = numpy.full((8,8), 0)
for x in range (0,8):
checker[x][x] = 1
teams_left = list(range(8))
print ("Week " + str(regular_season_week+1))
while (regular_season_week < regular_season_games):
game_set = False
get_away_team = False
while get_away_team == False:
Team_A = random.choice(teams_left)
if 0 in checker[:,Team_A]:
for x in range (0,8):
if checker[x][Team_A] == 0 and x in teams_left:
teams_left.remove(Team_A)
get_away_team = True
break
while game_set == False:
Team_B = random.choice(teams_left)
if checker[Team_B][Team_A] == 0:
teams_left.remove(Team_B)
print(str(Team_A) + " vs " + str(Team_B))
checker[Team_B][Team_A] = 1
game_set = True
if not teams_left:
print ("Week " + str(regular_season_week+2))
teams_left = list(range(8))
regular_season_week = regular_season_week + 1
I've used an adaptation of the scheduling algorithm from here to achieve this. Basically, we generate a list of the teams - list(range(8)) - and choose as our initial matchup 0 vs 4, 1 vs 5, 2 vs 6, 3 vs 7. We then rotate the list, excluding the first element, and choose as our next matchup 0 vs 3, 7 vs 4, 1 vs 5, 2 vs 6. We continue on in the following way until we have every pairing.
I've added a handler for home & away matches - if a pairing has already been played, we play the opposite home/away pairing. Below is the code, including a function to check if a list of games is valid, and a sample output.
Code:
import random
# Generator function for list of matchups from a team_list
def games_from_list(team_list):
for i in range(4):
yield team_list[i], team_list[i+4]
# Function to apply rotation to list of teams as described in article
def rotate_list(team_list):
team_list = [team_list[4]] + team_list[0:3] + team_list[5:8] + [team_list[3]]
team_list[0], team_list[1] = team_list[1], team_list[0]
return team_list
# Function to check if a list of games is valid
def checkValid(game_list):
if len(set(game_list)) != len(game_list):
return False
for week in range(14):
teams = set()
this_week_games = game_list[week*4:week*4 + 4]
for game in this_week_games:
teams.add(game[0])
teams.add(game[1])
if len(teams) < 8:
return False
else:
return True
# Generate list of teams & empty list of games played
teams = list(range(8))
games_played = []
# Optionally shuffle teams before generating schedule
random.shuffle(teams)
# For each week -
for week in range(14):
print(f"Week {week + 1}")
# Get all the pairs of games from the list of teams.
for pair in games_from_list(teams):
# If the matchup has already been played:
if pair in games_played:
# Play the opposite match
pair = pair[::-1]
# Print the matchup and append to list of games.
print(f"{pair[0]} vs {pair[1]}")
games_played.append(pair)
# Rotate the list of teams
teams = rotate_list(teams)
# Checks that the list of games is valid
print(checkValid(games_played))
Sample Output:
Week 1
0 vs 7
4 vs 3
6 vs 1
5 vs 2
Week 2
0 vs 3
7 vs 1
4 vs 2
6 vs 5
Week 3
0 vs 1
3 vs 2
7 vs 5
4 vs 6
Week 4
0 vs 2
1 vs 5
3 vs 6
7 vs 4
Week 5
0 vs 5
2 vs 6
1 vs 4
3 vs 7
Week 6
0 vs 6
5 vs 4
2 vs 7
1 vs 3
Week 7
0 vs 4
6 vs 7
5 vs 3
2 vs 1
Week 8
7 vs 0
3 vs 4
1 vs 6
2 vs 5
Week 9
3 vs 0
1 vs 7
2 vs 4
5 vs 6
Week 10
1 vs 0
2 vs 3
5 vs 7
6 vs 4
Week 11
2 vs 0
5 vs 1
6 vs 3
4 vs 7
Week 12
5 vs 0
6 vs 2
4 vs 1
7 vs 3
Week 13
6 vs 0
4 vs 5
7 vs 2
3 vs 1
Week 14
4 vs 0
7 vs 6
3 vs 5
1 vs 2
True
Related
I am working on a small task in which I have to find the distance between two nodes. Each node has X and Y coordinates which can be seen below.
node_number X_coordinate Y_coordinate
0 0 1 0
1 1 1 1
2 2 1 2
3 3 1 3
4 4 0 3
5 5 0 4
6 6 1 4
7 7 2 4
8 8 3 4
9 9 4 4
10 10 4 3
11 11 3 3
12 12 2 3
13 13 2 2
14 14 2 1
15 15 2 0
For the purpose I mentioned above, I wrote below code,
X1_coordinate = df['X_coordinate'].tolist()
Y1_coordinate = df['Y_coordinate'].tolist()
node_number1 = df['node_number'].tolist()
nodal_dist = []
i = 0
for i in range(len(node_number1)):
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
nodal_dist.append(dist)
I got the error
list index out of range
Kindly let me know what I am doing wrong and what should I change to get the answer.
Indexing starts at zero, so the last element in the list has an index that is one less than the number of elements in that list. But the len() function gives you the number of elements in the list (in other words, it starts counting at 1), so you want the range of your loop to be len(node_number1) - 1 to avoid an -off-by-one error.
The problems should been in this line
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
the X1_coordinate[i+1] and the ] Y1_coordinate[i+1]] go out of range on the last number call.
I have a dataframe of 12 different teams with their own statistics. My objective is to repeat an entire series of steps for one team, and so on, until the last team has been processed. My code currently correctly calculates statistics for only the first row of the dataframe. I want to repeat these lines of code for each row of the dataframe. I figured that a for loop would be the way to do so, but I'm struggling with the arguments to pass through. Any help would be appreciated, thank you.
import pandas as pd
stats = pd.read_csv('question2_data .csv')
print(stats)
team_count = 0
Output:
Team ID Wins Losses Ties
0 9867 4 2 3
1 1234 7 5 2
2 6213 9 7 0
3 1231 12 2 2
4 8821 2 7 7
5 1131 8 0 8
6 7761 10 3 3
7 6831 0 16 0
8 3131 16 0 0
9 3131 0 0 16
10 8424 0 0 0
11 4211 4 4 4
team_id = stats.iloc[0]['Team ID']
win_count = stats.iloc[0]['Wins']
loss_count = stats.iloc[0]['Losses']
tie_count = stats.iloc[0]['Ties']
print('Team', team_id)
print(win_count, 'Wins', loss_count, 'Losses', tie_count, 'Ties')
game_count = win_count + loss_count + tie_count
remaining_games_count = 16 - game_count
if (game_count == 16):
print('Games played:', game_count, 'The teams season is finished')
elif (game_count < 16):
print('Games played:', game_count, 'Games remaining:', remaining_games_count)
win_avg = round((win_count/game_count), 4)
print('Winning average:', win_avg)
if (tie_count >= win_count):
print('Games tied are greater than or equal to games won')
else:
print('Games tied are not greater than or equal to games won')
if (tie_count > loss_count):
print('Games tied are greater than games lost')
else:
print('Games tied are not greater than games lost')
wip_tot = win_count + tie_count - (loss_count*3)
if (wip_tot%2==0):
wip_tot = 0
print('WIP total:', wip_tot)
Your code calculates the statistics for the first row because you're using stats.iloc[0]. So just replace the 0 for your iterator in the for loop:
for i in range(12):
team_id = stats.iloc[i]['Team ID']
win_count = stats.iloc[i]['Wins']
loss_count = stats.iloc[i]['Losses']
tie_count = stats.iloc[i]['Ties']
etc...
You can use stats.shape(0) to get the number of rows.
Bonus: There's a pd.DataFrame.apply() function if you want to get each statistic in a new column.
I have a DataFrame that contains gas concentrations and the corresponding valve number. This data was taken continuously where we switched the valves back and forth (valves=1 or 2) for a certain amount of time to get 10 cycles for each valve value (20 cycles total). A snippet of the data looks like this (I have 2,000+ points and each valve stayed on for about 90 seconds each cycle):
gas1 valveW time
246.9438 2 1
247.5367 2 2
246.7167 2 3
246.6770 2 4
245.9197 1 5
245.9518 1 6
246.9207 1 7
246.1517 1 8
246.9015 1 9
246.3712 2 10
247.0826 2 11
... ... ...
My goal is to save the last N points of each valve's cycle. For example, the first cycle where valve=1, I want to index and save the last N points from the end before the valve switches to 2. I would then save the last N points and average them to find one value to represent that first cycle. Then I want to repeat this step for the second cycle when valve=1 again.
I am currently converting from Matlab to Python so here is the Matlab code that I am trying to translate:
% NOAA high
n2o_noaaHigh = [];
co2_noaaHigh = [];
co_noaaHigh = [];
h2o_noaaHigh = [];
ind_noaaHigh_end = zeros(1,length(t_c));
numPoints = 40;
for i = 1:length(valveW_c)-1
if (valveW_c(i) == 1 && valveW_c(i+1) ~= 1)
test = (i-numPoints):i;
ind_noaaHigh_end(test) = 1;
n2o_noaaHigh = [n2o_noaaHigh mean(n2o_c(test))];
co2_noaaHigh = [co2_noaaHigh mean(co2_c(test))];
co_noaaHigh = [co_noaaHigh mean(co_c(test))];
h2o_noaaHigh = [h2o_noaaHigh mean(h2o_c(test))];
end
end
ind_noaaHigh_end = logical(ind_noaaHigh_end);
This is what I have so far for Python:
# NOAA high
n2o_noaaHigh = [];
co2_noaaHigh = [];
co_noaaHigh = [];
h2o_noaaHigh = [];
t_c_High = []; # time
for i in range(len(valveW_c)):
# NOAA HIGH
if (valveW_c[i] == 1):
t_c_High.append(t_c[i])
n2o_noaaHigh.append(n2o_c[i])
co2_noaaHigh.append(co2_c[i])
co_noaaHigh.append(co_c[i])
h2o_noaaHigh.append(h2o_c[i])
Thanks in advance!
I'm not sure if I understood correctly, but I guess this is what you are looking for:
# First we create a column to show cycles:
df['cycle'] = (df.valveW.diff() != 0).cumsum()
print(df)
gas1 valveW time cycle
0 246.9438 2 1 1
1 247.5367 2 2 1
2 246.7167 2 3 1
3 246.677 2 4 1
4 245.9197 1 5 2
5 245.9518 1 6 2
6 246.9207 1 7 2
7 246.1517 1 8 2
8 246.9015 1 9 2
9 246.3712 2 10 3
10 247.0826 2 11 3
Now you can use groupby method to get the average for the last n points of each cycle:
n = 3 #we assume this is n
df.groupby('cycle').apply(lambda x: x.iloc[-n:, 0].mean())
Output:
cycle 0
1 246.9768
2 246.6579
3 246.7269
Let's call your DataFrame df; then you could do:
results = {}
for k, v in df.groupby((df['valveW'].shift() != df['valveW']).cumsum()):
results[k] = v
print(f'[group {k}]')
print(v)
Shift(), as it suggests, shifts the column of the valve cycle allows to detect changes in number sequences. Then, cumsum() helps to give a unique number to each of the group with the same number sequence. Then we can do a groupby() on this column (which was not possible before because groups were either of ones or twos!).
which gives e.g. for your code snippet (saved in results):
[group 1]
gas1 valveW time
0 246.9438 2 1
1 247.5367 2 2
2 246.7167 2 3
3 246.6770 2 4
[group 2]
gas1 valveW time
4 245.9197 1 5
5 245.9518 1 6
6 246.9207 1 7
7 246.1517 1 8
8 246.9015 1 9
[group 3]
gas1 valveW time
9 246.3712 2 10
10 247.0826 2 11
Then to get the mean for each cycle; you could e.g. do:
df.groupby((df['valveW'].shift() != df['valveW']).cumsum()).mean()
which gives (again for your code snippet):
gas1 valveW time
valveW
1 246.96855 2.0 2.5
2 246.36908 1.0 7.0
3 246.72690 2.0 10.5
where you wouldn't care much about the time mean but the gas1 one!
Then, based on results you could e.g. do:
n = 3
mean_n_last = []
for k, v in results.items():
if len(v) < n:
mean_n_last.append(np.nan)
else:
mean_n_last.append(np.nanmean(v.iloc[len(v) - n:, 0]))
which gives [246.9768, 246.65796666666665, nan] for n = 3 !
If your dataframe is sorted by time you could get the last N records for each valve like this.
N=2
valve1 = df[df['valveW']==1].iloc[-N:,:]
valve2 = df[df['valveW']==2].iloc[-N:,:]
If it isn't currently sorted you could easily sort it like this.
df.sort_values(by=['time'])
I am trying to set up a recursive game solver (for the cracker-barrel peg game). The recursive function appears to not be operating correctly, and some outputs are created with no trace of how they were created (despite logging all steps). Is it possible that the python recursion steps are overwriting eachother?
I have already tried adding in print statements at all steps of the way. The game rules and algorithms work correctly, but the recursive play algorithm is not operating as expected
def recursive_play(board, moves_list, move_history, id, first_trial, recurse_counter):
# Check how many moves are left
tacks_left = len(char_locations(board, character=tack, grid=True))
log_and_print(f"tacks_left: {tacks_left}")
log_and_print(f"moves_left: {len(moves_list)}")
log_and_print(f"moves_list: {moves_list}")
if (len(moves_list) == 0):
if (tacks_left == 1):
# TODO: Remove final move separator
log_and_print(f"ONE TACK LEFT :)!!!!")
log_and_print(f"move_history to retrun for win: {move_history}")
return move_history
pass
elif (len(moves_list) > 0):
# Scan through all moves and make them recursively
for move in moves_list:
if first_trial:
id += 1
else:
# id += 1
id = id
next_board = make_move(board, move)
next_moves = possible_moves(next_board)
if first_trial:
next_history = "START: " + move
else:
next_history = move_history + round_separator + move
# log_and_print(f"og_board:")
prettify_board(board)
log_and_print(f"move: {move}")
log_and_print(f"next_board:")
prettify_board(next_board)
# log_and_print(f"next_moves: {next_moves}")
log_and_print(f"next_history: {next_history}")
log_and_print(f"id: {id}")
log_and_print(f"recurse_counter: {recurse_counter}")
# NOTE: Would this be cleaner with queues?
recursive_play(next_board, moves_list=next_moves, move_history=next_history, id=id, first_trial=False, recurse_counter=recurse_counter+1)
log_and_print(f"finished scanning all moves for board: {board}")
I expect all steps to be logged, and "START" should only occur on the first trial. However, a mysterious "START" appears in a later step with no trace of how that board was created.
Good Output:
INFO:root:next_history: START: 4 to 2 to 1 , 6 to 5 to 4 , 1 to 3 to 6 , 7 to 4 to 2
INFO:root:id: 1
INFO:root:recurse_counter: 3
INFO:root:tacks_left: 5
INFO:root:moves_left: 2
INFO:root:moves_list: ['9 to 8 to 7', '10 to 6 to 3']
INFO:root:o---
INFO:root:xo--
INFO:root:oox-
INFO:root:xoox
INFO:root:move: 9 to 8 to 7
INFO:root:next_board:
INFO:root:o---
INFO:root:xo--
INFO:root:oox-
INFO:root:xoox
INFO:root:next_history: START: 4 to 2 to 1 , 6 to 5 to 4 , 1 to 3 to 6 , 7 to 4 to 2 , 9 to 8 to 7
INFO:root:id: 1
INFO:root:recurse_counter: 4
INFO:root:tacks_left: 4
INFO:root:moves_left: 1
INFO:root:moves_list: ['10 to 6 to 3']
INFO:root:o---
INFO:root:xx--
INFO:root:ooo-
INFO:root:xooo
INFO:root:move: 10 to 6 to 3
INFO:root:next_board:
INFO:root:o---
INFO:root:xx--
INFO:root:ooo-
INFO:root:xooo
INFO:root:next_history: START: 4 to 2 to 1 , 6 to 5 to 4 , 1 to 3 to 6 , 7 to 4 to 2 , 9 to 8 to 7 , 10 to 6 to 3
Bad Output:
INFO:root:move: 6 to 3 to 1
INFO:root:next_board:
INFO:root:x---
INFO:root:xo--
INFO:root:ooo-
INFO:root:oooo
INFO:root:next_history: START: 6 to 3 to 1
INFO:root:id: 2
INFO:root:recurse_counter: 0
INFO:root:tacks_left: 2
INFO:root:moves_left: 1
INFO:root:moves_list: ['1 to 2 to 4']
INFO:root:o---
INFO:root:oo--
INFO:root:xoo-
INFO:root:oooo
INFO:root:move: 1 to 2 to 4
INFO:root:next_board:
INFO:root:o---
INFO:root:oo--
INFO:root:xoo-
INFO:root:oooo
INFO:root:next_history: START: 6 to 3 to 1 , 1 to 2 to 4
INFO:root:id: 2
INFO:root:recurse_counter: 1
INFO:root:tacks_left: 1
INFO:root:moves_left: 0
INFO:root:moves_list: []
INFO:root:ONE TACK LEFT :)!!!!
INFO:root:move_history to retrun for win: START: 6 to 3 to 1 , 1 to 2 to 4
INFO:root:finished scanning all moves for board: ['o---', 'oo--', 'xoo-', 'oooo']
Any tips anyone can provide would be greatly appreciated.
I'm trying to parse a logfile of our manufacturing process. Most of the time the process is run automatically but occasionally, the engineer needs to switch into manual mode to make some changes and then switches back to automatic control by the reactor software. When set to manual mode the logfile records the step as being "MAN.OP." instead of a number. Below is a representative example.
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
ser_orig = pd.Series(steps)
which results in
0 1
1 2
2 2
3 MAN.OP.
4 MAN.OP.
5 2
6 2
7 3
8 3
9 MAN.OP.
10 MAN.OP.
11 4
12 4
dtype: object
I need to detect the 'MAN.OP.' and make them distinct from each other. In this example, the two regions with values == 2 should be one region after detecting the manual mode section like this:
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
I have code that iterates over this series and produces the correct result when the series is passed to my object. The setter is:
#step_series.setter
def step_series(self, ss):
"""
On assignment, give the manual mode steps a unique name. Leave
the steps done on recipe the same.
"""
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
counter = 0
continuous = False
for i in ss.index:
if continuous and ss.at[i] != manual_mode:
continuous = False
counter += 1
elif not continuous and ss.at[i] == manual_mode:
continuous = True
ss.at[i] = new_manual_mode_text.format(str(counter))
elif continuous and ss.at[i] == manual_mode:
ss.at[i] = new_manual_mode_text.format(str(counter))
self._step_series = ss
but this iterates over the entire dataframe and is the slowest part of my code other than reading the logfile over the network.
How can I detect these non-unique sections and rename them uniquely without iterating over the entire series? The series is a column selection from a larger dataframe so adding extra columns is fine if needed.
For the completed answer I ended up with:
#step_series.setter
def step_series(self, ss):
pd.options.mode.chained_assignment = None
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
newManOp = (ss=='MAN.OP.') & (ss != ss.shift())
ss[ss == 'MAN.OP.'] = 'Manual_Mode_' + (newManOp.cumsum()-1).astype(str)
self._step_series = ss
Here's one way:
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
steps = pd.Series(steps)
newManOp = (steps=='MAN.OP.') & (steps != steps.shift())
steps[steps=='MAN.OP.'] += seq.cumsum().astype(str)
>>> steps
0 1
1 2
2 2
3 MAN.OP.1
4 MAN.OP.1
5 2
6 2
7 3
8 3
9 MAN.OP.2
10 MAN.OP.2
11 4
12 4
dtype: object
To get the exact format you listed (starting from zero instead of one, and changing from "MAN.OP." to "Manual_mode_"), just tweak the last line:
steps[steps=='MAN.OP.'] = 'Manual_Mode_' + (seq.cumsum()-1).astype(str)
>>> steps
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
There a pandas enhancement request for contiguous groupby, which would make this type of task simpler.
There is s function in matplotlib that takes a boolean array and returns a list of (start, end) pairs. Each pair represents a contiguous region where the input is True.
import matplotlib.mlab as mlab
regions = mlab.contiguous_regions(ser_orig == manual_mode)
for i, (start, end) in enumerate(regions):
ser_orig[start:end] = new_manual_mode_text.format(i)
ser_orig
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object