How to create optimized schedule that avoids duplicates [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
Improve this question
I've got a list of games between teams that takes place over a sixteen day period:
| Date | Game |
|------|-----------------------------|
| 1 | hot ice vs playerz |
| 1 | caps vs quiet storm |
| 1 | slick ice vs blizzard |
| 1 | flow vs 4x4's |
| 2 | avalanche vs cold force |
| 2 | freeze vs in too deep |
| 2 | game spot vs rare air |
| 2 | out of order vs cold as ice |
| 3 | playerz vs avalanche |
| 3 | quiet storm vs freeze |
| 3 | blizzard vs game spot |
| 3 | 4x4's vs out of order |
| 14 | freeze vs avalanche |
| 14 | out of order vs game spot |
| 14 | in too deep vs cold force |
| 14 | cold as ice vs rare air |
| 15 | blizzard vs quiet storm |
| 15 | playerz vs 4x4's |
| 15 | slick ice vs caps |
| 15 | hot ice vs flow |
| 16 | game spot vs freeze |
| 16 | avalanche vs out of order |
| 16 | rare air vs in too deep |
| 16 | cold force vs cold as ice |
There are 16 teams that make up this schedule, and what I'd like to do in Python is find all of the 8 game combinations that allow me to "see" each team once. The only limitation is that I can only see one game per day. At this point all I can think of is a ton of nested for loops that generates all possible schedules, and then checking each one after to see if it is valid. A valid schedule is one that has one game per date and sees each team once.

You could use a backtracking algorithm to iterate through different combinations of matches and filtering them according to the constraints you mentioned.
First step would be to format your data into a collection like a python list or dict. Then implement a recursive backtracking algorithm that selects one match per day, and checks to make sure the chosen match doesn't include teams you have already selected.
Here is a rough example that uses the data you provided in your question:
def combinations(matches, day, schedules, current):
"""Backtracking function for selecting unique schedules."""
# base case when you have a match from each day
if day > max(matches.keys()):
schedules.append(current[:])
return
# skip over days where there are no matches
while day not in matches:
day += 1
# select one match for the current date
for i in range(len(matches[day])):
teams = matches[day][i]
current_teams = [j for i in current for j in i]
# check if the teams are already in the current schedule
if teams[0] in current_teams or teams[1] in current_teams:
continue
del matches[day][i]
# recursive case
combinations(matches, day + 1, schedules, current + [teams])
matches[day].insert(i,teams)
return
def format(inp):
"""Formats input data into a dictionary."""
lines = inp.split("\n")[2:] # split lines of input data
matches = [(line.split("|")[1:-1]) for line in lines]
schedule = {}
# add matches to dict with date as key and matches as value.
for day, match in matches:
day = int(day.strip())
teams = match.strip().split(" vs ")
try:
schedule[day].append(teams)
except KeyError:
schedule[day] = [teams]
ideal = []
# use backtracking algorithm to get desired results
combinations(schedule, 1, ideal, [])
show_schedules(ideal)
def show_schedules(results):
for i, x in enumerate(results):
print(f"Schedule {i+1}")
for day, match in enumerate(x):
print(f"Day: {day+1} - {match[0]} vs. {match[1]}")
print("\n")
format(inp) # <- entry point:`inp` is the pre-formatted data `str`
It's not exactly the most elegant code... :) With the example data this algorithm generates 32 unique schedules of 6 games. The output looks something like this but for each day of matches:
Schedule 1
Day: 1 - hot ice vs. playerz
Day: 2 - avalanche vs. cold force
Day: 3 - quiet storm vs. freeze
Day: 4 - out of order vs. game spot
Day: 5 - slick ice vs. caps
Day: 6 - rare air vs. in too deep
Schedule 2
Day: 1 - hot ice vs. playerz
Day: 2 - avalanche vs. cold force
Day: 3 - 4x4's vs. out of order
Day: 4 - cold as ice vs. rare air
Day: 5 - blizzard vs. quiet storm
Day: 6 - game spot vs. freeze
For more information on backtracking here are a few external resources or there are countless examples here on stack overflow.
https://www.hackerearth.com/practice/basic-programming/recursion/recursion-and-backtracking/tutorial/
http://jeffe.cs.illinois.edu/teaching/algorithms/book/02-backtracking.pdf

Related

How to write a Function in python pandas to append the rows in dataframe in a loop?

I am being provided with a data set and i am writing a function.
my objectice is quiet simple. I have a air bnb data base with various columns my onjective is simple. I am using a for loop over neighbourhood group list (that i created) and i am trying to extract (append) the data related to that particular element in a empty dataframe.
Example:
import pandas as pd
import numpy as np
dict1 = {'id' : [2539,2595,3647,3831,12937,18198,258838,258876,267535,385824],'name':['Clean & quiet apt home by the park','Skylit Midtown Castle','THE VILLAGE OF HARLEM....NEW YORK !','Cozy Entire Floor of Brownstone','1 Stop fr. Manhattan! Private Suite,Landmark Block','Little King of Queens','Oceanview,close to Manhattan','Affordable rooms,all transportation','Home Away From Home-Room in Bronx','New York City- Riverdale Modern two bedrooms unit'],'price':[149,225,150,89,130,70,250,50,50,120],'neighbourhood_group':['Brooklyn','Manhattan','Manhattan','Brooklyn','Queens','Queens','Staten Island','Staten Island','Bronx','Bronx']}
df = pd.DataFrame(dict1)
df
I created a function as follows
nbd_grp = ['Bronx','Queens','Staten Islands','Brooklyn','Manhattan']
# Creating a function to find the cheapest place in neighbourhood group
dfdf = pd.DataFrame(columns = ['id','name','price','neighbourhood_group'])
def cheapest_place(neighbourhood_group):
for elem in nbd_grp:
data = df.loc[df['neighbourhood_group']==elem]
cheapest = data.loc[data['price']==min(data['price'])]
dfdf = cheapest.copy()
cheapest_place(nbd_grp)
My Expected Output is :
id
name
Price
neighbourhood group
267535
Home Away From Home-Room in Bronx
50
Bronx
18198
Little King of Queens
70
Queens
258876
Affordable rooms,all transportation
50
Staten Island
3831
Cozy Entire Floor of Brownstone
89
Brooklyn
3647
THE VILLAGE OF HARLEM....NEW YORK !
150
Manhattan
My advice is that anytime you are working in a database or in a dataframe and you think "I need to loop", you should think again.
When in a dataframe you are in a world of set-based logic and there is likely a better set-based way of solving the problem. In your case you can groupby() your neighbourhood_group and get the min() of the price column and then merge or join that result set back to your original dataframe to get your id and name columns.
That would look something like:
df_min_price = df.groupby('neighbourhood_group').price.agg(min).reset_index().merge(df, on=['neighbourhood_group','price'])
+-----+---------------------+-------+--------+-------------------------------------+
| idx | neighbourhood_group | price | id | name |
+-----+---------------------+-------+--------+-------------------------------------+
| 0 | Bronx | 50 | 267535 | Home Away From Home-Room in Bronx |
| 1 | Brooklyn | 89 | 3831 | Cozy Entire Floor of Brownstone |
| 2 | Manhattan | 150 | 3647 | THE VILLAGE OF HARLEM....NEW YORK ! |
| 3 | Queens | 70 | 18198 | Little King of Queens |
| 4 | Staten Island | 50 | 258876 | Affordable rooms,all transportation |
+-----+---------------------+-------+--------+-------------------------------------+

Handle csv file with almost similar records but different times - need to group them as one record

I am attempting to resolve the below lab and having issues. This problem involves a csv input. There is criteria that the solution needs to meet. Any help or tips at all would be appreciated. My code is at the end of the problem along with my output.
Each row contains the title, rating, and all showtimes of a unique movie.
A space is placed before and after each vertical separator ('|') in each row.
Column 1 displays the movie titles and is left justified with a minimum of 44 characters.
If the movie title has more than 44 characters, output the first 44 characters only.
Column 2 displays the movie ratings and is right justified with a minimum of 5 characters.
Column 3 displays all the showtimes of the same movie, separated by a space.
This is the input:
16:40,Wonders of the World,G
20:00,Wonders of the World,G
19:00,End of the Universe,NC-17
12:45,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
15:00,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
19:30,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
10:00,Adventure of Lewis and Clark,PG-13
14:30,Adventure of Lewis and Clark,PG-13
19:00,Halloween,R
This is the expected output:
Wonders of the World | G | 16:40 20:00
End of the Universe | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull | PG | 12:45 15:00 19:30
Adventure of Lewis and Clark | PG-13 | 10:00 14:30
Halloween | R | 19:00
My code so far:
import csv
rawMovies = input()
repeatList = []
with open(rawMovies, 'r') as movies:
moviesList = csv.reader(movies)
for movie in moviesList:
time = movie[0]
#print(time)
show = movie[1]
if len(show) > 45:
show = show[0:44]
#print(show)
rating = movie[2]
#print(rating)
print('{0: <44} | {1: <6} | {2}'.format(show, rating, time))
My output doesn't have the rating aligned to the right and I have no idea how to filter for repeated movies without removing the time portion of the list:
Wonders of the World | G | 16:40
Wonders of the World | G | 20:00
End of the Universe | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull | PG | 12:45
Buffalo Bill And The Indians or Sitting Bull | PG | 15:00
Buffalo Bill And The Indians or Sitting Bull | PG | 19:30
Adventure of Lewis and Clark | PG-13 | 10:00
Adventure of Lewis and Clark | PG-13 | 14:30
Halloween | R | 19:00
You could collect the input data in a dictionary, with the title-rating-tuples as keys and the showtimes collected in a list, and then print the consolidated information. For example (you have to adjust the filename):
import csv
movies = {}
with open("data.csv", "r") as file:
for showtime, title, rating in csv.reader(file):
movies.setdefault((title, rating), []).append(showtime)
for (title, rating), showtimes in movies.items():
print(f"{title[:44]: <44} | {rating: >5} | {' '.join(showtimes)}")
Output:
Wonders of the World | G | 16:40 20:00
End of the Universe | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull | PG | 12:45 15:00 19:30
Adventure of Lewis and Clark | PG-13 | 10:00 14:30
Halloween | R | 19:00
Since the input seems to come in connected blocks you could also use itertools.groupby (from the standard library) and print while reading:
import csv
from itertools import groupby
from operator import itemgetter
with open("data.csv", "r") as file:
for (title, rating), group in groupby(
csv.reader(file), key=itemgetter(1, 2)
):
showtimes = " ".join(time for time, *_ in group)
print(f"{title[:44]: <44} | {rating: >5} | {showtimes}")
For this consider the max length of the rating string. Subtract the length of the rating from that value. Make a string of spaces of that length and append the rating.
so basically
your_desired_str = ' '*(6-len(Rating))+Rating
also just replace
'somestr {value}'.format(value)
with f strings, much easier to read
f'somestr {value}'
Below is what I ended up with after some tips from the community.
rawMovies = input()
outputList = []
with open(rawMovies, 'r') as movies:
moviesList = csv.reader(movies)
movieold = [' ', ' ', ' ']
for movie in moviesList:
if movieold[1] == movie[1]:
outputList[-1][2] += ' ' + movie[0]
else:
time = movie[0]
# print(time)
show = movie[1]
if len(show) > 45:
show = show[0:44]
# print(show)
rating = movie[2]
outputList.append([show, rating, time])
movieold = movie
# print(rating)
#print(outputList)
for movie in outputList:
print('{0: <44} | {1: <5} | {2}'.format(movie[0], movie[1].rjust(5), movie[2]))
I would use Python's groupby() function for this which helps you to group consecutive rows with the same value.
For example:
import csv
from itertools import groupby
with open('movies.csv') as f_movies:
csv_movies = csv.reader(f_movies)
for title, entries in groupby(csv_movies, key=lambda x: x[1]):
movies = list(entries)
showtimes = ' '.join(row[0] for row in movies)
rating = movies[0][2]
print(f"{title[:44]: <44} | {rating: >5} | {showtimes}")
Giving you:
Wonders of the World | G | 16:40 20:00
End of the Universe | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull | PG | 12:45 15:00 19:30
Adventure of Lewis and Clark | PG-13 | 10:00 14:30
Halloween | R | 19:00
So how does groupby() work?
When reading a CSV file you will get a row at a time. What groupby() does is to group rows together into mini-lists containing rows which have the same value. The value it looks for is given using the key parameter. In this case the lambda function is passed a row at a time and it returns the current value of x[1] which is the title. groupby() keeps reading rows until that value changes. It then returns the current list as entries as an iterator.
This approach does assume that the rows you wish to group are in consecutive rows in the file. You could even write you own kind of group by generator function:
def group_by_title(csv):
title = None
entries = []
for row in csv:
if title and row[1] != title:
yield title, entries
entries = []
title = row[1]
entries.append(row)
if entries:
yield title, entries
with open('movies.csv') as f_movies:
csv_movies = csv.reader(f_movies)
for title, entries in group_by_title(csv_movies):
showtimes = ' '.join(row[0] for row in entries)
rating = entries[0][2]
print(f"{title[:44]: <44} | {rating: >5} | {showtimes}")

How to Insert several separate characters in a sqlite3 cell?

I want to insert several different values ​​in just one cell
E.g.
Friends' names
ID | Grade | Names
----+--------------+----------------------------
1 | elementary | Kai, Matthew, Grace
2 | guidance | Eli, Zoey, David, Nora, William
3 | High school | Emma, James, Levi, Sophia
Or as a list or dictionary:
ID | Grade | Names
----+--------------+------------------------------
1 | elementary | [Kai, Matthew, Grace]
2 | guidance | [Eli, Zoey, David, Nora, William]
3 | High school | [Emma, James, Levi, Sophia]
or
ID | Grade | Names
----+--------------+---------------------------------------------
1 | elementary | { a:Kai, b:Matthew, c:Grace}
2 | guidance | { a:Eli, b:Zoey, c:David, d:Nora, e:William}
3 | High school | { a:Emma, b:James, c:Levi, d:Sophia}
Is there a way?
Yes there is a way, but that doesn't mean you should do it this way.
You could for example save your values as a json string and save them inside the column. If you later want to add a value you can simply parse the json, add the value and put it back into the database. (Might also work with a BLOB, but I'm not sure)
However, I would not recommend saving a list inside of a column, as SQL is not meant to be used like that.
What I would recommend is that you have a table and for every grade with its own primary key. Like this:
ID
Grade
1
Elementary
2
Guidance
3
High school
And then another table containing all the names, having its own primary key and the gradeId as its secondary key. E.g:
ID
GradeID
Name
1
1
Kai
2
1
Matthew
3
1
Grace
4
2
Eli
5
2
Zoey
6
2
David
7
2
Nora
8
2
William
9
3
Emma
10
3
James
11
3
Levia
12
3
Sophia
If you want to know more about this, you should read about Normalization in SQL.

Creating a new column in Pandas that uses a mapping to reduce another column's vals to a set of predetermined options

So I'm trying to take a dataframe like this (for example):
ID | reason_for_rejection
--------------------------
1 | invalid insurance
2 | behavior issues
3 | not enough money
4 | no space in hospital
5 | anger issues
...
and, using a hand-written mapping (for example {financial: [invalid insurance, not enough money], patient problems: [behavior issues, anger issues]...} create a new column containing the mapped values and turn this into:
ID | reason_for_rejection | reason_for_rejection_grouped
---------------------------------------------------------------
1 | invalid insurance | financial
2 | behavior issues | patient problems
3 | not enough money | financial
4 | no space in hospital | occupancy
5 | anger issues | patient problems
...
So while the 'reason_for_rejection' column will have a lot of unique values, I want to use some kind of a mapping that maps those unique values into 7 or 8 unique values in 'reason_for_rejection_grouped'.
I considered using a dictionary here, but the key would be a value in 'reason_for_rejection_grouped' and the values would be values in 'reason_for_rejection', so then I'd have to get the key based off the value which would be computationally expensive (and I have a really big dataset to look at).
Any guidance or suggestions would be super helpful!

Pandas Pivot table, how to put a series of columns in the values attribute

First of all, I apologize! It's my first time using stack overflow so I hope I'm doing it right! I searched but can't find what I'm looking for.
I'm also quite new with pandas and python :)
I am going to try to use an example and I will try to be clear.
I have a dataframe with 30 columns that contains information about a shopping cart, 1 of the columns (order) have 2 values, either completed of in progress.
And I have like 20 columns with items, lets say apple, orange, bananas... And I need to know how many times there is an apple in a complete order and how many in a in progress order. I decided to use a pivot table with the aggregate function count.
This would be a small example of the dataframe:
Order | apple | orange | banana | pear | pineapple | ... |
-----------|-------|--------|--------|------|-----------|------|
completed | 2 | 4 | 10 | 5 | 1 | |
completed | 5 | 4 | 5 | 8 | 3 | |
iProgress | 3 | 7 | 6 | 5 | 2 | |
completed | 6 | 3 | 1 | 7 | 1 | |
iProgress | 10 | 2 | 2 | 2 | 2 | |
completed | 2 | 1 | 4 | 8 | 1 | |
I have the output I want but what I'm looking for is a more elegant way of selecting lots of columns without having to type them manually.
df.pivot_table(index=['Order'], values=['apple', 'bananas', 'orange', 'pear', 'strawberry',
'mango'], aggfunc='count')
But I want to select around 15 columns, so instead of typing one by one 15 times, I'm sure there is an easy way of doing it by using column numbers or something. Let's say I want to select columns from 6 till 15.
I have tried with things like values=[df.columns[6:15]], I have also tried using df.iloc, but as I said, I'm pretty new so I'm probably using things wrong or making silly things!
Is there also a way to get them in the order they have? Because in my answer they seem to have been ordered alphabetically and I want to keep the order of the columns. So it should be apple, orange, banana...
Order Completed In progress
apple 92 221
banana 102 144
mango 70 55
I'm just looking for a way of improving my code and I hope I have not made much mess. Thank you!
I think you can use:
#if need select only few columns - df.columns[1:3]
df = df.pivot_table(columns=['Order'], values=df.columns[1:3], aggfunc='count')
print (df)
Order completed iProgress
apple 4 2
orange 4 2
#if need use all column, parameter values can be omit
df = df.pivot_table(columns=['Order'], aggfunc='count')
print (df)
Order completed iProgress
apple 4 2
banana 4 2
orange 4 2
pear 4 2
pineapple 4 2
What is the difference between size and count in pandas?
df = df.pivot_table(columns=['Order'], aggfunc=len)
print (df)
Order completed iProgress
apple 4 2
banana 4 2
orange 4 2
pear 4 2
pineapple 4 2
#solution with groupby and transpose
df = df.groupby('Order').count().T
print (df)
Order completed iProgress
apple 4 2
orange 4 2
banana 4 2
pear 4 2
pineapple 4 2
Your example doesn't show an example of an item not in the cart. I'm assuming it comes up as None or 0. If this is correct, then I fill na values and count how many are greater than 0
df.set_index('Order').fillna(0).gt(0).groupby(level='Order').sum().T

Categories

Resources