select random pairs from remaining unique values in a list - python

Updated: Not sure I explained it well first time.
I have a scheduling problem, or more accurately, a "first come first served" problem. A list of available assets are assigned a set of spaces, available in pairs (think cars:parking spots, diners:tables, teams:games). I need a rough simulation (random) that chooses the first two to arrive from available pairs, then chooses the next two from remaining available pairs, and so on, until all spaces are filled.
Started using teams:games to cut my teeth. The first pair is easy enough. How do I then whittle it down to fill the next two spots from among the remaining available entities? Tried a bunch of different things, but coming up short. Help appreciated.
import itertools
import numpy as np
import pandas as pd
a = ['Georgia','Oregon','Florida','Texas'], ['Georgia','Oregon','Florida','Texas']
b = [(x,y) for x,y in itertools.product(*a) if x != y]
c = pd.DataFrame(b)
c.columns = ['home', 'away']
print(c)
d = c.sample(n = 2, replace = False)
print(d)
The first results is all possible combinations. But, once the first slots are filled, there can be no repeats. in example below, once Oregon and Georgia are slated in, the only remaining options to choose from are Forlida:Texas or Texas:Florida. Obviously just the sample function alone produces duplicates frequently. I will need this to scale up to dozens, then hundreds of entities:slots. Many thanks in advance!
home away
0 Georgia Oregon
1 Georgia Florida
2 Georgia Texas
3 Oregon Georgia
4 Oregon Florida
5 Oregon Texas
6 Florida Georgia
7 Florida Oregon
8 Florida Texas
9 Texas Georgia
10 Texas Oregon
11 Texas Florida
home away
3 Oregon Georgia
5 Oregon Texas

Not exactly sure what you are trying to do. But if you want to randomly pair your unique entities you can simply randomly order them and then place them in a 2-columns dataframe. I wrote this with all the US states minus one (Wyomi):
states = ['Alaska','Alabama','Arkansas','Arizona','California',
'Colorado','Connecticut','District of Columbia','Delaware',
'Florida','Georgia','Hawaii','Iowa','Idaho','Illinois',
'Indiana','Kansas','Kentucky','Louisiana','Massachusetts',
'Maryland','Maine','Michigan','Minnesota','Missouri',
'Mississippi','Montana','North Carolina','North Dakota',
'Nebraska','New Hampshire','New Jersey','New Mexico',
'Nevada','New York','Ohio','Oklahoma','Oregon',
'Pennsylvania','Rhode Island','South Carolina',
'South Dakota','Tennessee','Texas','Utah','Virginia',
'Vermont','Washington','Wisconsin','West Virginia']
a=states.copy()
random.shuffle(states)
c = pd.DataFrame({'home':a[::2],'away':a[1::2]})
print(c)
#Output
home away
0 West Virginia Minnesota
1 New Hampshire Louisiana
2 Nevada Florida
3 Alabama Indiana
4 Delaware North Dakota
5 Georgia Rhode Island
6 Oregon Pennsylvania
7 New York South Dakota
8 Maryland Kansas
9 Ohio Hawaii
10 Colorado Wisconsin
11 Iowa Idaho
12 Illinois Missouri
13 Arizona Mississippi
14 Connecticut Montana
15 District of Columbia Vermont
16 Tennessee Kentucky
17 Alaska Washington
18 California Michigan
19 Arkansas New Jersey
20 Massachusetts Utah
21 Oklahoma New Mexico
22 Virginia South Carolina
23 North Carolina Maine
24 Texas Nebraska
Not sure if this is exactly what you were asking for though.
If you need to schedule all the fixtures of the season, you can check this answer --> League fixture generator in python

Related

How to take N groups from pandas dataframe when grouped by multiple columns

I am having this kind of code (code sample is recreation of production code) -
import pandas as pd
df_nba = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/nba.csv')
df_nba['custom'] = 'abc'
df_gpby_team_clg = df_nba.groupby(['custom', 'College', 'Team']).agg({'Salary': sum})
print(df_gpby_team_clg)
Output looks something like this -
Now I want to have first N College stats. So if I give n=2 I will have a df with Alabama and Arizona and their respective Team and Salary stats.
You can use .reset_index() to restore the dataframe after groupby() with multi-index row index back to normal range index for easier subsequent operations.
Then extract the first n colleges into a list by calling .unique() on the College column.
Finally, filter the expanded dataframe with .loc by checking for College is in the first n colleges just extracted by using .isin within .loc:
n = 2
df_gpby_team_clg_expand = df_gpby_team_clg.reset_index()
first_N_college = df_gpby_team_clg_expand['College'].unique()[:n]
df_gpby_team_clg_expand.loc[df_gpby_team_clg_expand['College'].isin(first_N_college)]
Result:
custom College Team Salary
0 abc Alabama Cleveland Cavaliers 2100000.0
1 abc Alabama Memphis Grizzlies 845059.0
2 abc Alabama New Orleans Pelicans 1320000.0
3 abc Arizona Brooklyn Nets 1335480.0
4 abc Arizona Cleveland Cavaliers 9140305.0
5 abc Arizona Detroit Pistons 2841960.0
6 abc Arizona Golden State Warriors 11710456.0
7 abc Arizona Houston Rockets 947276.0
8 abc Arizona Indiana Pacers 5358880.0
9 abc Arizona Milwaukee Bucks 3000000.0
10 abc Arizona New York Knicks 4000000.0
11 abc Arizona Orlando Magic 4171680.0
12 abc Arizona Philadelphia 76ers 525093.0
13 abc Arizona Phoenix Suns 206192.0
Use get_level_values() to get the first n colleges:
n = 2
colleges = df_gpby_team_clg.index.get_level_values('College').unique()[:n]
# Index(['Alabama', 'Arizona'], dtype='object', name='College')
Then extract those colleges with IndexSlice:
index = pd.IndexSlice[:, colleges]
df_gpby_team_clg.loc[index, :]
# Salary
# custom College Team
# abc Alabama Cleveland Cavaliers 2100000.0
# Memphis Grizzlies 845059.0
# New Orleans Pelicans 1320000.0
# Arizona Brooklyn Nets 1335480.0
# Cleveland Cavaliers 9140305.0
# Detroit Pistons 2841960.0
# Golden State Warriors 11710456.0
# Houston Rockets 947276.0
# Indiana Pacers 5358880.0
# Milwaukee Bucks 3000000.0
# New York Knicks 4000000.0
# Orlando Magic 4171680.0
# Philadelphia 76ers 525093.0
# Phoenix Suns 206192.0

Cannot turn string into an integer python3

I'm attempting to convert the following into integers. I have literally tried everything and keep getting errors.
For instance:
pop2007 = pop2007.astype('int32')
ValueError: invalid literal for int() with base 10: '4,779,736'
Below is the DF I'm trying to convert. I've even attempted the .values method with no success.
pop2007
Alabama 4,779,736
Alaska 710,231
Arizona 6,392,017
Arkansas 2,915,918
California 37,253,956
Colorado 5,029,196
Connecticut 3,574,097
Delaware 897,934
Florida 18,801,310
Georgia 9,687,653
Idaho 1,567,582
Illinois 12,830,632
Indiana 6,483,802
Iowa 3,046,355
Kansas 2,853,118
Kentucky 4,339,367
Louisiana 4,533,372
Maine 1,328,361
Maryland 5,773,552
Massachusetts 6,547,629
Michigan 9,883,640
Minnesota 5,303,925
Mississippi 2,967,297
Missouri 5,988,927
Montana 989,415
Nebraska 1,826,341
Nevada 2,700,551
New Hampshire 1,316,470
New Jersey 8,791,894
New Mexico 2059179
New York 19378102
North Carolina 9535483
North Dakota 672591
Ohio 11536504
Oklahoma 3751351
Oregon 3831074
Pennsylvania 12702379
Rhode Island 1052567
South Carolina 4625364
South Dakota 814180
Tennessee 6346105
Texas 25,145,561
Utah 2,763,885
Vermont 625,741
Virginia 8,001,024
Washington 6,724,540
West Virginia 1,852,994
Wisconsin 5,686,986
Wyoming 563,626
Name: 3, dtype: object
You can't turn a string with commas into an integer. Try this.
my_int = '1,000,000'
my_int = int(my_int.replace(',', ''))
print(my_int)
Have you tried pop2007.replace(',','') to remove the commas from your string values before converting to integers?

Loop and store coordinates

I have a copy of a dataframe that looks like this:
heatmap_df = test['coords'].copy()
heatmap_df
0 [(Manhattanville, Manhattan, Manhattan Communi...
1 [(Mainz, Rheinland-Pfalz, 55116, Deutschland, ...
2 [(Ithaca, Ithaca Town, Tompkins County, New Yo...
3 [(Starr Hill, Charlottesville, Virginia, 22903...
4 [(Neuchâtel, District de Neuchâtel, Neuchâtel,...
5 [(Newark, Licking County, Ohio, 43055, United ...
6 [(Mae, Cass County, Minnesota, United States o...
7 [(Columbus, Franklin County, Ohio, 43210, Unit...
8 [(Canaanville, Athens County, Ohio, 45701, Uni...
9 [(Arizona, United States of America, (34.39534...
10 [(Enschede, Overijssel, Nederland, (52.2233632...
11 [(Gent, Oost-Vlaanderen, Vlaanderen, België - ...
12 [(Reno, Washoe County, Nevada, 89557, United S...
13 [(Grenoble, Isère, Auvergne-Rhône-Alpes, Franc...
14 [(Columbus, Franklin County, Ohio, 43210, Unit...
Each row has this format with some coordinates:
heatmap_df[2]
[Location(Ithaca, Ithaca Town, Tompkins County, New York, 14853, United States of America, (42.44770298533052, -76.48085858627931, 0.0)),
Location(Chapel Hill, Orange County, North Carolina, 27515, United States of America, (35.916920469999994, -79.05664845999999, 0.0))]
I want to pull the latitude and longitudes from each row and store them as separate columns in the dataframe heatmap_df. I have this so far, but I suck at writing loops. My loop is not working recursively, it only prints out the last coordinates.
x = np.arange(start=0, stop=3, step=1)
for i in x:
point_i = (heatmap_df[i][0].latitude, heatmap_df[i][0].longitude)
i = i+1
point_i
(42.44770298533052, -76.48085858627931)
I am trying to make a heat map with all the coordinates using Folium. Can someone help please? Thank you
Python doesn't know what you are trying to do it's assuming you want to store the tuple value of (heatmap_df[i][0].latitude, heatmap_df[i][0].longitude) in the variable point_i for every iteration. So what happens is it is overwritten every time. You want to declare a list outside then loop the append a lists of the Lat and Long to it creating a List of List which can easily be a DF. Also, your loop in the example isn't recursive, Check this out for recursion
Try this:
x = np.arange(start=0, stop=3, step=1)
points = []
for i in x:
points.append([heatmap_df[i][0].latitude, heatmap_df[i][0].longitude])
i = i+1
print(points)

Derive a new pandas column based on a certain value of a row and apply until the next value appears again

In a pandas dataframe string column, I want to derive a new column based on the value of a row until the next value appears again. What is the most efficient way to do this / clean way to do achieve this?
Input Dataframe:
import pandas as pd
df = pd.DataFrame({'neighborhood':['Chicago City', 'Wicker Park', 'Bucktown','Lincoln Park','West Loop','River North','Milwaukee City','Bay View','East Side','South Side','Bronzeville','North Side','New York City','Harlem','Midtown','Chinatown']})
My desired dataframe output would be:
neighborhood city
0 Chicago City Chicago
1 Wicker Park Chicago
2 Bucktown Chicago
3 Lincoln Park Chicago
4 West Loop Chicago
5 River North Chicago
6 Milwaukee City Milwaukee
7 Bay View Milwaukee
8 East Side Milwaukee
9 South Side Milwaukee
10 Bronzeville Milwaukee
11 North Side Milwaukee
12 New York City New York
13 Harlem New York
14 Midtown New York
15 Chinatown New York
1) If the first column contains 'City', copy it to the second column but cut out the ' City' part
2) Fill NA's with a forward fill method
import numpy as np
df['city'] = np.where(
df.neighborhood.str.contains('City'),
df.neighborhood.str.replace(' City', '', case = False),
None)
Result:
neighborhood city
0 Chicago City Chicago
1 Wicker Park None
2 Bucktown None
3 Lincoln Park None
4 West Loop None
5 River North None
6 Milwaukee City Milwaukee
7 Bay View None
8 East Side None
9 South Side None
10 Bronzeville None
11 North Side None
12 New York City New York
13 Harlem None
14 Midtown None
15 Chinatown None
df['city'] = df['city'].fillna(method = 'ffill')
Result:
neighborhood city
0 Chicago City Chicago
1 Wicker Park Chicago
2 Bucktown Chicago
3 Lincoln Park Chicago
4 West Loop Chicago
5 River North Chicago
6 Milwaukee City Milwaukee
7 Bay View Milwaukee
8 East Side Milwaukee
9 South Side Milwaukee
10 Bronzeville Milwaukee
11 North Side Milwaukee
12 New York City New York
13 Harlem New York
14 Midtown New York
15 Chinatown New York
Use .str.extract + ffill
df['city'] = df.neighborhood.str.extract('(.*)\sCity').ffill()
you can just map a custom defined function that behaves as intended
city = None
def generate(s):
global city
if 'City' in s: city = s.replace('City','')
return city
df['neighborhood'].map(generate)
this will return the intended output

Robot FrameWork Collections - List comparison Issue

I am trying to compare two identical lists in Robot Framework . The code I am using is :
List Test
Lists Should Be Equal #{List_Of_States_USA} #{List_Of_States_USA-Temp}
and the lists are identical with the following values :
#{List_Of_States_USA} Alabama Alaska American Samoa Arizona Arkansas California Colorado
... Connecticut Delaware District of Columbia Florida Georgia Guam Hawaii
... Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana
... Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri
... Montana National Nebraska Nevada New Hampshire New Jersey New Mexico
... New York North Carolina North Dakota Northern Mariana Islands Ohio Oklahoma Oregon
... Pennsylvania Puerto Rico Rhode Island South Carolina South Dakota Tennessee Texas
... Utah Vermont Virgin Islands Virginia Washington West Virginia Wisconsin
... Wyoming
This test fails with the following error:
FAIL Keyword 'Collections.Lists Should Be Equal' expected 2 to 5 arguments, got 114.
I have searched SO and other sites for a solution, but could not figure out why this happened. Thanks in advance for support
You need to use a $ not #. When you use #, robot expands the lists into multiple arguments.
From the robot framework user's guide:
When a variable is used as a scalar like ${EXAMPLE}, its value will be used as-is. If a variable value is a list or list-like, it is also possible to use as a list variable like #{EXAMPLE}. In this case individual list items are passed in as arguments separately.
Consider the case of #{foo} being a list with the values "one", "two" and "three". In such as case the following two are identical:
some keyword #{foo}
some keyword one two three
You need to change your statement to this:
Lists Should Be Equal ${List_Of_States_USA} ${List_Of_States_USA-Temp}
So, As suggested by Bryan-Oakley above, I modified the test as follows:
${L1} Create List #{List_Of_States_USA}
${L2} Create List #{List_Of_States_USA-Temp}
Lists Should Be Equal ${L1} ${L2}
Now the test passed. Thanks Again # Brian

Categories

Resources