Creating pairwise relationship between columns of a dataframe [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a dataframe as follows :
Neighborhood, City, State, Country
Westside, Boston, MA,USA
South District, New York,NY,USA
Business Town,,OR,USA
Shopping District,,Wellington,New Zealand
Big Mountain,,,Australia
Now I want to go over pairs of NON Empty Columns C0,C1 C1,C2 C2,C3 and create a dataframe that looks like below. However if C1 is empty or null then pair C0 with C2 and so on
Root Child
OR Business Town
USA OR
New Zealand Wellington
Wellington. Shopping District
Boston Westside
MA Boston
USA MA
New York South District
NY New York
USA NY
Australia Big Mountain

Here is one way using shift after stack
s=df.stack().iloc[::-1]
yourdf=pd.DataFrame({'Root':s.groupby(level=0).shift().values,'Child':s.values}).dropna()
yourdf
Out[62]:
Root Child
1 Australia Big Mountain
3 New Zealand Wellington
4 Wellington Shopping District
6 USA OR
7 OR Business Town
9 USA NY
10 NY New York
11 New York South District
13 USA MA
14 MA Boston
15 Boston Westside

comprehension
and other things
pd.DataFrame([
t for _, g in df.stack().groupby(level=0)
for t in zip(g.iloc[1:], g)
], columns=['Root', 'Child'])
Root Child
0 Boston Westside
1 MA Boston
2 USA MA
3 New York South District
4 NY New York
5 USA NY
6 OR Business Town
7 USA OR
8 Wellington Shopping District
9 New Zealand Wellington
10 Australia Big Mountain

Related

select random pairs from remaining unique values in a list

Updated: Not sure I explained it well first time.
I have a scheduling problem, or more accurately, a "first come first served" problem. A list of available assets are assigned a set of spaces, available in pairs (think cars:parking spots, diners:tables, teams:games). I need a rough simulation (random) that chooses the first two to arrive from available pairs, then chooses the next two from remaining available pairs, and so on, until all spaces are filled.
Started using teams:games to cut my teeth. The first pair is easy enough. How do I then whittle it down to fill the next two spots from among the remaining available entities? Tried a bunch of different things, but coming up short. Help appreciated.
import itertools
import numpy as np
import pandas as pd
a = ['Georgia','Oregon','Florida','Texas'], ['Georgia','Oregon','Florida','Texas']
b = [(x,y) for x,y in itertools.product(*a) if x != y]
c = pd.DataFrame(b)
c.columns = ['home', 'away']
print(c)
d = c.sample(n = 2, replace = False)
print(d)
The first results is all possible combinations. But, once the first slots are filled, there can be no repeats. in example below, once Oregon and Georgia are slated in, the only remaining options to choose from are Forlida:Texas or Texas:Florida. Obviously just the sample function alone produces duplicates frequently. I will need this to scale up to dozens, then hundreds of entities:slots. Many thanks in advance!
home away
0 Georgia Oregon
1 Georgia Florida
2 Georgia Texas
3 Oregon Georgia
4 Oregon Florida
5 Oregon Texas
6 Florida Georgia
7 Florida Oregon
8 Florida Texas
9 Texas Georgia
10 Texas Oregon
11 Texas Florida
home away
3 Oregon Georgia
5 Oregon Texas
Not exactly sure what you are trying to do. But if you want to randomly pair your unique entities you can simply randomly order them and then place them in a 2-columns dataframe. I wrote this with all the US states minus one (Wyomi):
states = ['Alaska','Alabama','Arkansas','Arizona','California',
'Colorado','Connecticut','District of Columbia','Delaware',
'Florida','Georgia','Hawaii','Iowa','Idaho','Illinois',
'Indiana','Kansas','Kentucky','Louisiana','Massachusetts',
'Maryland','Maine','Michigan','Minnesota','Missouri',
'Mississippi','Montana','North Carolina','North Dakota',
'Nebraska','New Hampshire','New Jersey','New Mexico',
'Nevada','New York','Ohio','Oklahoma','Oregon',
'Pennsylvania','Rhode Island','South Carolina',
'South Dakota','Tennessee','Texas','Utah','Virginia',
'Vermont','Washington','Wisconsin','West Virginia']
a=states.copy()
random.shuffle(states)
c = pd.DataFrame({'home':a[::2],'away':a[1::2]})
print(c)
#Output
home away
0 West Virginia Minnesota
1 New Hampshire Louisiana
2 Nevada Florida
3 Alabama Indiana
4 Delaware North Dakota
5 Georgia Rhode Island
6 Oregon Pennsylvania
7 New York South Dakota
8 Maryland Kansas
9 Ohio Hawaii
10 Colorado Wisconsin
11 Iowa Idaho
12 Illinois Missouri
13 Arizona Mississippi
14 Connecticut Montana
15 District of Columbia Vermont
16 Tennessee Kentucky
17 Alaska Washington
18 California Michigan
19 Arkansas New Jersey
20 Massachusetts Utah
21 Oklahoma New Mexico
22 Virginia South Carolina
23 North Carolina Maine
24 Texas Nebraska
Not sure if this is exactly what you were asking for though.
If you need to schedule all the fixtures of the season, you can check this answer --> League fixture generator in python

Cross-checking dataframes in Python

I am working on a Pandas issue.
Currently in df1:
start
Stop
NYPenn
WUnion
GCTerm
30thSt
TUStat
LAUnio
JaStat
MillSt
ChiUnS
MonCen
OGTran
SouthS
Currently in df2 (Prime):
Train_Code
City
NYPenn
New York City
WUnion
D.C.
GCTerm
New York City
30thSt
Philadelphia
TUStat
Toronto
LAUnio
Los Angeles
MonCen
Montreal
OGTran
Chicago
SouthS
Boston
I want to use the train codes to determine which start/stop in df1 contain prime stations. I would need to run each element in both columns in df1 against df2 (Train_Code) to output the results indicating which station was a prime (or if both stations are prime) into another dataframe (df3).
df3 should end up being:
start
Stop
Results
City
Results
City
NYPenn
WUnion
Yes
New York City
Yes
D.C.
GCTerm
30thSt
TUStat
LAUnio
JaStat
MillSt
NO
NaN
NO
NaN
ChiUnS
MonCen
NO
NaN
Yes
Montreal
OGTran
SouthS
**Note: I didn't fill in df3 all the way but I gave examples of how it should be filled.
[If I added another column indicating there was a layover station, the code should work run against the layover column as well.]
This will get you close:
df1s = df1.stack().rename('Train_Code').to_frame()
df1s.loc[:,'City'] = df1s['Train_Code'].map(df2.set_index('Train_Code')['City'])
df1s['Results'] = np.where(df1s['City'].notna(), 'Yes', 'NO')
df1s.unstack()
Output:
Train_Code City Results
start Stop start Stop start Stop
0 NYPenn WUnion New York City D.C. Yes Yes
1 GCTerm 30thSt New York City Philadelphia Yes Yes
2 TUStat LAUnio Toronto Los Angeles Yes Yes
3 JaStat MillSt NaN NaN NO NO
4 ChiUnS MonCen NaN Montreal NO Yes
5 OGTran SouthS Chicago Boston Yes Yes

How to add a dictionary as the last element to a list of dictionaries?

I would like to add a dictionary to a list, which contains several other dictionaries.
I have a list of ten top travel cities:
City Country Population Area
0 Buenos Aires Argentina 2891000 4758
1 Toronto Canada 2800000 2731571
2 Pyeongchang South Korea 2581000 3194
3 Marakesh Morocco 928850 200
4 Albuquerque New Mexico 559277 491
5 Los Cabos Mexico 287651 3750
6 Greenville USA 84554 68
7 Archipelago Sea Finland 60000 8300
8 Walla Walla Valley USA 32237 33
9 Salina Island Italy 4000 27
10 Solta Croatia 1700 59
11 Iguazu Falls Argentina 0 672
I imported the excel with pandas:
import pandas as pd
travel_df = pd.read_excel('./cities.xlsx')
print(travel_df)
cities = travel_df.to_dict('records')
print(cities)
variables = list(cities[0].keys())
I would like to add a 12th element to the end of the list but don't know how to do so:
beijing = {"City" : "Beijing", "Country" : "China", "Population" : "24000000", "Ares" : "6490" }
print(beijing)
Try appending the new row to the DataFrame you read.
travel_df.append(beijing, ignore_index=True)

Derive a new pandas column based on a certain value of a row and apply until the next value appears again

In a pandas dataframe string column, I want to derive a new column based on the value of a row until the next value appears again. What is the most efficient way to do this / clean way to do achieve this?
Input Dataframe:
import pandas as pd
df = pd.DataFrame({'neighborhood':['Chicago City', 'Wicker Park', 'Bucktown','Lincoln Park','West Loop','River North','Milwaukee City','Bay View','East Side','South Side','Bronzeville','North Side','New York City','Harlem','Midtown','Chinatown']})
My desired dataframe output would be:
neighborhood city
0 Chicago City Chicago
1 Wicker Park Chicago
2 Bucktown Chicago
3 Lincoln Park Chicago
4 West Loop Chicago
5 River North Chicago
6 Milwaukee City Milwaukee
7 Bay View Milwaukee
8 East Side Milwaukee
9 South Side Milwaukee
10 Bronzeville Milwaukee
11 North Side Milwaukee
12 New York City New York
13 Harlem New York
14 Midtown New York
15 Chinatown New York
1) If the first column contains 'City', copy it to the second column but cut out the ' City' part
2) Fill NA's with a forward fill method
import numpy as np
df['city'] = np.where(
df.neighborhood.str.contains('City'),
df.neighborhood.str.replace(' City', '', case = False),
None)
Result:
neighborhood city
0 Chicago City Chicago
1 Wicker Park None
2 Bucktown None
3 Lincoln Park None
4 West Loop None
5 River North None
6 Milwaukee City Milwaukee
7 Bay View None
8 East Side None
9 South Side None
10 Bronzeville None
11 North Side None
12 New York City New York
13 Harlem None
14 Midtown None
15 Chinatown None
df['city'] = df['city'].fillna(method = 'ffill')
Result:
neighborhood city
0 Chicago City Chicago
1 Wicker Park Chicago
2 Bucktown Chicago
3 Lincoln Park Chicago
4 West Loop Chicago
5 River North Chicago
6 Milwaukee City Milwaukee
7 Bay View Milwaukee
8 East Side Milwaukee
9 South Side Milwaukee
10 Bronzeville Milwaukee
11 North Side Milwaukee
12 New York City New York
13 Harlem New York
14 Midtown New York
15 Chinatown New York
Use .str.extract + ffill
df['city'] = df.neighborhood.str.extract('(.*)\sCity').ffill()
you can just map a custom defined function that behaves as intended
city = None
def generate(s):
global city
if 'City' in s: city = s.replace('City','')
return city
df['neighborhood'].map(generate)
this will return the intended output

Appending or Concatenating DataFrame via for loop to existing DataFrame

Posted in the output you will see that this code take the Location column(or series), and places it in a data frame. After which, the first,second, and third part of the nested for loop then takes the first index of each column and then creates a dataframe to add to the first dataframe. What I have been trying to do is for loop through, going up one index each for loop, and then adding a new dataframe of repetitve data. However, when I try to print it, the dataframe will only print the first dataframe, and the last repetitive dataframe that it looped through. However I'm trying to make a huge dataframe that attaches a repetitive index data frame from 0-17. I have updated this to show the repetitiveness that I am looking for, but in a truncated way. I hope this helps. Thanks!
Here is the input
for j in range(0,18,1):
for i in range(0,18,1):
df['Rep Loc'] = str(df['Location'][j:j+1])
df['Rep Lat'] = float(df['Latitude'][j:j+1])
df['Rep Long'] = float(df['Longitude'][j:j+1])
break
print(df)
Here is the output
Location Latitude
Longitude \
0 Letsholathebe II Rd, Maun, North-West District... -19.989491
23.397709
1 North-West District, Botswana -19.389353
23.267951
2 Silobela, Kwekwe, Midlands Province, Zimbabwe -18.993930
29.147992
3 Mosi-Oa-Tunya, Livingstone, Southern Province,... -17.910147
25.861904
4 Parkway Drive, Victoria Falls, Matabeleland No... -17.909231
25.827019
5 A33, Kasane, North-West District, Botswana -17.795057
25.197270
6 T1, Southern Province, Zambia -17.040664
26.608454
7 Sikoongo Road, Siavonga, Southern Province, Za... -16.536204
28.708753
8 New Kasama, Lusaka Province, Zambia -15.471934
28.398588
9 Simon Mwansa Kapwepwe Avenue, Avondale, Lusaka... -15.386244
28.397111
10 Lusaka, Lusaka Province, 1010, Zambia -15.416697
28.281381
11 Chigwirizano Road, Rhodes Park, Lusaka, Lusaka... -15.401848
28.302248
12 T2, Kabwe, Central Province, Zambia -14.420744
28.462169
13 Kabushi Road, Ndola, Copperbelt Province, Zambia -12.997968
28.608536
14 Dr Aggrey Avenue, Mishenshi, Kitwe, Copperbelt... -12.797684
28.199061
15 President Avenue, Kalulushi, Copperbelt Provin... -12.833375
28.108370
16 Eglise Methodiste Unie, Avenue Mantola, Mawawa... -11.699407
27.500234
17 Avenue Babemba, Kolwezi, Lwalaba, Katanga, Lua... -10.698109
25.503816
Rep Loc Rep Lat
Rep
Long
0 0 Letsholathebe II Rd, Maun, North-West Dis... -19.989491
23.397709
1 0 Letsholathebe II Rd, Maun, North-West Dis... -19.989491
23.397709
2 0 Letsholathebe II Rd, Maun, North-West Dis... -19.989491
23.397709
Rep Loc Rep Lat
Rep Long
0 1 North-West District, Botswana\nName: Loca... -19.389353
23.267951
1 1 North-West District, Botswana\nName: Loca... -19.389353
23.267951
2 1 North-West District, Botswana\nName: Loca... -19.389353
23.267951
Rep Loc Rep Lat
Rep Long
0 2 Silobela, Kwekwe, Midlands Province, Zimb... -18.99393
29.147992
1 2 Silobela, Kwekwe, Midlands Province, Zimb... -18.99393
29.147992
Rep Loc Rep Lat
Rep Long
0 3 Mosi-Oa-Tunya, Livingstone, Southern Prov... -17.910147
25.861904
1 3 Mosi-Oa-Tunya, Livingstone, Southern Prov... -17.910147
25.861904
2 3 Mosi-Oa-Tunya, Livingstone, Southern Prov... -17.910147
25.861904
Rep Loc Rep Lat Rep
Long
0 4 Parkway Drive, Victoria Falls, Matabelela... -17.909231
25.827019
1 4 Parkway Drive, Victoria Falls, Matabelela... -17.909231
25.827019
2 4 Parkway Drive, Victoria Falls, Matabelela... -17.909231
25.827019
Rep Loc Rep Lat Rep
Long
0 5 A33, Kasane, North-West District, Botswan... -17.795057
25.19727
1 5 A33, Kasane, North-West District, Botswan... -17.795057
25.19727
2 5 A33, Kasane, North-West District, Botswan... -17.795057
25.19727
Good practice when asking questions is to provide an example of what you want your output to look like. However, this is my best guess at what you want.
pd.concat({i: d.shift(i) for i in range(18)}, axis=1)

Categories

Resources