How to add an id column to identify read_html() tables?

How to add an id column to identify read_html() tables? - python

Consider the following sites (site1, site2, site3) which have a number of different tables.
I am using read_html to scrap the tables into a single table as follows:
import multiprocessing
links = ['site1.com','site2.com','site3.com']
def process_url(url):
return pd.concat(pd.read_html(url), ignore_index=False)
pool = multiprocessing.Pool(processes=2)
df = pd.concat(pool.map(process_url, links), ignore_index=True)
With the above procedure I am getting a single table. Although is what I expected, it would be helpful to add a flag or a "table counter", just to not lose the reference of the table (e.g. which row belongs or corresponds to which table). So, how to add the number of the table to a row?.
Something like this, the same single table, but with a table_num column:
Bank Name City ST CERT Acquiring Institution Closing Date Updated Date table_num
1 Allied Bank Mulberry AR 91.0 Today's Bank September 23, 2016 October 17, 2016 1
2 The Woodbury Banking Company Woodbury GA 11297.0 United Bank August 19, 2016 October 17, 2016 1
3 First CornerStone Bank King of Prussia PA 35312.0 First-Citizens Bank & Trust Company May 6, 2016 September 6, 2016 1
4 Trust Company Bank Memphis TN 9956.0 The Bank of Fayette County April 29, 2016 September 6, 2016 2
5 North Milwaukee State Bank Milwaukee WI 20364.0 First-Citizens Bank & Trust Company March 11, 2016 June 16, 2016 2
6 Hometown National Bank Longview WA 35156.0 Twin City Bank October 2, 2015 April 13, 2016 3
7 The Bank of Georgia Peachtree City GA 35259.0 Fidelity Bank October 2, 2015 October 24, 2016 3
8 Premier Bank Denver CO 34112.0 United Fidelity Bank, fsb July 10, 2015 August 17, 2016 3
9 Edgebrook Bank Chicago IL 57772.0 Republic Bank of Chicago May 8, 2015 July 12, 2016 3
10 Doral Bank NaN NaN NaN NaN NaN NaN 4
11 En Espanol San Juan PR 32102.0 Banco Popular de Puerto Rico February 27, 2015 May 13, 2015 4
12 Capitol City Bank & Trust Company Atlanta GA 33938.0 First-Citizens Bank & Trust Company February 13, 2015 April 21, 2015 4
13 Valley Bank Fort Lauderdale FL 21793.0 Landmark Bank, National Association June 20, 2014 June 29, 2015 5
14 Valley Bank Moline IL 10450.0 Great Southern Bank June 20, 2014 June 26, 2015 5
15 Slavie Federal Savings Bank Bel Air MD 32368.0 Bay Bank, FSB May 3, 2014 June 15, 2015 5
16 Columbia Savings Bank Cincinnati OH 32284.0 United Fidelity Bank, fsb May 23, 2014 November 10, 2016 6
17 AztecAmerica Bank NaN NaN NaN NaN NaN NaN 6
18 En Espanol Berwyn IL 57866.0 Republic Bank of Chicago May 16, 2014 October 20, 2016 6
For instance, if there are two tables in site1, the function must assign 0 to all the rows of table1, and with regards to table2 in site1 the function must assign 1 to all the rows of table2.
On the other hand, if site2 has two tables, the function must assign 3 to all the rows of table1 and 4 to table2 for all the tables that live in site2.
Also, is it possible to use assign() or other method to get the reference of each row (e.g. the table of provenance)?

try to change your process_url() function as follows:
def process_url(url):
return pd.concat([x.assign(table_num=i)
for i,x in enumerate(pd.read_html(url))],
ignore_index=False)

Related

Ordering Pairs of Data by date - Pandas

I am somewhat new to coding in Pandas and I have what I think to be a simple problem that I can't find an answer to. I have a list of students, the college they went to and what year they entered college.
Name
College
Year
Mary
Princeton
2017
Joe
Harvard
2018
Bill
Princeton
2016
Louise
Princeton
2020
Michael
Harvard
2019
Penny
Yale
2018
Harry
Yale
2015
I need the data to be ordered by year but grouped by college. However, if I order by year then I get the years in order but the colleges not together and if I order by college then I get the colleges together in alphabetical order but not with the years in order. Similarly if I order by year then college I won't get the colleges together and if I order by college then year I can't guarantee that the most recent year is first. What I want the table to look like is:
Name
College
Year
Louise
Princeton
2020
Mary
Princeton
2017
Bill
Princeton
2016
Michael
Harvard
2019
Joe
Harvard
2018
Penny
Yale
2018
Harry
Yale
2015
So we see Princeton is first because it has the most recent year, but all the Princeton colleges are all together. Than Harvard is next because 2019>2018 which is the most recent year for Yale so it has the two Harvard schools. Followed by Yale since 2020>2019>2018. I appreciate all your ideas and help! Thank you!

Add a temporary extra column with the max year per group and sort on multiple columns:
out = (df
.assign(max_year=df.groupby('College')['Year'].transform('max'))
.sort_values(by=['max_year', 'College', 'Year'], ascending=[False, True, False])
.drop(columns='max_year')
)
output:
Name College Year
3 Louise Princeton 2020
0 Mary Princeton 2017
2 Bill Princeton 2016
4 Michael Harvard 2019
1 Joe Harvard 2018
5 Penny Yale 2018
6 Harry Yale 2015
with temporary column:
Name College Year max_year
3 Louise Princeton 2020 2020
0 Mary Princeton 2017 2020
2 Bill Princeton 2016 2020
4 Michael Harvard 2019 2019
1 Joe Harvard 2018 2019
5 Penny Yale 2018 2018
6 Harry Yale 2015 2018

You first want to sort by "College" then "Year", then keep "College" values together by using .groupby
import pandas as pd
data = [
["Mary", "Princeton", 2017],
["Joe", "Harvard", 2018],
["Bill", "Princeton", 2016],
["Louise", "Princeton", 2020],
["Michael", "Harvard", 2019],
["Penny", "Yale", 2018],
["Harry", "Yale", 2015],
]
df = pd.DataFrame(data, columns=["Name", "College", "Year"])
df.sort_values(["College", "Year"], ascending=False).groupby("College").head()
You'd get this output:
Name College Year
Penny Yale 2018
Harry Yale 2015
Louise Princeton 2020
Mary Princeton 2017
Bill Princeton 2016
Michael Harvard 2019
Joe Harvard 2018

You will have to first find the maximum among each group and set that as a column.
You can then sort by values based on max and year.
df=pd.read_table('./table.txt')
df["max"]=df.groupby("College")["Year"].transform("max")
df.sort_values(by=["max","Year"],ascending=False).drop(columns="max").reset_index(drop=True)
Output:
Out[60]:
Name College Year
0 Louise Princeton 2020
1 Mary Princeton 2017
2 Bill Princeton 2016
3 Michael Harvard 2019
4 Joe Harvard 2018
5 Penny Yale 2018
6 Harry Yale 2015

Combining Pandas DataFrames With Multiple Reference Columns

I'm trying to combine two pandas DataFrames to update the first one based on criteria from the second. Here is a sample of the two dataframes:
df1
year
2016 CALIFORNIA CLINTON, HILLARY
2016 CALIFORNIA TRUMP, DONALD J.
2016 CALIFORNIA JOHNSON, GARY
2016 CALIFORNIA STEIN, JILL
2016 CALIFORNIA WRITE-IN
2016 CALIFORNIA LA RIVA, GLORIA ESTELLA
2016 TEXAS TRUMP, DONALD J.
2016 TEXAS CLINTON, HILLARY
2016 TEXAS JOHNSON, GARY
2016 TEXAS STEIN, JILL
...
state candidate
year
1988 CALIFORNIA BUSH, GEORGE H.W.
1988 CALIFORNIA DUKAKIS, MICHAEL
1988 CALIFORNIA PAUL, RONALD ""RON""
1988 CALIFORNIA FULANI, LENORA
1988 TEXAS BUSH, GEORGE H.W.
1988 TEXAS DUKAKIS, MICHAEL
1988 TEXAS PAUL, RONALD ""RON""
1988 TEXAS FULANI, LENORA
df2
year
1988 CALIFORNIA 47
1988 TEXAS 29
...
2016 CALIFORNIA 55
2016 TEXAS 38
There are values for every election year from 2020 to 1972 that includes all candidates and all states in a similar format. There are other columns in df1 but they aren't relevant to what I'm trying to do.
My expected result is:
year
2016 CALIFORNIA CLINTON, HILLARY 55
2016 CALIFORNIA TRUMP, DONALD J. 55
2016 CALIFORNIA JOHNSON, GARY 55
2016 CALIFORNIA STEIN, JILL 55
2016 CALIFORNIA WRITE-IN 55
2016 CALIFORNIA LA RIVA, GLORIA ESTELLA 55
2016 TEXAS TRUMP, DONALD J. 38
2016 TEXAS CLINTON, HILLARY 38
2016 TEXAS JOHNSON, GARY 38
2016 TEXAS STEIN, JILL 38
...
state candidate
year
1988 CALIFORNIA BUSH, GEORGE H.W. 47
1988 CALIFORNIA DUKAKIS, MICHAEL 47
1988 CALIFORNIA PAUL, RONALD ""RON"" 47
1988 CALIFORNIA FULANI, LENORA 47
1988 TEXAS BUSH, GEORGE H.W. 29
1988 TEXAS DUKAKIS, MICHAEL 29
1988 TEXAS PAUL, RONALD ""RON"" 29
1988 TEXAS FULANI, LENORA 29
I want to match up the electoral_votes column in df2 with the year and state columns in df1 so it puts the correct value. I got some assistance and was able to match it up when there is only one column being matched (you can see the question and answer here) but I am having trouble matching it up with the two points of reference (year and state). If I use the code linked as is it returns the error:
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
I have tried apply, map, applymap, merge, etc and haven't been able to figure it out. Thanks in advance for the help!

I believe what you are looking for is left_merge. You should specify the common columns within on=[....], that the merge should be based on.
# Imports
import pandas as pd
# Specify two columns in the "on".
pd.merge(df1,
df2,
how='left',
on=['year','state'])
Out[1821]:
year state candidate votes
0 2016 CALIFORNIA CLINTON, HILLARY 55
1 2016 CALIFORNIA TRUMP, DONALD J. 55
2 2016 CALIFORNIA JOHNSON, GARY 55
3 2016 CALIFORNIA STEIN, JILL 55
4 2016 CALIFORNIA WRITE-IN 55
5 2016 CALIFORNIA LA RIVA, GLORIA ESTELLA 55
6 2016 TEXAS TRUMP, DONALD J. 38
7 2016 TEXAS CLINTON, HILLARY 38
8 2016 TEXAS JOHNSON, GARY 38
9 2016 TEXAS STEIN, JILL 38
10 1988 CALIFORNIA BUSH, GEORGE H.W. 47
11 1988 CALIFORNIA DUKAKIS, MICHAEL 47
12 1988 CALIFORNIA PAUL, RONALD ""RON"" 47
13 1988 CALIFORNIA FULANI, LENORA 47
14 1988 TEXAS BUSH, GEORGE H.W. 29
15 1988 TEXAS DUKAKIS, MICHAEL 29
16 1988 TEXAS PAUL, RONALD ""RON"" 29
17 1988 TEXAS FULANI, LENORA 29
The above code could be written as:
pd.merge(df1,
df2,
how='left',
left_on=['year','state'],
right_on=['year','state'])
but since the columns are the same in the 2 dfs, we can use on = ['year', 'state']

An alternate way to write -
merged_df = df1.merge(df2, on=['year', 'state'], how='left')
If you want to use only 3 columns from df1 -
df1 = pd.read_csv('<name_of_the_CSV_file>', usecols=['year', 'state', 'candidate'])

Extract Date, Append by Number of Games

I am currently web scraping the college football schedule by week.
import requests
from bs4 import BeautifulSoup
URL = 'https://www.cbssports.com/college-football/schedule/FBS/2020/regular/5/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
teams = [t.text for t in soup.find_all('span', class_='TeamName')]
away = teams[::2]
home = teams[1::2]
time = [c.text.replace("\n", "").replace(' ','').replace(' ',' ') for c in soup.find_all('div', class_='CellGame')]
import pandas as pd
schedule = pd.DataFrame(
{
'away': away,
'home': home,
'time': time,
})
schedule
I would like a date column. I am having difficulty extracting the date and duplicate the date corresponding to number of games for that date and append to a python list.
date = []
for d in soup.find_all('div', class_='TableBaseWrapper'):
for a in d.find_all('h4'):
date.append(a.text.replace('\n \n ','').replace('\n \n ',''))
print(date)
['Friday, October 2, 2020', 'Saturday, October 3, 2020']
Dates are like headers for each table. I would like each date corresponding to the correct game. And also include "postponed' for the postponed games.
My plan is to automate this code for each week.
Thanks ahead.
*Post Answer
Beautiful and well done. How would I pull venues especially with postponed, using your code?
My original code was:
venue = [v.text.replace('\n','').replace(' ','').replace(' ','').strip('—').strip() for v in soup.find_all('td', text=lambda x: x and "Field" or x and 'Stadium' in x) if v != '' ]
venues = [x for x in venue if x]
missing = len(away) - len(venues)
words = ['Postponed' for x in range(missing) if len(away)> len(venues)]
venues = venues + words

You can use .find_previous() to find date for current tow:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.cbssports.com/college-football/schedule/FBS/2020/regular/5/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
for row in soup.select('.TableBase-bodyTr'):
home = row.select_one('.TeamLogoNameLockup')
away = home.find_next(class_='TeamLogoNameLockup')
time = row.select_one('.CellGame')
date = row.find_previous('h4')
all_data.append({
'home': home.get_text(strip=True),
'away': away.get_text(strip=True),
'time': time.get_text(strip=True, separator=' '),
'date': date.get_text(strip=True),
})
df = pd.DataFrame(all_data)
print(df)
df.to_csv('data.csv', index=False)
Prints:
home away time date
0 Campbell Wake Forest WAKE 66 - CAMP 14 Friday, October 2, 2020
1 Louisiana Tech BYU BYU 45 - LATECH 14 Friday, October 2, 2020
2 East Carolina Georgia St. GAST 35, ECU 10 - 2nd ESPU Saturday, October 3, 2020
3 Arkansas St. Coastal Carolina CSTCAR 17, ARKST 14 - 2nd ESP2 Saturday, October 3, 2020
4 Missouri Tennessee TENN 21, MIZZOU 6 - 2nd SECN Saturday, October 3, 2020
5 Baylor West Virginia BAYLOR 7, WVU 7 - 2nd ABC Saturday, October 3, 2020
6 TCU Texas TCU 14, TEXAS 14 - 2nd FOX Saturday, October 3, 2020
7 NC State Pittsburgh NCST 17, PITT 10 - 2nd ACCN Saturday, October 3, 2020
8 South Carolina Florida FLA 17, SC 14 - 2nd ESPN Saturday, October 3, 2020
9 UT-San Antonio UAB UAB 7, TXSA 3 - 2nd Saturday, October 3, 2020
10 North Alabama Liberty NAL 0, LIB 0 - 1st ESP3 Saturday, October 3, 2020
11 Abil Christian Army 1:30 pm CBSSN Saturday, October 3, 2020
12 Texas A&M Alabama 3:30 pm Saturday, October 3, 2020
13 Texas Tech Kansas St. 3:30 pm FS1 Saturday, October 3, 2020
14 North Carolina Boston College 3:30 pm ABC Saturday, October 3, 2020
15 South Florida Cincinnati 3:30 pm ESP+ Saturday, October 3, 2020
16 Oklahoma St. Kansas 3:30 pm ESPN Saturday, October 3, 2020
17 Memphis SMU 3:30 pm ESP2 Saturday, October 3, 2020
18 Charlotte FAU 4:00 pm ESPU Saturday, October 3, 2020
19 Jacksonville St. Florida St. 4:00 pm Saturday, October 3, 2020
20 Virginia Tech Duke 4:00 pm ACCN Saturday, October 3, 2020
21 Ole Miss Kentucky 4:00 pm SECN Saturday, October 3, 2020
22 W. Kentucky Middle Tenn. 5:00 pm ESP3 Saturday, October 3, 2020
23 Navy Air Force 6:00 pm CBSSN Saturday, October 3, 2020
24 Ga. Southern UL-Monroe 7:00 pm ESP+ Saturday, October 3, 2020
25 Auburn Georgia 7:30 pm ESPN Saturday, October 3, 2020
26 Arkansas Miss. State 7:30 pm SECN Saturday, October 3, 2020
27 LSU Vanderbilt 7:30 pm SECN Saturday, October 3, 2020
28 Oklahoma Iowa St. 7:30 pm ABC Saturday, October 3, 2020
29 So. Miss North Texas 7:30 pm Saturday, October 3, 2020
30 Tulsa UCF 7:30 pm ESP2 Saturday, October 3, 2020
31 Virginia Clemson 8:00 pm ACCN Saturday, October 3, 2020
32 Rice Marshall Postponed Saturday, October 3, 2020
33 Troy South Alabama Postponed Saturday, October 3, 2020
And saves data.csv (screenshot from LibreOffice):
EDIT: To pare "Venue" column, you can use this example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.cbssports.com/college-football/schedule/FBS/2020/regular/5/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
for row in soup.select('.TableBase-bodyTr'):
home = row.select_one('.TeamLogoNameLockup')
away = home.find_next(class_='TeamLogoNameLockup')
time = row.select_one('.CellGame')
venue = '-' if len(row.select('td')) == 3 else row.select('td')[3].get_text(strip=True)
date = row.find_previous('h4')
all_data.append({
'home': home.get_text(strip=True),
'away': away.get_text(strip=True),
'time': time.get_text(strip=True, separator=' '),
'venue': venue,
'date': date.get_text(strip=True),
})
df = pd.DataFrame(all_data)
print(df)
df.to_csv('data.csv', index=False)
Prints:
home away time venue date
0 Campbell Wake Forest WAKE 66 - CAMP 14 - Friday, October 2, 2020
1 Louisiana Tech BYU BYU 45 - LATECH 14 - Friday, October 2, 2020
2 East Carolina Georgia St. GAST 35, ECU 13 - 3rd ESPU Center Parc Stadium Saturday, October 3, 2020
3 Arkansas St. Coastal Carolina CSTCAR 31, ARKST 14 - 3rd ESP2 Brooks Stadium Saturday, October 3, 2020
4 Missouri Tennessee TENN 28, MIZZOU 6 - 3rd SECN Neyland Stadium Saturday, October 3, 2020
5 Baylor West Virginia BAYLOR 7, WVU 7 - 3rd ABC Mountaineer Field at Milan Puskar Stadium Saturday, October 3, 2020
6 TCU Texas TCU 20, TEXAS 14 - 2nd FOX DKR-Texas Memorial Stadium Saturday, October 3, 2020
7 NC State Pittsburgh NCST 17, PITT 13 - 3rd ACCN Heinz Field Saturday, October 3, 2020
8 South Carolina Florida FLA 31, SC 14 - 3rd ESPN Florida Field at Ben Hill Griffin Stadium Saturday, October 3, 2020
9 UT-San Antonio UAB UAB 14, TXSA 6 - 2nd Legion Field Saturday, October 3, 2020
10 North Alabama Liberty LIB 7, NAL 0 - 2nd ESP3 Williams Stadium Saturday, October 3, 2020
11 Abil Christian Army ARMY 7, ABIL 0 - 1st CBSSN Blaik Field at Michie Stadium Saturday, October 3, 2020
12 Texas A&M Alabama 3:30 pm Bryant-Denny Stadium Saturday, October 3, 2020
13 Texas Tech Kansas St. 3:30 pm FS1 Bill Snyder Family Stadium Saturday, October 3, 2020
14 North Carolina Boston College 3:30 pm ABC Alumni Stadium Saturday, October 3, 2020
15 South Florida Cincinnati 3:30 pm ESP+ Nippert Stadium Saturday, October 3, 2020
16 Oklahoma St. Kansas 3:30 pm ESPN David Booth Kansas Memorial Stadium Saturday, October 3, 2020
17 Memphis SMU 3:30 pm ESP2 Gerald J. Ford Stadium Saturday, October 3, 2020
18 Charlotte FAU 4:00 pm ESPU FAU Stadium Saturday, October 3, 2020
19 Jacksonville St. Florida St. 4:00 pm Bobby Bowden Field at Doak Campbell Stadium Saturday, October 3, 2020
20 Virginia Tech Duke 4:00 pm ACCN Brooks Field at Wallace Wade Stadium Saturday, October 3, 2020
21 Ole Miss Kentucky 4:00 pm SECN Kroger Field Saturday, October 3, 2020
22 W. Kentucky Middle Tenn. 5:00 pm ESP3 Johnny (Red) Floyd Stadium Saturday, October 3, 2020
23 Navy Air Force 6:00 pm CBSSN Falcon Stadium Saturday, October 3, 2020
24 Ga. Southern UL-Monroe 7:00 pm ESP+ JPS Field at James L. Malone Stadium Saturday, October 3, 2020
25 Auburn Georgia 7:30 pm ESPN Sanford Stadium Saturday, October 3, 2020
26 Arkansas Miss. State 7:30 pm SECN Davis Wade Stadium at Scott Field Saturday, October 3, 2020
27 LSU Vanderbilt 7:30 pm SECN Vanderbilt Stadium Saturday, October 3, 2020
28 Oklahoma Iowa St. 7:30 pm ABC Jack Trice Stadium Saturday, October 3, 2020
29 So. Miss North Texas 7:30 pm Apogee Stadium Saturday, October 3, 2020
30 Tulsa UCF 7:30 pm ESP2 Spectrum Stadium Saturday, October 3, 2020
31 Virginia Clemson 8:00 pm ACCN Memorial Stadium Saturday, October 3, 2020
32 Rice Marshall Postponed - Saturday, October 3, 2020
33 Troy South Alabama Postponed - Saturday, October 3, 2020

In pandas, how can I identify records that share a common value and replace the value of one of them to match the other?

I have a pandas dataframe with three columns:
a b c
Donaldson Minnesota 2020
Ozuna Atlanta 2020
Betts Boston 2019
Donaldson Atlanta 2019
Ozuna St. Louis 2019
Torres New York 2019
I want to identify all column names that have more than one column c value, and then replace all column b instances with the first value in the dataframe like this:
a b c
Donaldson Minnesota 2020
Ozuna Atlanta 2020
Betts Boston 2019
Donaldson Minnesota 2019
Ozuna Atlanta 2019
Torres New York 2019
This is definitely inefficient, but here's what I tried so far:
# get a df of just names and cities and deduplicate
df_names = df[['a','b']].drop_duplicates()
# find any multiple column b values and put them in a list
a_matches = pd.Dataframe(df_names.groupby('a')['b'].nunique())
multi_b = a_matches.index[a_matches['b'] > 1].tolist()
This gives me ['Donaldson','Ozuna'], but now I am stuck. I can't come up with a good way to generate a replacement dictionary for their corresponding values in c. I think there must be a more elegant way to get to this.

IIUC, you can try with groupby+transform with np.where:
g = df.groupby('a')
c = g['c'].transform('nunique').gt(1) # column a names that have >1 column c value
df['b'] = np.where(c,g['b'].transform('first'),df['b'])
# for a new df: new = df.assign(b=np.where(c,g['b'].transform('first'),df['b']))
print(df)
a b c
0 Donaldson Minnesota 2020
1 Ozuna Atlanta 2020
2 Betts Boston 2019
3 Donaldson Minnesota 2019
4 Ozuna Atlanta 2019
5 Torres New York 2019
For the given example as #ALloz correctly pointed , you can just use:
df['b'] = df.groupby('a')['b'].transform('first')
print(df)
a b c
0 Donaldson Minnesota 2020
1 Ozuna Atlanta 2020
2 Betts Boston 2019
3 Donaldson Minnesota 2019
4 Ozuna Atlanta 2019
5 Torres New York 2019

python groupby: how to move hierarchical grouped data from columns into rows?

i’ve got a python/pandas groupby that is grouped on name and looks like this:
name gender year city city total
jane female 2011 Detroit 1
2015 Chicago 1
dan male 2009 Lexington 1
bill male 2001 New York 1
2003 Buffalo 1
2000 San Francisco 1
and I want it to look like this:
name gender year1 city1 year2 city2 year3 city3 city total
jane female 2011 Detroit 2015 Chicago 2
dan male 2009 Lexington 1
bill male 2000 Chico 2001 NewYork 2003 Buffalo 3
so i want to keep the grouping by name and then order by year and make each name have only one column. it's a variation on a dummy variables maybe? i'm not even sure how to summarize it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.