python: import data from text - python

I tried importing float numbers from P-I curve.txt file which contains my data. however i get an error when converting this into float. i used the following code.
with open('C:/Users/Kevin/Documents/4e Jaar/fotonica/Metingen/P-I curve.txt') as csvfile:
data= csv.reader(csvfile, delimiter = '\t')
current=[]
P_15=[]
P_20=[]
P_25=[]
P_30=[]
P_35=[]
P_40=[]
P_45=[]
P_50=[]
for row in data:
current.append(float(row[0].replace(',','.')))
P_15.append(float(row[2].replace(',','.')))
P_20.append(float(row[4].replace(',','.')))
P_25.append(float(row[6].replace(',','.')))
P_30.append(float(row[8].replace(',','.')))
P_35.append(float(row[10].replace(',','.')))
P_40.append(float(row[12].replace(',','.')))
P_45.append(float(row[14].replace(',','.')))
P_50.append(float(row[16].replace(',','.')))
with this code i got the following error which i understand that row 2 is a string but if so then why did this error not occur for row 1. Is there any other data to import float numbers without using csv import? I have copied and pasted the data from excel to a .txt file.
returned error:
File "C:/Users/Kevin/Documents/Python Scripts/P-I curves.py", line 29, in <module>
P_15.append(float(row[2].replace(',','.')))
ValueError: could not convert string to float:
I tried another following code:
import pandas as pd
df=pd.read_csv('C:/Users/Kevin/Documents/4e Jaar/fotonica/Metingen/P-I curve.txt', decimal=',', sep='\t',header=0,names=['current','15','20','25','30','35','40','45','50'] )
#curre=df['current']
print(current)
The txt file has a header and looks like this:
1.8 1.9 0.4 1.9 0.4 1.9 0.4 1.9 0.4
3.8 1.9 1.3 1.9 1.3 1.9 1.3 1.9 1.2
5.8 2.0 2.5 2.0 2.4 2.0 2.3 2.0 2.2
7.8 2.0 3.7 2.0 3.6 2.0 3.5 2.0 3.4
9.8 2.1 5.2 2.0 5.1 2.0 4.9 2.0 4.7
11.8 2.1 6.9 2.1 6.7 2.1 6.4 2.1 6.1
13.8 2.1 9.0 2.0 8.6 2.1 8.2 2.1 7.8
15.8 2.1 11.5 2.1 10.8 2.1 10.2 2.1 9.7
17.8 2.2 14.7 2.2 13.7 2.2 12.7 2.2 11.8
19.8 2.2 19.5 2.2 17.5 2.2 15.9 2.2 14.5
21.8 2.2 28.9 2.2 23.6 2.2 20.3 2.2 17.9
23.8 2.3 125.8 2.2 38.4 2.2 27.8 2.2 22.8
25.8 2.3 1669.0 2.3 634.0 2.3 51.7 2.3 31.4
27.8 2.3 3142.0 2.3 2154.0 2.3 982.0 2.3 62.2
29.8 2.3 4560.0 2.3 3594.0 2.3 2460.0 2.3 1075.0
31.8 2.3 5950.0 2.3 5010.0 2.3 3872.0 2.3 2540.0
33.8 2.4 7320.0 2.4 6360.0 2.4 5230.0 2.3 3880.0
35.8 2.4 8670.0 2.4 7700.0 2.4 6550.0 2.4 5210.0
37.8 NaN NaN NaN NaN 2.4 7850.0 2.4 6480.0
39.8 NaN NaN NaN NaN NaN NaN NaN NaN
41.8 NaN NaN NaN NaN NaN NaN NaN NaN
Name: current, dtype: float64
python seems to be returning everything instead of just line 1 which i want by printing the header current. I only want to take this line so i can save it as in an array. But How do i specifically draw the line with header current out of the data?.
I am not sure why it returned everything but i think that there is something wrong with encoding because i copied and pasted the data from excel.
Please look at the image of how the .txt looks like when copied from excel.
i tried out another short code (i also deleted the header manually for the .txt file!!), see description below:
data=np.loadtxt('C:/Users/Kevin/Documents/4e Jaar/fotonica/Metingen/ttest.txt',delimiter='\t')
data=float(data.replace(',','.'))
print(data[0])
with this code, i get the followin error.
ValueError: could not convert string to float: b'1,8'
I find this weird to occur. Is floating and replacing not enough for this

I think you need omit header=0:
df=pd.read_csv('C:/Users/Kevin/Documents/4e Jaar/fotonica/Metingen/P-I curve.txt',
decimal=',',
sep='\t',
names=['current','15','20','25','30','35','40','45','50'])
EDIT:
df=pd.read_csv('ttest.txt',
decimal=',',
sep='\t',
names=['current','15','20','25','30','35','40','45','50'])
print (df)
current 15 20 25 30 35 40 45 50
0 1.8 0.4 0.4 0.4 0.4 0.4 0.4 0.3 0.3
1 3.8 1.3 1.3 1.3 1.2 1.2 1.1 1.1 1.1
2 5.8 2.5 2.4 2.3 2.2 2.2 2.1 2.0 1.9
3 7.8 3.7 3.6 3.5 3.4 3.3 3.1 3.0 2.9
4 9.8 5.2 5.1 4.9 4.7 4.5 4.3 4.1 4.0
5 11.8 6.9 6.7 6.4 6.1 5.9 5.6 5.3 5.1
6 13.8 9.0 8.6 8.2 7.8 7.4 7.0 6.6 6.3
7 15.8 11.5 10.8 10.2 9.7 9.1 8.6 8.0 7.6
8 17.8 14.7 13.7 12.7 11.8 11.0 10.3 9.6 9.0
9 19.8 19.5 17.5 15.9 14.5 13.3 12.2 11.3 10.5
10 21.8 28.9 23.6 20.3 17.9 16.0 14.5 13.2 12.2
11 23.8 125.8 38.4 27.8 22.8 19.6 17.2 15.4 14.1
12 25.8 1669.0 634.0 51.7 31.4 24.5 20.6 17.9 16.2
13 27.8 3142.0 2154.0 982.0 62.2 33.1 25.3 21.0 18.5
14 29.8 4560.0 3594.0 2460.0 1075.0 60.0 32.6 25.0 21.3
15 31.8 5950.0 5010.0 3872.0 2540.0 903.0 49.9 30.8 24.6
16 33.8 7320.0 6360.0 5230.0 3880.0 2294.0 387.0 40.9 28.8
17 35.8 8670.0 7700.0 6550.0 5210.0 3621.0 1733.0 71.0 34.8
18 37.8 NaN NaN 7850.0 6480.0 4880.0 3026.0 751.0 44.6
19 39.8 NaN NaN NaN NaN 6100.0 4240.0 1998.0 70.2
20 41.8 NaN NaN NaN NaN NaN NaN 3161.0 650.0
#list from column 15 with all values include NaNs
L1 = df['15'].tolist()
print (L1)
[0.4, 1.3, 2.5, 3.7, 5.2, 6.9, 9.0, 11.5, 14.7, 19.5, 28.9, 125.8, 1669.0,
3142.0, 4560.0, 5950.0, 7320.0, 8670.0, nan, nan, nan]
#list from column 15 with removing NaNs
L2 = df['15'].dropna().tolist()
print (L2)
[0.4, 1.3, 2.5, 3.7, 5.2, 6.9, 9.0, 11.5, 14.7, 19.5, 28.9, 125.8, 1669.0,
3142.0, 4560.0, 5950.0, 7320.0, 8670.0]
#convert all NaNs in all columns to 0
df = df.fillna(0)
print (df)
current 15 20 25 30 35 40 45 50
0 1.8 0.4 0.4 0.4 0.4 0.4 0.4 0.3 0.3
1 3.8 1.3 1.3 1.3 1.2 1.2 1.1 1.1 1.1
2 5.8 2.5 2.4 2.3 2.2 2.2 2.1 2.0 1.9
3 7.8 3.7 3.6 3.5 3.4 3.3 3.1 3.0 2.9
4 9.8 5.2 5.1 4.9 4.7 4.5 4.3 4.1 4.0
5 11.8 6.9 6.7 6.4 6.1 5.9 5.6 5.3 5.1
6 13.8 9.0 8.6 8.2 7.8 7.4 7.0 6.6 6.3
7 15.8 11.5 10.8 10.2 9.7 9.1 8.6 8.0 7.6
8 17.8 14.7 13.7 12.7 11.8 11.0 10.3 9.6 9.0
9 19.8 19.5 17.5 15.9 14.5 13.3 12.2 11.3 10.5
10 21.8 28.9 23.6 20.3 17.9 16.0 14.5 13.2 12.2
11 23.8 125.8 38.4 27.8 22.8 19.6 17.2 15.4 14.1
12 25.8 1669.0 634.0 51.7 31.4 24.5 20.6 17.9 16.2
13 27.8 3142.0 2154.0 982.0 62.2 33.1 25.3 21.0 18.5
14 29.8 4560.0 3594.0 2460.0 1075.0 60.0 32.6 25.0 21.3
15 31.8 5950.0 5010.0 3872.0 2540.0 903.0 49.9 30.8 24.6
16 33.8 7320.0 6360.0 5230.0 3880.0 2294.0 387.0 40.9 28.8
17 35.8 8670.0 7700.0 6550.0 5210.0 3621.0 1733.0 71.0 34.8
18 37.8 0.0 0.0 7850.0 6480.0 4880.0 3026.0 751.0 44.6
19 39.8 0.0 0.0 0.0 0.0 6100.0 4240.0 1998.0 70.2
20 41.8 0.0 0.0 0.0 0.0 0.0 0.0 3161.0 650.0
#list from column 15
L3 = df['15'].tolist()
print (L3)
[0.4, 1.3, 2.5, 3.7, 5.2, 6.9, 9.0, 11.5, 14.7, 19.5, 28.9, 125.8, 1669.0,
3142.0, 4560.0, 5950.0, 7320.0, 8670.0, 0.0, 0.0, 0.0]

if importing data from .txt file as csv, the missing data should be added. So in this by manually adding 0 to the .txt file and retrying this code
with open('C:/Users/Kevin/Documents/4e Jaar/fotonica/Metingen/P-I curve.txt') as csvfile:
data= csv.reader(csvfile, delimiter = '\t')
current=[]
P_15=[]
P_20=[]
P_25=[]
P_30=[]
P_35=[]
P_40=[]
P_45=[]
P_50=[]
for row in data:
current.append(float(row[0].replace(',','.')))
P_15.append(float(row[2].replace(',','.')))
print(P_15)
it works for any row to print out.

Related

Beautiful Soup not finding specific table by ID

I am trying to parse a basketball reference player page to extract one of the tables from the page and work with the data from it. For some reason, though, beautiful soup cannot find the table in the page. I have tried to search for other tables in the page and it has successfully found them but for some reason will not find this specific one.
I have the following line which takes a link to the page of the specific player I am searching for and gets the BeautifulSoup version of it:
page_soup = BeautifulSoup(bball_ref_page.content, 'lxml')
I then search for the table with the following line:
table = page_soup.find('table', attrs={'id': 'per_poss'})
Whenever I try to print(table) it just comes out as None.
I have also tried searching for the contents by doing:
table = page_soup.find(attrs={'id': 'per_poss'})
same result of None
I have also tried searching for all tables in the page_soup and it returns a list of a bunch of tables not including the one I am looking for
I have tried changing the parse in the page_soup assignment to html.parser and the result remains the same. I have also tried printing the contents of page_soup and can find the table in their:
<div class="table_container current" id="div_per_poss">
<table class="stats_table sortable row_summable" id="per_poss" data-cols-to-freeze="1,3"> <caption>Per 100 Poss Table</caption> <colgroup><col>....
Any ideas what might be causing this to happen?
The page is storing the <table> data inside the HTML comment <!-- --> so normally BeautifulSoup doesn't see it. To load it as pandas dataframe you can use next example:
import requests
import pandas as pd
from bs4 import BeautifulSoup, Comment
url = "https://www.basketball-reference.com/players/j/jordami01.html"
soup = BeautifulSoup(requests.get(url).content, "lxml")
soup = BeautifulSoup("\n".join(soup.find_all(text=Comment)), "lxml")
df = pd.read_html(str(soup.select_one("table#per_poss")))[0]
print(df.to_markdown())
Prints:
Season
Age
Tm
Lg
Pos
G
GS
MP
FG
FGA
FG%
3P
3PA
3P%
2P
2PA
2P%
FT
FTA
FT%
ORB
DRB
TRB
AST
STL
BLK
TOV
PF
PTS
Unnamed: 29
ORtg
DRtg
0
1984-85
21
CHI
NBA
SG
82
82
3144
12.9
25
0.515
0.1
0.8
0.173
12.7
24.2
0.526
9.7
11.5
0.845
2.6
5.6
8.2
7.4
3
1.1
4.5
4.4
35.5
nan
118
107
1
1985-86
22
CHI
NBA
SG
18
7
451
16
35
0.457
0.3
1.9
0.167
15.7
33.1
0.474
11.2
13.3
0.84
2.5
4.4
6.8
5.7
3.9
2.2
4.8
4.9
43.5
nan
109
107
2
1986-87
23
CHI
NBA
SG
82
82
3281
16.8
34.8
0.482
0.2
1
0.182
16.6
33.8
0.491
12.7
14.8
0.857
2.5
4
6.6
5.8
3.6
1.9
4.2
3.6
46.4
nan
117
104
3
1987-88
24
CHI
NBA
SG
82
82
3311
16.2
30.3
0.535
0.1
0.8
0.132
16.1
29.5
0.546
11
13.1
0.841
2.1
4.7
6.8
7.4
3.9
2
3.8
4.1
43.6
nan
123
101
4
1988-89
25
CHI
NBA
SG
81
81
3255
14.7
27.3
0.538
0.4
1.5
0.276
14.3
25.8
0.553
10.2
12.1
0.85
2.3
7.6
9.9
9.9
3.6
1
4.4
3.8
40
nan
123
103
5
1989-90
26
CHI
NBA
SG
82
82
3197
16
30.5
0.526
1.4
3.8
0.376
14.6
26.7
0.548
9.2
10.8
0.848
2.2
6.6
8.8
8.1
3.5
0.8
3.8
3.7
42.7
nan
123
106
6
1990-91
27
CHI
NBA
SG
82
82
3034
16.4
30.4
0.539
0.5
1.5
0.312
15.9
28.9
0.551
9.4
11.1
0.851
2
6.2
8.1
7.5
3.7
1.4
3.3
3.8
42.7
nan
125
102
7
1991-92
28
CHI
NBA
SG
80
80
3102
15.5
29.8
0.519
0.4
1.6
0.27
15
28.2
0.533
8
9.7
0.832
1.5
6.9
8.4
8
3
1.2
3.3
3.3
39.4
nan
121
102
8
1992-93
29
CHI
NBA
SG
78
78
3067
16.8
33.9
0.495
1.4
3.9
0.352
15.4
30
0.514
8.1
9.6
0.837
2.3
6.5
8.8
7.2
3.7
1
3.5
3.2
43
nan
119
102
9
1994-95
31
CHI
NBA
SG
17
17
668
13
31.5
0.411
1.2
2.5
0.5
11.7
29
0.403
8.5
10.6
0.801
2
7.2
9.1
7
2.3
1
2.7
3.7
35.7
nan
109
103
10
1995-96
32
CHI
NBA
SG
82
82
3090
15.6
31.5
0.495
1.9
4.4
0.427
13.7
27.1
0.506
9.3
11.2
0.834
2.5
6.7
9.3
6
3.1
0.7
3.4
3.3
42.5
nan
124
100
11
1996-97
33
CHI
NBA
SG
82
82
3106
15.8
32.5
0.486
1.9
5.1
0.374
13.9
27.4
0.507
8.2
9.9
0.833
1.9
6.3
8.3
6
2.4
0.8
2.9
2.7
41.8
nan
121
102
12
1997-98
34
CHI
NBA
SG
82
82
3181
14.9
32.1
0.465
0.5
2.1
0.238
14.4
30
0.482
9.6
12.2
0.784
2.2
5.8
8.1
4.8
2.4
0.8
3.1
2.6
40
nan
114
100
13
2001-02
38
WAS
NBA
SF
60
53
2093
14.3
34.4
0.416
0.3
1.4
0.189
14
33
0.426
6.8
8.6
0.79
1.3
7.5
8.8
8
2.2
0.7
4.2
3.1
35.7
nan
99
105
14
2002-03
39
WAS
NBA
SF
82
67
3031
12.2
27.4
0.445
0.3
1
0.291
11.9
26.4
0.45
4.8
5.8
0.821
1.3
7.7
8.9
5.6
2.2
0.7
3.1
3.1
29.5
nan
101
103
15
Career
nan
nan
NBA
nan
1072
1039
41011
15.3
30.7
0.497
0.7
2.2
0.327
14.5
28.5
0.51
9.2
11
0.835
2.1
6.3
8.3
7
3.1
1.1
3.7
3.5
40.4
nan
118
103
16
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
17
13 seasons
nan
CHI
NBA
nan
930
919
35887
15.5
30.8
0.505
0.8
2.4
0.332
14.8
28.4
0.52
9.6
11.5
0.838
2.2
6.1
8.3
7.1
3.3
1.2
3.7
3.5
41.5
nan
120
103
18
2 seasons
nan
WAS
NBA
nan
142
120
5124
13.1
30.3
0.431
0.3
1.1
0.241
12.8
29.1
0.439
5.6
7
0.805
1.3
7.6
8.9
6.6
2.2
0.7
3.6
3.1
32
nan
100
104
To iterate the rows of dataframe, you can use df.iterrows() for example:
for index, row in df.iterrows():
print(row["Season"], row["Age"])
Prints:
1984-85 21.0
1985-86 22.0
1986-87 23.0
1987-88 24.0
1988-89 25.0
...

How to iterate through multiple urls (teams) to combine NBA players names and stats into one dataframe?

I am still learning web scraping and would appreciate any help that I can get. Thanks to help from the community I was able to successfully scrape NBA player data (player name and player stats) and concatenate the data into one dataframe.
Here is the code below:
import pandas as pd
import requests
url = 'https://www.espn.com/nba/team/stats/_/name/lal/season/2020/seasontype/2'
df = pd.read_html(url)
df_concat = pd.concat([df[0], df[1], df[3]], axis=1)
I would now like to iterate through multiple urls to get data for different teams and then combine all of the different teams into one dataframe.
Here is the code that I have so far:
import pandas as pd
teams = ['chi','den','lac']
for team in teams:
print(team)
url = 'https://www.espn.com/nba/team/stats/_/name/{team}/season/2020/seasontype/2'.format(team=team)
print(url)
df = pd.read_html(url)
df_concat = pd.concat([df[0], df[1], df[3]], axis=1)
I tried changing 'lal' in the url to the variable team. When I ran this script the scrape was really, really slow and only gave me a dataframe for the team 'lac', not 'chi' or 'den. Any advice on the best way to do this? I have never tried scraping multiple urls.
Again, I would the data for each team combined into one large dataframe if possible. Thanks in advance for any help that you may offer. I will learn a lot from this project. =)
The principle is the same, use pd.concat with list of dataframes. For example:
import requests
import pandas as pd
teams = ["chi", "den", "lac"]
dfs_to_concat = []
for team in teams:
print(team)
url = "https://www.espn.com/nba/team/stats/_/name/{team}/season/2020/seasontype/2".format(
team=team
)
print(url)
df = pd.read_html(url)
df_concat = pd.concat([df[0], df[1], df[3]], axis=1)
dfs_to_concat.append(df_concat)
df_final = pd.concat(dfs_to_concat)
print(df_final)
df_final.to_csv("data.csv", index=False)
Prints:
chi
https://www.espn.com/nba/team/stats/_/name/chi/season/2020/seasontype/2
den
https://www.espn.com/nba/team/stats/_/name/den/season/2020/seasontype/2
lac
https://www.espn.com/nba/team/stats/_/name/lac/season/2020/seasontype/2
Name GP GS MIN PTS OR DR REB AST STL BLK TO PF AST/TO PER FGM FGA FG% 3PM 3PA 3P% FTM FTA FT% 2PM 2PA 2P% SC-EFF SH-EFF
0 Zach LaVine SG 60 60.0 34.8 25.5 0.7 4.1 4.8 4.2 1.5 0.5 3.4 2.2 1.2 19.52 9.0 20.0 45.0 3.1 8.1 38.0 4.5 5.6 80.2 5.9 11.9 49.7 1.276 0.53
1 Lauri Markkanen PF 50 50.0 29.8 14.7 1.2 5.1 6.3 1.5 0.8 0.5 1.6 1.9 0.9 14.32 5.0 11.8 42.5 2.2 6.3 34.4 2.5 3.1 82.4 2.8 5.5 51.8 1.247 0.52
2 Coby White PG 65 1.0 25.8 13.2 0.4 3.1 3.5 2.7 0.8 0.1 1.7 1.8 1.6 11.92 4.8 12.2 39.4 2.0 5.8 35.4 1.6 2.0 79.1 2.8 6.4 43.0 1.085 0.48
3 Otto Porter Jr. SF 14 9.0 23.6 11.9 0.9 2.5 3.4 1.8 1.1 0.4 0.8 2.2 2.3 15.87 4.4 10.0 44.3 1.7 4.4 38.7 1.4 1.9 70.4 2.7 5.6 48.7 1.193 0.53
4 Wendell Carter Jr. C 43 43.0 29.2 11.3 3.2 6.2 9.4 1.2 0.8 0.8 1.7 3.8 0.7 15.51 4.3 8.0 53.4 0.1 0.7 20.7 2.6 3.5 73.7 4.1 7.3 56.4 1.411 0.54
5 Thaddeus Young PF 64 16.0 24.9 10.3 1.5 3.5 4.9 1.8 1.4 0.4 1.6 2.1 1.1 13.36 4.2 9.4 44.8 1.2 3.5 35.6 0.7 1.1 58.3 3.0 5.9 50.1 1.097 0.51
6 Tomas Satoransky SG 65 64.0 28.9 9.9 1.2 2.7 3.9 5.4 1.2 0.1 2.0 2.1 2.7 13.52 3.6 8.5 43.0 1.0 3.1 32.2 1.6 1.9 87.6 2.7 5.4 49.1 1.169 0.49
7 Chandler Hutchison F 28 10.0 18.8 7.8 0.6 3.2 3.9 0.9 1.0 0.3 1.0 1.7 1.0 12.45 2.9 6.3 45.7 0.4 1.4 31.6 1.6 2.8 59.0 2.4 4.9 49.6 1.246 0.49
8 Kris Dunn PG 51 32.0 24.9 7.3 0.5 3.2 3.6 3.4 2.0 0.3 1.3 3.1 2.5 12.15 3.0 6.7 44.4 0.6 2.2 25.9 0.8 1.1 74.1 2.4 4.5 53.5 1.091 0.49
9 Denzel Valentine SG 36 5.0 13.6 6.8 0.3 1.8 2.1 1.2 0.7 0.2 0.7 1.4 1.7 13.09 2.7 6.6 40.9 1.3 3.8 33.6 0.2 0.2 75.0 1.4 2.8 51.0 1.038 0.51
10 Luke Kornet C 36 14.0 15.5 6.0 0.6 1.7 2.3 0.9 0.3 0.7 0.4 1.5 2.3 12.70 2.3 5.2 43.8 0.9 3.0 28.7 0.6 0.8 71.4 1.4 2.2 64.6 1.150 0.52
11 Daniel Gafford C 43 7.0 14.2 5.1 1.2 1.3 2.5 0.5 0.3 1.3 0.7 2.3 0.7 16.21 2.2 3.1 70.1 0.0 0.0 0.0 0.7 1.4 53.3 2.2 3.1 70.1 1.642 0.70
12 Shaquille Harrison G 43 10.0 11.3 4.9 0.5 1.5 2.0 1.1 0.8 0.4 0.4 1.3 2.6 17.81 1.8 3.8 46.7 0.4 1.0 38.1 0.9 1.2 78.0 1.4 2.9 49.6 1.267 0.52
13 Ryan Arcidiacono PG 58 4.0 16.0 4.5 0.3 1.6 1.9 1.7 0.5 0.1 0.6 1.7 2.6 9.04 1.6 3.8 40.9 0.9 2.4 39.1 0.5 0.7 71.1 0.6 1.4 43.9 1.186 0.53
14 Cristiano Felicio PF 22 0.0 17.5 3.9 2.5 2.1 4.6 0.7 0.5 0.1 0.8 1.5 0.9 12.79 1.5 2.5 63.0 0.0 0.1 0.0 0.8 1.0 78.3 1.5 2.4 65.4 1.593 0.63
15 Adam Mokoka SG 11 0.0 10.2 2.9 0.6 0.3 0.9 0.4 0.4 0.0 0.2 1.5 2.0 8.18 1.1 2.5 42.9 0.5 1.4 40.0 0.2 0.4 50.0 0.5 1.2 46.2 1.143 0.54
16 Max Strus SG 2 0.0 3.0 2.5 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 30.82 1.0 1.5 66.7 0.0 0.5 0.0 0.5 0.5 100.0 1.0 1.0 100.0 1.667 0.67
17 Total 65 NaN NaN 106.8 10.5 31.4 41.9 23.2 10.0 4.1 14.6 21.8 1.6 NaN 39.6 88.6 44.7 12.2 35.1 34.8 15.5 20.5 75.5 27.4 53.5 51.1 1.205 0.52
0 Nikola Jokic C 73 73.0 32.0 19.9 2.3 7.5 9.7 7.0 1.2 0.6 3.1 3.0 2.3 24.97 7.7 14.7 52.8 1.1 3.5 31.4 3.4 4.1 81.7 6.6 11.2 59.4 1.359 0.56
1 Jamal Murray PG 59 59.0 32.3 18.5 0.8 3.2 4.0 4.8 1.1 0.3 2.2 1.7 2.2 17.78 6.9 15.2 45.6 1.9 5.5 34.6 2.8 3.1 88.1 5.0 9.7 51.9 1.220 0.52
2 Will Barton SF 58 58.0 33.0 15.1 1.3 5.0 6.3 3.7 1.1 0.5 1.5 2.1 2.4 15.70 5.7 12.7 45.0 1.9 5.0 37.5 1.8 2.3 76.7 3.9 7.8 49.8 1.184 0.52
3 Jerami Grant SF 71 24.0 26.6 12.0 0.8 2.7 3.5 1.2 0.7 0.8 0.9 2.2 1.4 14.46 4.3 8.9 47.8 1.4 3.5 38.9 2.1 2.8 75.0 2.9 5.4 53.7 1.342 0.56
4 Paul Millsap PF 51 48.0 24.3 11.6 1.9 3.8 5.7 1.6 0.9 0.6 1.4 2.9 1.2 16.96 4.1 8.6 48.2 1.1 2.4 43.5 2.3 2.8 81.6 3.1 6.2 50.0 1.349 0.54
5 Gary Harris SG 56 55.0 31.8 10.4 0.5 2.4 2.9 2.1 1.4 0.3 1.1 2.1 2.0 9.78 3.9 9.3 42.0 1.3 3.8 33.3 1.3 1.6 81.5 2.6 5.5 47.9 1.119 0.49
6 Michael Porter Jr. SF 55 8.0 16.4 9.3 1.2 3.5 4.7 0.8 0.5 0.5 0.9 1.8 0.9 19.84 3.5 7.0 50.9 1.1 2.7 42.2 1.1 1.3 83.3 2.4 4.3 56.4 1.337 0.59
7 Monte Morris PG 73 12.0 22.4 9.0 0.3 1.5 1.9 3.5 0.8 0.2 0.7 1.0 4.8 14.98 3.6 7.8 45.9 0.9 2.4 37.8 1.0 1.2 84.3 2.7 5.4 49.5 1.166 0.52
8 Malik Beasley SG * 41 0.0 18.2 7.9 0.2 1.7 1.9 1.2 0.8 0.1 0.9 1.2 1.3 10.51 2.9 7.3 38.9 1.4 3.9 36.0 0.8 0.9 86.8 1.4 3.4 42.1 1.080 0.49
9 Mason Plumlee C 61 1.0 17.3 7.2 1.6 3.6 5.2 2.5 0.5 0.6 1.3 2.3 1.9 18.86 2.9 4.7 61.5 0.0 0.1 0.0 1.4 2.5 53.5 2.9 4.6 62.5 1.517 0.61
10 PJ Dozier SG 29 0.0 14.2 5.8 0.3 1.6 1.9 2.2 0.5 0.2 0.9 1.6 2.3 11.66 2.2 5.4 41.4 0.6 1.7 34.7 0.7 1.0 72.4 1.7 3.7 44.4 1.070 0.47
11 Bol Bol C 7 0.0 12.4 5.7 0.7 2.0 2.7 0.9 0.3 0.9 1.4 1.6 0.6 14.41 2.0 4.0 50.0 0.6 1.3 44.4 1.1 1.4 80.0 1.4 2.7 52.6 1.429 0.57
12 Torrey Craig SF 58 27.0 18.5 5.4 1.1 2.2 3.3 0.8 0.4 0.6 0.4 2.3 1.9 10.79 2.1 4.6 46.1 0.8 2.4 32.6 0.4 0.6 61.1 1.4 2.3 60.3 1.171 0.54
13 Keita Bates-Diop SF * 7 0.0 14.0 5.3 0.6 1.9 2.4 0.0 0.3 0.6 0.4 1.0 0.0 12.13 1.9 4.0 46.4 0.4 1.3 33.3 1.1 1.4 80.0 1.4 2.7 52.6 1.321 0.52
14 Troy Daniels G * 6 0.0 12.7 4.3 0.0 1.0 1.0 0.5 0.5 0.0 0.5 1.2 1.0 5.35 1.7 4.7 35.7 1.0 3.3 30.0 0.0 0.0 0.0 0.7 1.3 50.0 0.929 0.46
15 Juancho Hernangomez PF * 34 0.0 12.4 3.1 0.7 2.1 2.8 0.6 0.1 0.1 0.5 0.9 1.2 6.89 1.1 3.2 34.5 0.4 1.8 25.0 0.5 0.7 64.0 0.7 1.5 46.0 0.973 0.41
16 Jordan McRae G * 4 0.0 8.0 2.3 0.3 1.0 1.3 1.0 0.5 0.3 0.0 0.5 inf 16.74 0.5 1.5 33.3 0.5 1.0 50.0 0.8 1.0 75.0 0.0 0.5 0.0 1.500 0.50
17 Tyler Cook F * 2 0.0 9.5 2.0 1.0 1.0 2.0 0.0 1.0 0.0 1.0 0.5 0.0 11.31 0.5 1.0 50.0 0.0 0.0 0.0 1.0 1.0 100.0 0.5 1.0 50.0 2.000 0.50
18 Noah Vonleh F * 7 0.0 4.3 1.9 0.4 0.7 1.1 0.3 0.0 0.0 0.3 0.6 1.0 17.61 0.7 0.9 83.3 0.1 0.1 100.0 0.3 0.6 50.0 0.6 0.7 80.0 2.167 0.92
19 Vlatko Cancar SF 14 0.0 3.2 1.2 0.4 0.4 0.7 0.2 0.1 0.1 0.2 0.5 1.0 11.45 0.4 1.1 40.0 0.1 0.4 16.7 0.3 0.3 100.0 0.4 0.6 55.6 1.133 0.43
20 Jarred Vanderbilt PF * 9 0.0 4.6 1.1 0.3 0.6 0.9 0.2 0.3 0.1 0.8 0.7 0.3 7.20 0.6 0.8 71.4 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0.8 71.4 1.429 0.71
21 Total 73 NaN NaN 111.3 10.8 33.4 44.1 26.7 8.0 4.6 13.1 20.3 2.0 NaN 42.0 88.9 47.3 11.0 30.6 35.9 16.2 20.9 77.7 31.1 58.3 53.3 1.252 0.53
0 Kawhi Leonard SF 57 57.0 32.4 27.1 0.9 6.1 7.1 4.9 1.8 0.6 2.6 2.0 1.9 26.91 9.3 19.9 47.0 2.2 5.7 37.8 6.2 7.1 88.6 7.2 14.2 50.6 1.362 0.52
1 Paul George SG 48 48.0 29.6 21.5 0.5 5.2 5.7 3.9 1.4 0.4 2.6 2.4 1.5 21.14 7.1 16.3 43.9 3.3 7.9 41.2 4.0 4.5 87.6 3.9 8.4 46.4 1.321 0.54
2 Montrezl Harrell C 63 2.0 27.8 18.6 2.6 4.5 7.1 1.7 0.6 1.1 1.7 2.3 1.0 23.26 7.5 12.9 58.0 0.0 0.3 0.0 3.7 5.6 65.8 7.5 12.6 59.3 1.445 0.58
3 Lou Williams SG 65 8.0 28.7 18.2 0.5 2.6 3.1 5.6 0.7 0.2 2.8 1.2 2.0 17.38 6.0 14.4 41.8 1.7 4.8 35.2 4.5 5.2 86.1 4.3 9.6 45.1 1.266 0.48
4 Marcus Morris Sr. SF * 19 19.0 28.9 10.1 0.6 3.5 4.1 1.4 0.7 0.7 1.3 2.7 1.1 8.96 3.9 9.2 42.5 1.4 4.4 31.0 0.9 1.2 81.8 2.5 4.7 53.3 1.103 0.50
5 Reggie Jackson PG * 17 6.0 21.3 9.5 0.4 2.6 3.0 3.2 0.3 0.2 1.6 2.2 1.9 12.66 3.4 7.5 45.3 1.5 3.7 41.3 1.1 1.2 90.5 1.9 3.8 49.2 1.258 0.55
6 Landry Shamet SG 53 30.0 27.4 9.3 0.1 1.8 1.9 1.9 0.4 0.2 0.8 2.7 2.4 8.51 3.0 7.4 40.4 2.1 5.6 37.5 1.2 1.4 85.5 0.9 1.8 49.5 1.258 0.55
7 Ivica Zubac C 72 70.0 18.4 8.3 2.7 4.8 7.5 1.1 0.2 0.9 0.8 2.3 1.3 21.75 3.3 5.3 61.3 0.0 0.0 0.0 1.7 2.3 74.7 3.3 5.3 61.6 1.548 0.61
8 Patrick Beverley PG 51 50.0 26.3 7.9 1.1 4.1 5.2 3.6 1.1 0.5 1.3 3.1 2.8 12.54 2.9 6.7 43.1 1.6 4.0 38.8 0.6 0.9 66.0 1.3 2.6 49.6 1.188 0.55
9 JaMychal Green PF 63 1.0 20.7 6.8 1.2 4.9 6.2 0.8 0.5 0.4 0.9 2.8 0.9 11.11 2.4 5.6 42.9 1.5 3.8 38.7 0.6 0.8 75.0 0.9 1.8 51.8 1.222 0.56
10 Maurice Harkless SF * 50 38.0 22.8 5.5 0.9 3.1 4.0 1.0 1.0 0.6 0.9 2.4 1.0 9.70 2.2 4.3 51.6 0.5 1.5 37.0 0.5 0.8 57.1 1.7 2.9 59.0 1.267 0.58
11 Patrick Patterson PF 59 18.0 13.2 4.9 0.6 2.0 2.6 0.7 0.1 0.1 0.4 0.9 2.0 11.57 1.6 3.9 40.8 1.1 2.9 39.0 0.6 0.7 81.4 0.5 1.0 45.9 1.253 0.55
12 Mfiondu Kabengele F 12 0.0 5.3 3.5 0.1 0.8 0.9 0.2 0.2 0.2 0.2 0.8 1.0 18.28 1.2 2.7 43.8 0.8 1.7 45.0 0.4 0.4 100.0 0.4 1.0 41.7 1.313 0.58
13 Rodney McGruder SG 56 4.0 15.6 3.3 0.5 2.2 2.7 0.6 0.5 0.1 0.4 1.3 1.5 6.75 1.3 3.2 39.8 0.4 1.6 27.0 0.3 0.6 55.9 0.9 1.6 52.2 1.033 0.46
14 Amir Coffey SG 18 1.0 8.8 3.2 0.2 0.7 0.9 0.8 0.3 0.1 0.4 1.1 1.8 8.55 1.3 3.0 42.6 0.3 1.1 31.6 0.3 0.6 54.5 0.9 1.9 48.6 1.074 0.48
15 Jerome Robinson SG * 42 1.0 11.3 2.9 0.1 1.3 1.4 1.1 0.3 0.2 0.6 1.3 1.8 4.86 1.1 3.2 33.8 0.5 1.6 28.4 0.3 0.5 57.9 0.6 1.6 39.1 0.897 0.41
16 Joakim Noah C 5 0.0 10.0 2.8 1.0 2.2 3.2 1.4 0.2 0.2 1.2 1.8 1.2 11.11 0.8 1.6 50.0 0.0 0.0 0.0 1.2 1.6 75.0 0.8 1.6 50.0 1.750 0.50
17 Terance Mann SG 41 6.0 8.8 2.4 0.2 1.1 1.3 1.3 0.3 0.1 0.4 1.1 2.9 10.58 0.9 1.9 46.8 0.2 0.5 35.0 0.4 0.7 66.7 0.7 1.4 50.8 1.253 0.51
18 Derrick Walton Jr. G * 23 1.0 9.7 2.2 0.1 0.6 0.7 1.0 0.2 0.0 0.2 0.8 5.5 8.43 0.7 1.6 47.2 0.4 0.9 42.9 0.3 0.4 77.8 0.3 0.7 53.3 1.389 0.60
19 Johnathan Motley F 13 0.0 3.2 2.2 0.2 0.5 0.8 0.6 0.2 0.0 0.4 0.5 1.6 28.53 0.8 1.2 73.3 0.1 0.1 100.0 0.4 0.5 71.4 0.8 1.1 71.4 1.867 0.77
20 Total 72 NaN NaN 116.3 10.7 37.0 47.7 23.7 7.1 4.7 14.0 22.1 1.7 NaN 41.6 89.2 46.6 12.4 33.5 37.1 20.8 26.3 79.1 29.1 55.8 52.2 1.304 0.54
and creates data.csv:

How to transform recurrent time series pandas data frame to pandas multi-index data frame

Is there a pandas function to transform the data frame represented below to multi-index time series data frame ?
ticker date lastupdated ev evebit evebitda marketcap pb pe ps
None
0 XOM 2018-12-31 2018-12-31 323071.3 12.3 7.1 288703.3 1.5 12.4 1.1
1 XOM 2018-12-28 2018-12-28 322986.6 12.3 7.1 288618.6 1.5 12.4 1.1
2 XOM 2018-12-27 2018-12-27 326246.7 12.5 7.1 291878.7 1.5 12.6 1.1
3 XOM 2018-12-26 2018-12-26 324976.5 12.4 7.1 290608.5 1.5 12.5 1.1
4 XOM 2018-12-24 2018-12-24 311724.7 11.9 6.8 277356.7 1.5 11.9 1.0
5 AAPL 2018-12-31 2018-10-21 1137146.7 16.2 14.0 1054517.7 9.2 18.8 4.1
6 AAPL 2018-12-28 2018-10-21 1151491.6 16.4 14.2 1068862.6 9.3 19.0 4.2
7 AAPL 2018-12-27 2018-10-21 1160185.5 16.5 14.3 1077556.5 9.4 19.2 4.2
8 AAPL 2018-12-26 2018-10-21 1178394.3 16.7 14.5 1095765.3 9.5 19.5 4.3
9 AAPL 2018-12-24 2018-10-21 1185590.9 16.8 14.6 1102961.9 9.6 19.7 4.3
to get the following data frame with regrouped date:
lastupdated ev evebit evebitda marketcap pb pe ps
date ticker
2018-12-31 XOM 2018-12-31 323071.3 12.3 7.1 288703.3 1.5 12.4 1.1
AAPL 2018-12-31 322986.6 12.3 7.1 288618.6 1.5 12.4 1.1
2018-12-28 XOM 2018-12-28 326246.7 12.5 7.1 291878.7 1.5 12.6 1.1
AAPL 2018-12-28 324976.5 12.4 7.1 290608.5 1.5 12.5 1.1
2018-12-27 XOM 2018-12-27 311724.7 11.9 6.8 277356.7 1.5 11.9 1.0
AAPL 2018-10-27 1137146.7 16.2 14.0 1054517.7 9.2 18.8 4.1
2018-12-26 XOM 2018-10-26 1151491.6 16.4 14.2 1068862.6 9.3 19.0 4.2
AAPL 2018-10-26 1160185.5 16.5 14.3 1077556.5 9.4 19.2 4.2
2018-12-24 XOM 2018-10-24 1178394.3 16.7 14.5 1095765.3 9.5 19.5 4.3
AAPL 2018-10-24 1185590.9 16.8 14.6 1102961.9 9.6 19.7 4.3
Use DataFrame.set_index with DataFrame.sort_index:
df1 = df.set_index(['date', 'ticker']).sort_index(level=[0,1], ascending=[True, False])

Is there short Pandas method chain for assigning grouped nth value?

I use nth value as columns without row aggregation.
Because I want to create a feature that can be tracked by using the window function and the aggregation function at any time.
R:
library(tidyverse)
iris %>% arrange(Species, Sepal.Length) %>% group_by(Species) %>%
mutate(cs = cumsum(Sepal.Length), cs4th = cumsum(Sepal.Length)[4]) %>%
slice(c(1:4))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species cs cs4th
<dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
1 4.3 3 1.1 0.1 setosa 4.3 17.5
2 4.4 2.9 1.4 0.2 setosa 8.7 17.5
3 4.4 3 1.3 0.2 setosa 13.1 17.5
4 4.4 3.2 1.3 0.2 setosa 17.5 17.5
5 4.9 2.4 3.3 1 versicolor 4.9 20
6 5 2 3.5 1 versicolor 9.9 20
7 5 2.3 3.3 1 versicolor 14.9 20
8 5.1 2.5 3 1.1 versicolor 20 20
9 4.9 2.5 4.5 1.7 virginica 4.9 22
10 5.6 2.8 4.9 2 virginica 10.5 22
11 5.7 2.5 5 2 virginica 16.2 22
12 5.8 2.7 5.1 1.9 virginica 22 22
Python: Too long and verbose!
import numpy as np
import pandas as pd
import seaborn as sns
iris = sns.load_dataset('iris')
iris.sort_values(['species','sepal_length']).assign(
index_species=lambda x: x.groupby('species').cumcount(),
cs=lambda x: x.groupby('species').sepal_length.cumsum(),
tmp=lambda x: np.where(x.index_species==3, x.cs, 0),
cs4th=lambda x: x.groupby('species').tmp.transform(sum)
).iloc[list(range(0,4))+list(range(50,54))+list(range(100,104))]
sepal_length sepal_width petal_length ... cs tmp cs4th
13 4.3 3.0 1.1 ... 4.3 0.0 17.5
8 4.4 2.9 1.4 ... 8.7 0.0 17.5
38 4.4 3.0 1.3 ... 13.1 0.0 17.5
42 4.4 3.2 1.3 ... 17.5 17.5 17.5
57 4.9 2.4 3.3 ... 4.9 0.0 20.0
60 5.0 2.0 3.5 ... 9.9 0.0 20.0
93 5.0 2.3 3.3 ... 14.9 0.0 20.0
98 5.1 2.5 3.0 ... 20.0 20.0 20.0
106 4.9 2.5 4.5 ... 4.9 0.0 22.0
121 5.6 2.8 4.9 ... 10.5 0.0 22.0
113 5.7 2.5 5.0 ... 16.2 0.0 22.0
101 5.8 2.7 5.1 ... 22.0 22.0 22.0
Python : My better solution(not smart. There is room for improvement about specifications of groupby )
iris.sort_values(['species','sepal_length']).assign(
cs=lambda x: x.groupby('species').sepal_length.transform('cumsum'),
cs4th=lambda x: x.merge(
x.groupby('species', as_index=False).nth(3).loc[:,['species','cs']],on='species')
.iloc[:,-1]
)
This doesn't work in a good way
iris.groupby('species').transform('nth(3)')
Here is an updated solution, using Pandas, which is still longer than what you will get with dplyr:
import seaborn as sns
import pandas as pd
iris = sns.load_dataset('iris')
iris['cs'] = (iris
.sort_values(['species','sepal_length'])
.groupby('species')['sepal_length']
.transform('cumsum'))
M = (iris
.sort_values(['species','cs'])
.groupby('species')['cs'])
groupby has a nth function that gets you a row per group : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.nth.html
iris = (iris
.sort_values(['species','cs'])
.reset_index(drop=True)
.merge(M.nth(3), how='left', on='species')
.rename(columns={'cs_x':'cs',
'cs_y':'cs4th'})
)
iris.head()
sepal_length sepal_width petal_length petal_width species cs cs4th
0 4.3 3.0 1.1 0.1 setosa 4.3 17.5
1 4.4 2.9 1.4 0.2 setosa 8.7 17.5
2 4.4 3.0 1.3 0.2 setosa 13.1 17.5
3 4.4 3.2 1.3 0.2 setosa 17.5 17.5
4 4.5 2.3 1.3 0.3 setosa 22.0 17.5
Update: 16/04/2021 ... Below is a better way to achieve the OP's goal:
(iris
.sort_values(['species', 'sepal_length'])
.assign(cs = lambda df: df.groupby('species')
.sepal_length
.transform('cumsum'),
cs4th = lambda df: df.groupby('species')
.cs
.transform('nth', 3)
)
.groupby('species')
.head(4)
)
sepal_length sepal_width petal_length petal_width species cs cs4th
13 4.3 3.0 1.1 0.1 setosa 4.3 17.5
8 4.4 2.9 1.4 0.2 setosa 8.7 17.5
38 4.4 3.0 1.3 0.2 setosa 13.1 17.5
42 4.4 3.2 1.3 0.2 setosa 17.5 17.5
57 4.9 2.4 3.3 1.0 versicolor 4.9 20.0
60 5.0 2.0 3.5 1.0 versicolor 9.9 20.0
93 5.0 2.3 3.3 1.0 versicolor 14.9 20.0
98 5.1 2.5 3.0 1.1 versicolor 20.0 20.0
106 4.9 2.5 4.5 1.7 virginica 4.9 22.0
121 5.6 2.8 4.9 2.0 virginica 10.5 22.0
113 5.7 2.5 5.0 2.0 virginica 16.2 22.0
101 5.8 2.7 5.1 1.9 virginica 22.0 22.0
Now you can do it in a non-verbose way as you did in R with datar in python:
>>> from datar.datasets import iris
>>> from datar.all import f, arrange, mutate, cumsum, slice
>>>
>>> (iris >>
... arrange(f.Species, f.Sepal_Length) >>
... group_by(f.Species) >>
... mutate(cs=cumsum(f.Sepal_Length), cs4th=cumsum(f.Sepal_Length)[3]) >>
... slice(f[1:4]))
Sepal_Length Sepal_Width Petal_Length Petal_Width Species cs cs4th
0 4.3 3.0 1.1 0.1 setosa 4.3 17.5
1 4.4 2.9 1.4 0.2 setosa 8.7 17.5
2 4.4 3.0 1.3 0.2 setosa 13.1 17.5
3 4.4 3.2 1.3 0.2 setosa 17.5 17.5
4 4.9 2.4 3.3 1.0 versicolor 4.9 20.0
5 5.0 2.0 3.5 1.0 versicolor 9.9 20.0
6 5.0 2.3 3.3 1.0 versicolor 14.9 20.0
7 5.1 2.5 3.0 1.1 versicolor 20.0 20.0
8 4.9 2.5 4.5 1.7 virginica 4.9 22.0
9 5.6 2.8 4.9 2.0 virginica 10.5 22.0
10 5.7 2.5 5.0 2.0 virginica 16.2 22.0
11 5.8 2.7 5.1 1.9 virginica 22.0 22.0
[Groups: ['Species'] (n=3)]
I am the author of the package. Feel free to submit issues if you have any questions.

Trying to scrape a webpage with multiple data tables, however only the first table is being extracted?

I am trying to extract data from basketball players off of Basketball-Reference for a project I am working on. On B-R, a player page has multiple tables of data and I want to grab all of it. However, when I try to grab the tables from the page, it only gives me the first instance of a table tag, i.e only the first table.
I have searched through the html and found that outside the first instance of the table tag, all the table tags are under a comment block. When I parse their parent tag and try and search for the child tag that contains the table information, it returns nothing. Here is a link to an example page, and here is my code:
url = 'https://www.basketball-reference.com/players/j/jamesle01.html'
get = requests.get(url)
soup = BeautifulSoup(get.text, 'html.parser')
per_36 = soup.find(id='all_per_minute')
table = per_36.find('table')
This returns nothing, however, if I were to instead look for the first table, it would return the contents. I don't understand what is going on, but I think it may have something to do with those comment blocks?
To scrape comments via BeautifulSoup, you could use this script:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://www.basketball-reference.com/players/j/jamesle01.html'
get = requests.get(url)
soup = BeautifulSoup(get.text, 'html.parser')
pl = soup.select_one('#all_per_minute .placeholder')
comments = pl.find_next(string=lambda text: isinstance(text, Comment))
soup = BeautifulSoup(comments, 'html.parser')
rows = []
for tr in soup.select('tr'):
rows.append([td.get_text(strip=True) for td in tr.select('td, th')])
for row in rows:
print(''.join('{: ^7}'.format(td) for td in row))
Prints:
Season Age Tm Lg Pos G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
2003-04 19 CLE NBA SG 79 79 3122 7.2 17.2 .417 0.7 2.5 .290 6.4 14.7 .438 4.0 5.3 .754 1.1 3.8 5.0 5.4 1.5 0.7 3.1 1.7 19.1
2004-05 20 CLE NBA SF 80 80 3388 8.4 17.9 .472 1.1 3.3 .351 7.3 14.6 .499 5.1 6.8 .750 1.2 5.1 6.2 6.1 1.9 0.6 2.8 1.6 23.1
2005-06 21 CLE NBA SF 79 79 3361 9.4 19.5 .480 1.4 4.1 .335 8.0 15.5 .518 6.4 8.7 .738 0.8 5.2 6.0 5.6 1.3 0.7 2.8 1.9 26.5
2006-07 22 CLE NBA SF 78 78 3190 8.7 18.3 .476 1.1 3.5 .319 7.6 14.8 .513 5.5 7.9 .698 0.9 5.0 5.9 5.3 1.4 0.6 2.8 1.9 24.1
2007-08 23 CLE NBA SF 75 74 3027 9.4 19.5 .484 1.3 4.3 .315 8.1 15.3 .531 6.5 9.2 .712 1.6 5.5 7.0 6.4 1.6 1.0 3.0 2.0 26.8
2008-09 24 CLE NBA SF 81 81 3054 9.3 19.0 .489 1.6 4.5 .344 7.7 14.5 .535 7.0 9.0 .780 1.2 6.0 7.2 6.9 1.6 1.1 2.8 1.6 27.2
2009-10 25 CLE NBA SF 76 76 2966 9.3 18.5 .503 1.6 4.7 .333 7.8 13.8 .560 7.2 9.4 .767 0.9 5.9 6.7 7.9 1.5 0.9 3.2 1.4 27.4
2010-11 26 MIA NBA SF 79 79 3063 8.9 17.5 .510 1.1 3.3 .330 7.8 14.2 .552 5.9 7.8 .759 0.9 6.0 6.9 6.5 1.5 0.6 3.3 1.9 24.8
2011-12 27 MIA NBA SF 62 62 2326 9.6 18.1 .531 0.8 2.3 .362 8.8 15.8 .556 6.0 7.8 .771 1.5 6.2 7.6 6.0 1.8 0.8 3.3 1.5 26.0
...and so on.

Categories

Resources