Extract a table from website and save it as csv file - python

I want to extract the table with div element as ind-mp_info to a csv file. You can find it when you expand the COVID-19 Statewise Status tab.
The website link is- https://www.mygov.in/covid-19/
The code-
# importing the libraries
from bs4 import BeautifulSoup
from urllib.request import urlopen
import csv
import pandas as pd
html = urlopen("https://www.mygov.in/covid-19/")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table", {"class":"ind-mp_info"})
rows = table.findAll("tr")
with open("editors.csv", "wt+", newline="") as f:
writer = csv.writer(f)
for row in rows:
csv_row = []
for cell in row.findAll(["td", "th"]):
csv_row.append(cell.get_text())
writer.writerow(csv_row)

You can get that json directly and convert to dataframe.
import requests
import pandas as pd
import time
url = 'https://www.mygov.in/sites/default/files/covid/vaccine/vaccine_counts_today.json'
payload = {
'timestamp':int(time.time())}
jsonData = requests.get(url, params=payload).json()
df = pd.DataFrame(jsonData['vacc_st_data'])
df.to_csv('editors.csv',index=False)
Output:
print(df.to_string())
st_name state_id covid_state_name covid_state_id dose1 dose2 total_doses last_dose1 last_dose2 last_total_doses
0 Andaman and Nicobar 1 Andaman and Nicobar 35 216987 95049 312036 216053 94601 310654
1 Andhra Pradesh 2 Andhra Pradesh 28 17826178 6298023 24124201 17633289 6216600 23849889
2 Arunachal Pradesh 3 Arunachal Pradesh 12 694525 189292 883817 692532 186724 879256
3 Assam 4 Assam 18 10706685 2259908 12966593 10510804 2209605 12720409
4 Bihar 5 Bihar 10 23504063 4521066 28025129 23366654 4488385 27855039
5 Chandigarh 6 Chandigarh 4 706582 226814 933396 700329 223569 923898
6 Chhattisgarh 7 Chhattisgarh 22 10000779 2632861 12633640 9979765 2611797 12591562
7 Dadra and Nagar Haveli and Daman and Diu 8 Dadra and Nagar Haveli and Daman and Diu 26 587610 81895 669505 584400 80828 665228
8 Delhi 9 Delhi 7 7930829 3057218 10988047 7835664 3000596 10836260
9 Goa 10 Goa 30 1098302 307364 1405666 1094394 302521 1396915
10 Gujarat 11 Gujarat 24 28535938 9139961 37675899 28113725 9054592 37168317
11 Haryana 12 Haryana 6 10152822 2974775 13127597 10090001 2924714 13014715
12 Himachal Pradesh 13 Himachal Pradesh 2 4335980 1406772 5742752 4249932 1382642 5632574
13 Jammu and Kashmir 14 Jammu and Kashmir 1 5376054 1508641 6884695 5325806 1491262 6817068
14 Jharkhand 15 Jharkhand 20 8450135 2018319 10468454 8388534 1997186 10385720
15 Karnataka 16 Karnataka 29 26000864 7509346 33510210 25860894 7437119 33298013
16 Kerala 17 Kerala 32 15759471 6442507 22201978 15672348 6427551 22099899
17 Ladakh 18 Ladakh 37 188876 70779 259655 188699 70337 259036
18 Lakshadweep 19 Lakshadweep 31 51371 17296 68667 51165 17170 68335
19 Madhya Pradesh 20 Madhya Pradesh 23 29817764 5750317 35568081 29752302 5736096 35488398
20 Maharashtra 21 Maharashtra 27 35261712 12239857 47501569 35044144 12114068 47158212
21 Manipur 22 Manipur 14 1163534 251078 1414612 1159499 246753 1406252
22 Meghalaya 23 Meghalaya 17 946600 238582 1185182 938984 232152 1171136
23 Mizoram 24 Mizoram 15 656018 209089 865107 654946 206780 861726
24 Nagaland 25 Nagaland 13 634479 162621 797100 632129 159436 791565
25 Odisha 26 Odisha 21 14222570 4264500 18487070 13971009 4202596 18173605
26 Puducherry 27 Puducherry 34 604872 152636 757508 601608 151744 753352
27 Punjab 28 Punjab 3 8222725 2303559 10526284 8202118 2287403 10489521
28 Rajasthan 29 Rajasthan 8 27226185 8464839 35691024 27017475 8377435 35394910
29 Sikkim 30 Sikkim 11 498609 152574 651183 497851 151538 649389
30 Tamil Nadu 31 Tamil Nadu 33 21024528 4734496 25759024 20857302 4689811 25547113
31 Telangana 32 Telengana 36 11714148 4019069 15733217 11649833 3966309 15616142
32 Tripura 33 Tripura 16 2417276 808369 3225645 2411801 804137 3215938
33 Uttar Pradesh 34 Uttar Pradesh 9 46430534 8618231 55048765 45976210 8518342 54494552
34 Uttarakhand 35 Uttarakhand 5 5171059 1622492 6793551 5071246 1596762 6668008
35 West Bengal 36 West Bengal 19 23559058 9184539 32743597 23264439 9134008 32398447
36 Miscellaneous 37 Miscellaneous 38 1900366 1549702 3450068 1900173 1549042 3449215

Related

Scraping data from TeamRankings.com

I want to scrape some NBA data from TeamRankings.com for my program in python. Here is an example link:
https://www.teamrankings.com/nba/stat/effective-field-goal-pct?date=2023-01-03
I only need the "Last 3" column data. I want to be able to set the date to whatever I want with a constant variable. There are a few other data points I want that are on different links but I will be able to figure that part out if this gets figured out.
I have tried using https://github.com/tymiguel/TeamRankingsWebScraper but it is outdated and did not work for me.
The easiest way will be to use pandas.read_html:
import pandas as pd
url = 'https://www.teamrankings.com/nba/stat/effective-field-goal-pct?date=2023-01-03'
df = pd.read_html(url)[0]
print(df)
Prints:
Rank Team 2022 Last 3 Last 1 Home Away 2021
0 1 Brooklyn 58.8% 64.5% 68.3% 59.4% 58.1% 54.2%
1 2 Denver 57.8% 62.8% 52.2% 59.5% 56.4% 55.5%
2 3 Boston 56.8% 54.6% 51.1% 58.2% 55.1% 54.0%
3 4 Sacramento 56.3% 56.9% 48.4% 59.1% 53.4% 52.5%
4 5 Golden State 56.3% 53.2% 52.5% 56.9% 55.6% 55.4%
5 6 Dallas 56.0% 59.5% 50.0% 55.8% 56.2% 54.0%
6 7 Portland 55.5% 58.6% 65.5% 57.3% 54.3% 51.5%
7 8 Minnesota 55.3% 52.1% 59.2% 55.7% 54.9% 53.8%
8 9 Utah 55.3% 53.9% 53.7% 58.1% 53.0% 55.1%
9 10 Philadelphia 55.3% 57.3% 56.4% 54.5% 56.2% 53.6%
10 11 Cleveland 55.1% 57.7% 60.9% 56.7% 53.1% 53.7%
11 12 Washington 54.6% 61.4% 56.9% 54.7% 54.5% 53.2%
12 13 Chicago 54.6% 57.3% 54.7% 55.7% 53.5% 53.7%
13 14 Indiana 54.5% 60.3% 53.8% 56.1% 52.8% 53.1%
14 15 New Orleans 54.4% 52.5% 56.5% 56.2% 52.5% 51.8%
15 16 Phoenix 54.1% 51.6% 44.8% 54.8% 53.5% 55.0%
16 17 LA Clippers 54.1% 57.8% 52.2% 52.3% 55.8% 53.0%
17 18 LA Lakers 54.0% 56.6% 53.8% 53.7% 54.3% 53.7%
18 19 San Antonio 53.1% 54.6% 47.4% 53.4% 52.8% 52.7%
19 20 Orlando 52.9% 48.0% 44.5% 54.6% 50.9% 50.2%
20 21 Milwaukee 52.8% 45.5% 42.2% 55.0% 50.4% 54.0%
21 22 Memphis 52.8% 54.0% 51.0% 53.8% 51.8% 52.1%
22 23 Miami 52.6% 54.6% 52.9% 53.1% 52.1% 54.0%
23 24 New York 52.2% 51.4% 57.4% 53.9% 50.6% 51.3%
24 25 Atlanta 52.2% 51.5% 53.7% 51.5% 53.0% 54.2%
25 26 Okla City 52.2% 50.9% 44.6% 52.6% 51.7% 49.7%
26 27 Detroit 51.5% 52.3% 45.1% 52.7% 50.5% 49.4%
27 28 Toronto 51.1% 51.3% 52.7% 51.3% 50.8% 51.0%
28 29 Houston 51.0% 50.0% 51.8% 50.2% 51.6% 53.4%
29 30 Charlotte 50.3% 52.0% 51.1% 49.3% 51.2% 54.3%
If you want only Last 3 column:
print(df[['Team', 'Last 3']])
Prints:
Team Last 3
0 Brooklyn 64.5%
1 Denver 62.8%
2 Boston 54.6%
3 Sacramento 56.9%
...

Web scrape of forbes website using requests-html

I'm trying to scrape the list from https://www.forbes.com/best-states-for-business/list/#tab:overall
import requests_html
session= requests_html.HTMLSession()
r = session.get("https://www.forbes.com/best-states-for-business/list/#tab:overall")
r.html.render()
body=r.text.find('#list-table-body')
print(body)
This returns -1, not the table content. How can I get the actual table content?
Data is loaded dynamically from external source via API and You can grab the required data using requests module only then store via pandas DataFrame.
import pandas as pd
import requests
headers = {'user-agent':'Mozilla/5.0'}
url = 'https://www.forbes.com/ajax/list/data?year=2019&uri=best-states-for-business&type=place'
r= requests.get(url,headers=headers)
df = pd.DataFrame(r.json())
print(df)
Output:
position rank name uri ... regulatoryEnvironment economicClimate growthProspects lifeQuality
0 1 1 North Carolina nc ... 1 13 13 16
1 2 2 Texas tx ... 21 4 1 15
2 3 3 Utah ut ... 6 8 7 9
3 4 4 Virginia va ... 3 20 24 1
4 5 5 Florida fl ... 7 3 5 18
5 6 6 Georgia ga ... 9 7 11 23
6 7 7 Tennessee tn ... 4 11 14 29
7 8 8 Washington wa ... 29 6 8 30
8 9 9 Colorado co ... 19 2 4 21
9 10 10 Idaho id ... 8 10 2 24
10 11 11 Nebraska ne ... 2 28 36 19
11 12 12 Indiana in ... 5 25 25 7
12 13 13 Nevada nv ... 14 14 6 48
13 14 14 South Dakota sd ... 13 39 20 28
14 15 15 Minnesota mn ... 16 16 27 3
15 16 16 South Carolina sc ... 17 15 12 39
16 17 17 Iowa ia ... 11 36 35 10
17 18 18 Arizona az ... 18 12 3 35
18 19 19 Massachusetts ma ... 37 5 15 4
19 20 20 Oregon or ... 36 9 9 38
20 21 21 Wisconsin wi ... 10 19 37 8
21 22 22 Missouri mo ... 25 26 18 17
22 23 23 Delaware de ... 42 37 19 43
23 24 24 Oklahoma ok ... 15 31 33 31
24 25 25 New Hampshire nh ... 32 21 22 22
25 26 26 North Dakota nd ... 22 45 26 42
26 27 27 Pennsylvania pa ... 35 23 40 12
27 28 28 New York ny ... 34 18 21 14
28 29 29 Ohio oh ... 26 22 44 2
29 30 30 Montana mt ... 28 35 17 45
30 31 31 California ca ... 40 1 10 27
31 32 32 Wyoming wy ... 12 49 23 36
32 33 33 Arkansas ar ... 20 33 39 41
33 34 34 Maryland md ... 41 27 29 26
34 35 35 Michigan mi ... 22 17 41 13
35 36 36 Kansas ks ... 24 32 42 32
36 37 37 Illinois il ... 39 30 45 11
37 38 38 Kentucky ky ... 33 41 34 25
38 39 39 New Jersey nj ... 49 29 30 5
39 40 40 Alabama al ... 27 38 31 44
40 41 41 Rhode Island ri ... 44 40 32 20
41 42 42 Mississippi ms ... 30 46 47 37
42 43 43 Connecticut ct ... 43 42 48 6
43 44 44 Maine me ... 48 34 28 34
44 45 45 Vermont vt ... 45 43 38 33
45 46 46 Louisiana la ... 47 47 46 47
46 47 47 Hawaii hi ... 38 24 49 40
47 48 48 New Mexico nm ... 46 44 15 49
48 49 49 West Virginia wv ... 50 48 50 46
49 50 50 Alaska ak ... 31 50 43 50
[50 rows x 15 columns]

Web Scraping School Project

I need help with a school project. The code that I have "#" I can't seem to get to work with the table I scraped. I need to change it into a data frame. Can anyone see what I'm missing and if I am missing a step.
Tertiary=pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_tertiary_education_attainment")
Tertiary=pd.DataFrame(Tertiary[1])
#Tertiary=Tertiary.drop(["Non-OECD"], axis=1, inplace=True)
print(Tertiary.dtypes)
#Tertiary["Age25-64(%)"] = pd.to_numeric(Tertiary["Age25-64(%)"])
#Tertiary["Age"] = pd.to_numeric(Tertiary["Age"])
print(Tertiary.dtypes)
print()
#print(Tertiary.describe)
print()
#print(Tertiary.isnull().sum())
#print(Tertiary)
Everything works fine for me.
import pandas as pd
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_tertiary_education_attainment")
table = pd.DataFrame(df[1])
print(table)
print(table.columns)
Output:
Country Age 25–64 (%) Age Year Non-OECD
Country Age 25–64 (%) 25–34 (%) 35–44 (%) 45–54 (%) 55–64 (%) Year Non-OECD
0 Australia 42 48 46 38 33 2014 NaN
1 Austria 30 38 33 27 21 2014 NaN
2 Belgium 37 44 42 34 26 2014 NaN
3 Brazil 14 15 14 14 11 2013 NaN
4 Canada 54 58 61 51 45 2014 NaN
5 Chile 21 27 24 17 14 2013 NaN
6 China 17 27 15 7 2 2018 NaN
7 Colombia 22 28 23 18 16 2014 NaN
8 Costa Rica 18 21 19 17 17 2014 NaN
9 Czech Republic 22 30 21 20 15 2014 NaN
10 Denmark 36 42 41 33 29 2014 NaN
11 Estonia 38 40 39 35 36 2014 NaN
12 Finland 42 40 50 44 34 2014 NaN
13 France 32 44 39 26 20 2013 NaN
14 Germany 27 28 29 26 25 2014 NaN
15 Greece 28 39 27 26 21 2014 NaN
16 Hungary 23 32 25 20 17 2014 NaN
17 Iceland 37 41 42 36 29 2014 NaN
18 Indonesia 8 10 9 8 4 2011 NaN
19 Ireland 41 51 49 34 24 2014 NaN
20 Israel 49 46 53 48 47 2014 NaN
21 Italy 17 24 19 13 12 2014 NaN
22 Japan 48 59 53 47 35 2014 NaN
23 Latvia 30 39 31 27 23 2014 NaN
24 Lithuania 37 53 38 30 28 2014 NaN
25 Luxembourg 46 53 56 40 32 2014 NaN
26 Mexico 19 25 17 16 13 2014 NaN
27 Netherlands 34 44 38 30 27 2014 NaN
28 New Zealand 36 40 41 32 29 2014 NaN
29 Norway 42 49 49 36 32 2014 NaN
30 Poland 27 43 32 18 14 2014 NaN
31 Portugal 22 31 26 17 13 2014 NaN
32 Russia 54 58 55 53 50 2013 NaN
33 Saudi Arabia 22 26 22 18 14 2013 NaN
34 Slovakia 20 30 21 15 14 2014 NaN
35 Slovenia 29 38 35 24 18 2014 NaN
36 South Africa 7 5 7 8 7 2012 NaN
37 South Korea 45 68 56 33 17 2014 NaN
38 Spain 35 41 43 30 21 2014 NaN
39 Sweden 39 46 46 32 30 2014 NaN
40 Switzerland 40 46 45 38 31 2014 NaN
41 Turkey 17 25 16 10 10 2014 NaN
42 Taiwan[3] 45 X X X X 2015 NaN
43 United Kingdom 42 49 46 38 35 2014 NaN
44 United States 44 46 47 43 41 2014 NaN
__
MultiIndex([( 'Country', 'Country'),
('Age 25–64 (%)', 'Age 25–64 (%)'),
( 'Age', '25–34 (%)'),
( 'Age', '35–44 (%)'),
( 'Age', '45–54 (%)'),
( 'Age', '55–64 (%)'),
( 'Year', 'Year'),
( 'Non-OECD', 'Non-OECD')],
)

Can't fetch the content of a table from a website through web scraping using BeautifulSoup

I'm trying to pull out data off the table from the section COVID-19 Statewise Status from https://www.mohfw.gov.in/ website.
I'm using BeautifulSoup for this, but when I try to read the contents of tbody, it outputs it as None.
Here is my code:
import requests
from bs4 import BeautifulSoup
url = "https://www.mohfw.gov.in/"
soup = BeautifulSoup(requests.get(url).content, "lxml")
t1=soup.find(id="state-data")
t2=t1.find('div',class_='data-table table-responsive')
t3=t2.find('table')
tab=t3.find('tbody')
data=tab.find_all('tr')
for d in data:
t=d.find_all('td')
for t1 in t:
val=t1.text
print(val)
All the data of the table clearly present in the tbody tag. But when I tried to scrape them the output shows None. How to fetch them, someone please help. Thank You in advance!
The data is loaded from external source via Javascript. You can use requests to load this json data or use pandas:
import pandas as pd
df = pd.read_json("https://www.mohfw.gov.in/data/datanew.json")
print(df)
Prints:
sno state_name active positive cured death new_active new_positive new_cured new_death state_code
0 2 Andaman and Nicobar Islands 97 7415 7191 127 103 7425 7195 127 35
1 1 Andhra Pradesh 58140 1853183 1782680 12363 53880 1857352 1791056 12416 28
2 3 Arunachal Pradesh 2539 33375 30677 159 2548 33664 30956 160 12
3 4 Assam 32625 485310 448442 4243 32975 488179 450924 4280 18
4 5 Bihar 3017 719939 707365 9557 2811 720207 707833 9563 10
5 6 Chandigarh 311 61444 60327 806 278 61467 60383 806 04
6 7 Chhattisgarh 8564 991171 969212 13395 8007 991653 970244 13402 22
7 8 Dadra and Nagar Haveli and Daman and Diu 60 10516 10452 4 61 10520 10455 4 26
8 10 Delhi 1996 1432381 1405460 24925 1918 1432778 1405927 24933 07
9 11 Goa 3066 164654 158591 2997 2920 164957 159029 3008 30
10 12 Gujarat 5639 822485 806812 10034 5159 822620 807424 10037 24
11 13 Haryana 2337 767580 755968 9275 2200 767726 756231 9295 06
12 14 Himachal Pradesh 2408 200603 194747 3448 2276 200791 195062 3453 02
13 15 Jammu and Kashmir 7759 312156 300135 4262 7181 312584 301134 4269 01
14 16 Jharkhand 1489 344665 338076 5100 1417 344775 338256 5102 20
15 17 Karnataka 123156 2811320 2654139 34025 118615 2815029 2662250 34164 29
16 18 Kerala 100135 2816843 2704554 12154 100881 2829460 2716284 12295 32
17 19 Ladakh 365 19838 19271 202 360 19871 19309 202 37
18 20 Lakshadweep 319 9471 9106 46 315 9504 9142 47 31
19 21 Madhya Pradesh 1980 789350 778584 8786 1707 789415 778902 8806 23
20 22 Maharashtra 127523 5979051 5733215 118313 126468 5987521 5742258 118795 27
21 23 Manipur 9298 64418 54065 1055 9214 64993 54714 1065 14
22 24 Meghalaya 4196 45555 40574 785 4273 45976 40915 788 17
23 25 Mizoram 4227 17979 13667 85 4424 18409 13900 85 15
24 26 Nagaland 1844 24374 22055 475 1757 24438 22204 477 13
25 27 Odisha 32099 880533 844801 3633 30859 883490 848960 3671 21
26 28 Puducherry 3364 115080 109990 1726 3214 115364 110423 1727 34
27 29 Punjab 6477 592658 570327 15854 5968 593063 571207 15888 03
28 30 Rajasthan 2691 951256 939664 8901 2388 951393 940101 8904 08
29 31 Sikkim 2448 19321 16580 293 2430 19458 16732 296 11
30 32 Tamil Nadu 61329 2429924 2337209 31386 56886 2436819 2348353 31580 33
31 34 Telangana 17246 614399 593577 3576 16640 615574 595348 3586 36
32 33 Tripura 3910 62745 58181 654 3747 63140 58735 658 16
33 35 Uttarakhand 2964 338807 328799 7044 2896 338978 329030 7052 05
34 36 Uttar Pradesh 4163 1704476 1678089 22224 3910 1704678 1678486 22282 09
35 37 West Bengal 22740 1483586 1443456 17390 22508 1485438 1445493 17437 19
36 11111 662521 29977861 28926038 389302 643194 30028709 28994855 390660
The response from requests.get(url).content contains a comment in the html code, <!--<tbody>, which is causing issues down the line. Your line tab=t3.find('tbody') returns None
One way to get past this might be to remove the comment. See below.
import requests
from bs4 import BeautifulSoup
url = "https://www.mohfw.gov.in/"
soup = BeautifulSoup(requests.get(url).content, "lxml")
t1=soup.find(id="state-data")
t2=t1.find('div',class_='data-table table-responsive')
t3=t2.find('table')
# tab=t3.find('tbody')
t3 = str(t3).replace("<!--<tbody>", "")
t3 = BeautifulSoup(t3, "lxml")
data=t3.find_all('tr')
for d in data:
t=d.find_all('td')
for t1 in t:
val=t1.text
print(val)

Pandas - read a text file

I have a text file that looks like this:
************************************************************************************************
English Premier Division - Saturday 25th May 2002
************************************************************************************************
================================================================================================
2001/2 Assists
================================================================================================
Pos Player Club Apps Asts
-------------------------------------------------------------------------
1st David Beckham Man Utd 29 15
2nd Dean Gordon Middlesbrough 30 (1) 11
3rd John Collins Fulham 32 11
4th Ryan Giggs Man Utd 32 11
5th Kieron Dyer Newcastle 33 10
6th Sean Davis Fulham 23 (1) 10
7th Damien Duff Blackburn 30 (3) 10
8th Alan Smith Leeds 23 (6) 9
9th Jesper Grønkjær Chelsea 34 9
10th Andrejs Stolcers Fulham 28 9
11th Ian Harte Leeds 37 8
12th Eidur Gudjohnsen Chelsea 28 (3) 8
13th Robert Pires Arsenal 24 (3) 7
14th Lauren Arsenal 32 (1) 7
15th John Robinson Charlton 33 7
16th Michael Gray Sunderland 37 7
17th Henrik Pedersen Bolton 36 7
18th Anders Svensson Southampton 34 (2) 7
19th Lee Bowyer Leeds 32 7
20th Craig Hignett Blackburn 21 (6) 7
21st Paul Merson Aston Villa 27 7
22nd Teddy Sheringham Tottenham 37 7
23rd Steed Malbranque Fulham 16 (14) 7
24th Marian Pahars Southampton 37 7
25th Muzzy Izzet Leicester 28 7
26th Sergei Rebrov Tottenham 36 (1) 7
27th Julio Arca Sunderland 32 (1) 7
28th Christian Bassedas Newcastle 37 7
29th Juan Sebastián Verón Man Utd 29 (2) 7
30th Joe Cole West Ham 32 6
I'm trying to read it into a pandas data frame like this:
df = pd.read_table('assist1.txt',
sep='\s+',
skiprows=6,
header=0,)
This code throws an exception - pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 31, saw 8.
I guess that's because of the space between the first and last name of the player (should be the value of the Player column).
Is there a way to achieve this?
Furthermore, it is a part of a larger text file that looks like this:
************************************************************************************************
English Premier Division - Saturday 25th May 2002
************************************************************************************************
================================================================================================
2001/2 Table
================================================================================================
Pos Team Pld Won Drn Lst For Ag Won Drn Lst For Ag Pts
--------------------------------------------------------------------------------------------------
1st C Man Utd 38 15 4 0 41 4 10 4 5 34 20 83
--------------------------------------------------------------------------------------------------
2nd Arsenal 38 15 2 2 38 9 11 3 5 28 14 83
3rd Leeds 38 15 4 0 33 8 9 4 6 36 37 80
4th Liverpool 38 13 4 2 25 7 9 2 8 26 24 72
5th Chelsea 38 16 1 2 44 18 4 5 10 24 33 66
6th Newcastle 38 11 5 3 40 23 7 3 9 25 33 62
7th Blackburn 38 11 3 5 36 24 5 5 9 23 30 56
8th Middlesbrough 38 9 7 3 31 19 5 6 8 20 29 55
9th Sunderland 38 8 5 6 31 30 8 2 9 22 25 55
10th West Ham 38 11 3 5 31 17 3 7 9 14 29 52
11th Tottenham 38 10 3 6 35 26 4 5 10 23 35 50
12th Leicester 38 7 5 7 23 20 6 4 9 26 28 48
13th Fulham 38 7 5 7 39 35 5 7 7 33 44 48
14th Ipswich 38 9 4 6 23 22 3 3 13 14 34 43
15th Charlton 38 5 5 9 18 26 5 4 10 16 30 39
16th Everton 38 8 4 7 30 28 1 5 13 11 36 36
17th Aston Villa 38 2 8 9 19 28 5 6 8 21 26 35
--------------------------------------------------------------------------------------------------
18th R Derby 38 6 4 9 25 28 3 3 13 14 39 34
19th R Southampton 38 5 7 7 34 34 1 4 14 12 35 29
20th R Bolton 38 6 3 10 25 31 1 4 14 15 40 28
================================================================================================
2001/2 Goals
================================================================================================
Pos Player Club Apps Gls
-------------------------------------------------------------------------
1st Thierry Henry Arsenal 34 25
2nd Alan Shearer Newcastle 36 25
3rd Ruud van Nistelrooy Man Utd 26 23
4th Steve Marlet Fulham 38 20
5th Jimmy Floyd Hasselbaink Chelsea 30 (1) 20
6th Les Ferdinand Sunderland 27 (2) 17
7th Kevin Phillips Sunderland 36 17
8th Frédéric Kanouté West Ham 32 (3) 14
9th Marcus Bent Blackburn 28 (4) 13
10th Alen Boksic Middlesbrough 36 13
11th Eidur Gudjohnsen Chelsea 28 (3) 13
12th Luis Boa Morte Fulham 36 13
13th Michael Owen Liverpool 32 (1) 12
14th Dwight Yorke Man Utd 29 (1) 11
15th Henrik Pedersen Bolton 36 11
16th Juan Pablo Angel Aston Villa 34 (2) 11
17th Juan Sebastián Verón Man Utd 29 (2) 11
18th Shaun Bartlett Charlton 35 10
19th Matt Jansen Blackburn 28 (5) 10
20th Duncan Ferguson Everton 28 (5) 10
21st Ian Harte Leeds 37 10
22nd Bosko Balaban Aston Villa 36 10
23rd Robbie Fowler Liverpool 25 (3) 10
24th Georgi Kinkladze Derby 36 (1) 10
25th Hamilton Ricard Middlesbrough 28 (2) 10
26th Robert Pires Arsenal 24 (3) 9
27th Andrew Cole Man Utd 15 (5) 9
28th Rod Wallace Bolton 31 9
29th James Beattie Southampton 28 (1) 9
30th Robbie Keane Leeds 28 (8) 9
================================================================================================
2001/2 Assists
================================================================================================
Pos Player Club Apps Asts
-------------------------------------------------------------------------
1st David Beckham Man Utd 29 15
2nd Dean Gordon Middlesbrough 30 (1) 11
3rd John Collins Fulham 32 11
4th Ryan Giggs Man Utd 32 11
5th Kieron Dyer Newcastle 33 10
6th Sean Davis Fulham 23 (1) 10
7th Damien Duff Blackburn 30 (3) 10
8th Alan Smith Leeds 23 (6) 9
9th Jesper Grønkjær Chelsea 34 9
10th Andrejs Stolcers Fulham 28 9
11th Ian Harte Leeds 37 8
12th Eidur Gudjohnsen Chelsea 28 (3) 8
13th Robert Pires Arsenal 24 (3) 7
14th Lauren Arsenal 32 (1) 7
15th John Robinson Charlton 33 7
16th Michael Gray Sunderland 37 7
17th Henrik Pedersen Bolton 36 7
18th Anders Svensson Southampton 34 (2) 7
19th Lee Bowyer Leeds 32 7
20th Craig Hignett Blackburn 21 (6) 7
21st Paul Merson Aston Villa 27 7
22nd Teddy Sheringham Tottenham 37 7
23rd Steed Malbranque Fulham 16 (14) 7
24th Marian Pahars Southampton 37 7
25th Muzzy Izzet Leicester 28 7
26th Sergei Rebrov Tottenham 36 (1) 7
27th Julio Arca Sunderland 32 (1) 7
28th Christian Bassedas Newcastle 37 7
29th Juan Sebastián Verón Man Utd 29 (2) 7
30th Joe Cole West Ham 32 6
================================================================================================
2001/2 Average Rating
================================================================================================
Pos Player Club Apps Av R
-------------------------------------------------------------------------
1st Ruud van Nistelrooy Man Utd 26 8.54
2nd Thierry Henry Arsenal 34 8.09
3rd Alan Shearer Newcastle 36 7.97
4th Kieron Dyer Newcastle 33 7.94
5th Steve Marlet Fulham 38 7.89
6th Ian Harte Leeds 37 7.86
7th Andrew Cole Man Utd 15 (5) 7.85
8th Roy Keane Man Utd 19 7.84
9th Les Ferdinand Sunderland 27 (2) 7.83
10th Juan Sebastián Verón Man Utd 29 (2) 7.81
11th Eidur Gudjohnsen Chelsea 28 (3) 7.77
12th Jesper Grønkjær Chelsea 34 7.76
13th Michaël Silvestre Man Utd 32 7.72
14th Dean Gordon Middlesbrough 30 (1) 7.71
15th Michael Owen Liverpool 32 (1) 7.70
16th Patrick Vieira Arsenal 29 7.69
17th Robert Pires Arsenal 24 (3) 7.67
18th Ryan Giggs Man Utd 32 7.66
19th Dwight Yorke Man Utd 29 (1) 7.63
20th Mario Stanic Chelsea 29 (3) 7.63
21st Frédéric Kanouté West Ham 32 (3) 7.57
22nd Mark Viduka Leeds 21 7.57
23rd David Beckham Man Utd 29 7.55
24th Jimmy Floyd Hasselbaink Chelsea 30 (1) 7.55
25th Martin Taylor Blackburn 14 (8) 7.55
26th Titus Bramble Ipswich 33 7.55
27th Sol Campbell Arsenal 20 (1) 7.52
28th Mario Melchiot Chelsea 19 (2) 7.52
29th Stephane Henchoz Liverpool 29 7.52
30th Rio Ferdinand Leeds 36 (1) 7.51
================================================================================================
2001/2 Man of Match
================================================================================================
Pos Player Club Apps MoM
-------------------------------------------------------------------------
1st Thierry Henry Arsenal 34 8
2nd Ruud van Nistelrooy Man Utd 26 8
3rd Kieron Dyer Newcastle 33 6
4th Les Ferdinand Sunderland 27 (2) 6
5th Steve Marlet Fulham 38 6
6th Eidur Gudjohnsen Chelsea 28 (3) 6
7th Ian Harte Leeds 37 5
8th Richie Wellens Leicester 20 (9) 5
9th Henrik Pedersen Bolton 36 5
10th Alan Shearer Newcastle 36 5
11th Michael Owen Liverpool 32 (1) 4
12th Dean Gordon Middlesbrough 30 (1) 4
13th Matt Jansen Blackburn 28 (5) 4
14th Marcus Bent Blackburn 28 (4) 4
15th Kevin Campbell Everton 27 (4) 4
16th Titus Bramble Ipswich 33 4
17th Roy Keane Man Utd 19 4
18th Frédéric Kanouté West Ham 32 (3) 4
19th Patrick Vieira Arsenal 29 4
20th Hermann Hreidarsson Ipswich 34 4
21st Dennis Bergkamp Arsenal 22 (9) 4
22nd Jimmy Floyd Hasselbaink Chelsea 30 (1) 4
23rd Claus Lundekvam Southampton 27 (2) 4
24th Robert Pires Arsenal 24 (3) 3
25th Shaun Bartlett Charlton 35 3
26th Kevin Phillips Sunderland 36 3
27th Lucas Radebe Leeds 31 (1) 3
28th Ragnvald Soma West Ham 27 (3) 3
29th Dean Richards Tottenham 34 3
30th Wayne Quinn Liverpool 25 (4) 3
Ideally I would like to run a function that creates a data frame out of each table above, but can't figure it out.
Thanks
Thanks
another way you can specify the seperator as more than one space, and skiprows as a list of rows. I tried this and it gave me your expected output. You can write simple script to find which lines to be skipped and which to be considered.
df = pd.read_table('assist1.txt', sep='\s\s+', skiprows=[0,1,2,3,4,5,6,7,8,10], header=0,engine='python')
You're using whitespace as a delimiter, but this is fixed-length delimited, not whitespace delimited. You should google fixed-length parsing, e.g. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_fwf.html.

Categories

Resources