Subplot for grouped value count bar plot - python

My table looks like something below
YEAR RESPONSIBLE DISTRICT
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
2014 01 - PARIS
... ... ...
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
2017 15 - SAN ANTONIO
After I wrote
g = df.groupby('FISCAL YEAR')['RESPONSIBLE DISTRICT'].value_counts()
I got below
YEAR RESPONSIBLE DISTRICT
2014 05 - LUBBOCK 12312
15 - SAN ANTONIO 10457
18 - DALLAS 9885
04 - AMARILLO 9617
08 - ABILENE 8730
...
2020 21 - PHARR 5645
25 - CHILDRESS 5625
20 - BEAUMONT 5560
22 - LAREDO 5034
24 - EL PASO 4620
I have 25 districts in total. Now I want to create 25 subplots, so each subplot would represent a single district. For each subplot, I want the year 2014-2020 to be on the x-axis and the value count to be on the y-axis. How could I do that?

Is it what you expect?
import matplotlib.pyplot as plt
fig, axs = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(15, 15))
for ax, (district, sr) in zip(axs.flat, g.groupby('RESPONSIBLE DISTRICT')):
ax.set_title(district)
ax.plot(sr.index.get_level_values('YEAR'), sr.values)
fig.tight_layout()
plt.show()

This should work.
import matplotlib.pyplot as plt
import pandas as pd
g = df.groupby('YEAR')['RESPONSIBLE DISTRICT'].value_counts()
fig, axs = plt.subplots(5, 5, constrained_layout=True)
for ax, (district, dfi) in zip(axs.ravel(), g.groupby('RESPONSIBLE DISTRICT')):
x = dfi.index.get_level_values('YEAR').values
y = dfi.values
ax.bar(x, y)
ax.set_title(district)
plt.show()

The correct way with only pandas, is to shape the dataframe with .pivot, and then to correctly use pandas.DataFrame.plot.
Imports & Data
import pandas as pd
import numpy as np # for test data
import seaborn as sns # only for seaborn option
# test data
np.random.seed(365)
rows = 100000
data = {'YEAR': np.random.choice(range(2014, 2021), size=rows),
'RESPONSIBLE DISTRICT': np.random.choice(['05 - LUBBOCK', '15 - SAN ANTONIO', '18 - DALLAS', '04 - AMARILLO', '08 - ABILENE', '21 - PHARR', '25 - CHILDRESS', '20 - BEAUMONT', '22 - LAREDO', '24 - EL PASO'], size=rows)}
df = pd.DataFrame(data)
# get the value count of each district by year and pivot the shape
dfp = df.value_counts(subset=['YEAR', 'RESPONSIBLE DISTRICT']).reset_index(name='VC').pivot(index='YEAR', columns='RESPONSIBLE DISTRICT', values='VC')
# display(dfp)
RESPONSIBLE DISTRICT 04 - AMARILLO 05 - LUBBOCK 08 - ABILENE 15 - SAN ANTONIO 18 - DALLAS 20 - BEAUMONT 21 - PHARR 22 - LAREDO 24 - EL PASO 25 - CHILDRESS
YEAR
2014 1407 1406 1485 1456 1392 1456 1499 1458 1394 1452
2015 1436 1423 1428 1441 1395 1400 1423 1442 1375 1399
2016 1480 1381 1393 1415 1446 1442 1414 1435 1452 1454
2017 1422 1388 1485 1447 1404 1401 1413 1470 1424 1426
2018 1479 1424 1384 1450 1390 1384 1445 1435 1478 1386
2019 1387 1317 1379 1457 1457 1476 1447 1459 1451 1406
2020 1462 1452 1454 1448 1441 1428 1411 1407 1402 1445
pandas.DataFrame.plot
Use kind='line' if a line plot is preferred.
# plot the dataframe
fig = dfp.plot(kind='bar', subplots=True, layout=(5, 5), figsize=(20, 20), legend=False)
seaborn.catplot
seaborn is a high-level API for matplotlib
This is the easiest way because the dataframe does not need to be reshaped.
p = sns.catplot(kind='count', data=df, col='RESPONSIBLE DISTRICT', col_wrap=5, x='YEAR', height=3.5, )
p.set_titles(row_template='{row_name}', col_template='{col_name}') # shortens the titles

Related

How to plot bar chart using seaborn after pandas.pivot

I'm having difficulties plotting my bar chart after I pivot my data as it can't seem to detect the column that I'm using for the x-axis.
This is the original data:
import pandas as pd
data = {'year': [2014, 2014, 2014, 2015, 2015, 2015, 2016, 2016, 2016, 2017, 2017, 2017, 2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020, 2021, 2021, 2021],
'sector': ['Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector',
'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice',
'Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice'],
'count': [861, 531, 2, 877, 606, 66, 899, 682, 112, 882, 765, 167, 960, 804, 203, 943, 834, 243, 1016, 876, 237, 1085, 960, 215]}
df = pd.DataFrame(data)
year sector count
0 2014 Public Sector 861
1 2014 Private Sector 531
2 2014 Not in Active Practice 2
3 2015 Public Sector 877
4 2015 Private Sector 606
5 2015 Not in Active Practice 66
6 2016 Public Sector 899
7 2016 Private Sector 682
8 2016 Not in Active Practice 112
9 2017 Public Sector 882
10 2017 Private Sector 765
11 2017 Not in Active Practice 167
12 2018 Public Sector 960
13 2018 Private Sector 804
14 2018 Not in Active Practice 203
15 2019 Public Sector 943
16 2019 Private Sector 834
17 2019 Not in Active Practice 243
18 2020 Public Sector 1016
19 2020 Private Sector 876
20 2020 Not in Active Practice 237
21 2021 Public Sector 1085
22 2021 Private Sector 960
23 2021 Not in Active Practice 215
After pivoting the data:
sector Not in Active Practice Private Sector Public Sector
year
2014 2 531 861
2015 66 606 877
2016 112 682 899
2017 167 765 882
2018 203 804 960
2019 243 834 943
2020 237 876 1016
2021 215 960 1085
After tweaking the data to get the columns I want:
sector Private Sector Public Sector Total in Practice
year
2014 531 861 1392
2015 606 877 1483
2016 682 899 1581
2017 765 882 1647
2018 804 960 1764
2019 834 943 1777
2020 876 1016 1892
2021 960 1085 2045
As you can see, after I have pivoted the data, there is an extra row on top of the year called 'sector'.
sns.barplot(data=df3, x='year', y="Total in Practice")
This is the code that I'm using to plot the graph but python returns with:
<Could not interpret input 'year'>
I've tried using 'sector' instead of 'year' but it returns with the same error.
I have copied the original data, then do the same process as you described :
import pandas as pd
import seaborn as sns
mdic = {'year': [2014, 2014, 2014, 2015, 2015, 2015, 2016, 2016, 2016, 2017, 2017, 2017,
2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020, 2021, 2021, 2021],
'sector': ["Public Sector", "Private Sector", "Not in Active Practice", "Public Sector", "Private Sector", "Not in Active Practice",
"Public Sector", "Private Sector", "Not in Active Practice", "Public Sector", "Private Sector", "Not in Active Practice",
"Public Sector", "Private Sector", "Not in Active Practice", "Public Sector", "Private Sector", "Not in Active Practice",
"Public Sector", "Private Sector", "Not in Active Practice", "Public Sector", "Private Sector", "Not in Active Practice"],
'count' : [861, 531, 2, 877, 606, 66, 899, 682, 112, 882, 765, 167, 960, 804, 203, 943, 834, 243, 1016, 876, 237, 1085, 960, 215]}
data = pd.DataFrame(mdic)
data_pivot = data.pivot(index='year', columns='sector', values='count')
df3= data_pivot.drop('Not in Active Practice', axis=1)
df3['Total in Practice'] = df3.sum(axis=1)
Then got the same result as :
df3
sector Private Sector Public Sector Total in Practice
year
2014 531 861 1392
2015 606 877 1483
2016 682 899 1581
2017 765 882 1647
2018 804 960 1764
2019 834 943 1777
2020 876 1016 1892
2021 960 1085 2045
The reason that you are getting the error is that when you created df3, the colum year is changed to index, here are three solutions:
First is as commented by #tdy
sns.barplot(data=df3.reset_index(), x='year', y='Total in Practice')
Second is:
sns.barplot(data=df3, x=df3.index, y="Total in Practice")
Third is when you do the pivoting add reset_index() and do the sum for specified columns:
data_pivot = data.pivot(index='year', columns='sector', values='count').reset_index()
df3= data_pivot.drop('Not in Active Practice', axis=1)
df3['Total in Practice'] = df3[['Public Sector','Private Sector']].sum(axis=1)
Then you can do bar plot with your code :
ax = sns.barplot(data=df3, x='year', y="Total in Practice")
ax.bar_label(ax.containers[0])
You get the figure :
If you’re going to plot a pivoted (wide form) dataframe, then plot directly with pandas.dataframe.plot, which works with 'year' as the index. Leave the data in long form (as specified in the data parameter documentation) when using seaborn. Both pandas and seaborn use matplotlib.
seaborn doesn't recognize 'year' because it's in the dataframe index, it's not a column, as needed by the API.
It's not necessary to calculate a total column because this can be added to the top of stacked bars with matplotlib.pyplot.bar_label.
See this answer for a thorough explanation of using .bar_label.
Manage the DataFrame
# select the data to not include 'Not in Active Practice'
df = df[df.sector.ne('Not in Active Practice')]
Plot long df with seaborn
As shown in this answer, seaborn.histplot, or seaborn.displot with kind='hist', can be used to plot a stacked bar.
# plot the data in long form
fig, ax = plt.subplots(figsize=(9, 7))
sns.histplot(data=df, x='year', weights='count', hue='sector', multiple='stack', bins=8, discrete=True, ax=ax)
# iterate through the axes containers to add bar labels
for c in ax.containers:
# add the section label to the middle of each bar
ax.bar_label(c, label_type='center')
# add the label for the total bar length by adding only the last container to the top of the bar
_ = ax.bar_label(ax.containers[-1])
Plot pivoted (wide) df with pandas.DataFrame.plot
# pivot the dataframe
dfp = df.pivot(index='year', columns='sector', values='count')
# plot the dataframe
ax = dfp.plot(kind='bar', stacked=True, rot=0, figsize=(9, 7))
# add labels
for c in ax.containers:
ax.bar_label(c, label_type='center')
_ = ax.bar_label(ax.containers[-1])

How do I groupby, count or sum and then plot two lines in Pandas?

Say I have the following dataframes:
Earthquakes:
latitude longitude place year
0 36.087000 -106.168000 New Mexico 1973
1 33.917000 -90.775000 Mississippi 1973
2 37.160000 -104.594000 Colorado 1973
3 37.148000 -104.571000 Colorado 1973
4 36.500000 -100.693000 Oklahoma 1974
… … … … …
13941 36.373500 -96.818700 Oklahoma 2016
13942 36.412200 -96.882400 Oklahoma 2016
13943 37.277167 -98.072667 Kansas 2016
13944 36.939300 -97.896000 Oklahoma 2016
13945 36.940500 -97.906300 Oklahoma 2016
and Wells:
LAT LONG BBLS Year
0 36.900324 -98.218260 300.0 1977
1 36.896636 -98.177720 1000.0 2002
2 36.806113 -98.325840 1000.0 1988
3 36.888589 -98.318530 1000.0 1985
4 36.892128 -98.194620 2400.0 2002
… … … … …
11117 36.263285 -99.557631 1000.0 2007
11118 36.263220 -99.548647 1000.0 2007
11119 36.520160 -99.334183 19999.0 2016
11120 36.276728 -99.298563 19999.0 2016
11121 36.436857 -99.137391 60000.0 2012
How do I manage to make a line graph showing the number of BBLS per year (from Wells), and the number of Earthquakes that occurred in a year (from Earthquakes), where the x-axis shows the year since 1980 and the y1-axis shows the sum of BBLS per year, while y2-axis shows the number of earthquakes.
I believe I need to make a groupby, count(for earthquakes) and sum(for BBLS) in order to make the plot but I really tried so many codings and I just don't get how to do it.
The only one that kinda worked was the line graph for earthquakes as follows:
Earthquakes.pivot_table(index=['year'],columns='type',aggfunc='size').plot(kind='line')
Still, for the line graph for BBLS nothing has worked
Wells.pivot_table(index=['Year'],columns='BBLS',aggfunc='count').plot(kind='line')
This one either:
plt.plot(Wells['Year'].values, Wells['BBL'].values, label='Barrels Produced')
plt.legend() # Plot legends (the two labels)
plt.xlabel('Year') # Set x-axis text
plt.ylabel('Earthquakes') # Set y-axis text
plt.show() # Display plot
This one from another thread either:
fig, ax = plt.subplots(figsize=(10,8))
Earthquakes.plot(ax = ax, marker='v')
ax.title.set_text('Earthquakes and Injection Wells')
ax.set_ylabel('Earthquakes')
ax.set_xlabel('Year')
ax.set_xticks(Earthquakes['year'])
ax2=ax.twinx()
ax2.plot(Wells.Year, Wells.BBL, color='c',
linewidth=2.0, label='Number of Barrels', marker='o')
ax2.set_ylabel('Annual Number of Barrels')
lines_1, labels_1 = ax.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
lines = lines_1 + lines_2
labels = labels_1 + labels_2
ax.legend(lines, labels, loc='upper center')
Input data:
>>> df2 # Earthquakes
year
0 2007
1 1974
2 1979
3 1992
4 2006
.. ...
495 2002
496 2011
497 1971
498 1977
499 1985
[500 rows x 1 columns]
>>> df1 # Wells
BBLS year
0 16655 1997
1 7740 1998
2 37277 2000
3 20195 2014
4 11882 2018
.. ... ...
495 30832 1981
496 24770 2018
497 14949 1980
498 24743 1975
499 46933 2019
[500 rows x 2 columns]
Prepare data to plot:
data1 = df1.value_counts("year").sort_index().rename("Earthquakes")
data2 = df2.groupby("year")["BBLS"].sum()
Simple plot:
ax1 = data1.plot(legend=data1.name, color="blue")
ax2 = data2.plot(legend=data2.name, color="red", ax=ax1.twinx())
Now, you can do whatever with the 2 axes.
A more controlled chart
# Figure and axis
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
# Data
line1, = ax1.plot(data1.index, data1.values, label="Earthquakes", color="b")
line2, = ax2.plot(data2.index, data2.values / 10**6, label="Barrels", color="r")
# Legend
lines = [line1, line2]
ax1.legend(lines, [line.get_label() for line in lines])
# Titles
ax1.set_title("")
ax1.set_xlabel("Year")
ax1.set_ylabel("Earthquakes")
ax2.set_ylabel("Barrels Produced (MMbbl)")

How to find coordinates from a list of addresses in a dataframe

I am trying to create 2 columns in my dataframe for Longitude and Latitude which I want to find by using my address column called 'Details'.
I have tried from
geopy.extra.rate_limiter import RateLimiter
locator=Nominatim(user_agent="MyGeocoder")
results['location']=results['Details'].apply
results['point']=results['location'].apply(lambda loc:tuple(loc['point']) if loc else None)
results[['latitude', 'longitude',]]=pd.DataFrame(results['point'].tolist(), index=results.index)
But this gives the error "method object is not subscriptable"
I want to create a loop to get all coordinates for each address
Details Sale Price Post Code Year Sold
1 53 Eastbury Grove, London, W4 2JT Flat, Lease... 450000.0 W4 2020
2 Flat 148 Wedgwood House Lambeth Walk, London, ... 325000.0 E11 2020
3 63 Russell Road, Wimbledon, London, SW19 1QN ... 800000.0 W19 2020
4 Flat 2 9 Queens Gate Place, London, SW7 5NX F... 400000.0 W7 2020
5 83 Chingford Mount Road, London, E4 8LU Freeh... 182000.0 E4 2020
... ... ... ... ...
47 702 Rutherford Heights Rodney Road, London, SE... 554750.0 E17 2015
48 Flat 48 Highlands Court Highland Road, London,... 340000.0 E19 2015
49 5 Mount Nod Road, London, SW16 2LQ Flat, Leas... 395000.0 W16 2015
50 6 Woodmill Street, London, SE16 3GG Terraced,... 1010000.0 E16 2015
51 402 Rutherford Heights Rodney Road, London, SE... 403200.0 E17 2015
300 rows × 4 columns
Try this
import pandas as pd
import geopandas
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
locator = Nominatim(user_agent="myGeocoder")
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
def lat_long(row):
loc = locator.geocode(row["Details"])
row["latitude"] = loc.latitude
row["longitude"] = loc.longitude
return row
results.apply(lat_long, axis=1)

How to resolve, list index out of range, from scraping website?

from bs4 import BeautifulSoup
import pandas as pd
with open("COVID-19 pandemic in the United States - Wikipedia.htm", "r", encoding="utf-8") as fd:
soup=BeautifulSoup(fd)
print(soup.prettify())
all_tables = soup.find_all("table")
print("The total number of tables are {} ".format(len(all_tables)))
data_table = soup.find("div", {"class": 'mw-stack stack-container stack-clear-right mobile-float-reset'})
print(type(data_table))
sources = data_table.tbody.findAll('tr', recursive=False)[0]
sources_list = [td for td in sources.findAll('td')]
print(len(sources_list))
data = data_table.tbody.findAll('tr', recursive=False)[1].findAll('td', recursive=False)
data_tables = []
for td in data:
data_tables.append(td.findAll('table'))
header1 = [th.getText().strip() for th in data_tables[0][0].findAll('thead')[0].findAll('th')]
header1
For some reason, the last line with header one gives me an error, "list index out of range". I am not too sure what is causing this error to happen, but I know I need this line. Here is a link to the website I am using for the data, https://en.wikipedia.org/wiki/COVID-19_pandemic_in_the_United_States. The specific table I want is the one that is below the horizontal bar chart.
Traceback
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-47-67ef2aac7bf3> in <module>
28 data_tables.append(td.findAll('table'))
29
---> 30 header1 = [th.getText().strip() for th in data_tables[0][0].findAll('thead')[0].findAll('th')]
31
32 header1
IndexError: list index out of range
Use pandas.read_html
Read HTML tables into a list of DataFrame objects.
This answer side-steps the question to provide a more efficient method for extracting tables from Wikipedia and gives the OP the desired end result.
The following code will more easily get the desired table from the Wikipedia page.
.read_html will return a list of dataframes.
The table you're interested in, is at index 4
Clean the table
Select the rows and columns with valid data.
This method does return the table headers, but the column names are multi-level so we'll rename them.
Before renaming the columns, if you need the original data from the column names, use us_covid_data.columns which will return a list of tuples with all the column name values.
import pandas as pd
# get list of dataframes and select index 4
us_covid_data = pd.read_html('https://en.wikipedia.org/wiki/COVID-19_pandemic_in_the_United_States')[4]
# select rows and columns
us_covid_data = us_covid_data.iloc[0:56, 1:6]
# rename columns
us_covid_data.columns = ['state_territory', 'cases', 'deaths', 'recovered', 'hospitalized']
# display(us_covid_data)
state_territory cases deaths recovered hospitalized
0 Alabama 45785 1033 22082 2961
1 Alaska 1184 17 560 78
2 American Samoa 0 0 – –
3 Arizona 116892 2082 – 5272
4 Arkansas 24253 292 17834 1604
5 California 296499 6711 – –
6 Colorado 34316 1704 – 5527
7 Connecticut 46976 4338 – –
8 Delaware 12293 512 6778 –
9 District of Columbia 10569 561 1465 –
10 Florida 244151 4102 – 15150
11 Georgia 111211 2965 – 11919
12 Guam 1272 6 179 –
13 Hawaii 1012 19 746 116
14 Idaho 8222 94 2886 350
15 Illinois 151767 7144 – –
16 Indiana 49560 2698 36788 7139
17 Iowa 31906 725 24242 –
18 Kansas 17618 282 – 1269
19 Kentucky 17526 623 4785 2662
20 Louisiana 66435 3296 43026 –
21 Maine 3440 110 2787 354
22 Maryland 70497 3246 – 10939
23 Massachusetts 111110 8296 88725 10985
24 Michigan 73403 6225 52841 –
25 Minnesota 38606 1511 33907 4112
26 Mississippi 31257 1114 22167 2881
27 Missouri 24985 1077 – –
28 Montana 1249 23 678 89
29 Nebraska 20053 286 14641 1224
30 Nevada 22930 537 – –
31 New Hampshire 5914 382 4684 558
32 New Jersey 174628 15479 31014 –
33 New Mexico 14549 539 6181 2161
34 New York 400299 32307 71371 –
35 North Carolina 81331 1479 55318 –
36 North Dakota 3858 89 3350 218
37 Northern Mariana Islands 31 2 19 –
38 Ohio 57956 2927 – 7292
39 Oklahoma 16362 399 12432 1676
40 Oregon 10402 218 2846 1069
41 Pennsylvania 93876 6880 – –
42 Puerto Rico 8714 157 – –
43 Rhode Island 16991 960 – 1922
44 South Carolina 47214 838 – –
45 South Dakota 7105 97 6062 689
46 Tennessee 51509 646 31020 2860
47 Texas 240111 3013 122996 9610
48 Virgin Islands 112 6 79 –
49 Utah 25563 190 14448 1565
50 Vermont 1251 56 1022 –
51 Virginia 66740 1881 – 9549
52 Washington 38517 1370 – 4463
53 West Virginia 3461 95 2518 –
54 Wisconsin 35318 805 25542 3574
55 Wyoming 1675 20 1172 253
Addressing the original issue:
data is an empty list generated from data_table.tbody.findAll('tr', recursive=False)[1].findAll('td', recursive=False)
With data = data_table.tbody.findAll('tr', recursive=False)[1] and then data = [v for v in data.get_text().split('\n') if v], you will get the headers.
The output of data will be ['U.S. state or territory[i]', 'Cases[ii]', 'Deaths', 'Recov.[iii]', 'Hosp.[iv]', 'Ref.']
Since data_tables is generated from iterating through data, it is also empty.
header1 is generated from iterating data_tables[0], so IndexError occurs because data_tables is empty.

No tables found error when making an AJAX request

I am trying to scrape the results table from the following url: https://utmbmontblanc.com/en/page/107/results.html
However when I run my code it says 'No Tables Found'
import pandas as pd
url = 'https://utmbmontblanc.com/en/page/107/results.html'
data = pd.read_html(url, header = 0)
data.head()
ValueError: No tables found
Having used developer tools I know that there is definitely a table in the html code. Why is it not being found? Any help is greatly appreciated. Thanks in advance
build URL for Ajax request, for 2017 - CCC is like this
url = 'https://.......com/result.php?mode=edPass&ajax=true&annee=2017&course=ccc'
data = pd.read_html(url, header = 0)
print(data[0])
You can also use selenium if you are unable to find any other hacks.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep
from bs4 import BeautifulSoup as BSoup
import pandas as pd
url = "https://utmbmontblanc.com/en/page/107/results.html"
driver = webdriver.Chrome("/home/bitto/chromedriver")#change this to your chromedriver path
year = 2017
driver.get(url)
element = WebDriverWait(driver, 10).until(
#changes div[#class='bloc'] to change year - [1] for 2018, [2] for 2017 etc
#change index of div[#class='row'] - [1], [2] for TDS etc
#change #value of option match your preferred option's value - you can find this from the inspect tool - First two are Scratch and ScratchH
EC.presence_of_element_located((By.XPATH, "//div[#class='bloc'][2]/div[#class='row'][4]/span[#class='selectbutton']/select[#name='cat'][1]/option[#value='Scratch']"))
)
element.click()#select option
#make relevant changes you made in top here also
driver.find_element_by_xpath("//div[#class='bloc'][2]/div[#class='row'][4]/span[#class='selectbutton']/input").click();#click go
sleep(10)#not preferred but will do for now
table=pd.read_html(driver.page_source)
print(table)
Output
[ GeneralRanking Family name First name Club Cat. ... Time Difference/ 1st Nationality
0 1 3001 - HAWKS Hayden HOKA ONE ONE SEH ... 10:24:30 00:00:00 United States
1 2 3018 - ŚWIERC Marcin SALOMON SUUNTO TEAM POLAND SEH ... 10:42:49 00:18:19 Poland
2 3 3005 - POMMERET Ludovic TEAM HOKA V1H ... 10:50:47 00:26:17 France
3 4 3214 - EVANS Thomas COMPRESS SPORT SEH ... 10:57:44 00:33:14 United Kingdom
4 5 3002 - OWENS Tom SALOMON SEH ... 11:03:48 00:39:18 United Kingdom
5 6 3011 - JONSSON Thorbergur 66 NORTH SEH ... 11:14:22 00:49:52 Iceland
6 7 3026 - BOUVIER-GAZ Nicolas TEAM NEW BALANCE SEH ... 11:18:33 00:54:03 France
7 8 3081 - JONES Michael WWW.APEXRUNNING.CO SEH ... 11:31:50 01:07:20 United Kingdom
8 9 3020 - COLLET Aurélien HOKA ONE ONE SEH ... 11:33:10 01:08:40 France
9 10 3009 - MARAVILLA Jorge HOKA ONE ONE V1H ... 11:36:14 01:11:44 United States
10 11 3036 - PERRILLAT Christophe SEH ... 11:40:05 01:15:35 France
11 12 3070 - FRAGUELA BREIJO Alejandro STUDIO54 V1H ... 11:40:11 01:15:41 Spain
12 13 3092 - AIGROZ Mike TRUST SEH ... 11:41:53 01:17:23 Switzerland
13 14 3021 - O'LEARY Paddy THE NORTH FACE SEH ... 11:47:04 01:22:34 Ireland
14 15 3065 - PÉREZ TORREGLOSA Juan CLUB ULTRATRAIL ... SEH ... 11:47:51 01:23:21 Spain
15 16 3031 - SÁNCHEZ CEBRIÁN Miguel Ángel LURBEL-LI... V1H ... 11:49:15 01:24:45 Spain
16 17 3062 - ANDREWS Justin SEH ... 11:49:47 01:25:17 United States
17 18 3039 - PIANA Giulio TEAM MUD AND SNOW SEH ... 11:50:23 01:25:53 Italy
18 19 3047 - RONIMOISS Andris Inov8 / OSveikals.lv ... SEH ... 11:52:25 01:27:55 Latvia
19 20 3052 - DURAND Regis TEAM TRAIL ISOSTAR V1H ... 11:56:40 01:32:10 France
20 21 3027 - SANDES Ryan SALOMON SEH ... 12:04:39 01:40:09 South Africa
21 22 3014 - EL MORABITY Rachid ULTRA TRAIL ATLAS T... SEH ... 12:10:01 01:45:31 Morocco
22 23 3067 - JONES Harry RUNIVORE SEH ... 12:10:12 01:45:42 United Kingdom
23 24 3030 - CLAVERY Erik - SEH ... 12:12:56 01:48:26 France
24 25 3056 - JIMENEZ LLORENS Juan Maria GREEN POWER... SEH ... 12:13:18 01:48:48 Spain
25 26 3024 - GALLAGHER Clare THE NORTH FACE SEF ... 12:13:57 01:49:27 United States
26 27 3136 - ASSEL Garry LICENCE INDIVIDUELLE LUXEM... SEH ... 12:20:46 01:56:16 Luxembourg
27 28 3071 - RIGODANZA Francesco SPIRITO TRAIL TEAM SEH ... 12:22:49 01:58:19 Italy
28 29 3118 - POLASZEK Christophe CHARTRES VERTICAL V1H ... 12:24:49 02:00:19 France
29 30 3125 - CALERO RODRIGUEZ David Altmann Sports/... SEH ... 12:25:07 02:00:37 Spain
... ... ... ... ... ... ... ...
1712 1713 5734 - GOT Hang Fai V2H ... 26:25:01 16:00:31 Hong Kong, China
1713 1714 4154 - RAMOS Liliana NIKE RUNNING CLUB V3F ... 26:26:22 16:01:52 Argentina
1714 1715 5448 - BECKRICH Xavier PHOENIX57 V1H ... 26:26:45 16:02:15 France
1715 1716 5213 - BARBERIO ARNOULT Isabelle PHOENIX57 V1F ... 26:26:49 16:02:19 France
1716 1717 4704 - ZHANG Zheng XIAOMABENTENG SEH ... 26:28:37 16:04:07 China
1717 1718 5282 - GUISOLAN Frédéric SEH ... 26:28:46 16:04:16 Switzerland
1718 1719 5306 - MEDINA Rafael V1H ... 26:29:26 16:04:56 Mexico
1719 1720 5379 - PENTCHEFF Nicolas SEH ... 26:33:05 16:08:35 France
1720 1721 4665 - GONZALEZ SUANCES Israel BAR ES PUIG V1H ... 26:33:58 16:09:28 Spain
1721 1722 4389 - TONANNY Marie SEF ... 26:34:51 16:10:21 France
1722 1723 5616 - GLORIAN Thierry V2H ... 26:35:47 16:11:17 France
1723 1724 5684 - CHEUNG Ho FAITHWALKERS V1H ... 26:37:09 16:12:39 Hong Kong, China
1724 1725 5719 - GANDER Pascal JEFF B TRAIL SEH ... 26:39:04 16:14:34 France
1725 1726 4555 - JURGIELEWICZ Urszula SEF ... 26:39:44 16:15:14 Poland
1726 1727 4722 - HIDALGO José Miguel C.D. ATLETISMO SAN... V1H ... 26:40:27 16:15:57 Spain
1727 1728 4425 - JITTIWUTIKARN Gif V1F ... 26:41:02 16:16:32 Thailand
1728 1729 4556 - ZHU Jing SEF ... 26:41:12 16:16:42 China
1729 1730 4314 - HU Dongli V1H ... 26:41:27 16:16:57 China
1730 1731 4239 - DURET Estelle OXYGENE BELBEUF V1F ... 26:41:51 16:17:21 France
1731 1732 4525 - MAGLIERI Fabrice ATHLETIC CLUB PAYS DE... V1H ... 26:42:11 16:17:41 France
1732 1733 4433 - ANDERSEN Laura Jentsch RUN DEM CREW SEF ... 26:42:27 16:17:57 Denmark
1733 1734 4563 - CHEUNG Annie On Nai FAITHWALKERS V1F ... 26:45:35 16:21:05 Hong Kong, China
1734 1735 4355 - KHALED Naïm GENEVE AEROPORT SEH ... 26:47:50 16:23:20 Algeria
1735 1736 4749 - STELLA Sara COURMAYEUR TRAILERS V1F ... 26:48:07 16:23:37 Italy
1736 1737 4063 - LALIMAN Leslie SEF ... 26:48:09 16:23:39 France
1737 1738 5702 - BURKE Tony Alchester/CTR/Bicester Tri V2H ... 26:50:52 16:26:22 Ireland
1738 1739 5146 - OLIVEIRA Sandra BUDEGUITA RUNNERS V1F ... 26:52:23 16:27:53 Portugal
1739 1740 5545 - VELLANDI Emilio TEAM PEGGIORI SCARPA MICO V1H ... 26:55:32 16:31:02 Italy
1740 1741 5543 - GASPAROVIC Bernard STADE FRANCAIS V3H ... 26:56:31 16:32:01 France
1741 1742 4760 - MENDONCA Carine ASPTT COMPIEGNE V2F ... 27:19:15 16:54:45 Belgium
[1742 rows x 7 columns]]

Categories

Resources