Getting "StaleElementReferenceException" when accessing options in drop down menu - python

In Python and Selenium, I'm populating a form, submitting it, then scraping the resulting multi-page table that appears on the page underneath the form. After I scrape every page of this table, I reset the form and attempt to repopulate the form. However, a drop down menu is tripping up the code.
I've tried to make the driver wait for the drop down menu to reappear after I reset the form, but this doesn't help. I still receive the StaleReferenceElementException error on the if option.text == state line:
StaleElementReferenceException: Message: The element reference of <option> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
How do I submit the form over and over for different options within the drop down menu?
states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
'Colorado', 'Connecticut', 'Delaware', 'District of Columbia',
'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana',
'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland',
'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri',
'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey',
'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio',
'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina',
'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia',
'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']
# Construct browser and link
browser = webdriver.Firefox(executable_path='/usr/local/bin/geckodriver')
url = 'https://myaccount.rid.org/Public/Search/Member.aspx'
ignored_exceptions = (StaleElementReferenceException,)
# Navigate to link
browser.get(url)
try:
# For each state
for state in states:
print('Searching ' + state)
# Set category and select state menu
category = Select(browser.find_element_by_name('ctl00$FormContentPlaceHolder$Panel$categoryDropDownList'))
category.select_by_value('a027b6c0-07bb-4301-b9b5-1b38dcdc59b6')
state_menu = Select(WebDriverWait(browser, 10, ignored_exceptions=ignored_exceptions).until(EC.presence_of_element_located((By.ID, 'FormContentPlaceHolder_Panel_stateDropDownList'))))
options = state_menu.options
for option in options:
if option.text == state:
state_menu.select_by_value(option.get_attribute('value'))
browser.find_element_by_name('ctl00$FormContentPlaceHolder$Panel$searchButtonStrip$searchButton').click()
# Scrape the first page of results
results = []
curr_page = 1
onFirstPage = True
scrape_page(curr_page)
# Reset form
browser.find_element_by_name('ctl00$FormContentPlaceHolder$Panel$searchButtonStrip$resetButton').click()
break
finally:
pass

The moment you select the option, element references will update and you can't use the older references. Reason you are getting the exception is, you are trying to get the attribute from the option which no longer valid.
Rather using the iteration, I would use the xpath to select the option as shown below
state_menu = WebDriverWait(browser, 10, ignored_exceptions=ignored_exceptions).until(EC.presence_of_element_located((By.ID, 'FormContentPlaceHolder_Panel_stateDropDownList')))
#options = state_menu.options <== replace this line with below line
option = state_menu.find_element_by_xpath("//option[.='" + state + "']")
#for option in options: <== remove this line
# if option.text == state: <== remove this
option.click()
browser.find_element_by_name('ctl00$FormContentPlaceHolder$Panel$searchButtonStrip$searchButton').click()
# Scrape the first page of results
results = []
curr_page = 1
onFirstPage = True
scrape_page(curr_page)
# Reset form
browser.find_element_by_name('ctl00$FormContentPlaceHolder$Panel$searchButtonStrip$resetButton').click()

Related

What is the easiest way to check if a city name belongs to a given country?

I have two lists of city and country names, and I would like to check which city belong to which country. What is the easiest way to achieve that in python?
Please note that I have used till now GeoText to extract city and country names from a test but it doesn't tell me which city belongs to which country.
The problem can't be solved manually because the lists are long.
E.G.
country_list = ['china', 'india', 'canada', 'america', ...]
city_list = ['Mocoa', 'March', 'San Miguel', 'Neiva', 'Naranjito', 'San Fernando',
'Alliance', 'Progreso', 'NewYork', 'Toronto', ...]
you can try this code
import requests
import re
city_list = ['Jerusalem', 'Tel-Aviv', 'New York', 'London', 'Madrid', 'Alliance',
'Mocoa', 'March', 'San Miguel', 'Neiva', 'Naranjito', 'San Fernando',
'Alliance', 'Progreso', 'NewYork', 'Toronto']
city_country_dict = {}
country_city_dict = {}
for city in city_list:
response = requests.request("GET", f"https://www.geonames.org/search.html?q={city}&country=")
country = re.findall("/countries.*\.html", response.text)[0].strip(".html").split("/")[-1]
if country not in country_city_dict:
country_city_dict[country] = [city]
else:
country_city_dict[country].append(city)
city_country_dict[city] = country
this code make request to geoname with city name and than search for the first link to country, you can change this and use beautifulsoup to make it more elegant.
if you run this code on large list notice that it takes time because he wait for response from geoname!
example output:
city_country_dict = {'Jerusalem': 'israe', 'Tel-Aviv': 'israe', 'New York': 'united-states', 'London': 'united-kingdo', 'Madrid': 'spain', 'Alliance': 'united-states', 'Mocoa': 'colombia', 'March': 'switzerland', 'San Miguel': 'el-salvador', 'Neiva': 'colombia', 'Naranjito': 'puerto-rico', 'San Fernando': 'trinidad-and-tobago', 'Progreso': 'honduras', 'NewYork': 'united-kingdo', 'Toronto': 'canada'}
country_city_dict = {'israe': ['Jerusalem', 'Tel-Aviv'], 'united-states': ['New York', 'Alliance', 'Alliance'], 'united-kingdo': ['London', 'NewYork'], 'spain': ['Madrid'], 'colombia': ['Mocoa', 'Neiva'], 'switzerland': ['March'], 'el-salvador': ['San Miguel'], 'puerto-rico': ['Naranjito'], 'trinidad-and-tobago': ['San Fernando'], 'honduras': ['Progreso'], 'canada': ['Toronto']}
You can prepare a python script that will fetch the city info via one of the free APIs.
One of the options that I recommend is https://tequila.kiwi.com provided by Kiwi.com for free. You can and query their Locations API with 'term' parameter, which will give you the full details of the city that has the highest rank, based on search volume. One of the parameters of the returned database entry is the country.

How do i automatically update a dropdown selection widget when another selection widget is changed? (Python panel pyviz)

I have a Select widget that should give a different list of options whenever another Select widget is changed, so it updates whenever this other Select widget changes. How do I this in the example code below?
_countries = {
'Africa': ['Ghana', 'Togo', 'South Africa'],
'Asia' : ['China', 'Thailand', 'Japan'],
'Europe': ['Austria', 'Bulgaria', 'Greece']
}
continent = pn.widgets.Select(
value='Asia',
options=['Africa', 'Asia', 'Europe']
)
country = pn.widgets.Select(
value=_countries[continent.value][0],
options=_countries[continent.value]
)
#pn.depends(continent.param.value)
def _update_countries(continent):
countries = _countries[continent]
country.options = countries
country.value = countries[0]
pn.Row(continent, country)
So, it took me forever to find this out, but in your #pn.depends() you have to add argument watch=True, so it constantly listens if changes are happening and updates to your other list should be done.
In this case:
#pn.depends(continent.param.value, watch=True)
Whole example:
_countries = {
'Africa': ['Ghana', 'Togo', 'South Africa'],
'Asia' : ['China', 'Thailand', 'Japan'],
'Europe': ['Austria', 'Bulgaria', 'Greece']
}
continent = pn.widgets.Select(
value='Asia',
options=['Africa', 'Asia', 'Europe']
)
country = pn.widgets.Select(
value=_countries[continent.value][0],
options=_countries[continent.value]
)
#pn.depends(continent.param.value, watch=True)
def _update_countries(continent):
countries = _countries[continent]
country.options = countries
country.value = countries[0]
pn.Row(continent, country)
The example of the GoogleMapViewer on this page pointed me in the right direction:
Selector updates after another selector is changed
The same answer but then in the form of a Class:
class GoogleMapViewer(param.Parameterized):
continent = param.Selector(default='Asia', objects=['Africa', 'Asia', 'Europe'])
country = param.Selector(default='China', objects=['China', 'Thailand', 'Japan'])
_countries = {'Africa': ['Ghana', 'Togo', 'South Africa'],
'Asia' : ['China', 'Thailand', 'Japan'],
'Europe': ['Austria', 'Bulgaria', 'Greece']}
#param.depends('continent', watch=True)
def _update_countries(self):
countries = self._countries[self.continent]
self.param['country'].objects = countries
self.country = countries[0]
viewer = GoogleMapViewer(name='Google Map Viewer')
pn.Row(viewer.param)

Python is mixing two lists and I have no idea why

Here is the code:
import requests
import bs4
response = requests.get('http://discoverygc.com/forums/serverinterface.php?action=players_online') #Loads page
soup = bs4.BeautifulSoup(response.text)
table = soup.find("div", {"id": "forum"})
rowsNo = (str(table).count('<tr>') - 2) #Number of players online. Minus 2 to remove leading title and column description rows
players = systems = [] #Define lists
for i in range(3, (rowsNo + 3)):
rows = table.findAll('tr')[i]
cols = rows.findAll('td')
player = cols[0].get_text()
system = cols[1].get_text()
players.append(player)
systems.append(system)
print(players)
print(systems)
If I remove either players.append(player) or systems.append(system) the code works fine and outputs the correct list:
['-Vasqez-', '[-=XTF=-]Neon.Bunny-[R]', "[SV]-Valley'", '<-JohnyWalker->', '~VP)Bad.Tibira', 'Alkanius', 'Apex91', 'Araroba', 'Baldor', 'Benediction', 'Black_Bird', 'Boost', 'Caelius.Moya[X]', 'Core|APM-Maverick', 'Daftwagen', 'Dee.Leers', 'Emiko:Hayashi', 'Gamma-6', 'Gauri', 'Gigi.7', 'GMG|GTS-Komahashi-Maru', 'Grawmod', 'GrazySlon', 'Hunor', 'Jakob-Schleiter', 'Joyita', 'Judge_BigJo', 'JulyJalwa', 'Kruger|KMS-Lankow', 'Luxor', 'monitor91', 'Morgulis', 'Nuggets', 'OSI-Mendes', 'Ronny.Rochester', 'Samura|-Arata', 'Samura|-Ichikawa', 'Shpritzen', 'Stardrifter', 'The_Altair', 'The.Liner.of.Dreams', 'Tony.Sosa', 'Wilde.RNC-Nestor']
or:
['Omega-11', 'Omega-49', 'Pennsylvania', 'Magellan', 'Omicron Gamma', 'Kyushu', 'Pennsylvania', 'Kyushu', 'Omega-5', 'Manchester', 'Cassini', 'Newcastle', 'Connecticut', 'Omega-47', 'Stuttgart', 'Stuttgart', 'Munich', 'New York', 'Hudson', 'Sigma-13', 'Languedoc', 'Colorado', 'Virginia', 'Stuttgart', 'New London', 'Magellan', 'New York', 'New Tokyo', 'Manchester', 'New York', 'Pennsylvania', 'Omega-3', 'Omega-49', 'New Berlin', 'California', 'Nagano', 'New Berlin', 'Okinawa', 'Magellan', 'Texas', 'Ontario', 'New Berlin', 'Stuttgart']
However if I put both lines in it mixes the two together for both lists:
['-Vasqez-', 'Omega-11', "[SV]-Valley'", 'Omega-49', '<-JohnyWalker->', 'Pennsylvania', '=Z=Exositas', 'Magellan', '~VP)Death.Incarnator', 'Omicron Gamma', 'Alkanius', 'Shikoku', 'Apex91', 'Pennsylvania', 'Baldor', 'Kyushu', 'Benediction', 'Omega-5', 'Black_Bird', 'Manchester', 'Boost', 'Cassini', 'Caelius.Moya[X]', 'Connecticut', 'Core|APM-Maverick', 'Omega-47', 'Daftwagen', 'Stuttgart', 'Darf.Acour', 'Texas', 'Dee.Leers', 'New Berlin', 'Emiko:Hayashi', 'Munich', 'Gamma-6', 'New York', 'Gauri', 'Hudson', 'Gigi.7', 'Orkney', 'GMG|GTS-Komahashi-Maru', 'Colorado', 'Grawmod', 'Virginia', 'GrazySlon', 'Stuttgart', 'Hunor', 'Manchester', 'Jakob-Schleiter', 'New Berlin', 'Joyita', 'Magellan', 'Judge_BigJo', 'New York', 'Kruger|KMS-Lankow', 'New Tokyo', 'Luxor', 'Manchester', 'monitor91', 'New York', 'Morgulis', 'Pennsylvania', 'Nuggets', 'Omega-3', 'OSI-Mendes', 'Omega-49', 'Ronny.Rochester', 'California', 'Samura|-Arata', 'Nagano', 'Samura|-Ichikawa', 'New Berlin', 'Stardrifter', 'Okinawa', 'The_Altair', 'Magellan', 'Tony.Sosa', 'Ontario', 'Wilde.RNC-Nestor', 'Omega-7']
Why is this? I cannot see any reason why this should happen.
Don't do this, they will refer to the same list:
players = systems = [] #Define lists
But split them:
players = []
systems = [] #Define lists
Then you will have two separate lists.
Your style is used for creating name aliases of a list (in some cases, it might be useful), not to create two different lists
where you have players = systems = [] change it to separate assignments.
players = []
systems = []
If you want to keep it in one line:
players, systems = [], []

Random quiz generator

I have been following an example program from a tutorial book, the program is to take a dictionary with all 50 US states in and their capitals and then to create a random set of multiple choice A-D questions, these questions are then to be randomized and 3 different quizzes printed out into 3 different files. The answers for all the questions for each quiz are then to be printed out into an answers file to go with each questions file.
As a test Im only doing it with a range of 3 for now. When I run the program the files are created but only the 3rd one has the questions in its quiz file and only the 3rd answer file has its answers in too. Files 1 and 2 for the questions have the header section with the blank Name: and Date: but nothing else and their answer files are blank.
I have been over this several times now and can't figure out what the problem is.
Any input would be appreciated, thanks.
import random
# The quiz data. Keys are states and values are their capitals.
capitals = {'Alabama': 'Montgomery', 'Alaska': 'Juneau', 'Arizona':'Phoenix',
'Arkansas': 'Little Rock', 'California': 'Sacramento', 'Colorado':'Denver',
'Connecticut': 'Hartford', 'Delaware': 'Dover', 'Florida': 'Tallahassee',
'Georgia': 'Atlanta', 'Hawaii': 'Honolulu', 'Idaho': 'Boise', 'Illinois':
'Springfield', 'Indiana': 'Indianapolis', 'Iowa': 'Des Moines', 'Kansas':
'Topeka', 'Kentucky': 'Frankfort', 'Louisiana': 'Baton Rouge', 'Maine':
'Augusta', 'Maryland': 'Annapolis', 'Massachusetts': 'Boston', 'Michigan':
'Lansing', 'Minnesota': 'Saint Paul', 'Mississippi': 'Jackson', 'Missouri':
'Jefferson City', 'Montana': 'Helena', 'Nebraska': 'Lincoln', 'Nevada':
'Carson City', 'New Hampshire': 'Concord', 'New Jersey': 'Trenton',
'New Mexico': 'Santa Fe', 'New York': 'Albany', 'North Carolina': 'Raleigh',
'North Dakota': 'Bismarck', 'Ohio': 'Columbus', 'Oklahoma': 'Oklahoma City',
'Oregon': 'Salem', 'Pennsylvania': 'Harrisburg', 'Rhode Island': 'Providence',
'South Carolina': 'Columbia', 'South Dakota': 'Pierre', 'Tennessee':
'Nashville', 'Texas': 'Austin', 'Utah': 'Salt Lake City', 'Vermont':
'Montpelier', 'Virginia': 'Richmond', 'Washington': 'Olympia',
'West Virginia': 'Charleston', 'Wisconsin': 'Madison', 'Wyoming': 'Cheyenne'}
# Generate quiz files
for quizNum in range(3):
# Create the quiz and answer key files.
quizFile = open('capitalsquiz%s.txt' % (quizNum + 1), 'w')
answerKeyFile = open('capitalsquiz_answers%s.txt' % (quizNum + 1), 'w')
# Write out the header for the quiz.
quizFile.write('Name:\n\nDate:\n\nPeriod:\n\n')
quizFile.write((' ' * 20) + 'State Capitals Quiz (Form %s)' % (quizNum + 1))
quizFile.write('\n\n')
# Shuffle the order of the states.
states = list(capitals.keys())
random.shuffle(states)
#Loop through all 50 states, making a question for each.
for questionNum in range(50):
# Get right and wrong answers.
correctAnswer = capitals[states[questionNum]]
wrongAnswers = list(capitals.values())
del wrongAnswers[wrongAnswers.index(correctAnswer)]
wrongAnswers = random.sample(wrongAnswers, 3)
answerOptions = wrongAnswers + [correctAnswer]
random.shuffle(answerOptions)
# Write the question and the answer options to the quiz file.
quizFile.write('%s. What is the capital of %s?\n' % (questionNum + 1, states[questionNum]))
for i in range(4):
quizFile.write(' %s. %s\n' % ('ABCD'[i], answerOptions[i]))
quizFile.write('\n')
# Write the answer key to a file.
answerKeyFile.write('%s. %s\n' % (questionNum + 1, 'ABCD'[answerOptions.index(correctAnswer)]))
quizFile.close()
answerKeyFile.close()
Think about the iteration of your loops and the setting of the variables.
Did you mean for the for questionNum in range(50): to be a separate or inner loop to the for quizNum in range(3): loop, this may be an issue of indentation within your pyton file.
When your for questionNum in range(50): loop starts the value of quizFile and answerKeyFile are set to the last in the for quizNum in range(3): hence the writing to only the last file. At the time the for questionNum in range(50): loop starts the for quizNum in range(3): has finished
To solve:
Put your question making loop in your quiz file loop (indentation is the key)
for quizNum in range(3):
...
for questionNum in range(50):
It's because first you iterate over [0,1,2] and create all the files, and then, with quiz 3 open you iterate over writing the actual question/answers.
Put everything under one for questionNum in range(3).
EDIT: I see that it's because of you indentation, you exit the first for loop too soon.
See https://gist.github.com/Noxeus/dcb3898f601ef76fbf8f
I think everytime you call quizfile.write () ,it overwrites the previous text that was inserted in your file because you opened the file in w mode, try opening the file in append mode("a" in open() instead of "w"),it should work out.

Webscraping with Xpath in Python

From what I've seen the method to derive a path for Xpath to scrape a page is not totally clear to me. I'm trying to use Xpath in python to scrape the wikipedia article for states and capitals to get a list of states and a list of capitals, but so far I've had no luck when trying to figure out the correct path to use. I've tried inspecting the element and copying the Xpath there but I still have had no luck. I'm looking for someone to explain a method to figure out the correct xpath to use to grab certain elements in a page.
from lxml import html
import requests
page = requests.get('https://en.wikipedia.org/wiki/List_of_capitals_in_the_United_States')
tree = html.fromstring(page.text)
#creating list of states
state = tree.xpath('xpath')
#list of capitals
capital = tree.xpath('xpath')
print 'State: ', state
print 'Capital: ', capital
Two of the xpaths I've tried so far have been:
//*[#id="mw-content-text"]/table[1]/tbody/tr[1]/td[1]/a
//*[#id="mw-content-text"]/table[1]/tbody/tr[1]/td[2]
Start with an expression that will get you the table. Here's one that works:
>>> tree.xpath('//div[#id="mw-content-text"]/table[1]')
[<Element table at 0x7f9dd7322578>]
You want the first table in that div (hence the [1]) and there does not appear to be a tbody element there.
You could iterate over the rows in that table like this:
for row in tree.xpath('//div[#id="mw-content-text"]/table[1]/tr')[1:]:
Within that loop, the state name is:
row[0][0].text
That is the first child of the row (which is a <td> element), and then first child of that (which is an <a> element), and then the text content of that element.
And the capital is:
row[3][0].text
So:
>>> for row in tree.xpath('//div[#id="mw-content-text"]/table[1]/tr')[1:]:
... st = row[0][0].text
... cap = row[3][0].text
... print 'The capital of %s is %s' % (st, cap)
The capital of Alabama is Montgomery
The capital of Alaska is Juneau
The capital of Arizona is Phoenix
[...]
You can get all the state names like this:
>>> tree.xpath('//div[#id="mw-content-text"]/table[1]/tr/td[1]/a/text()')
['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']

Categories

Resources