I'm trying to do an accumulation in the XPath, is it possible?
Look at my code yet:
driver = webdriver.Chrome()
driver.get('http://www.imdb.com/user/ur33778891/watchlist?ref_=wt_nv_wl_all_0')
wait = (WebDriverWait, 10)
x = 1
while True:
try:
film = driver.find_element(By.XPATH, "((//h3[#class='lister-item-header']/a)[%d]" % x).text
x = x + 1
print(film)
The thing is, I'm trying to go to the IMDB website, inside a user watchlist, and get the name film by film with this method, but as I see it's not possible to use %d for this purpose, is there anything that the Selenium allows that could do the job?
PS: Nor string nor number is working.
The thing is: if you open the IMDB watchlist, there will be a list of films, and the XPath of then are the same //h3[#class='lister-item-header']/a, but I was thinking about how to select then individually, I know that it can be done like this:
//h3[#class='lister-item-header']/a [1] #this will select the first movie
//h3[#class='lister-item-header']/a [2] #this will select the second movie
Now, is there a way to do this automatically? I was thinking with the accumulators, instead of the [1] I was going to put the [x] and determine as x = x + 1.
Use for that the following code:
xpath = "//h3[#class='lister-item-header']/a[{}]".format(x)
After that use like this:
film = driver.find_element(By.XPATH, xpath).text
Hope it helps you!
As you are starting with x = 1 so you can use the index as follows :
film = driver.find_elements_by_xpath("//h3[#class='lister-item-header']/a")[x].get_attribute("innerHTML")
Related
Just like the title says, how do I write the code in python if I want to replace a part of the URL.
For this example replacing a specific part by 1, 2, 3, 4 and so on for this link (https://test.com/page/1), then doing something on said page and going to the next and repeat.
So, "open url > click on button or whatever > replace link by the new link with the next number in order"
(I know my code is a mess I am still a newbie, but I am trying to learn and I am adding whatever mess I've wrote so far to follow the posting rules)
PATH = Service("C:\Program Files (x86)\chromedriver.exe")
driver = webdriver.Chrome(service=PATH)
driver.maximize_window()
get = 1
url = "https://test.com/page/{get}"
while get < 5:
driver.get(url)
time.sleep(1)
driver.find_element_by_xpath("/html/body/div/div/div[2]/form/section[3]/input[4]").click()
get = get + 1
driver.get(url)
driver.close()
get = 1
url = f"https://test.com/page/{get}"
while get < 5:
driver.get(url)
driver.find_element_by_xpath("/html/body/div/div/div[2]/form/section[3]/input[4]").click()
print(get)
print(url)
get+=1
url = f"https://test.com/page/{get}"
To simply update url in a loop.
Outputs
1
https://test.com/page/1
2
https://test.com/page/2
3
https://test.com/page/3
4
https://test.com/page/4
Use the range() function and use String interpolation as follows:
for i in range(1,5):
print(f"https://test.com/page/{i}")
driver.get(f"https://test.com/page/{i}")
driver.find_element_by_xpath("/html/body/div/div/div[2]/form/section[3]/input[4]").click()
Console Output:
https://test.com/page/1
https://test.com/page/2
https://test.com/page/3
https://test.com/page/4
I am working on web automation with selenium and I need to find an element with the xpath. This by itself is not a problem but the code needs to run multiple times and when it does that the HTML xpath changes. This is also not a problem. I have used simple math to get the new xpath every time.
The xpathes look like this
1. run: '//*[#id="input-text-3"]'
2. run: '//*[#id="input-text-5"]'
3. run: '//*[#id="input-text-7"]' etc.
I solved this problem using this code:
y = 1
#Corme browser already defined and on website
while True:
mathop1 = y*2 + 1
xxpath = ""'//*[#id="input-text-' + str(mathop1) + '"]'""
xxpath1 = "'" + str(xxpath) + "'"
print(xxpath1)
Bezeichnung = driver.find_element_by_xpath(xxpath1)
Bezeichnung.send_keys(file1name)
y = y + 1
What this does is that every time the program loops it updates y so the xpath will be correct. I tried using the output from xxpath1 to find the element like you normally would and it works fine however as soon as I use the variable it does not work. Specifically, the problem is this I can't use the variable.
Bezeichnung = driver.find_element_by_xpath(xxpath1)
why does this not work?
First of all, I guess you have to put a wait condition there.
Also, I do not understand why are you using so much strings inside string and converting string to string again, so I removed that
y = 1
#Corme browser already defined and on website
while True:
mathop1 = y*2 + 1
xxpath = '//*[#id="input-text-{}"]'.format(mathop1)
Bezeichnung = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, xxpath)))
Bezeichnung.send_keys(file1name)
y = y + 1
I'm working on scraping a site that has a dropdown menu of hundreds of schools. I am trying to go through and grab tables for only schools from a certain district in the state. So far I have isolated the values for only those schools, but I've bee unable to replace the xpath values from what is stored in my dataframe/list.
Here is my code:
ousd_list = ousd['name'].to_list()
for i in range(0,129):
n = 0
driver.find_element_by_xpath(('"//option[#value="',ousd_list[n],']"'))
driver.find_elements_by_name("submit1").click()
table = driver.find_elements_by_id("ContentPlaceHolder1_grdDisc")
tdf = pd.read_html(table)
tdf.to_csv(index=False)
n += 1
driver.get('https://dq.cde.ca.gov/dataquest/Expulsion/ExpSearchName.asp?TheYear=2018-19&cTopic=Expulsion&cLevel=School&cName=&cCounty=&cTimeFrame=S')
I suspect the issue is on the find_element_by_xpath line, but I'm not sure how else I would go about resolving this issue. Any advice?
The mistake is not in the scraping part but your code logic, since you put n=0 in the beginning of your loop, it resets to 0 and every loop will just find your ousd_list[0].
Try,
ousd_list = ousd['name'].to_list()
for ousd_name in ousd_list :
driver.find_element_by_xpath(f'//option[#value="{ousd_name}"]')
driver.find_elements_by_name("submit1").click()
table = driver.find_elements_by_id("ContentPlaceHolder1_grdDisc")
tdf = pd.read_html(table)
tdf.to_csv(index=False)
driver.get('https://dq.cde.ca.gov/dataquest/Expulsion/ExpSearchName.asp?TheYear=2018-19&cTopic=Expulsion&cLevel=School&cName=&cCounty=&cTimeFrame=S')
I have a main page where there are links to 5 other pages with the following xpaths from tr[1] to tr[5].
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[1]/td[3]/div[1]/a
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[2]/td[3]/div[1]/a
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[3]/td[3]/div[1]/a
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[4]/td[3]/div[1]/a
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[5]/td[3]/div[1]/a
Inside every page there I have the following actions:
driver.find_element_by_name('key').send_keys('test_1')
driver.find_element_by_name('i18n[en_EN][value]').send_keys('Test 1')
# and at the end this takes me back to the main page again
driver.find_element_by_xpath('/html/body/div[3]/div[2]/div/div[3]/div/ul/li[2]/a').click()
How can I iterate so that the script will go through all 5 pages and do the above actions. Tried for loop but I guess I didn't do it right... any help would be very appreciated.
You can try this:
xpath = '/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[{}]/td[3]/div[1]/a'
for i in range(1, 6):
driver.find_element_by_xpath(xpath.format(i)).click()
seems like I figured it out so here is the answer which works for me now.
wls = ['/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[1]/td[3]/div[1]/a',
'/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[2]/td[3]/div[1]/a',
'/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[3]/td[3]/div[1]/a',
'/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[4]/td[3]/div[1]/a',
'/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[5]/td[3]/div[1]/a']
for i in wls:
driver.find_element_by_xpath(i).click()
below
template = '/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[{}]/td[3]/div[1]/a'
for x in range(1,6):
a = template.format(x)
print(a)
# do what you need to do with the 'a' element.
output
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[1]/td[3]/div[1]/a
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[2]/td[3]/div[1]/a
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[3]/td[3]/div[1]/a
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[4]/td[3]/div[1]/a
/html/body/div[3]/div[2]/div/div[5]/div/div[2]/table/tbody/tr[5]/td[3]/div[1]/a
When I run my code, I get the price of the hotel I have defined in the url and after that I get the prices of all the other hotels that come as a suggestion. In order to subset and pick the first output, I need to store the for-loop output in a single variable or as a list. How do I do that?
I am using python 3.6.5, windows 7 professional
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
chrome_path= r"C:\Users\Downloads\chromedriver_win32\chromedriver.exe"
dr = webdriver.Chrome(chrome_path)
dr.get("url")
hoteltrial = dr.find_elements_by_class_name("hotel-info")
for hoteltrial1 in hoteltrial:
nametrial = hoteltrial1.find_element_by_class_name("hotel-name")
print(nametrial.text + " - ")
try:
pricetrial = hoteltrial1.find_element_by_class_name("c-price")
price = pricetrial.find_element_by_css_selector("span.price-num")
currency = pricetrial.find_element_by_class_name("price-currency")
print(currency.text + price.text)
except NoSuchElementException:
print("sold")
The actual output looks somewhat like this and I need the price of only Langham
The Langham Hong Kong -
$272
Cordis Hong Kong -
$206
Island Shangri-La -
$881
What you are doing is overriding the variables you use in your for-loop. For every iteration, the new value found is assigned to the variable in the loop.
for i in range(5):
x = i
When you run this example and look at the value assigned to x after the for-loop, you'll see that the value is 4. You are doing the same in your code.
To solve this you can define a list outside of the for-loop and append the results to this list.
hotel = []
for i in range(5):
hotel.append(i)
After running the above code you will see that this results in a list.
hotel
[0,1,2,3,4]
You should do the same in you code.
hotellist = []
for hoteltrial1 in hoteltrial:
nametrial = hoteltrial1.find_element_by_class_name("hotel-name")
hName = nametrial.text + " - "
try:
pricetrial = hoteltrial1.find_element_by_class_name("c-price")
price = pricetrial.find_element_by_css_selector("span.price-num")
currency = pricetrial.find_element_by_class_name("price-currency")
result = hName + currency.text + price.text
hotellist.append(result)
except NoSuchElementException:
result = hName + "Sold"
hotellist.append(result)
After running this for-loop you will have a list with all the results found in each iteration of the loop. You could use a dictionary instead, so you could get each hotel and price by searching for the key.
Use dict:
hoteldict = {}
for hoteltrial1 in hoteltrial:
nametrial = hoteltrial1.find_element_by_class_name("hotel-name")
try:
pricetrial = hoteltrial1.find_element_by_class_name("c-price")
price = pricetrial.find_element_by_css_selector("span.price-num")
currency = pricetrial.find_element_by_class_name("price-currency")
hoteldict.update({nametrial.text:currency.text+price.text})
except NoSuchElementException:
hoteldict.update({nametrial.text:"Sold"})
For dictionary use update instead of append.
Access your hoteldict:
hoteldict["The Langham Hong Kong"] #Will return $272
I hope this helped you.
Kind regards,
Sam