Regex for more results locator on google scholar

Regex for more results locator on google scholar - python

I would like to click on more results on google scholar from Python software. But I cannot find the correct XPATH. I have located the results button but not the link to the next site which seems to be in the span one level below as can be seen from the picture. I have tried "//button[#aria-label='Next']" and "//button[#aria-label='Next']\span" and "//button[#aria-label='Next']\span\span[1]". why does this not work?
url = "https://scholar.google.ch/scholar?hl=en&as_sdt=0%2C5&q=bla&btnG="
driver = webdriver.Chrome("~/chromedriver")
driver.get(url)
driver.find_element_by_xpath("//button[#aria-label='Next']").click()
I get the error, element not clickable because I cannot access the position of the real button.
Below a screenshot of the node structure in the html.

Using chrome Dev-Tools, you can select the element and click on Copy XPath. This will give you a selector that is guaranteed to work with that element. I've attached an image that displays this:
This is what it returned: //*[#id="gs_n"]/center/table/tbody/tr/td[12]/a/b
However, this can be further tuned. After some analysis, I found that //*[(#id = "gs_n")]//a//b works just as well.
Since you are using the chrome driver, these values should work fine since the DevTools is the one that generated it.
Edit
I think that we were referring to different selectors, thus creating a problem for you. Consider the screenshot:
The blue highlight represents the element that I used while the green highlight represents what I think you focused on. As a general note, try to right click on the specific element you want when using Inspect Element.

Related

Automatic click with Selenium, using By.XPATH

Do you see this little blue logo in the image below? Using WebDriverWait I would like to automatically click on this blue logo, in order to open the list of those who have left the likes.
I used By.XPATH., More precisely: //*[#id="jsc_c_z"]/span[1]/span/span/div , but it doesn't work. NOTA :I've noticed that the IDs and ranges on this page look suspiciously obfuscated and so maybe they won't necessarily be the same every time. So maybe I don't think id = "jsc_c_z" that will be reliable. You may need to resort to using aria-labeld of the attribute on the target element div.
My code is:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[#id="jsc_c_z"]/span[1]/span/span/div'))).click()
This instead is the link I would like to open https://www.facebook.com/FranzKafkaAuthor/posts/3985338151528881.
IMPORTANT: Maybe you have to enter with the Facebook login, but who does not want to do so will post a screenshot
PART 1 (UPLOADED)
PART 2
I hope someone can help me. Thanks

Can you try this?
//span[#data-hover='tooltip']/span
When I was checking this, I couldn't find the div tag you mentioned.

Is there any way to click on "plain text" using selenium?

Apologies if this question was answered before, I want to click on an area in a browser with plain text using Selenium Webdriver in python
The code I'm using is:
element_plainText = driver.find_elements(By.XPATH, '//*[contains(#class, "WgFkxc")]')
element_plainText.click()
However this is returning "ElementNotInteractableException". Can anyone help me out with this?

Selenium is trying to be helpful here, by telling you why it won't click on the element; ElementNotInteractableException means it thinks that what you're trying to click on isn't clickable.
This usually happens because either:
The element isn't actually visible, or is disabled
Another element is "overlapping" the element, possibly invisibly
You're clicking something Selenium thinks won't do anything, like plain text
There's two things I'd try to get around this. Firstly, Actions. Selenium has an Action API you can use to cause specific UI events to occur. I'd suggest finding the co-ordinates of the text, then making Selenium click those co-ordinates instead of telling it to click the element. Read more about that API here.
Secondly, try clicking it with Javascript, using a Javascript Executor. That can often give you the same outcome as using Selenium directly, without it being so "helpful".

Obtaining XPATH for Selenium

In a previous question
a user provides the following solution to the problem.
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='title
login_title' and text()='Login']//following::div[1]//input[#class='text header_login_text_box
ignore_interaction']"))).send_keys("someemail#email.com")
However, when I go into my chrome inspect element, I get the following XPATH by going copy>XPATH, which when added like the following, no longer works. It also doesn't give an error, just no email is typed into the box.
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[#id='__w2_wHsckeIm21_email']"))).send_keys("someemail#email.com")
What's the difference between the two? Why does only the first one work and how do I obtain this long working version of xpath.

Well, not a concrete solution as such but do try out ChroPath Plugin.
Also available on Chrome Web Store.
https://autonomiq.io/chropath/

First of all you don't need such a long xpath to locate the email element. Simply you can use
//form[#class='inline_login_form']//input[#name='email']
And I don't recommend using id to identify in this case as it's dynamic (meaning the id will change each time you navigate to this page). So it's not a good idea to use the id to locate the element.
There are multiple ways to write locator for this element like
//form[#class='inline_login_form']//input[#name='email']
//input[#name='email'][#class='text header_login_text_box ignore_interaction']
//input[#name='email'][starts-with(#class,'text header_login_text_box')]
I don't want to keep on giving all the possible options. The idea I chosen the //form[#class='inline_login_form']//input[#name='email'] is, it's clear that I am locating the input element with name email under form. If you try to locate the element with only //input[#name='email'], then there are 2 elements and Selenium will pick the first element (which we don't want this case) and it's not intractable.
If you want to learn more about xpath and how to develop the correct xpath for your target element refer to this post

I suspect the id is not a stable selector for Quora.
When I try to repeat your steps today I find the XPath is slightly different, because the ID of the input field is different.
Today: //*[#id="__w2_wtEXFdHr21_email"]
In your example: //*[#id='__w2_wHsckeIm21_email']
XPath is loosely speaking a description of how you navigate the DOM to get to the element(s) of interest. There are many ways to get to a particular element. Chrome's dev tools will give you one way (or two if you count "Copy full XPath").
The question you linked has several answers that suggest different XPath expressions, and also CSS selectors. The ones looking for an input with name = email will find more than one element, where the input you're looking for is not the first.

How to select an HTML id in XPath in Python, using a wildcarded string?

For the moment, I'm looking to make a program for a repetitive action which we need to make something like 1000 times by hand otherwise.
This action is done throughout a web browser (I'm using Chrome). My actual issue is the XPATH selector is changing at every connection but only one number. So, I use the recognition on the webpage linked using Selenium and associated WebDrivers.
The fact is my code run sometimes when the selector has the right name.
Indeed, as the css selector is changing permanently, it happens that this is the right one !
So, after making a headless browser, login to the company webpage, I have to recognize then click on a specific object on the navigator :
The problematic code is the following:
wait.until(EC.presence_of_element_located((By.XPATH, '//*[#id="__xmlview0--settingsButton-img"]')))
OT = driver.find_element_by_xpath('//*[#id="__xmlview0--settingsButton-img"]')
OT.click()
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '#__select1-label')))
driver.save_screenshot("screenshot.png")
I have an idea but I don't know how to do it: Is it possible to add a random number instead of the 0 in xmlview0, which is the number issue within the CSS selector ?
I'm not a Python veteran and I really don't want to do the job by hand.

How can I get the text from a dialogue using selenium-python?

I want to crawl the dialogue text in a popup window. The problem is that after I triggered the link the window appears but it seems that the selenium driver cannot handle it automatically as I learned from other questions on this site by entering driver.window_handles.
The source of the trigger:
The value of len(driver.window_handles) is 1. I thought I can get the window element and then get the text via the get_attributes, fortunately I succeeded getting the element by
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
selenium.webdriver.remote.webelement.WebElement (session="f810cbbe-db43-4e8d-b484-664559ec8efc", element="{dd00e689-7991-44e9-85d3-76c69e79218f}")
But the sad thing is I don't know how to get all the stuff out from it since I don't know their attributes.
I'm not certain if it's a dialogue, a front end engineer told me that it looks like an animation. Anyway this is the source snippet:
PS: the browser is Firefox.
I thought it may violate the site's Acceptable Use Policy to crawl then I should hide some information. Sorry.

Once you have your parent element :
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
you can continue calling methods on this object, and in this order reach the children elements, you can use find element_by_xpath, or find element_by_class name, for example:
wd = driver.find_element_by_css_selector('div[node-type="repeat_list"]')
wd.find_element_by_class_name("list_box").find_element_by_class_name("list_ul").find_elements_by_class_name("list_li S_line1 clearfix")
and so on until you reach the desired element down the hierarchy and extract it's content as you wish.
I hope this helps!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regex for more results locator on google scholar - python

Related

Automatic click with Selenium, using By.XPATH

Is there any way to click on "plain text" using selenium?

Obtaining XPATH for Selenium

How to select an HTML id in XPath in Python, using a wildcarded string?

How can I get the text from a dialogue using selenium-python?

Categories

Resources