I need to input text into the text box on this website:
http://www.link.cs.cmu.edu/link/submit-sentence-4.html
I then need the return page's html to be returned. I have looked at other solutions. But i am aware that there is no solution for all. I have seen selenium, but im do not understand its documentation and how i can apply it. Please help me out thanks.
BTW i have some experience with beautifulsoup, if it helps.I had asked before but requests was the only solution.I don't know how to use it though
First, imho automation via BeautifulSoup is overkill if you're looking at a single page. You're better off looking at the page source and get the form structure off it. Your form is really simple:
<FORM METHOD="POST"
ACTION="/cgi-bin/link/construct-page-4.cgi#submit">
<input type="text" name="Sentence" size="120" maxlength="120"></input><br>
<INPUT TYPE="checkbox" NAME="Constituents" CHECKED>Show constituent tree
<INPUT TYPE="checkbox" NAME="NullLinks" CHECKED>Allow null links
<INPUT TYPE="checkbox" NAME="AllLinkages" OFF>Show all linkages
<INPUT TYPE="HIDDEN" NAME="LinkDisplay" VALUE="on">
<INPUT TYPE="HIDDEN" NAME="ShortLength" VALUE="6">
<INPUT TYPE="HIDDEN" NAME="PageFile" VALUE="/docs/submit-sentence-4.html">
<INPUT TYPE="HIDDEN" NAME="InputFile" VALUE="/scripts/input-to-parser">
<INPUT TYPE="HIDDEN" NAME="Maintainer" VALUE="sleator#cs.cmu.edu">
<br>
<INPUT TYPE="submit" VALUE="Submit one sentence">
<br>
</FORM>
so you should be able to extract the fields and populate them.
I'd do it with curl and -X POST (like here -- see the answer too :)).
If you really want to do it in python, then you need to do something like POST using requests.
Pulled straight from the docs and changed to your example.
from selenium import webdriver
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
# go to the page
driver.get("http://www.link.cs.cmu.edu/link/submit-sentence-4.html")
# the page is ajaxy so the title is originally this:
print driver.title
# find the element that's name attribute is Sentence
inputElement = driver.find_element_by_name("Sentence")
# type in the search
inputElement.send_keys("You're welcome, now accept the answer!")
# submit the form
inputElement.submit()
This will at least help you input the text. Then, take a look at this example to retrieve the html.
Following OP's requirement of having the process in python.
I wouldn't use selenium, because it's launching a browser on your desktop and is overkill for just filling up a form and getting its reply (you could justify it if your page would have JS or ajax stuff).
The form request code could be something like:
import requests
payload = {
'Sentence': 'Once upon a time, there was a little red hat and a wolf.',
'Constituents': 'on',
'NullLinks': 'on',
'AllLinkages': 'on',
'LinkDisplay': 'on',
'ShortLegth': '6',
'PageFile': '/docs/submit-sentence-4.html',
'InputFile': "/scripts/input-to-parser",
'Maintainer': "sleator#cs.cmu.edu"
}
r = requests.post("http://www.link.cs.cmu.edu/cgi-bin/link/construct-page-4.cgi#submit",
data=payload)
print r.text
the r.text is the HTML body which you can parse via e.g. BeautifulSoup.
Looking at the HTML reply, I think your problem will be in processing the text within the <pre> tags, but that's an entirely different thing outside the scope of this question.
HTH,
Related
I have a problem with this checkbox. I tried to click searching element with id, name, XPath, CSS Selector and contains text and still I could not click on this checkbox. Additionally, I've tried with another site with similar HTML code and on this site, it was enough to look for id and click. Any ideas?
<div class="agree-box-term">
<input tabindex="75" id="agree" name="agree" type="checkbox" value="1">
<label for="agree" class="checkbox-special">* Zapoznałam/em się z Regulaminem sklepu internetowego i akceptuję jego postanowienia.<br></label>
</div>
Here is my Python code https://codeshare.io/5zo0Jj
I have used javaScript Executor and it clicks on the element.However I have also checked webdriver click is not working.
driver.execute_script("arguments[0].click();", driver.find_element_by_id("agree"))
I don't know why this is, but in my experience some boxes don't accept click but do accept a 'mousedown' trigger.
try:
driver.execute_script('$("div.agree-box-term input#agree").trigger("mousedown")')
This solution does rely on jquery being on the page, if it's not we can write it in javascript
r = driver.find_element_by_xpath("//*[#id="form-order"]/div[2]/div[4]/label")
r.click()
Does this work for you? Sometimes it's just a question of selecting the right xpath, or adding the brackets after click.
Does your code contain nested html tags? For example:
<html>
<div>
<p> Some text </p>
<html>
That block can't be traversed!
</html>
</div>
</html>
Anything inside the second HTML tags can't be traversed/accessed. Try to see if that's the case.
In any other case the following code ran perfectly fine for your snippet:
driver.find_element_by_css_selector('#agree').click()
I have some troubles locating a username field on a webpage.
Using find_element_by_name/class prompts me with a 'no such element' error.
After a lot of fiddling, I still can not get this to work. Have not had this problem on any other webpages where I used the same method. Hope anyone can help me out!
<input type="text" class="_ph6vk _o716c" aria-describedby="" aria-label="Telefoonnummer, gebruikersnaam of e-mailadres" aria-required="true" autocapitalize="off" autocorrect="off" autocomplete="username" maxlength="30" name="username" placeholder="Telefoonnummer, gebruikersnaam of e-mailadres" value="">
The HTML above represents the element which I want to locate.
In case of slow page load/render, instruct the driver to wait for 5 seconds (for the element to load):
driver.implicitly_wait(5).
Explicitly getting the input:
driver.find_element_by_xpath("//input[contains(#class, '_ph6vk')]")
Though the class name looks to be dynamically generated on each particular page load, in that case, you will have to count inputs on the page before wanted one:
driver.find_element_by_xpath("//input[1]")
or write there a full absolute XPath.
Try the following:
driver.find_element_by_css_selector("input._ph6vk._o716c")
this won't work:
find_element_by_class("_ph6vk _o716c")
as they are two different classes.
I have a HTML page containing a form with an tag. I want to set the value of the drop down in this tag using Selenium.
This is how I retrieve the input element:
driver.find_element_by_xpath("/html/body/div[2]/div/div/form/div/div[1]/div[3]/div[1]/div/div[1]/input")
I tried to set the value using select_month.send_keys("09") but this is not accepted by the web page when I try to submit the form so I need to find another method.
EDIT: Here is the HTML of the form, I have ensured that it is the right element in my x-path:
<input autocomplete="off" tabindex="-1" class="ui-select-search ui-select-toggle ng-pristine ng-valid ng-touched" ng-click="$select.toggle($event)" placeholder="Select month" ng-model="$select.search" ng-hide="!$select.searchEnabled || ($select.selected && !$select.open)" ng-disabled="$select.disabled" type="text">
After messing around a bit and incorporating the better practice presented by alecxe, this solution worked...
driver.find_element_by_xpath("//input[#placeholder='Select month']").click()
driver.find_element_by_xpath("//*[contains(text(), '09')]").click()
Python: 3.4.1
Browser: Chrome
I'm trying to push a button which is located in a form using Selenium with Python. I'm fairly new to Selenium and HTML.
The HTML code is as follows:
<FORM id='QLf_437222' method='POST' action='xxxx'>
<script>document.write("<a href='javascript:void(0);' onclick='document.getElementById(\"QLf_437222\").submit();' title='xxx'>51530119</a>");</script>
<noscript><INPUT type='SUBMIT' value='51530119' title='xxx' name='xxxx'></noscript>
<INPUT type=hidden name="prodType" value="DDA"/>
<INPUT type=hidden name="BlitzToken" value="BlitzToken"/>
<INPUT type=hidden name="productInfo" value="40050951530119"/>
<INPUT type=hidden name="reDirectionURL" value="xxx"/>
</FORM>
I've been trying the following:
driver.execute("javascript:void(0)")
driver.find_element_by_xpath('//*[#id="QLf_437104"]/a').click()
driver.find_element_by_xpath('//*[#id="QLf_437104"]/a').submit()
driver.find_element_by_css_selector("#QLf_437104 > a").click()
driver.find_element_by_css_selector("#QLf_437104 > a").submit()
Python doesn't throw an exception, so it seems like I'm clicking something, but it doesn't do what I want.
In addition to this the webpage acts funny when the chrome driver is initialized from Selenium. When clicking the button in the initialized chrome driver, the webpage throws an error (888).
I'm not sure where to go from here. Might it be something with the hidden elements?
If I can provide additional information please let me know.
EDIT:
It looks like the form id changes sometimes.
What it sounds like you are trying to do, is to submit the form, right?
The <a> that you are pointing out is simply submitting that form. Since that is being injected via JavaScript, it's possible that it's not showing up when you try to click it. What i'd recommend, is doing:
driver.find_element_by_css_selector("form[id^='QLf']").submit()
That will avoid the button, and submit the appropriate form.
In the above CSS selector, i also used [id^= this means, find a <form> with an ID attribute that starts with QLf, because it looks like the numbers after, are automatically generated.
I am attempting to scrape the following website flow.gassco.no as one of my first python projects. I need to bypass the splash screen which redirects to the main page. I have isolated the following action,
<form method="get" action="acceptDisclaimer">
<input type="submit" value="Accept"/>
<input type="button" name="decline" value="Decline" onclick="window.location = 'http://www.gassco.no'" />
</form>
In a browser appending 'acceptDisclaimer?' to the url redirects to the target flow.gassco.no. However if I try to replicate this in urllib, I appear to stay on the same page when outputting the source.
import urllib, urllib2
url="http://flow.gassco.no/acceptDisclaimer?"
url2="http://flow.gassco.no/"
#first pass to invoke disclaimer
req=urllib2.Request(url)
res=urllib2.urlopen(req)
#second pass to access main page
req1=urllib2.Request(url2)
res2=urllib2.urlopen(req1)
data=res2.read()
print data
I suspect that I have oversimplified the problem, but would appreciate any input into how I can accept the disclaimer and continue to output the main page source.
Use a cookiejar. See python: urllib2 how to send cookie with urlopen request
Open the main url first
Open the /acceptDisclaimer after that