How to access a subpage (same url different content) with BeautifulSoup? - python

Using BeautifulSoup on Python, I'm trying to scrape a subpage of this page
https://www.mmorpg-stat.eu/0_fiche_alliance.php?pays=5&ftr=500208.all&univers=_146
More precisely, the subpage titled
The problem is that by clicking on that button, the url doesn't change (is this called a subpage? If not what is it?) so I cannot access that page with
url = '...'
requests.get(url)
Looking at the browser console, the button code is
<td width="250" align="center" valign="middle" class="Style1_f_j barre_joueur1 fond_56_1" style="cursor:pointer;text-transform: uppercase" onclick="fcache12('faCacher');fcache13('ffond_gris');document.form1_2date.statview.value='2';document.forms['form1_2date'].submit();return false;">
<span style="color:#ffffff;"> Other information</span>
</td>
All I can understand is that when clicked, the button calls some fcache method.
How to access the subpage?

All I can understand is that when clicked, the button calls some fcache method.
onclick="fcache12('faCacher');fcache13('ffond_gris');document.form1_2date.statview.value='2';document.forms['form1_2date'].submit();return false;"
It actually calls two different methods: fcache12() and fcache13(). And then it finds a form in the page and submits it:
document.forms['form1_2date'].submit()
If you search 'form1_2date', you will find:
<form name="form1_2date" method="post">
So to simulate clicking on this button, you need to call requests.post() instead of requests.get(). You also need to determine the form values that should be passed in. These are determined by all of the <input> tags in the form.
Alternatively, you can use selenium or a similar library to simulate user interaction in a browser rather than trying to make the requests directly.

Related

Hi, I'm writing a bot in requests to fill out an HTML form. Have some questions about values and the payload

I created a program to fill out an HTML webpage form in Selenium, but now I want to change it to requests. However, I've come across a bit of a roadblock. I'm new to requests, and I'm not sure how to emulate a request as if a button had been pressed on the original website. Here's what I have so far -
import requests
import random
emailRandom = ''
for i in range(6):
add = random.randint(1,10)
emailRandom += str(add)
payload = {
'email':emailRandom+'#redacted',
'state_id':'34',
'tnc-optin':'on',
}
r= requests.get('redacted.com', data=payload)
The button I'm trying to "click" on the webpage looks like this -
<div class="button-container">
<input type="hidden" name="recaptcha" id="recaptcha">
<button type="submit" class="button red large">ENTER NOW</button>
</div>
What is the default/"clicked" value for this button? Will I be able to use it to submit the form using my requests code?
Using selenium and using requests are 2 different things, selenium uses your browser to submit the form via the html rendered UI, Python requests just submits the data from your python code without the html UI, it does not involve "clicking" the submit button.
The "submit" button in this case just merely triggers the browser to POST the form values.
However your backend will validate against the "recaptcha" token, so you will need to work around that.
Recommend u fiddling requests.
https://www.telerik.com/fiddler
And them recreating them.
James`s answer using selenium is slower than this.

Response code from a URL where location.href = '/';

Currently scraping a webpage using python to get a response code from a button within the page, however when inspecting element for this button the html code reads the following:
<div style="cursor: pointer;" onclick="javascript: location.href = '/';" id="TopPromotionMainArea"></div>
I'm quite new to this however other links within the same page have the full url showing after "href=" and when using the requests library I'm able to get the full url. Any idea why in the above example I have "href='/'" and is there a way how I can get the response code for this button?

Web Scraping with Python: Iinput text and click a button

I was doing some web scraping with python (Linkedin site) and got stuck with the following 2 issues: 1) How do I input text on a search bar? 2) How to click a button? First, this is the search bar code:
<input aria-autocomplete="list" autocomplete="off" spellcheck="false"
placeholder="Búsqueda" autocorrect="off" autocapitalize="off" id="a11y-
ember6214" role="combobox" class="ember-text-field ember-view" aria-
expanded="false">
To input the text I was using the xpath (and it works) but it changes every time I login into the site:
search = driver.find_element_by_xpath('//*[#id="a11y-ember997"]')
search.send_keys('MedMake')
So could I use instead part of the input bar code above so that I can rerun my script multiple times?
My second point is 2) how to click a button. Again I was using the xpath but it changes after every login. My code was:
button = driver.find_element_by_xpath('//*[#id="nav-search-controls-wormhole"]/button')
button.click()
I inspected the button code and I would instead like to use data-vertical="PEOPLE" or any other of this unique fields (the tag button is not enough since there are many buttons on Linkedin site). By the way,how are all these inner fields called? I believe part of my problem arises from the lack of html code understanding.
<button data-vertical="PEOPLE" data-control-
name="vertical_nav_people_toggle" data-ember-action="" data-ember-
action-8620="8620" data-is-animating-click="true">
Gente
</button>
If id attribute values are dynamic you can use other attributes with static values:
search = driver.find_element_by_xpath('//input[#placeholder="Búsqueda"]')
search.send_keys('MedMake')
button = driver.find_element_by_xpath('//button[normalize-space()="Gente"]')
button.click()
First one use xpath
//input[contains(#class,'ember-text-field')]
Second one use the xpath
//button[#class='vertical_nav_people_toggle']

Selenium to push button in form

Python: 3.4.1
Browser: Chrome
I'm trying to push a button which is located in a form using Selenium with Python. I'm fairly new to Selenium and HTML.
The HTML code is as follows:
<FORM id='QLf_437222' method='POST' action='xxxx'>
<script>document.write("<a href='javascript:void(0);' onclick='document.getElementById(\"QLf_437222\").submit();' title='xxx'>51530119</a>");</script>
<noscript><INPUT type='SUBMIT' value='51530119' title='xxx' name='xxxx'></noscript>
<INPUT type=hidden name="prodType" value="DDA"/>
<INPUT type=hidden name="BlitzToken" value="BlitzToken"/>
<INPUT type=hidden name="productInfo" value="40050951530119"/>
<INPUT type=hidden name="reDirectionURL" value="xxx"/>
</FORM>
I've been trying the following:
driver.execute("javascript:void(0)")
driver.find_element_by_xpath('//*[#id="QLf_437104"]/a').click()
driver.find_element_by_xpath('//*[#id="QLf_437104"]/a').submit()
driver.find_element_by_css_selector("#QLf_437104 > a").click()
driver.find_element_by_css_selector("#QLf_437104 > a").submit()
Python doesn't throw an exception, so it seems like I'm clicking something, but it doesn't do what I want.
In addition to this the webpage acts funny when the chrome driver is initialized from Selenium. When clicking the button in the initialized chrome driver, the webpage throws an error (888).
I'm not sure where to go from here. Might it be something with the hidden elements?
If I can provide additional information please let me know.
EDIT:
It looks like the form id changes sometimes.
What it sounds like you are trying to do, is to submit the form, right?
The <a> that you are pointing out is simply submitting that form. Since that is being injected via JavaScript, it's possible that it's not showing up when you try to click it. What i'd recommend, is doing:
driver.find_element_by_css_selector("form[id^='QLf']").submit()
That will avoid the button, and submit the appropriate form.
In the above CSS selector, i also used [id^= this means, find a <form> with an ID attribute that starts with QLf, because it looks like the numbers after, are automatically generated.

Using Python and Mechanize with ASP Forms

I'm trying to submit a form on an .asp page but Mechanize does not recognize the name of the control. The form code is:
<form id="form1" name="frmSearchQuick" method="post">
....
<input type="button" name="btSearchTop" value="SEARCH" class="buttonctl" onClick="uf_Browse('dledir_search_quick.asp');" >
My code is as follows:
br = mechanize.Browser()
br.open(BASE_URL)
br.select_form(name='frmSearchQuick')
resp = br.click(name='btSearchTop')
I've also tried the last line as:
resp = br.submit(name='btSearchTop')
The error I get is:
raise ControlNotFoundError("no control matching "+description) ControlNotFoundError: no control matching name 'btSearchTop', kind 'clickable'
If I print br I get this: IgnoreControl(btSearchTop=)
But I don't see that anywhere in the HTML.
Any advice on how to submit this form?
The button doesn't submit the form - it calls some javascript function.
Mechanize can't run javascript, so you can't use it to click that button.
The easy way out is to read that function yourself, and see what it does - if it just submits the form, then maybe you can get around it by submitting the form without clicking on anything.
you need to inspect element first, did mechanize recognize the form ?
for form in br.forms():
print form

Categories

Resources