I'm wrote a bot that used selenium to scrape all needed data and performed a few simple tasks. I don't know why I didnt use http requests instead from the start but I am now trying to switch to that. One of the selenium functions used a simple driver.get(url) to trigger an action on the site. Using requests.get, however, does not work.
This selenium code worked
import time
from selenium import webdriver
AM4_URL = 'https://www.airline4.net/?gameType=app&uid=102692112805909972638&uid_token=8adee69e774d89fb6e9f903e7d2afc70&mail=bsgpricecheck#gmail.com&mail_token=286f8bd25bcc32f49a02036102ce072c&device=ios&version=6&FCM=daf5d0d8bf4d7962061eac3a8e4bffa770d6593f31fd5b070d690f244dfb40d1#'
def depart():
# Load driver and get login url
if pax_rep > 80:
driver = webdriver.Firefox(executable_path='C:\webdrivers\geckodriver.exe')
driver.get(AM4_URL)
driver.minimize_window()
driver.get("https://www.airline4.net/route_depart.php?mode=all&ids=x")
time.sleep(100)
def randfunc():
depart()
But now im trying to switch over to requests because all the other bot functions work with it. I tried this and it doesn't perform the action.
import requests
# I was able to combine the URLs into one. It still performs the action when on a browser.
dep_url = 'https://www.airline4.net/route_depart.php?mode=all&ids=x?gameType=app&uid=102692112805909972638&uid_token=8adee69e774d89fb6e9f903e7d2afc70&mail=bsgpricecheck#gmail.com&mail_token=286f8bd25bcc32f49a02036102ce072c&device=ios&version=6&FCM=daf5d0d8bf4d7962061eac3a8e4bffa770d6593f31fd5b070d690f244dfb40d1#'
requests.get(dep_url)
I figured this code would work because the url doesnt return any content. I thought it was using a GET request as a command.
I would also like to note, I got the route_depart.php url from an ajax button.
Here's the HTML from that
<div class="btn-group d-flex" role="group">
<button class="btn" style="display:none;" onclick="Ajax('def227_j22.php','runme');"></button>
<button class="btn w-100 btn-danger btn-xs" onclick="Ajax('route_depart.php?mode=all&ids=x','runme',this);">
<span class="glyphicons glyphicons-plane"></span> Depart <span id="listDepartAmount">5</span></button>
</div>
Related
I'm trying to trigger the download of a CSV using requests-html I believe when the button is clicked it triggers "export_csv.php"
<button class='tiny' type='submit' name='action' value='export' style='width:200px;'>Export All Fields</button> </form> <form name='export' method='POST' action='../export_csv.php'>
I'm just not sure how to trigger the php file with python. I don't have to do it in requests if there's a better way but I would like to avoid using selenium if possible.
I'd share the URL but it's an internal resource and not available on the web.
I am trying to web scraping a popular movie database on the Internet. After a few thousands of requests, my spider is detected and I have to manually click on a simple button to continue my data collection.
I have tried to automatize this process by clicking automatically in that button. For that purpose, I am using Selenium library. I have located the xpath of the button, but, for some reason, .click() method doesn't work.
Here is my code:
def click_button():
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
DRIVER_PATH = 'geckodriver'
driver = webdriver.Firefox(options=options, executable_path=DRIVER_PATH)
driver.get('https://www.filmaffinity.com/es/main.html')
element = driver.find_element(By.XPATH,"//input[#value='Send']")
element.click()
driver.quit()
Also, I have tried other common alternatives, such as waiting until the element gets clickable or visible, but this didn't work. I have also tried to execute the click as a javascript script.
This is the window I see when my spider is detected:
Too many request window (picture)
As you see, there is a reCaptcha-protection icon on the lower right corner, but I don't have to solve any captcha puzzle to confirm I am not a robot, I have just to click on the send button you can see on the picture.
The html which contains the button is the following one:
<div class="content">
<h1>Too many requests</h1>
<div class="image">
<img height="400" src="/images/too-many-request.jpg">
</div>
<div class="text">
Are you sure you do not dream of electric sheep?
</div>
<form name="toomanyrequest">
<div class="alert">
please enter the Captcha to continue.
</div>
<div>
<input type="submit" value="Send">
</div>
</form>
</div>
Which do you think is the trouble with my code or approach? I have to find a way to click on that button to continue my scraping, but I don't know what I am doing wrong (I have little experience at this field).
Maybe the xpath is not correct? Maybe the captcha protection is blocking the click action? I don't get any exception when I execute my code; but nothing happens and the "Too many request" window does not disappear.
Thank you very much.
Try this:
driver.execute_script("arguments[0].click()", element)
or:
driver.execute_script("arguments[0].click();", element)
I created a program to fill out an HTML webpage form in Selenium, but now I want to change it to requests. However, I've come across a bit of a roadblock. I'm new to requests, and I'm not sure how to emulate a request as if a button had been pressed on the original website. Here's what I have so far -
import requests
import random
emailRandom = ''
for i in range(6):
add = random.randint(1,10)
emailRandom += str(add)
payload = {
'email':emailRandom+'#redacted',
'state_id':'34',
'tnc-optin':'on',
}
r= requests.get('redacted.com', data=payload)
The button I'm trying to "click" on the webpage looks like this -
<div class="button-container">
<input type="hidden" name="recaptcha" id="recaptcha">
<button type="submit" class="button red large">ENTER NOW</button>
</div>
What is the default/"clicked" value for this button? Will I be able to use it to submit the form using my requests code?
Using selenium and using requests are 2 different things, selenium uses your browser to submit the form via the html rendered UI, Python requests just submits the data from your python code without the html UI, it does not involve "clicking" the submit button.
The "submit" button in this case just merely triggers the browser to POST the form values.
However your backend will validate against the "recaptcha" token, so you will need to work around that.
Recommend u fiddling requests.
https://www.telerik.com/fiddler
And them recreating them.
James`s answer using selenium is slower than this.
I'm new here and new to Selenium. I have an SPA that is written in AngularJS. I'm trying to test a view with Python webdriver but it can't find the elements in the routed page. My question is how can I test the routed page?
view<div id ="form" align="center" ng-controller = "BP">
<input id = "topNumber" ng-model ="topNum" placeholder="Top Number" type ="number" class="form"/>
<br><br>
<input id = "botNumber" ng-model="bottomNum" placeholder="Bottom Number" type="number" class ="form"/>
<br><br>
<button class ="form" id ="bp_enter" ng-click="write(topNum, bottomNum)">Enter</button>
</div>
code
current python attempt
from selenium import webdriver
path = r'C:\Users\Mason\Desktop\chromedriver.exe'
var = webdriver.Chrome(path)
var.get("http://localhost/PhpProject30/index.php#!/")
var.find_element_by_id('home').click()
var.find_element_by_id('bp_btn').click()
var.find_element_by_id('BP_enter_btn').click()
var.get("http://localhost/PhpProject30/index.php#!/BP_Enter")
var.find_element_by_id('bp_enter').click()
Turns out that Selenium was executing to fast. It was looking for an element in the view before it the view loaded. I put a pause in there to give the view sometime and it worked like charm.
All I did was the following
import time
time.sleep(.5)
I am attempting to scrape the following website flow.gassco.no as one of my first python projects. I need to bypass the splash screen which redirects to the main page. I have isolated the following action,
<form method="get" action="acceptDisclaimer">
<input type="submit" value="Accept"/>
<input type="button" name="decline" value="Decline" onclick="window.location = 'http://www.gassco.no'" />
</form>
In a browser appending 'acceptDisclaimer?' to the url redirects to the target flow.gassco.no. However if I try to replicate this in urllib, I appear to stay on the same page when outputting the source.
import urllib, urllib2
url="http://flow.gassco.no/acceptDisclaimer?"
url2="http://flow.gassco.no/"
#first pass to invoke disclaimer
req=urllib2.Request(url)
res=urllib2.urlopen(req)
#second pass to access main page
req1=urllib2.Request(url2)
res2=urllib2.urlopen(req1)
data=res2.read()
print data
I suspect that I have oversimplified the problem, but would appreciate any input into how I can accept the disclaimer and continue to output the main page source.
Use a cookiejar. See python: urllib2 how to send cookie with urlopen request
Open the main url first
Open the /acceptDisclaimer after that