How to change geolocation of chrome selenium driver in Python? - python

I am trying to trick the chromedriver to make it believe that it is running in a different city. Under normal circumstances, this can easily be done manually as shown in a quick diagram
Then, when a google search is done, the new coordinates are used, and the results that would normally originate from that location are displayed. You can confirm that this worked when you look at the bottom of a Google search page as seen
.
However, Selenium can only control what the browser displays, not the browser in itself. I cannot tell Selenium to automatically click the buttons needed to change the coordinates. I tried the solutions posted here but that is not meant for Python, and even after I tried to adapt the script, nothing seemed to happen.
Is there a browser.execute_script() argument that could work, or is this the wrong way to change the geolocation?

You can do this by importing Selenium DevTools package. Please refer below for complete java code sample:
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.devtools.DevTools;
public void geoLocationTest(){
ChromeDriver driver = new ChromeDriver();
Map coordinates = new HashMap()
{{
put("latitude", 50.2334);
put("longitude", 0.2334);
put("accuracy", 1);
}};
driver.executeCdpCommand("Emulation.setGeolocationOverride", coordinates);
driver.get("<your site url>");
}
Reference : Selenium Documentation

Try this code below :
driver.execute_script("window.navigator.geolocation.getCurrentPosition=function(success){"+
"var position = {\"coords\" : {\"latitude\": \"555\",\"longitude\": \"999\"}};"+
"success(position);}");
print(driver.execute_script("var positionStr=\"\";"+
"window.navigator.geolocation.getCurrentPosition(function(pos){positionStr=pos.coords.latitude+\":\"+pos.coords.longitude});"+
"return positionStr;"))

In search for a solution to the same problem I also came across the already postet scripts. While I am yet to find a solution, I assume that the scripts don't work because they do not change the sensors permanently. The sensors are only changed for that one specific call of window.navigator.geolocation.getCurrentPosition.
The website (in this case google) will later call the same function but with the regular (unchanged) geolocation. Happy to hear solutions to permanently change the sensors to then also affect future geolocation requests.

This Can be Done Using Selenium 4.
HashMap<String ,Object> coordinate = new HashMap<String ,Object>();
coordinate.put("latitude", 39.913818);
coordinate.put("longitude", 116.363625);
coordinate.put("accuracy", 1);
((ChromeDriver)driver).executeCdpCommand("Emulation.setGeolocationOverride",coordinate);
driver.navigate().to("URL");

Related

Extracting info from webpage via python

I'd like to ask somebody with experience with headless browsers and python if it's possible to extract box info with distance from closest strike on webpage below. Till now I was using python bs4 but since everything is driven by jQuery here simple download of webpage doesn't work. I found PhantomJS but I wasn't able extract it too so I am not sure if it's possible. Thanks for hints.
https://lxapp.weatherbug.net/v2/lxapp_impl.html?lat=49.13688&lon=16.56522&v=1.2.0
This isn't really a Linux question, it's a StackOverflow question, so I won't go into too much detail.
The thing you want to do can be easily done with Selenium. Selenium has both a headless mode, and a heady mode (where you can watch it open your browser and click on things). The DOM query API is a bit less extensive than bs4, but it does have nice visual query (location on screen) functions. So you would write a Python script that initializes Selenium, goes to your website and interacts with it. You may need to do some image recognition on screenshots at some point. It may be as simple as finding for a certain query image on the screen, or something much more complicated.
You'd have to go through the Selenium tutorials first to see how it works, which would take you 1-2 days. Then figure out what Selenium stuff you can use to do what you want, that depends on luck and whether what you want happens to be easy or hard for that particular website.
Instead of using Selenium, though, I recommend trying to reverse engineer the API. For example, the page you linked to hits https://cmn-lx.pulse.weatherbug.net/data/lightning/v1/spark with parameters like:
_
callback
isGpsLocation
location
locationtype
safetyMessage
shortMessage
units
verbose
authid
timestamp
hash
You can figure out by trial and error which ones you need and what to put in them. You can capture requests from your browser and then read them yourself. Then construct appropriate requests from a Python program and hit their API. It would save you from having to deal with a Web UI designed for humans.

PhantomJS loads much less HTML than other drivers

I'm trying to load one web page and get some elements from it. So the first thing I do is to check the page using "inspect element". When I search for the tags I'm looking for, I can see them (in Chrome).
But when I try to do driver.get(url) and then driver.find_element_by_..., it doesn't find those elements because they aren't in the source code.
I think that it is probably because it doesn't load the whole page but only a part.
Here is an example:
I'm trying to find ads on the web page.
PREPARED_TABOOLA_BLOCK = """//div[contains(#id,'taboola') and not(ancestor::div[contains(#id,'taboola')])]"""
driver = webdriver.PhantomJS(service_args=["--load-images=false"])
# driver = webdriver.Chrome()
driver.maximize_window()
def find_taboola_blocks_selenium(url):
driver.get(url)
taboola_blocks = driver.find_elements_by_xpath(PREPARED_TABOOLA_BLOCK)
return taboola_blocks
print len(find_taboola_blocks_selenium('http://www.breastfeeding-problems.com/breastfeeding-a-sick-baby.html'))
driver.get('http://www.breastfeeding-problems.com/breastfeeding-a-sick-baby.html')
print len(driver.page_source)
OUTPUTS:
Using PhantomJS:
0
85103
Using ChromeDriver:
3
420869
Do you know how to make PhantomJS to load as much Html as possible or any other way to solve this?
Can you compare the request that ChromeDriver is making versus the request you are making in PhantomJS? Since you are only doing GET for the specified url, you may not be including other request parameters that are needed to get the advertisements.
The open() method may give you a better representation of what you are looking for here: http://phantomjs.org/api/webpage/method/open.html
The reason for this is because PhantomJS, by default, renders in a really small window, which makes it load the mobile version of the site. And with the PhantomJSDriver, calling maximizeWindow() (or maximize_window() in python) does absolutely nothing, since there is no rendered window to maximize. You will have to explicitly set the window's render size with:
edit: Below is the Java solution. I'm not entirely sure what the Python solution would be when setting the window size, but it should be similar.
driver.manage().window().setSize(new Dimension(1920, 1200));
edit again: Found the python version:
driver.set_window_size(1920, 1200)
Hope that helps!
PhantomJS 1.x is a really old browser. It only uses SSLv3 (now disabled on most sites) by default and doesn't implement most cutting edge functionality.
Advertisement scripts are usually delivered over HTTPS (SSLv3/TLS) and usually use some obscure feature of JavaScript which is not well tested or simply not implemented in PhantomJS.
If you use PhantomJS < v1.9.8 then you should use those commandline options (service_args): --ignore-ssl-errors=true --ssl-protocol=any.
If iframes or strange cross-domain requests are necessary for the page/ads to work, then add --web-security=false to the service_args.
If this still doesn't solve the problem, then try upgrading to PhantomJS 2.0.0. You might need to compile it yourself on Linux.

How to determine the size of html table in pixels given an html file

I have a html file that has various html tags in it. This html also has a bunch of tables in it. I am processing this file using python. How do I find out what the size (length x width in pixels) when it is rendered by a browser (preferably chrome or firefox)?
I am essentially looking for the information when you do "inspect element" on a browser, and you are able to see the size of the various elements. I want to access this size in my python code.
I am using lxml to parse my html and can use selenium if needed.
edit: added #node.js incase I can use it to spit out the size of all the tables in a shell script and I can grab it in python.
You're going to want to use Selenium WebDriver to open the HTML file in an actual browser installed on the computer that your Python code is running on.
I'm not sure how you'd use the Selenium WebDriver API to find out how tall a rendered table is, but the value_of_css_property method might do it.
If you can call out shellscript, and you can use Node.js, I'm assuming you could also install and use PhantomJS, which is a headless WebKit port. (I.e. an actual honest to goodness WebKit renderer that just doesn't require a window to work.) This will let you use Javascript and the familiar web libraries to manipulate the document. As an example, the following gets you the width of the logo element towards the upper left Stack Overflow site:
page = require('webpage').create(); // create a new "browser"
page.open('http://stackoverflow.com/', function() {
// callback when loading completes
var logoWidth = page.evaluate(function() {
// This runs in the rendered page and uses the version of jQuery that SO loads.
return $('#hlogo').width();
});
console.log(logoWidth); // prints 250, the same as Chrome.
phantom.exit(); // for some reason you need to exit manually
});
The documentation for PhantomJS will tell you more about what you can do with it and how.
One caveat however is that loading a page takes a while, since it needs to fetch CSS and scripts and generally do everything a browser does. I'm not sure if and how PhantomJS does any caching, if it does it might make sense to reuse the same process for multiple scrapes of the same site.

Python webdriver in IE returns None for all find_elements

I am developing tests using the latest version of Selenium 2 in python, installed with pip install -U selenium. I have a series of tests that run correctly using webdriver.Firefox(), but do not with webdriver.Ie(). It opens and navigates to the page correctly, but any attempt to access elements in that page fails. It does not appear to be a problem in other pages, but I cannot identify what would be causing the problem with mine.
I can generate the problem easily by building an instance of the webdriver with:
from selenium import webdriver
browser = webdriver.Ie()
browser.get("page url")
browser.find_elements_by_tag_name("html") #returns None!
I'm looking for any clues as to why this might be.
If you are exactly using this code, there are absolutely some problems.
First, I think it's better to assign return value of find to a variable and then check its value; I mean html_tags = browser.find_elements_by_tag_name("html").
After that, I thin you should use browser.find_element_by_tag_name("html"); because every page has just one tag with name "html". Your code is true if you want to achieve all tag elements with name "html".
Finally I think you want to access page source; in that case, you should use one of these:
html_text = browser.find_element_by_tag_name("html").text
html_text = browser.find_elements_by_tag_name("html")[0].text
Just to cover all the states, could you please tell me what is the URL that you are trying to access; because I tried multiple webpages with your code (exactly yours, not after my own edits) and everything was fine.

How to delete Firefox cookies from webdriver in python?

when I can't delete FF cookies from webdriver. When I use the .delete_all_cookies method, it returns None. And when I try to get_cookies, I get the following error:
webdriver_common.exceptions.ErrorInResponseException: Error occurred when processing
packet:Content-Length: 120
{"elementId": "null", "context": "{9b44672f-d547-43a8-a01e-a504e617cfc1}", "parameters": [], "commandName": "getCookie"}
response:Length: 266
{"commandName":"getCookie","isError":true,"response":{"lineNumber":576,"message":"Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMLocation.host]","name":"NS_ERROR_FAILURE"},"elementId":"null","context":"{9b44672f-d547-43a8-a01e-a504e617cfc1} "}
How can I fix it?
Update:
This happens with clean installation of webdriver with no modifications. The changes I've mentioned in another post were made later than this post being posted (I was trying to fix the issue myself).
Hmm, I actually haven't worked with Webdriver so this may be of no help at all... but in your other post you mention that you're experimenting with modifying the delete cookie webdriver js function. Did get_cookies fail before you were modifying the delete function? What happens when you get cookies before deleting them? I would guess that the modification you're making to the delete function in webdriver-read-only\firefox\src\extension\components\firefoxDriver.js could break the delete function. Are you doing it just for debugging or do you actually want the browser itself to show a pop up when the driver tells it to delete cookies? It wouldn't surprise me if this modification broke.
My real advice though would be actually to start using Selenium instead of Webdriver since it's being discontinued in it's current incarnation, or morphed into Selenium. Selenium is more actively developed and has pretty active and responsive forms. It will continue to be developed and stable while the merge is happening, while I take it Webdriver might not have as many bugfixes going forward. I've had success using the Selenium commands that control cookies. They seem to be revamping their documentation and for some reason there isn't any link to the Python API, but if you download selenium rc, you can find the Python API doc in selenium-client-driver-python, you'll see there are a good 5 or so useful methods for controlling cookies, which you use in your own custom Python methods if you want to, say, delete all the cookies with a name matching a certain regexp. If for some reason you do want the browser to alert() some info about the deleted cookies too, you could do that by getting the cookie names/values from the python method, and then passing them to selenium's getEval() statement which will execute arbitrary js you feed it (like "alert()"). ... If you do go the selenium route feel free to contact me if you get a blocker, I might be able to assist.

Categories

Resources