Selenium change firefox browser language python mac - python

I've spent 3 hours trying to find a solution to this problem, and I'm so far from an answer I feel like I have to ask. (I've found similar posts - below is an explanation of why I'm asking despite finding these posts)
I'm writing a program that enters several search terms into google - and retrieves the ranking of my page. I want to know my ranking for several different country SERP's. I've gotten the issue with ip address solved. But now I see that google is factoring in the language of my browser when giving me a SERP. In order to get closer to the true rank of my page in a country ( I say closer because I've also seen that the SERP depends on search history) I have to use a web browser with a language native to the country I'm interested in.
Changing the language in firefox manually is difficult - as a matter of fact you have to install several different language version (I've also read about installing language packs - but unsure if this is relevant for firefox 12). I don't have a clue about how to get selenium to choose the right firefox version.
I'm having a hard time understanding what needs to be done: Do I have to specify which firefox installation / version selenium is suppose to use when launching web driver.firefox(). Or is it possible to determine the browser language by changing the firefox profile.
I've spent some time looking into the profile part - and found partial evidence (original post) although I can't find any reference to language in the profile files.
The answer in the same post seems to have solved the problem - but I don't know the language and I'm having trouble understanding what's actually being done.
I know there's an add-on to firefox for changing between the language versions (you have to first install the different language versions of firefox that you want) - Given that the settings option mentions changing "general.useragent.local preference" I'm thinking it's a profile setting that can be changed - but the add-on requires a re-start of the browser when you change the language - so..
I can't find anything about general.useragent.local in the profile settings.
Any one giving me a point in the right direction would be making my day!
EDIT:
Forgot to mention - I only know python - so that's why I wrote python in the title

Ok, I have to agree that perhaps this isnt the best way to approach this problem but will answer what you need
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary('path/to/binary')
profile = webdriver.FirefoxProfile()
profile.add_extension('path/to/xpi') #XPI needs to be on disk and not downloaded from AMO
profile.set_preference('general.useragent.local','<enter your value')
driver = webdriver.Firefox(firefox_binary=binary, firefox_profile=profile)
# Carry on with what you want
The pydocs are available here

Related

Extracting info from webpage via python

I'd like to ask somebody with experience with headless browsers and python if it's possible to extract box info with distance from closest strike on webpage below. Till now I was using python bs4 but since everything is driven by jQuery here simple download of webpage doesn't work. I found PhantomJS but I wasn't able extract it too so I am not sure if it's possible. Thanks for hints.
https://lxapp.weatherbug.net/v2/lxapp_impl.html?lat=49.13688&lon=16.56522&v=1.2.0
This isn't really a Linux question, it's a StackOverflow question, so I won't go into too much detail.
The thing you want to do can be easily done with Selenium. Selenium has both a headless mode, and a heady mode (where you can watch it open your browser and click on things). The DOM query API is a bit less extensive than bs4, but it does have nice visual query (location on screen) functions. So you would write a Python script that initializes Selenium, goes to your website and interacts with it. You may need to do some image recognition on screenshots at some point. It may be as simple as finding for a certain query image on the screen, or something much more complicated.
You'd have to go through the Selenium tutorials first to see how it works, which would take you 1-2 days. Then figure out what Selenium stuff you can use to do what you want, that depends on luck and whether what you want happens to be easy or hard for that particular website.
Instead of using Selenium, though, I recommend trying to reverse engineer the API. For example, the page you linked to hits https://cmn-lx.pulse.weatherbug.net/data/lightning/v1/spark with parameters like:
_
callback
isGpsLocation
location
locationtype
safetyMessage
shortMessage
units
verbose
authid
timestamp
hash
You can figure out by trial and error which ones you need and what to put in them. You can capture requests from your browser and then read them yourself. Then construct appropriate requests from a Python program and hit their API. It would save you from having to deal with a Web UI designed for humans.

How to change geolocation of chrome selenium driver in Python?

I am trying to trick the chromedriver to make it believe that it is running in a different city. Under normal circumstances, this can easily be done manually as shown in a quick diagram
Then, when a google search is done, the new coordinates are used, and the results that would normally originate from that location are displayed. You can confirm that this worked when you look at the bottom of a Google search page as seen
.
However, Selenium can only control what the browser displays, not the browser in itself. I cannot tell Selenium to automatically click the buttons needed to change the coordinates. I tried the solutions posted here but that is not meant for Python, and even after I tried to adapt the script, nothing seemed to happen.
Is there a browser.execute_script() argument that could work, or is this the wrong way to change the geolocation?
You can do this by importing Selenium DevTools package. Please refer below for complete java code sample:
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.devtools.DevTools;
public void geoLocationTest(){
ChromeDriver driver = new ChromeDriver();
Map coordinates = new HashMap()
{{
put("latitude", 50.2334);
put("longitude", 0.2334);
put("accuracy", 1);
}};
driver.executeCdpCommand("Emulation.setGeolocationOverride", coordinates);
driver.get("<your site url>");
}
Reference : Selenium Documentation
Try this code below :
driver.execute_script("window.navigator.geolocation.getCurrentPosition=function(success){"+
"var position = {\"coords\" : {\"latitude\": \"555\",\"longitude\": \"999\"}};"+
"success(position);}");
print(driver.execute_script("var positionStr=\"\";"+
"window.navigator.geolocation.getCurrentPosition(function(pos){positionStr=pos.coords.latitude+\":\"+pos.coords.longitude});"+
"return positionStr;"))
In search for a solution to the same problem I also came across the already postet scripts. While I am yet to find a solution, I assume that the scripts don't work because they do not change the sensors permanently. The sensors are only changed for that one specific call of window.navigator.geolocation.getCurrentPosition.
The website (in this case google) will later call the same function but with the regular (unchanged) geolocation. Happy to hear solutions to permanently change the sensors to then also affect future geolocation requests.
This Can be Done Using Selenium 4.
HashMap<String ,Object> coordinate = new HashMap<String ,Object>();
coordinate.put("latitude", 39.913818);
coordinate.put("longitude", 116.363625);
coordinate.put("accuracy", 1);
((ChromeDriver)driver).executeCdpCommand("Emulation.setGeolocationOverride",coordinate);
driver.navigate().to("URL");

Splinter fill function is very slow, with IE Webdriver

So, I've followed this tutorial to use the Splinter framework with Internet Explorer (https://stirunagari.wordpress.com/2017/08/20/using-internet-explorer-web-driver-with-splinter-framework/), and It's working....well kind of working.
from splinter import Browser
browser = Browser('iexplorer')
browser.visit('http://google.com')
browser.fill('q', 'Text to fill in the search bar')
The search field is being filled but at a very slow rate, like 1 keystroke in 1-2 seconds. While using Chrome or Firefox as the browser the browser.fill is working well.
I know that this issue is most probably because IE is not directly supported by Splinter, but maybe someone knows a workaround or something?
Edit: I don't know what IEDriver I was using before, but I replaced it with IEDriverServer_Win32 from Here,and it's working fine now. I can't answer my question because someone deleted my answer...
I wasn't using the latest Internet Explorer WebDriver; Updated it from here and it's working fine now:
http://selenium-release.storage.googleapis.com/index.html?path=3.8/
Sounds like you are using 32-bit version of IE Driver. You should use 64 bit version of IE driver not sure why but it much more faster.

PhantomJS loads much less HTML than other drivers

I'm trying to load one web page and get some elements from it. So the first thing I do is to check the page using "inspect element". When I search for the tags I'm looking for, I can see them (in Chrome).
But when I try to do driver.get(url) and then driver.find_element_by_..., it doesn't find those elements because they aren't in the source code.
I think that it is probably because it doesn't load the whole page but only a part.
Here is an example:
I'm trying to find ads on the web page.
PREPARED_TABOOLA_BLOCK = """//div[contains(#id,'taboola') and not(ancestor::div[contains(#id,'taboola')])]"""
driver = webdriver.PhantomJS(service_args=["--load-images=false"])
# driver = webdriver.Chrome()
driver.maximize_window()
def find_taboola_blocks_selenium(url):
driver.get(url)
taboola_blocks = driver.find_elements_by_xpath(PREPARED_TABOOLA_BLOCK)
return taboola_blocks
print len(find_taboola_blocks_selenium('http://www.breastfeeding-problems.com/breastfeeding-a-sick-baby.html'))
driver.get('http://www.breastfeeding-problems.com/breastfeeding-a-sick-baby.html')
print len(driver.page_source)
OUTPUTS:
Using PhantomJS:
0
85103
Using ChromeDriver:
3
420869
Do you know how to make PhantomJS to load as much Html as possible or any other way to solve this?
Can you compare the request that ChromeDriver is making versus the request you are making in PhantomJS? Since you are only doing GET for the specified url, you may not be including other request parameters that are needed to get the advertisements.
The open() method may give you a better representation of what you are looking for here: http://phantomjs.org/api/webpage/method/open.html
The reason for this is because PhantomJS, by default, renders in a really small window, which makes it load the mobile version of the site. And with the PhantomJSDriver, calling maximizeWindow() (or maximize_window() in python) does absolutely nothing, since there is no rendered window to maximize. You will have to explicitly set the window's render size with:
edit: Below is the Java solution. I'm not entirely sure what the Python solution would be when setting the window size, but it should be similar.
driver.manage().window().setSize(new Dimension(1920, 1200));
edit again: Found the python version:
driver.set_window_size(1920, 1200)
Hope that helps!
PhantomJS 1.x is a really old browser. It only uses SSLv3 (now disabled on most sites) by default and doesn't implement most cutting edge functionality.
Advertisement scripts are usually delivered over HTTPS (SSLv3/TLS) and usually use some obscure feature of JavaScript which is not well tested or simply not implemented in PhantomJS.
If you use PhantomJS < v1.9.8 then you should use those commandline options (service_args): --ignore-ssl-errors=true --ssl-protocol=any.
If iframes or strange cross-domain requests are necessary for the page/ads to work, then add --web-security=false to the service_args.
If this still doesn't solve the problem, then try upgrading to PhantomJS 2.0.0. You might need to compile it yourself on Linux.

automatically edit firefox web address upon pageload, and then reload

My coding experience is in Python. Is there a simple way to execute a python code in firefox that would detect a particular address, say nytimes.com, load the page, then delete the end of the address following html (this allows bypassing the 20 pageviews/month limit) and reload?
Your best bet is to use selenium as proposed before. Here's a small example how you could do it. Basically the code checks if the limit has been reached and if it has it deletes cookies and refreshes the page letting you to continue reading. Deleting cookies lets you read another 10 articles without continuously editing the address. Thats the technical part, you have to consider the legal implications yourself.
from selenium import webdriver
browser=webdriver.Firefox()
browser.get('http://www.nytimes.com')
if browser.find_element_by_xpath('.//*[contains(.,"You’ve reached the limit of 10 free articles a month.")]'):
browser.delete_all_cookies()
browser.refresh()
you can use selenium it lets you easily fully control firefox and other web browsers with python. it would only be a few lines of code to acheive this. this answer How to integrate Selenium and Python has a working example

Categories

Resources