How to manage chrome local storage - python

I'm using Selenium + Python + ChromeDriver to test web application. Web application contains tables with data that could be sorted using various embedded filters. The problem is that after first test executed application saves current state (like which table page is opened, which data sorting method applied) in browser local storage, so that when next test starts data appears already filtered... But I need default data filters for each test and so I need to set default key:value pairs or clear storage before each test case. I found this solution
driver.get('javascript:localStorage.clear();')
but get
selenium.common.exceptions.WebDriverException: Message: unknown error:unsupported protocol
How can I manage (change or clear) Chrome local storage using Selenium?

You should execute the script instead:
driver.execute_script('window.localStorage.clear();')

Related

How to get a specific resource with a GET request using python3?

Hello I am trying to solve this problem currently. How would one go about getting a specific resource, i.e. "resources/2021/helloworld" from a web service at a specific url (example.com/example) using python? I also need to specify a user agent (chrome on android in this case) and a link which it was referred by (i.e. google.com/resourcelink). Then preferrably print the text of the resource or write it to a file.

Chrome from selenium won't sync

I am trying to open up a selenium browser with sync option enabled. Either I specify --used-data-dir (having all the cookies and extensions from my google account loaded on selenium) or I don't, it stills won't allow me to sync. A way to check if the browser is allowed to sync is to go to settings -> people. If the browser is syncing a Sync on label/option must exist under your account name. If the browser is not syncing then this option is not there, and so you cannot change it.
I have also tried changing the chrome policy DisableSync globally from the registry edit as specified here, but still, selenium won't sync (even if in chrome://policy/ the SyncDisabled policy is set to false).
Won't sync means that if e.g. I add/change a password from a site, the new password won't be stored in my account's cookies, which is basically my objective. Just to clarify, when I open chrome normally (not selenium) then, of course, the sync option is available.
So, my question I guess is how I can open chrome through selenium having the sync option enabled?
I finally found a solution. So when you start chrome through selenium, even if you are not specifying the --disable-sync argument, it specifies it itself. In other words, visiting chrome://version in a newly generated selenium browser on the list of command lines there is the --disable-sync argument. So basically I first excluded it like so:
options.add_experimental_option(
'excludeSwitches',
['disable-sync'])
and then I enabled it like so:
options.add_argument('--enable-sync')
since if didn't excluded it before enabling it, both the --disable-sync and the --enable-sync would be in the command lines list.

Node.js scraping with chrome-remote-interface

I have been trying to scrape a website protected by Distil Networks,
in which using selenium (with Python) would just always fail.
I did a few searches, and my conclusion is that the site can detect you are using Selenium by using some sort of javascript. I then took a loot at chrome-remote-interface, like it is the thing that I want, but then I got stuck.
What I would like to do is to automate following steps:
Open a Chrome instance
Navigate to a page
Run some javascript
Collect data and save to file
Repeat steps 2 - 4
I know that I can open a instance of Chrome for debugging by:
google-chrome --remote-debugging-port=9222
And I can open a console on node by:
chrome-remote-interface -t 127.0.0.1 -p 9222 inspect -r
I can also run simple scripts like
Page.navigate({url:"https://google.com"})
Runtime.evaluate({expression:"1+1"})
But like I can't get the DOMs directly on Node.js as what I could do on the Chrome Developer Tools console. Basically what I want is run scripts on Node like what I could do on the Chrome Developer Tools console.
Also , there are not enough documentation on chrome-remote-interface for scraping. Is there any good links for that?
I know it's has been asked two years ago, but let me write it here for documentation purposes.
-- Tools of the trade --
I tried the same technique as you did (used the remote debugger for scraping) but instead of using Python i used Node.js because of it's asynchronous nature, thus making easier to work with websockets that the remote debugger relies on.
-- Runtime.evaluate --
One thing i noted is that Runtime.evaluate isn't a valid option for recovering any data if your expression involves asynchronous calls because it returns the result of the calling function and not of the callback function. You have to stick with synchronous expressions.
Example:
Array.from(document.getElementByTagName('tr'))
.map((e)=>e.children[2].innerHTML)
.filter((e)=>e.length>0)
Other thing is that when your expression returns an array Runtime.evaluate just mention that the expression returned an array but not the array itself! (infuriating i know)
I got around it by simply enconding the arrays as JSON strings in the page context then decoding it back to object when it arrives at the Node.js. For example the above expression would need to be:
JSON.stringify(
Array.from(document.getElementByTagName('tr'))
.map((e)=>e.children[2].innerHTML)
.filter((e)=>e.length>0)
)
-- Navigation --
When you trigger a page load by using "Page.navigate", ".click()", ".submit()", "window.location.href=..." or any other way it's important to know when the next page was completely loaded before sending more instructions with Runtime.evaluate.
I did the trick asking the debugger to send me the page loading events(look for the Page.enable method in the documentation) then waiting for the "Page.loadEventFired" event before sending more expressions.
JavaScript expressions evaluated by Runtime.evaluate are executed within the page context just like what happens in the DevTools console.
You can interact with the DOM using the DOM domain, e.g., DOM.getDocument, DOM.querySelector, etc.
Also remember that chrome-remote-interface is mainly a library meaning that it allows you to write your own Node.js applications, the chrome-remote-interface inspect is just an utility.
There are several places where you can get help:
open an issue to chrome-remote-interface;
the chrome-remote-interface wiki;
the Chrome DevTools Protocol Viewer;
the Chrome Debugging Protocol Google Group.
If you ask something more specific I'd be happy to try to help you with that.
Finally you may want to take a look at automated-chrome-profiling, which I think is structurally similar to what you're trying to achieve.

Selenium - Find out where script was downloaded from

I have to validate that a web application, when executed in the client browser, is fetching some assets (*.js) from a particular remote server.
Say two options exist: whether it gets the script from server A or it gets a copy from server B. I need to assert (based on some preconditions) that the script was downloaded from server A.
The question: Is there a way to inspect source url of loaded javascript using selenium (preferably with python)?
Here it is a possible solution to extract url of javascript libraries from the stackoverflow site.
You should adapt the solution to the site you are working on.
driver = webdriver.Firefox()
driver.get("http://www.http://stackoverflow.com/")
link= driver.find_elements_by_tag_name('script')
for i in link:
print i.get_attribute("src")
Example of output:
http://rules.quantcount.com/rules-p-c1rF4kxgLUzNc.js
http://edge.quantserve.com/quant.js
http://b.scorecardresearch.com/beacon.js
https://www.google-analytics.com/analytics.js
https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js
https://cdn.sstatic.net/Js/stub.en.js?v=9798373e8e81
There are various strategies to locate elements in a page.
You can use the most appropriate one for your case (http://selenium-python.readthedocs.io/locating-elements.html)

Python: Getting current URL

I searched the net but couldn't get anything that works.
I am trying to write a python script which will trigger a timer if a particular url is opened in the current browser.
How do i obtain the url from the browser.
You cannot do it platform independent way.
You need to use pywin32 for Windows platform (or any other suitable module which provides access to platform API, for example pywm) to access window (you can get it by window name). After that you should analyse all child to get to window which represents URL string. Finally you can get text of this.

Categories

Resources