Is it possible to call a Chrome extension's method using Python? - python

I want to write a Chrome extension which gets the page source and I have found some references (1, 2) on how to do it. However, the end code that would be using this source is in Python. Is there any way I could write Chrome extension and call its methods in Python?
Note:
I have tried using Selenium to get browser's source. However, I'm stuck when the page doesn't stop loading. There is a bug in selenium which prevents it from doing anything if the page doesn't stop loading. The browser doesn't return back to Selenium so I'm trying alternate methods.

Related

Getting Wepage data same as browser INSPECT option

When I go to the following website: https://www.bvl.com.pe/mercado/movimientos-diarios and use Selenium's page_source option, or urllib.request.urlopen what I get is a different string than if I go to Google Chrome, and open the INSPECT option in the contextual menu and copy the entire thing.
From my research, I understand it has to do with Javascript running on the webpage and what I am getting is the base HTML.
What code can I use (Python) to get the same information?
That behavior entirely browser-dependent. The browser takes the raw HTML, processes it, runs a JS script (usually), styles it with CSS and does many other things. So to get such a result in Python you'd have to make your own web browser.
After much digging around, I came upon a solution that works in most cases. Use Headless Chrome with the --dump-dom switch.
https://developers.google.com/web/updates/2017/04/headless-chrome
Programmatically in Python use the subprocess module to run Chrome in a shell and either assign the output to a variable or direct the output to a text file.

Easy way to work around slow Selenium Python startup?

So I have a program I want to run using selenium specifically that takes a series of actions on a password-protected website. Basically, I need to be able to input a unique link and password when I get it, which will take me to the main website which I have automated. The issue here is that Selenium takes very long to get to load a webpage when you start it up and time is very important in this application. Inputting the link and launching the browser to that link directly takes a long time. What I have tried doing is preloading the browser to a different website (ie, https://google.com) beforehand, and then waiting on user input for the link to the actual page. This process works a lot quicker, but I'm having trouble getting it to work inside a function and with multiprocessing. I am using multiprocessing to execute this on a wide scale with lots of instances. I am trying to start all of my functions the second a link is defined by me. I am on Windows 10, using Python 3.8.3, and using Chrome for my Selenium browser.
from selenium import webdriver
global link
link = input('Paste Link Here: ')
def instance_1():
browser1 = webdriver.Chrome(*my webdriver file path*)
browser1.get('https://google.com')
#need something that waits here until the link variable is defined by me
browser1.get(link)
#the rest of the automation works fine from here
Ideally, the solution would be able to work with multiprocessing. The ideal flow would be something like this:
1. All selenium instances" (written as their own functions) start-up and preload to a website (this part works fine)
2. They wait until the link to go to is specified (this is where the issue is)
3. They then go to the link and execute the automation (this part works fine)
Tldr; basically anything that would allow me to let the program continue while waiting on the input would be nice.

First paint time in Python (may be using Selenium or without it)?

I want to find the time whatever (an object, image, text, link, DB or anything) loads first in a requested website using Python and Selenium.
Checkout performance.timing, it's JavaScript and comes default in your browser. You have a lot of options to display, like:
navigationStart
connectStart
connectEnd
domLoading
domInteractive
domComplete
Just go to your console window in your browser and type performance.timing. Might be of use to you.
If you find something you can use, you can use selenium to execute the JavaScript inside the browser using execute_script:
driver.execute_script(‘return performance.timing.domComplete’)

Python selenium webdriver - unable to click an element by xpath at a single go

I'm using selenium webdriver with python so as to find an element and click it. This is the code. I'm passing 'number' to this code's method and this doesn't work. I see it on the browser that the element is found but it doesn't click the element.
subIDTypeIcon = "//a[#id='s_%s_IdType']/img" % str(number)
self.driver.find_element_by_xpath(subIDTypeIcon).click()
Whereas, I tried placing the 'self.driver.find_.....' twice and to my surprise it works
subIDTypeIcon = "//a[#id='s_%s_IdType']/img" % str(number)
self.driver.find_element_by_xpath(subIDTypeIcon).click()
self.driver.find_element_by_xpath(subIDTypeIcon).click()
I have the browser getting opened on a remote server so there is sometimes timeout problem.
Is there a proper way to make this work? why does it work when same statement is placed twice
This is a common problem and the main reason to create abstract, per page helper classes. Instead of blindly finding elements, you usually need a loop which tries to find an element for a couple of seconds so the browser can update the DOM.
The second version often works because starting to load a new page doesn't invalidate the DOM. That only happens when the remote server has started to send enough of the new document to the browser. You can see this yourself when you use a browser: Pages don't become blank the same instant you click on a link. Instead, they stay for a while.

How to determine the size of html table in pixels given an html file

I have a html file that has various html tags in it. This html also has a bunch of tables in it. I am processing this file using python. How do I find out what the size (length x width in pixels) when it is rendered by a browser (preferably chrome or firefox)?
I am essentially looking for the information when you do "inspect element" on a browser, and you are able to see the size of the various elements. I want to access this size in my python code.
I am using lxml to parse my html and can use selenium if needed.
edit: added #node.js incase I can use it to spit out the size of all the tables in a shell script and I can grab it in python.
You're going to want to use Selenium WebDriver to open the HTML file in an actual browser installed on the computer that your Python code is running on.
I'm not sure how you'd use the Selenium WebDriver API to find out how tall a rendered table is, but the value_of_css_property method might do it.
If you can call out shellscript, and you can use Node.js, I'm assuming you could also install and use PhantomJS, which is a headless WebKit port. (I.e. an actual honest to goodness WebKit renderer that just doesn't require a window to work.) This will let you use Javascript and the familiar web libraries to manipulate the document. As an example, the following gets you the width of the logo element towards the upper left Stack Overflow site:
page = require('webpage').create(); // create a new "browser"
page.open('http://stackoverflow.com/', function() {
// callback when loading completes
var logoWidth = page.evaluate(function() {
// This runs in the rendered page and uses the version of jQuery that SO loads.
return $('#hlogo').width();
});
console.log(logoWidth); // prints 250, the same as Chrome.
phantom.exit(); // for some reason you need to exit manually
});
The documentation for PhantomJS will tell you more about what you can do with it and how.
One caveat however is that loading a page takes a while, since it needs to fetch CSS and scripts and generally do everything a browser does. I'm not sure if and how PhantomJS does any caching, if it does it might make sense to reuse the same process for multiple scrapes of the same site.

Categories

Resources