Extracting entire webpage source code using Selenium

Extracting entire webpage source code using Selenium - python

I'm new to Selenium and need help with a task. I want to somehow download the source code of a webpage, change the image src's and then take a screenshot of the resulting webpage. I need to do this using a mobile emulator, hence I am using selenium. The screenshots have to be reflective of the mobile emulator (the screenshots need to be as if you opened the webpage on a mobile device).
I know how to open a local html file using selenium as well as take a screenshot through selenium. I also already have the images locally stored in my working directory.
However, the page source code that I get from selenium doesn't actually look like if we open that webpage on a mobile broswer. I used the following code to get the source html:
html = driver.page_source
However, if I save this html code and reload it using selenium and then take a screenshot, it looks nothing like the original page (most of the time). The dimensions are ok (that of a mobile browser) but the elements are all missing, even if I haven't replaced the image sources yet. Is there a way to get visually similar code or another way to change the image sources?

Related

Get current HTML from browser tab with Python

I know there are plenty ways to get a HTML source passing the page url.
But is there a way to get the current html of a page if it displays data after some action ?
For example: A simple html page with a button (thats the source html) that displays random data when you click it.
Thanks

I believe you're looking for a tool collectively known as a "headless browser". The only one I've used that is available in Python (and can vouch for) is Selenium WebDriver, but there are plenty to choose from if you're searching up headless browsers for Python.
https://pypi.org/project/selenium
With this you should be able to programmatically load a web page, look up and click the button in the virtually rendered DOM, then lookup the innerHTML property of the targeted element.

Uploading Image with requests/selenium/pywinauto

Am trying to automate some postings on a couple different websites. Basically fill out my form and upload them to 3 sites with Selenium or requests. The image upload on this site opens a new window and asks you to specify the file path, or you can drag and drop the files. Here is what is looks like.
And without CSS here is what it looks like.
I abandoned requests earlier thinking there was no way I was going to be able to do anything with it. Moved to selenium and can click on the button and open the window but cannot actually place an image to upload in there. I have tried pywinauto and keep getting ElementNotVisible. I am having a hard time looking through the docs to find what to actually do. Where to go from here?

Try the below.
eleBrowse = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, '//input[#type='file']')))
# replace the path below with the one which you want to upload. If you want to send multiple files use comma as separator.
eleBrowse.send_keys("path")

Script cannot fetch data from a web page

I am trying to write a program in Python that can take the name of a stock and its price and print it. However, when I run it, nothing is printed. it seems like the data is having a problem being fetched from the website. I double checked that the path from the web page is correct, but for some reason the text does not want to show up.
from lxml import html
import requests
page = requests.get('https://www.bloomberg.com/quote/UKX:IND?in_source=topQuotes')
tree = html.fromstring(page.content)
Prices = tree.xpath('//span[#class="priceText__1853e8a5"]/text()')
print ('Prices:' , Prices)
here is the website I am trying to get the data from
I have tried BeautifulSoup, but it has the same problem.

If you print the string page.content, you'll see that the website code it captures is actually for a captcha test, not the "real" destination page itself you see when you manually visit the website. It seems that the website was smart enough to see that your request to this URL was from a script and not manually from a human, and it effectively prevented your script from scraping any real content. So Prices is empty because there simply isn't a span tag of class "priceText__1853e8a5" on this special Captcha page. I get the same when I try scraping with urllib2.
As others have suggested, Selenium (actual web automation) might be able to launch the page and get you what you need. The ID looks dynamically generated, though I do get the same one when I manually look at the page. Another alternative is to simply find a different site that can give you the quote you need without blocking your script. I tried it with https://tradingeconomics.com/ukx:ind and that works. Though of course you'll need a different xpath to find the cell you need.

How to get elements of webpage that load after initial webpage?

I'm trying to download stock option data from Yahoo Finance (here's Google as an example) with requests.get, which doesn't seem to be downloading everything. I'm trying to get the dropdown of dates with an XPath but even //option doesn't return anything even though Chrome DevTools says there are 13 instances!
I expect this has something to do with the fact that the parts of the site that actually matter are being loaded after all the navigation bars and such, and I don't know how to get all of it. Could you please suggest a method for getting the text of each item in the date dropdown menu?

If you open the dev console and refresh the page again (caches might need to be purged), you can see some requests with type xhr.
They are usually initiated by JavaScript programs and will load some data besides those provided by HTML body.
That's what you can look into.

Take a screenshot of an entire webpage using Robot Framework

I have a problem on taking a screenshot using robotframework.
Currently I am using keyword Capture Page Screenshot on Selenium2library. The problem is that, keyword only captures the webpage visible on the screen.
We needed a screenshot that could take the entire webpage. This means that when it capture a screenshot it should scroll down to the bottom of the webpage and capture the whole page. Is that possible?
Appreciate everyone can suggest if there are other library that we can use.

You can do that with a headless browser:
http://phantomjs.org/
Execute it from Python as a separate process if you need to, storing the result into a file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting entire webpage source code using Selenium - python

Related

Get current HTML from browser tab with Python

Uploading Image with requests/selenium/pywinauto

Script cannot fetch data from a web page

How to get elements of webpage that load after initial webpage?

Take a screenshot of an entire webpage using Robot Framework

Categories

Resources