Take a screenshot of an entire webpage using Robot Framework

Take a screenshot of an entire webpage using Robot Framework - python

I have a problem on taking a screenshot using robotframework.
Currently I am using keyword Capture Page Screenshot on Selenium2library. The problem is that, keyword only captures the webpage visible on the screen.
We needed a screenshot that could take the entire webpage. This means that when it capture a screenshot it should scroll down to the bottom of the webpage and capture the whole page. Is that possible?
Appreciate everyone can suggest if there are other library that we can use.

You can do that with a headless browser:
http://phantomjs.org/
Execute it from Python as a separate process if you need to, storing the result into a file.

Related

Impossible to recover some information with Beautifulsoup on a site

I need your help because I have for the first time problems to get some information with Beautifulsoup .
I have two problems on this page
The green button GET COUPON CODE appear after a few moment see GIF capture
When we inspect the button link, we find a a simple href attribute that call to an out.php function that performs the opening of the destination link that I am trying to capture.
GET COUPON CODE
Thank you for your help

Your problem is a little unclear but if I understand correctly, your first problem is that the 'get coupon code' button looks like this when you render the HTML that you receive from the original page request.
The mark-up for a lot of this code is rendered dynamically using javascript. So that button is missing its href value until it gets loaded in later. You would need to also run the javascript on that page to render this after the initial request. You can't really get this easily using just the python requests library and BeautifulSoup. It will be a lot easier if you use Selenium too which lets you control a browser so it runs all that javascript for you and then you can just get the button info a couple of seconds after loading the page.
There is a way to do this all with plain requests, but it's a bit tedious. You would need to read through the requests the page makes and figure out which one gets the link for the button. The upside to this is it would cut the number of steps to get the info you need and the amount of time it takes to get. You could just use this new request every time to get the right PHP link then just get the info from there.
For your second point, I'm also not sure if I answered it already, but maybe you're also trying to get the redirect link from that PHP link. From inspecting the network requests, it looks like the info will be found in the response headers, there is no body to inspect.
(I know it says 'from cache' but the point is that the redirect is being caused by the header info)

Extracting entire webpage source code using Selenium

I'm new to Selenium and need help with a task. I want to somehow download the source code of a webpage, change the image src's and then take a screenshot of the resulting webpage. I need to do this using a mobile emulator, hence I am using selenium. The screenshots have to be reflective of the mobile emulator (the screenshots need to be as if you opened the webpage on a mobile device).
I know how to open a local html file using selenium as well as take a screenshot through selenium. I also already have the images locally stored in my working directory.
However, the page source code that I get from selenium doesn't actually look like if we open that webpage on a mobile broswer. I used the following code to get the source html:
html = driver.page_source
However, if I save this html code and reload it using selenium and then take a screenshot, it looks nothing like the original page (most of the time). The dimensions are ok (that of a mobile browser) but the elements are all missing, even if I haven't replaced the image sources yet. Is there a way to get visually similar code or another way to change the image sources?

Extracting info from webpage via python

I'd like to ask somebody with experience with headless browsers and python if it's possible to extract box info with distance from closest strike on webpage below. Till now I was using python bs4 but since everything is driven by jQuery here simple download of webpage doesn't work. I found PhantomJS but I wasn't able extract it too so I am not sure if it's possible. Thanks for hints.
https://lxapp.weatherbug.net/v2/lxapp_impl.html?lat=49.13688&lon=16.56522&v=1.2.0

This isn't really a Linux question, it's a StackOverflow question, so I won't go into too much detail.
The thing you want to do can be easily done with Selenium. Selenium has both a headless mode, and a heady mode (where you can watch it open your browser and click on things). The DOM query API is a bit less extensive than bs4, but it does have nice visual query (location on screen) functions. So you would write a Python script that initializes Selenium, goes to your website and interacts with it. You may need to do some image recognition on screenshots at some point. It may be as simple as finding for a certain query image on the screen, or something much more complicated.
You'd have to go through the Selenium tutorials first to see how it works, which would take you 1-2 days. Then figure out what Selenium stuff you can use to do what you want, that depends on luck and whether what you want happens to be easy or hard for that particular website.
Instead of using Selenium, though, I recommend trying to reverse engineer the API. For example, the page you linked to hits https://cmn-lx.pulse.weatherbug.net/data/lightning/v1/spark with parameters like:
_
callback
isGpsLocation
location
locationtype
safetyMessage
shortMessage
units
verbose
authid
timestamp
hash
You can figure out by trial and error which ones you need and what to put in them. You can capture requests from your browser and then read them yourself. Then construct appropriate requests from a Python program and hit their API. It would save you from having to deal with a Web UI designed for humans.

Read browser image and click without using selenium?

I was just trying to figure out how to get python to see the browser window and read the image there(I don't want to take a snapshot, if possible) process it and then click the relevant button on screen.
As a project I want to avoid selenium. I can't seem to find the correct library or any tutorials for this. Can someone tell me the library and possibly a tutorial?
I am on Ubuntu.
Thanks

Access widget window beautifulsoup python mechanize

I am trying to scrape information off websites like this:
https://www.glassdoor.com/Overview/Working-at-7-Eleven-EI_IE3581.11,19.htm
using python + beautifulsoup + mechanize.
Accessing anything on the main-site is no problem. However, I also need the information that appears in a overlay-window that appears when one clicks on the "Rating Trends" button next to the bar with stars.
This overlay-window can also be accessed directly by using the url:
https://www.glassdoor.com/Reviews/7-Eleven-Reviews-E3581.htm#trends-overallRating
The html associated with this page is a modification of the original site's html.
However, regardless of what element I try to find (via findAll ) on that overlay-window website, beautifulsoup returns zero hits.
How can I fix this? I tried adding a sleep time between accessing the website and reading anything in, to no avail.
Thanks!

If you're using the Chrome browser select the background of that page (without the additional information displayed) and select 'Inspect' from the context menu (for Windows anyway), then the 'Network' tab, so that you can see network traffic. Now click on 'Rating trends'. The entry marked 'xhr' will be https://www.glassdoor.ca/api/employer/3581-rating.htm?locationStr=&jobTitleStr=&filterCurrentEmployee=false&filterEmploymentStatus=REGULAR&filterEmploymentStatus=PART_TIME (I much hope!) and its contents will be the following.
{"employerId":3581,"ratings":[{"hasRating":true,"type":"overallRating","value":2.9},{"hasRating":true,"type":"ceoRating","value":0.54},{"hasRating":true,"type":"bizOutlook","value":0.35},{"hasRating":true,"type":"recommend","value":0.4},{"hasRating":true,"type":"compAndBenefits","value":2.4},{"hasRating":true,"type":"cultureAndValues","value":2.5},{"hasRating":true,"type":"careerOpportunities","value":2.5},{"hasRating":true,"type":"workLife","value":2.4},{"hasRating":true,"type":"seniorManagement","value":2.3}],"week":0,"year":0}
Whether this URL can be altered for use in obtaining information for other employers, I regret, I cannot tell you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.