I am using python3 in combination with beautifulsoup.
I want to check if a website is responsive or not. First I thought checking the meta tags of a website and see if there is something like this in it:
content="width=device-width, initial-scale=1.0
Accuracy is not that good using this method but I have not found something better.
Has anybody an idea?
Basically I want to do the same as Google did it here: https://search.google.com/test/mobile-friendly reduced to the output if the website is responsive or not (Y/N)
(Just a suggestion)
I am not an expert on this but my first thought is that you need to render the website and see if it "responds" to different screen sizes. I would normally use something like phantomjs to do this.
Apparently, you can do this in python with selenium (more info at https://stackoverflow.com/a/15699761/3727050). A more comprehensive list of technologies that can be used for this task can be found here. Note that these resources seem a bit old/outdated and some solutions fallback to python subprocess calling phantomjs.
The linked google test seems to
Load the page in a small browser and check:
The font-size to be readable
The distance between clickable elements to ensure the page is usable
I would however do the following:
Load the page in desktop mode, record each div's style.
Gradually reduce the size of the screen and see which percentage of these change style
In most cases, from a large screen to a phone size you should be seeing 1-3 distinct layouts which should be identifiable from the percentage of elements changing style
The above does not guarantee that the page is "mobile-friendly" (ie usable in a mobile) but it shows if the CSS are responsive.
Related
Using Selenium to try and automate a bit of data entry with Salesforce. I have gotten my script to load a webpage, allow me to login, and click an "edit" button.
My next step is to enter data into a field. However, I keep getting an error about the field not being found. I've tried to identify it by XPATH, NAME, and ID and continue to get the error. For reference, my script works with a simple webpage like Google. I have a feeling that clicking the edit button in Salesforce opens either another window or frame (sorry if I'm using the wrong terminology). Things I've tried:
Looking for other frames (can't seem to find any in the HTML)
Having my script wait until the element is present (doesn't seem to work)
Any other options? Thank you!
Salesforce's Lighting Experience (the new white-blue UI) is built with web components that hide their internal implementation details. You'd need to read up a bit about "shadow DOM", it's not a "happy soup" of html and JS all chucked into top page's html. Means that CSS is limited to that one component, there's no risk of spilling over or overwriting another page area's JS function if you both declare function with same name - but it also means it's much harder to get into element's internals.
You'll have to read up about how Selenium deals with Shadow DOM. Some companies claim they have working Lightning UI automated tests/ Heard good stuff about Provar, haven't used it myself.
For custom UI components SF developer has option to use "light dom", for standard UI you'll struggle a bit. If you're looking for some automation without fighting with Lighting Experience (especially that with 3 releases/year SF sometimes changes the structure of generated html, breaking old tests) - you could consider switching over to classic UI for the test? It'll be more accessible for Selenium, won't be exactly same thing the user does - but server-side errors like required fields, validation rules should fire all the same.
I'd like to ask somebody with experience with headless browsers and python if it's possible to extract box info with distance from closest strike on webpage below. Till now I was using python bs4 but since everything is driven by jQuery here simple download of webpage doesn't work. I found PhantomJS but I wasn't able extract it too so I am not sure if it's possible. Thanks for hints.
https://lxapp.weatherbug.net/v2/lxapp_impl.html?lat=49.13688&lon=16.56522&v=1.2.0
This isn't really a Linux question, it's a StackOverflow question, so I won't go into too much detail.
The thing you want to do can be easily done with Selenium. Selenium has both a headless mode, and a heady mode (where you can watch it open your browser and click on things). The DOM query API is a bit less extensive than bs4, but it does have nice visual query (location on screen) functions. So you would write a Python script that initializes Selenium, goes to your website and interacts with it. You may need to do some image recognition on screenshots at some point. It may be as simple as finding for a certain query image on the screen, or something much more complicated.
You'd have to go through the Selenium tutorials first to see how it works, which would take you 1-2 days. Then figure out what Selenium stuff you can use to do what you want, that depends on luck and whether what you want happens to be easy or hard for that particular website.
Instead of using Selenium, though, I recommend trying to reverse engineer the API. For example, the page you linked to hits https://cmn-lx.pulse.weatherbug.net/data/lightning/v1/spark with parameters like:
_
callback
isGpsLocation
location
locationtype
safetyMessage
shortMessage
units
verbose
authid
timestamp
hash
You can figure out by trial and error which ones you need and what to put in them. You can capture requests from your browser and then read them yourself. Then construct appropriate requests from a Python program and hit their API. It would save you from having to deal with a Web UI designed for humans.
I am trying to grab a bunch numbers that are presented in a table on a web page that I’ve accessed using python and Selenium running headless on a Raspberry Pi. The numbers are not in the page source, rather they are deeply embedded in complex html served by several URLs called by the main page (the numbers update every few seconds). I know I could parse the html to get the numbers I want, but the numbers are already sitting on the front page in perfect format all in one place. I can select and copy the numbers when I view the web page in Chrome on my PC.
How can I use python and get Selenium webdriver to get me those numbers? Can Selenium simply provide all the visible text on a page? How? (I've tried driver.page_source but the text returned does not contain the numbers). Or is there a way to essentially copy text and numbers from a table visible on the screen using python and Selenium? (I’ve looked into xdotool but didn’t find enough documentation to help). I’m just learning Selenium so any suggestions will be much appreciated!
Well, I figured out the answer to my question. It's embarrassingly easy. This line gets just what I need - all the text that is visible on the web page:
page_text = driver.find_element_by_tag_name('body').text
So, there are some different situations why you can not get some info on the page:
Information doesn't loaded yet. You must waiting for some time to get your information ready. You may watch this theme for the better understanding. Some times you get dynamically added page elements with JS and so on, which loading is very slowly.
Information may consists of different type of data. For example you are waiting for a text with numbers, but you may get picture with numbers on the page. In this situation you must change your programming tactics and use another functions to get what you need.
I'm doing webpage layout analysis in python. A fundamental task is to programmatically measure the elements' sizes given HTML source codes, so that we could obtain statistical data of content/ad ratio, ad block position, ad block size for the webpage corpus.
An obvious approach is to use the width/height attributes, but they're not always available. Besides, things like width: 50% needs to be calculated after loading into DOM. So I guess loading the HTML source code into a window-size-predefined-browser (like mechanize although I'm not sure if window's size could be set) is a good way to try, but mechanize doesn't support the return of an element size anyway.
Is there any universal way (without width/height attributes) to do it in python, preferably with some library?
Thanks!
I suggest You to take a look at Ghost - webkit web client written in python. It has JavaScript support so you can easily call JavaScript functions and get its return value.
Example shows how to find out google text box width:
>>> from ghost import Ghost
>>> ghost = Ghost()
>>> ghost.open('https://google.lt')
>>> width, resources = ghost.evaluate("document.getElementById('gbqfq').offsetWidth;")
>>> width
541.0 # google text box width 541px
To properly get all the final sizes, you need to render the contents, taking in account all CSS style sheets, and possibly all javascript. Therefore, the only ways to get the sizes from a Python program are to have a full web browser implementation in Python, use a library that can do so, or pilot a browser off-process, remotely.
The later approach can be done with use of the Selenium tools - check how you can get the result of javascript expressions from within a Python program here: Can Selenium web driver have access to javascript global variables?
What is the best way to determine if a page on a website is REALLY displaying a specific img tag like this <img src=http://domain.com/img.jpg>? A simple string comparison is easy to fool using http comments <!-- -->. Even if the html tag exists it could be deleted with JavaScript. It could also be obscured by placing an image over it using CSS. Do you know of a solid method of detecting the img tag dispute these obscuring attacks listed? Do you know of another method of obscuring the image? Python code to detect the image would be ideal, but if you know of a good tactic or method that will earn you a +1 from me.
I don't think you can ever be sure. First, you're not even sure the program will stop.
Aside from that, consider the following scenarios. Your <img> can be added, removed or get obscured using JavaScript, CSS and/or server-side:
randomly.
at specific times.
to a certain part of the world.
according to differences and bugs between browsers.
Google is facing a similar problem - people are hiding search keywords in hidden text and links to get a better rank. Their solution is to penalize sites with hidden text. They get away with it because they're Google; people depend on them for traffic.
As for you, you can't do much better than to ask nicely...
The only surefire way I can think of is to render the page and check. It is simple to strip comments etc. But if scripts are involved, it is not possible to have a general solution that will not amount to executing them (I believe this is the first time I ever invoked Church's theorem...).
You could place a script anywhere that processes the request, counts the view and delivers the image like this:
http://yourhost.com/imageprocess?image=media/foo/bar.jpg
Then you can be sure that the image was loaded. If if was viewed, you of course can't be sure, however.