How to scrape location objects?

How to scrape location objects? - python

I have a list of URL's and would like to scrape the location objects for each of their webpages. The data I am referring to is produced by typing "window.location" into your browser's console. For example, performing this action on www.github.com with Chrome would give you the something like following output:
Location {assign: function, replace: function, reload: function, ancestorOrigins: DOMStringList, origin: "https://github.com"…}
When expanded, you can see more information:
Location {
ancestorOrigins: DOMStringList
assign: function () { [native code] }
hash: ""
host: "github.com"
hostname: "github.com"
href: "https://github.com/"
origin: "https://github.com"
pathname: "/"
port: ""
protocol: "https:"
reload: function () { [native code] }
replace: function () { [native code] }
search: ""
toString: function toString() { [native code] }
valueOf: function valueOf() { [native code] }
__proto__: Location
}
I have used Python and the Mechanize library to scrape in the past, but have never desired this functionality until now and am not sure how to proceed. Any suggestions would be welcomed.

As far as I understand, you want to perform a JavaScript call on desired web page. My suggestion would be to use some headless browsers. I did similar things with Framework called PyQt4. You can also use other headless web browsers like PhantomJS. Or you may also be interesting with tool called Selenium.

Related

How to Detect URL in text and convert it to link with python? [duplicate]

I am using the function below to match URLs inside a given text and replace them for HTML links. The regular expression is working great, but currently I am only replacing the first match.
How I can replace all the URL? I guess I should be using the exec command, but I did not really figure how to do it.
function replaceURLWithHTMLLinks(text) {
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/i;
return text.replace(exp,"<a href='$1'>$1</a>");
}

First off, rolling your own regexp to parse URLs is a terrible idea. You must imagine this is a common enough problem that someone has written, debugged and tested a library for it, according to the RFCs. URIs are complex - check out the code for URL parsing in Node.js and the Wikipedia page on URI schemes.
There are a ton of edge cases when it comes to parsing URLs: international domain names, actual (.museum) vs. nonexistent (.etc) TLDs, weird punctuation including parentheses, punctuation at the end of the URL, IPV6 hostnames etc.
I've looked at a ton of libraries, and there are a few worth using despite some downsides:
Soapbox's linkify has seen some serious effort put into it, and a major refactor in June 2015 removed the jQuery dependency. It still has issues with IDNs.
AnchorMe is a newcomer that claims to be faster and leaner. Some IDN issues as well.
Autolinker.js lists features very specifically (e.g. "Will properly handle HTML input. The utility will not change the href attribute inside anchor () tags"). I'll thrown some tests at it when a demo becomes available.
Libraries that I've disqualified quickly for this task:
Django's urlize didn't handle certain TLDs properly (here is the official list of valid TLDs. No demo.
autolink-js wouldn't detect "www.google.com" without http://, so it's not quite suitable for autolinking "casual URLs" (without a scheme/protocol) found in plain text.
Ben Alman's linkify hasn't been maintained since 2009.
If you insist on a regular expression, the most comprehensive is the URL regexp from Component, though it will falsely detect some non-existent two-letter TLDs by looking at it.

Replacing URLs with links (Answer to the General Problem)
The regular expression in the question misses a lot of edge cases. When detecting URLs, it's always better to use a specialized library that handles international domain names, new TLDs like .museum, parentheses and other punctuation within and at the end of the URL, and many other edge cases. See the Jeff Atwood's blog post The Problem With URLs for an explanation of some of the other issues.
The best summary of URL matching libraries is in Dan Dascalescu's Answer
(as of Feb 2014)
"Make a regular expression replace more than one match" (Answer to the specific problem)
Add a "g" to the end of the regular expression to enable global matching:
/ig;
But that only fixes the problem in the question where the regular expression was only replacing the first match. Do not use that code.

I've made some small modifications to Travis's code (just to avoid any unnecessary redeclaration - but it's working great for my needs, so nice job!):
function linkify(inputText) {
var replacedText, replacePattern1, replacePattern2, replacePattern3;
//URLs starting with http://, https://, or ftp://
replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
replacedText = inputText.replace(replacePattern1, '$1');
//URLs starting with "www." (without // before it, or it'd re-link the ones done above).
replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
replacedText = replacedText.replace(replacePattern2, '$1$2');
//Change email addresses to mailto:: links.
replacePattern3 = /(([a-zA-Z0-9\-\_\.])+#[a-zA-Z\_]+?(\.[a-zA-Z]{2,6})+)/gim;
replacedText = replacedText.replace(replacePattern3, '$1');
return replacedText;
}

Made some optimizations to Travis' Linkify() code above. I also fixed a bug where email addresses with subdomain type formats would not be matched (i.e. example#domain.co.uk).
In addition, I changed the implementation to prototype the String class so that items can be matched like so:
var text = 'address#example.com';
text.linkify();
'http://stackoverflow.com/'.linkify();
Anyway, here's the script:
if(!String.linkify) {
String.prototype.linkify = function() {
// http://, https://, ftp://
var urlPattern = /\b(?:https?|ftp):\/\/[a-z0-9-+&##\/%?=~_|!:,.;]*[a-z0-9-+&##\/%=~_|]/gim;
// www. sans http:// or https://
var pseudoUrlPattern = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
// Email addresses
var emailAddressPattern = /[\w.]+#[a-zA-Z_-]+?(?:\.[a-zA-Z]{2,6})+/gim;
return this
.replace(urlPattern, '$&')
.replace(pseudoUrlPattern, '$1$2')
.replace(emailAddressPattern, '$&');
};
}

Thanks, this was very helpful. I also wanted something that would link things that looked like a URL -- as a basic requirement, it'd link something like www.yahoo.com, even if the http:// protocol prefix was not present. So basically, if "www." is present, it'll link it and assume it's http://. I also wanted emails to turn into mailto: links. EXAMPLE: www.yahoo.com would be converted to www.yahoo.com
Here's the code I ended up with (combination of code from this page and other stuff I found online, and other stuff I did on my own):
function Linkify(inputText) {
//URLs starting with http://, https://, or ftp://
var replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
var replacedText = inputText.replace(replacePattern1, '$1');
//URLs starting with www. (without // before it, or it'd re-link the ones done above)
var replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
var replacedText = replacedText.replace(replacePattern2, '$1$2');
//Change email addresses to mailto:: links
var replacePattern3 = /(\w+#[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
var replacedText = replacedText.replace(replacePattern3, '$1');
return replacedText
}
In the 2nd replace, the (^|[^/]) part is only replacing www.whatever.com if it's not already prefixed by // -- to avoid double-linking if a URL was already linked in the first replace. Also, it's possible that www.whatever.com might be at the beginning of the string, which is the first "or" condition in that part of the regex.
This could be integrated as a jQuery plugin as Jesse P illustrated above -- but I specifically wanted a regular function that wasn't acting on an existing DOM element, because I'm taking text I have and then adding it to the DOM, and I want the text to be "linkified" before I add it, so I pass the text through this function. Works great.

Identifying URLs is tricky because they are often surrounded by punctuation marks and because users frequently do not use the full form of the URL. Many JavaScript functions exist for replacing URLs with hyperlinks, but I was unable to find one that works as well as the urlize filter in the Python-based web framework Django. I therefore ported Django's urlize function to JavaScript:
https://github.com/ljosa/urlize.js
An example:
urlize('Go to SO (stackoverflow.com) and ask. <grin>',
{nofollow: true, autoescape: true})
=> "Go to SO (stackoverflow.com) and ask. <grin>"
The second argument, if true, causes rel="nofollow" to be inserted. The third argument, if true, escapes characters that have special meaning in HTML. See the README file.

I searched on google for anything newer and ran across this one:
$('p').each(function(){
$(this).html( $(this).html().replace(/((http|https|ftp):\/\/[\w?=&.\/-;#~%-]+(?![\w\s?&.\/;#~%"=-]*>))/g, '$1 ') );
});
demo: http://jsfiddle.net/kachibito/hEgvc/1/
Works really well for normal links.

I made a change to Roshambo String.linkify() to the emailAddressPattern to recognize aaa.bbb.#ccc.ddd addresses
if(!String.linkify) {
String.prototype.linkify = function() {
// http://, https://, ftp://
var urlPattern = /\b(?:https?|ftp):\/\/[a-z0-9-+&##\/%?=~_|!:,.;]*[a-z0-9-+&##\/%=~_|]/gim;
// www. sans http:// or https://
var pseudoUrlPattern = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
// Email addresses *** here I've changed the expression ***
var emailAddressPattern = /(([a-zA-Z0-9_\-\.]+)#[a-zA-Z_]+?(?:\.[a-zA-Z]{2,6}))+/gim;
return this
.replace(urlPattern, '<a target="_blank" href="$&">$&</a>')
.replace(pseudoUrlPattern, '$1<a target="_blank" href="http://$2">$2</a>')
.replace(emailAddressPattern, '<a target="_blank" href="mailto:$1">$1</a>');
};
}

/**
* Convert URLs in a string to anchor buttons
* #param {!string} string
* #returns {!string}
*/
function URLify(string){
var urls = string.match(/(((ftp|https?):\/\/)[\-\w#:%_\+.~#?,&\/\/=]+)/g);
if (urls) {
urls.forEach(function (url) {
string = string.replace(url, '<a target="_blank" href="' + url + '">' + url + "</a>");
});
}
return string.replace("(", "<br/>(");
}
simple example

The best script to do this:
http://benalman.com/projects/javascript-linkify-process-lin/

This solution works like many of the others, and in fact uses the same regex as one of them, however in stead of returning a HTML String this will return a document fragment containing the A element and any applicable text nodes.
function make_link(string) {
var words = string.split(' '),
ret = document.createDocumentFragment();
for (var i = 0, l = words.length; i < l; i++) {
if (words[i].match(/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi)) {
var elm = document.createElement('a');
elm.href = words[i];
elm.textContent = words[i];
if (ret.childNodes.length > 0) {
ret.lastChild.textContent += ' ';
}
ret.appendChild(elm);
} else {
if (ret.lastChild && ret.lastChild.nodeType === 3) {
ret.lastChild.textContent += ' ' + words[i];
} else {
ret.appendChild(document.createTextNode(' ' + words[i]));
}
}
}
return ret;
}
There are some caveats, namely with older IE and textContent support.
here is a demo.

If you need to show shorter link (only domain), but with same long URL, you can try my modification of Sam Hasler's code version posted above
function replaceURLWithHTMLLinks(text) {
var exp = /(\b(https?|ftp|file):\/\/([-A-Z0-9+&##%?=~_|!:,.;]*)([-A-Z0-9+&##%?\/=~_|!:,.;]*)[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp, "<a href='$1' target='_blank'>$3</a>");
}

Reg Ex:
/(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|]*)/ig
function UriphiMe(text) {
var exp = /(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|]*)/ig;
return text.replace(exp,"<a href='$1'>$1</a>");
}
Below are some tested string:
Find me on to www.google.com
www
Find me on to www.http://www.com
Follow me on : http://www.nishantwork.wordpress.com
http://www.nishantwork.wordpress.com
Follow me on : http://www.nishantwork.wordpress.com
https://stackoverflow.com/users/430803/nishant
Note: If you don't want to pass www as valid one just use below reg ex:
/(\b((https?|ftp|file):\/\/|(www))[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig

The warnings about URI complexity should be noted, but the simple answer to your question is:
To replace every match you need to add the /g flag to the end of the RegEx:
/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gi

Try the below function :
function anchorify(text){
var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
var text1=text.replace(exp, "<a href='$1'>$1</a>");
var exp2 =/(^|[^\/])(www\.[\S]+(\b|$))/gim;
return text1.replace(exp2, '$1<a target="_blank" href="http://$2">$2</a>');
}
alert(anchorify("Hola amigo! https://www.sharda.ac.in/academics/"));

Keep it simple! Say what you cannot have, rather than what you can have :)
As mentioned above, URLs can be quite complex, especially after the '?', and not all of them start with a 'www.' e.g. maps.bing.com/something?key=!"£$%^*()&lat=65&lon&lon=20
So, rather than have a complex regex that wont meet all edge cases, and will be hard to maintain, how about this much simpler one, which works well for me in practise.
Match
http(s):// (anything but a space)+
www. (anything but a space)+
Where 'anything' is [^'"<>\s]
... basically a greedy match, carrying on to you meet a space, quote, angle bracket, or end of line
Also:
Remember to check that it is not already in URL format, e.g. the text contains href="..." or src="..."
Add ref=nofollow (if appropriate)
This solution isn't as "good" as the libraries mentioned above, but is much simpler, and works well in practise.
if html.match( /(href)|(src)/i )) {
return html; // text already has a hyper link in it
}
html = html.replace(
/\b(https?:\/\/[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='$1'>$1</a>"
);
html = html.replace(
/\s(www\.[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='http://$1'>$1</a>"
);
html = html.replace(
/^(www\.[^\s\(\)\'\"\<\>]+)/ig,
"<a ref='nofollow' href='http://$1'>$1</a>"
);
return html;

Correct URL detection with international domains & astral characters support is not trivial thing. linkify-it library builds regex from many conditions, and final size is about 6 kilobytes :) . It's more accurate than all libs, currently referenced in accepted answer.
See linkify-it demo to check live all edge cases and test your ones.
If you need to linkify HTML source, you should parse it first, and iterate each text token separately.

I've wrote yet another JavaScript library, it might be better for you since it's very sensitive with the least possible false positives, fast and small in size. I'm currently actively maintaining it so please do test it in the demo page and see how it would work for you.
link: https://github.com/alexcorvi/anchorme.js

I had to do the opposite, and make html links into just the URL, but I modified your regex and it works like a charm, thanks :)
var exp = /<a\s.*href=['"](\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])['"].*>.*<\/a>/ig;
source = source.replace(exp,"$1");

The e-mail detection in Travitron's answer above did not work for me, so I extended/replaced it with the following (C# code).
// Change e-mail addresses to mailto: links.
const RegexOptions o = RegexOptions.Multiline | RegexOptions.IgnoreCase;
const string pat3 = #"([a-zA-Z0-9_\-\.]+)#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,6})";
const string rep3 = #"$1#$2.$3";
text = Regex.Replace(text, pat3, rep3, o);
This allows for e-mail addresses like "firstname.secondname#one.two.three.co.uk".

After input from several sources I've now a solution that works well. It had to do with writing your own replacement code.
Answer.
Fiddle.
function replaceURLWithHTMLLinks(text) {
var re = /(\(.*?)?\b((?:https?|ftp|file):\/\/[-a-z0-9+&##\/%?=~_()|!:,.;]*[-a-z0-9+&##\/%=~_()|])/ig;
return text.replace(re, function(match, lParens, url) {
var rParens = '';
lParens = lParens || '';
// Try to strip the same number of right parens from url
// as there are left parens. Here, lParenCounter must be
// a RegExp object. You cannot use a literal
// while (/\(/g.exec(lParens)) { ... }
// because an object is needed to store the lastIndex state.
var lParenCounter = /\(/g;
while (lParenCounter.exec(lParens)) {
var m;
// We want m[1] to be greedy, unless a period precedes the
// right parenthesis. These tests cannot be simplified as
// /(.*)(\.?\).*)/.exec(url)
// because if (.*) is greedy then \.? never gets a chance.
if (m = /(.*)(\.\).*)/.exec(url) ||
/(.*)(\).*)/.exec(url)) {
url = m[1];
rParens = m[2] + rParens;
}
}
return lParens + "<a href='" + url + "'>" + url + "</a>" + rParens;
});
}

Here's my solution:
var content = "Visit https://wwww.google.com or watch this video: https://www.youtube.com/watch?v=0T4DQYgsazo and news at http://www.bbc.com";
content = replaceUrlsWithLinks(content, "http://");
content = replaceUrlsWithLinks(content, "https://");
function replaceUrlsWithLinks(content, protocol) {
var startPos = 0;
var s = 0;
while (s < content.length) {
startPos = content.indexOf(protocol, s);
if (startPos < 0)
return content;
let endPos = content.indexOf(" ", startPos + 1);
if (endPos < 0)
endPos = content.length;
let url = content.substr(startPos, endPos - startPos);
if (url.endsWith(".") || url.endsWith("?") || url.endsWith(",")) {
url = url.substr(0, url.length - 1);
endPos--;
}
if (ROOTNS.utils.stringsHelper.validUrl(url)) {
let link = "<a href='" + url + "'>" + url + "</a>";
content = content.substr(0, startPos) + link + content.substr(endPos);
s = startPos + link.length;
} else {
s = endPos + 1;
}
}
return content;
}
function validUrl(url) {
try {
new URL(url);
return true;
} catch (e) {
return false;
}
}

Try Below Solution
function replaceLinkClickableLink(url = '') {
let pattern = new RegExp('^(https?:\\/\\/)?'+
'((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.?)+[a-z]{2,}|'+
'((\\d{1,3}\\.){3}\\d{1,3}))'+
'(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*'+
'(\\?[;&a-z\\d%_.~+=-]*)?'+
'(\\#[-a-z\\d_]*)?$','i');
let isUrl = pattern.test(url);
if (isUrl) {
return `${url}`;
}
return url;
}

Replace URLs in text with HTML links, ignore the URLs within a href/pre tag.
https://github.com/JimLiu/auto-link

worked for me :
var urlRegex =/(\b((https?|ftp|file):\/\/)?((([a-z\d]([a-z\d-]*[a-z\d])*)\.)+[a-z]{2,}|((\d{1,3}\.){3}\d{1,3}))(\:\d+)?(\/[-a-z\d%_.~+]*)*(\?[;&a-z\d%_.~+=-]*)?(\#[-a-z\d_]*)?)/ig;
return text.replace(urlRegex, function(url) {
var newUrl = url.indexOf("http") === -1 ? "http://" + url : url;
return '' + url + '';
});

BDD framework with python

I need to write a test using cucumber for a course.
Scenario:
Login,
Select first item link,
Add item to the shopping cart,
Proceed shopping cart page,
Check the item on that list is correct,
Proceed to checkout,
Complete and Logout.
The point I don't understand is, do I need to open a feature file for each step and do I need to close and reopen the browser for each step? How should I do this? What kind of path should I follow?
(Note that I am a beginner and my English skills are limited, so I need a simple explanation.)

Yes, you should make a new file for each feature. For example, login.gherkin, upload.gherkin, logout.gherkin.
PS. Sorry, I didn't realize you said Python, but it's the same idea.
Each file should have a layout like this:
# Created by Mick Jagger at 1/13/2021
#login
Feature: Login
Scenario: Login to Website Homepage
When Launch website
Then Enter username and password
And Log in
Then make the corresponding step file like this:
package glue;
import cucumber.api.java.en.*;
import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;
import java.util.List;
public class LoginSteps extends Driver {
public String username = "username";
public String password = "password";
#When("Launch Website")
public void launch_website() throws Throwable {
driver.get("https://www.website.com/");
Thread.sleep(2000);
}
#Then("^Enter username and password$")
public void enter_credentials() throws Throwable {
driver.findElement(By.xpath("//input[#aria-label='Phone number, username, or email']")).sendKeys(username);
driver.findElement(By.xpath("//input[#aria-label='Password']")).sendKeys(password);
}
#And("^Log in$")
public void log_in() throws Throwable {
Thread.sleep(1000);
List<WebElement> buttons = driver.findElements(By.tagName("button"));
buttons.get(1).click();
Thread.sleep(2000);
driver.findElement(By.tagName("button")).click();
Thread.sleep(1000);
buttons = driver.findElements(By.xpath("//button[#tabindex='0']"));
buttons.get(1).click();
}
}
If you're new to browser automation, I recommend learning the "Page Object Model Methodology". Also, I stopped using this annoying framework, it's supposed to make things easier but to me it just adds extra work.

Using PyperClip on web app

I am using pyperclip.py to grab a list of E-Mail Addresses in my web app using a form so a user can paste it locally via clipboard. It works perfect locally. However, while running it on a server (Linux 14.04 with Apache2) and accessed from a client system through the browser it doesn't copy. How can I get it to copy to the clipboard of the client's system?
Right now I'm just trying to get it to work and as such I'm only using a single line. I'm using pyperclip 1.5.15 with xclip and Python 3.4. The server is running Linux 14.04 and the client has noticed issues on Windows 8 and Windows 10 using Google Chrome and IE. No other os has currently been tested.
pyperclip.copy("HELLO")

Since I couldn't find many details on this subject I thought I'd answer my question. Unfortunately, it doesn't appear that browsers will support pyperclip so an HTML + Javascript work around is required (meaning on pyperclip). First, add your Django Template var as an HTML attribute from there you can use Javascript to handle the copy functionality. Below is an example of how to do this, sorry in advance because stackoverflow was giving some weird formatting to the example. It also assumes you have a form below with the id of email_list_clipboard. I hope this helps anyone else who may of run into a similar issue!
Example:
<html email-list="{{request.session.email_list}}">
<script>
$(document).ready(function () {
function copyTextToClipboard(text) {
var textArea = document.createElement("textarea");
// Place in top-left corner of screen regardless of scroll position.
textArea.style.position = 'fixed';
textArea.style.top = 0;
textArea.style.left = 0;
textArea.style.width = '2em';
textArea.style.height = '2em';
// We don't need padding, reducing the size if it does flash render.
textArea.style.padding = 0;
textArea.style.border = 'none';
textArea.style.outline = 'none';
textArea.style.boxShadow = 'none';
textArea.style.background = 'transparent';
textArea.value = text;
document.body.appendChild(textArea);
textArea.select();
try {
var successful = document.execCommand('copy');
var msg = successful ? 'successful' : 'unsuccessful';
console.log('Copying text command was ' + msg);
} catch (err) {
console.log('Oops, unable to copy');
}
document.body.removeChild(textArea);
}
// set things up so my function will be called when field_three changes
$('#email_list_clipboard').click(function (click) {
event.preventDefault();
copyTextToClipboard(document.documentElement.getAttribute("email-list"));
});
</script>

Unable to get browser console logs from a remote chrome browser [duplicate]

I want to build an automation testing, so I have to know the errors that appear in the console of chrome.
there is an option to get the error lines that appear in the console?
In order to see the console: right click somewhere in the page, click "inspect element" and then go to "console".

I don't know C# but here's Java code that does the job, I hope you can translate it to C#
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.logging.LogEntries;
import org.openqa.selenium.logging.LogEntry;
import org.openqa.selenium.logging.LogType;
import org.openqa.selenium.logging.LoggingPreferences;
import org.openqa.selenium.remote.CapabilityType;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.testng.annotations.AfterMethod;
import org.testng.annotations.BeforeMethod;
import org.testng.annotations.Test;
public class ChromeConsoleLogging {
private WebDriver driver;
#BeforeMethod
public void setUp() {
System.setProperty("webdriver.chrome.driver", "c:\\path\\to\\chromedriver.exe");
DesiredCapabilities caps = DesiredCapabilities.chrome();
LoggingPreferences logPrefs = new LoggingPreferences();
logPrefs.enable(LogType.BROWSER, Level.ALL);
caps.setCapability(CapabilityType.LOGGING_PREFS, logPrefs);
driver = new ChromeDriver(caps);
}
#AfterMethod
public void tearDown() {
driver.quit();
}
public void analyzeLog() {
LogEntries logEntries = driver.manage().logs().get(LogType.BROWSER);
for (LogEntry entry : logEntries) {
System.out.println(new Date(entry.getTimestamp()) + " " + entry.getLevel() + " " + entry.getMessage());
//do something useful with the data
}
}
#Test
public void testMethod() {
driver.get("http://mypage.com");
//do something on page
analyzeLog();
}
}
Pay attention to setUp method in above code. We use LoggingPreferences object to enable logging. There are a few types of logs, but if you want to track console errors then LogType.BROWSER is the one that you should use. Then we pass that object to DesiredCapabilities and further to ChromeDriver constructor and voila - we have an instance of ChromeDriver with logging enabled.
After performing some actions on page we call analyzeLog() method. Here we simply extract the log and iterate through its entries. Here you can put assertions or do any other reporting you want.
My inspiration was this code by Michael Klepikov that explains how to extract performance logs from ChromeDriver.

You can get logs this way:
Driver().Manage().Logs.GetLog();
By specifying what log you are interested in you can get the browser log, that is:
Driver().Manage().Logs.GetLog(LogType.Browser);
Also remember to setup your driver accordingly:
ChromeOptions options = new ChromeOptions();
options.SetLoggingPreference(LogType.Browser, LogLevel.All);
driver = new ChromeDriver("path to driver", options);

This is the c# code for logging the brower log from chrome.
private void CheckLogs()
{
List<LogEntry> logs = Driver.Manage().Logs.GetLog(LogType.Browser).ToList();
foreach (LogEntry log in logs)
{
Log(log.Message);
}
}
here is my code for the actual log:
public void Log(string value, params object[] values)
{
// allow indenting
if (!String.IsNullOrEmpty(value) && value.Length > 0 && value.Substring(0, 1) != "*")
{
value = " " + value;
}
// write the log
Console.WriteLine(String.Format(value, values));
}

As per issue 6832 logging is not implemented yet for C# bindings. So there might not be an easy way to get this working as of now.

Here is a solution to get Chrome logs using the C#, Specflow and Selenium 4.0.0-alpha05.
Pay attention that the same code doesn't work with Selenium 3.141.0.
[AfterScenario]
public void AfterScenario(ScenarioContext context)
{
if (context.TestError != null)
{
GetChromeLogs(context); //Chrome logs are taken only if test fails
}
Driver.Quit();
}
private void GetChromeLogs()
{
var chromeLogs = Driver.Manage().Logs.GetLog(LogType.Browser).ToList();
}

public void Test_DetectMissingFilesToLoadWebpage()
{
try
{
List<LogEntry> logs = driver.Manage().Logs.GetLog(LogType.Browser).ToList();
foreach (LogEntry log in logs)
{
while(logs.Count > 0)
{
String logInfo = log.ToString();
if (log.Message.Contains("Failed to load resource: the server responded with a status of 404 (Not Found)"))
{
Assert.Fail();
}
else
{
Assert.Pass();
}
}
}
}
catch (NoSuchElementException e)
{
test.Fail(e.StackTrace);
}
}
You could do something like this in C#. It is a complete test case. Then print the console output as String i.e logInfo in your report. For some reason, Log(log.Message) from the solution above this one gave me build errors.So, I replaced it.

C# bindings to the Chrome console logs are finally available in Selenium 4.0.0-alpha05. Selenium 3.141.0 and prior do not have support.
Before instantiating a new ChromeDriver object, set the logging preference in a ChromeOptions object and pass that into ChromeDriver:
ChromeOptions options = new ChromeOptions();
options.SetLoggingPreference(LogType.Browser, LogLevel.All);
ChromeDriver driver = new ChromeDriver(options);
Then, to write the Chrome console logs to a flat file:
public void WriteConsoleErrors()
{
string strPath = "C:\\ConsoleErrors.txt";
if (!File.Exists(strPath))
{
File.Create(strPath).Dispose();
}
using (StreamWriter sw = File.AppendText(strPath))
{
var entries = driver.Manage().Logs.GetLog(LogType.Browser);
foreach (var entry in entries)
{
sw.WriteLine(entry.ToString());
}
}
}

driver.manage().logs().get("browser")
Gets all logs printed on the console. I was able to get all logs except Violations. Please have a look here Chrome Console logs not printing Violations

Autocomplete for Python in Codemirror?

I am trying to set up an autocomplete feature for Codemirror for the Python language. Unfortunately, it seems that Codemirror only includes the files necessary for Javascript key term completion.
Has anyone built Python hint file for CodeMirror similar to the JavaScript Version?
(Edit for future reference: link to similar question on CodeMirror Google Group)

I'm the original author of the Python parser for Codemirror (1 and 2). You are correct that the Python parser does not offer enough information for autocomplete. I tried to build it into the parser when Codemirror 2 came around but it proved too difficult for my JS skills at the time.
I have far more skills now but far less time. Perhaps someday I'll get back to it. Or if someone wants to take it up, I would be glad to help.

Add python-hint.js, show-hint.js, show-hint.css. Then
var editor = CodeMirror.fromTextArea(your editor instance codes here;
editor.on('inputRead', function onChange(editor, input) {
if (input.text[0] === ';' || input.text[0] === ' ' || input.text[0] === ":") {
return;
}
editor.showHint({
hint: CodeMirror.pythonHint
});
});

< script >
var editor = CodeMirror.fromTextArea(document.getElementById("code"), {
mode: {
name: "python",
version: 3,
singleLineStringErrors: false
},
lineNumbers: true,
indentUnit: 4,
extraKeys: {
"Ctrl-Space": "autocomplete"
},
matchBrackets: true
});
CodeMirror.commands.autocomplete = function (cm) {
CodeMirror.simpleHint(cm, CodeMirror.pythonHint);
}
</script>

I start the python autocomplete with a js based on pig-hint from codemirror 3.
You can get the python-hint.js from here.
to work, you need in your html:
include simple-hint,js and python-hint.js, simple-hint.css plus codemirror.js
add this script:
<script>
CodeMirror.commands.autocomplete = function(cm) {
CodeMirror.simpleHint(cm, CodeMirror.pythonHint);
}
</script>
python-hint.js is a basic js I have created today and not reviewed in depth.

You can initialize this way also, adding extraKeys parameter to CodeMirror initialization:
CodeMirror(function(elt) {
myTextArea.parentNode.replaceChild(elt, myTextArea);
}, {
mode: "python",
lineNumbers: true,
autofocus: true,
extraKeys: {"Ctrl-Space": "autocomplete"}
});

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.