Get Ublock Origin logger datas using Python and selenium - python

I'd like to know the number of blocked trackers detected by Ublock Origin using Python (running on linux server, so no GUI) and Selenium (with firefox driver). I don't necessarly need to really block them but i need to know how much there are.
Ublock Origin has a logger (https://github.com/gorhill/uBlock/wiki/The-logger#settings-dialog)) which i'd like to scrap.
This logger is available through an url like this: moz-extension://fc469b55-3182-4104-a95c-6b0b4f87cf0f/logger-ui.html#_ where the part in italic is the UUID of Ublock Origin Addon.
In this logger, for each entry, there is a div with class set to "logEntry" (yellow oblong in the screenshot below), and i'd like to get the datas in the green oblong:
So far, i got this:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options as FirefoxOptions
browser_options = FirefoxOptions()
browser_options.headless = True
# Activate add on
str_ublock_extension_path = "/usr/local/bin/uBlock0_1.45.3b10.firefox.signed.xpi"
browser = webdriver.Firefox(executable_path='/usr/loca/bin/geckodriver',options=browser_options)
str_id = browser.install_addon(str_ublock_extension_path)
# Getting the UUID which is new each time the script is launched
profile_path = browser.capabilities['moz:profile']
id_extension_firefox = "uBlock0#raymondhill.net"
with open('{}/prefs.js'.format(profile_path), 'r') as file_prefs:
lines = file_prefs.readlines()
for line in lines:
if 'extensions.webextensions.uuids' in line:
sublines = line.split(',')
for subline in sublines:
if id_extension_firefox in subline:
internal_uuid = subline.split(':')[1][2:38]
str_uoo_panel_url = "moz-extension://" + internal_uuid + "/logger-ui.html#_"
ubo_logger = browser.get(str_uoo_panel_url)
ubo_logger_log_entries = ubo_logger.find_element(By.CLASS_NAME, "logEntry")
for log_entrie in ubo_logger_log_entries:
print(log_entrie.text)
Using this "weird" url with moz-extension:// seems to work considering that print(browser.page_source) will display some relevant html code.
Problem: ubo_logger.find_element(By.CLASS_NAME, "logEntry") got nothing. What did i did wrong?

I found this to work:
parent = driver.find_element(by=By.XPATH, value='//*[#id="vwContent"]')
children = parent.find_elements(by=By.XPATH, value='./child::*')
for child in children:
attributes = (child.find_element(by=By.XPATH, value='./child::*')).find_elements(by=By.XPATH, value='./child::*')
print(attributes[4].text)
You could then also do:
if attributes[4].text.isdigit():
result = int(attributes[4].text)
This converts the resulting text into an int.

Related

Hot to get data from webapge using selenium and show it using flask?

Hello I'm a theologian and one of the things that I usually have to do is translate from latin to english or spanish. In order to do that I use an online dictionary and check if an specific word is in nominative case or dative case (latinist stuff)...
Now I'd code a simple script in python using selenium that get the dictionary's page and extract the case of the word. All works fine and as I want to, but...
Always there is a 'but' haha. I want to take that data that I extract by using selenium and 'print' it by using flask in a webpage. I code that, but it doesn't work...
my code:
from flask import Flask
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from tabulate import tabulate
import sys
import os
app = Flask(__name__)
chrome_opt = Options()
chrome_opt.binary_location = g_chrome_bin = os.environ.get("GOOGLE_CHROME_BIN")
chrome_opt.add_argument('--headless')
chrome_opt.add_argument('--no-sandbox')
chrome_opt.add_argument('--disable-dev-sh--usage')
selenium_driver_path = os.environ.get("CHROMEDRIVER_PATH")
driver = webdriver.Chrome(executable_path= selenium_driver_path if selenium_driver_path else "./chromedriver", options=chrome_opt)
def analyze (words):
ws = words.split()
sentence = []
for w in ws:
driver.get('http://archives.nd.edu/cgi-bin/wordz.pl?keyword=' + w)
pre = driver.find_element_by_xpath('//pre')
sentence = sentence + [[w] + [ pre.text.replace('.', '') ]]
return tabulate(sentence, headers=["Word", "Dictionary"])
#analyze("pater noster qui est in celis")
#app.route("/api/<string:ws>")
def api (ws):
return analyze(ws)
driver.close()
if __name__ == "__main__":
app.run(debug=True)
And when I go to http://localhost:5000/api/pater (for ex.) I've got Internal Server Error and in the console selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id
You close your driver session (driver.close())before the main method runs. Thus when you make an api request and try to call driver.get() that driver is already closed. Eather you initialise a new driver for every call to analazye() and close that at the end of the method OR you dont close the driver session at all.

Multiprocessing, python - sharing the same webdriver pointer

I couldn't find a proper response so I post this question.
The fastest way to understand the question is the goal:
There is a main process and a subprocess (the one I want to create). The main process inspects several websites via webdriver, but sometimes it got stuck at low selenium level and don't want to change the official code. So.. I manually inspect sometimes the monitor to the check whether the process got stuck, and if so, then I change manually the url in the browser and it works again smooth. I don't want to be a human checker.. so i'd like to automate the task with a subprocess that shares the same webdriver and inspects the url by webdriver.current_url and do the work for me.
Here is my try in the minimal representative example form in which the sub-process only detects a change in the url of the webdriver
def test_sub(driver):
str_site0 = driver.current_url # get the site0 url
time.sleep(4) # give some time to the main-process to change to site1
str_site1 = driver.current_url # get the site1 url (changed by main-process)
if str_site0 == str_site1:
print('sub: no change detected')
else:
print('sub: change detected')
#endif
#enddef sub
def test_main():
""" main process changes from site0 (stackoverflow) to site1 (youtube)
sub process detects this change of url of the webdriver object (same pointer) by using
".current_url" method
"""
# init driver
pat_webdriver = r"E:\WPy64-3680\python-3.6.8.amd64\Lib\site-packages\selenium\v83_chromedriver\chromedriver.exe"
driver = webdriver.Chrome(executable_path= pat_webdriver)
time.sleep(2)
# open initial site
str_site0 = 'https://stackoverflow.com'
driver.get(str_site0)
time.sleep(2)
# init sub and try to pass the webdriver object
p = multiprocessing.Process(target=test_sub, args=(driver,)) # PROBLEM HERE! PYTHON UNCAPABLE
p.daemon = False
p.start()
# change site
time.sleep(0.5) # give some time sub query webdriver with site0
str_site1 = 'https://youtube.com' # site 1 (this needs to be detected by sub)
driver.get(str_site1)
# wait the sub to detect the change in url. and kill process (non-daemon insufficient don't know why..)
time.sleep(3)
p.terminate()
#enddef test_main
# init the program (main-process)
test_main()
the corresponding error by executing $python test_multithread.py (it's the name of the test script..) is the following one:

for loop not looping Python

I have a problem. I am writing a bot in Selenium. I'm working with 3 arrays but for loop not working. All elements of the array are written to an input. You can see the codes below.
How can I fix?
Best regards.
bot.py:
from selenium import webdriver
import time
import list
browser = webdriver.Firefox()
browser.get("")
time.sleep(4)
sayi=1
for posta, isim, kadi in zip(instagramList.email,instagramList.fullName,instagramList.userName):
browser.find_element_by_css_selector("input[name='emailOrPhone']").send_keys(posta)
browser.find_element_by_css_selector("input[name='fullName']").send_keys(isim)
browser.find_element_by_css_selector("input[name='username']").send_keys(kadi)
browser.find_element_by_css_selector("input[name='password']").send_keys("+1gP5xc!")
time.sleep(2)
browser.find_element_by_partial_link_text('Sign').click()
print(str(sayi)+". "+ "Kayit olustu.")
sayi = sayi+1
list.py
email=["abidinkandemir#evtsoft.com", "asd#asd.com", "sdsd#asd.com"],
fullName=["abidin kandemir","asd asd", "asdd asd"],
userName=["abidinkandemir102","asdas", "asdd"]
For example, you have renamed the file with lists to "instagramList.py".
In "bot.py" file you need to import that file like: import instagramList.
After these steps, your script should work one iteration (If you are correctly intended your code):
bot.py:
from selenium import webdriver
import time
import instagramList
browser = webdriver.Firefox()
browser.get("your_url")
time.sleep(4)
sayi = 1
for posta, isim, kadi in zip(instagramList.email, instagramList.fullName, instagramList.userName):
browser.find_element_by_css_selector("input[name='emailOrPhone']").send_keys(posta)
browser.find_element_by_css_selector("input[name='fullName']").send_keys(isim)
browser.find_element_by_css_selector("input[name='username']").send_keys(kadi)
browser.find_element_by_css_selector("input[name='password']").send_keys("+1gP5xc!")
time.sleep(2)
browser.find_element_by_partial_link_text("Sign").click()
print(str(sayi) + ". Kayit olustu.")
sayi = sayi + 1
instagramList.py
email = ["abidinkandemir#evtsoft.com", "asd#asd.com", "sdsd#asd.com"],
fullName = ["abidin kandemir","asd asd", "asdd asd"],
userName = ["abidinkandemir102","asdas", "asdd"]
PS: Even after this your code will not work as you expected. Why? You need after each login does "logout"-action and open the Login page with browser.get("your_url").
Hope it helps you!

How to split selenium python code into multiply functions

I'm writing a python program which will test some functions on website. It will log in to this site, check it version and do some tests on it regarding the site version. I want to write few tests for this site but few things will repeat, for example login to the site.
I try to split my code into functions, like hue_login() and use it on every test I need to login to the site. To login to site I use selenium webdriver. So If I split the code into small functions and try to use it in other function where I also use selenium webdriver I end up with two browser windows. One from my hue_login() function where function log me in. And second browser window where it try to put url where I want to go after I log in to the site interface. Of course, because I am not login into the second browser window, site wont show and other tests will fail (tests from this second function).
Example:
def hue_version():
url = global_var.domain + global_var.about
response = urllib.request.urlopen(url)
htmlparser = etree.HTMLParser()
xpath = etree.parse(response, htmlparser).xpath('/html/body/div[4]/div/div/h2/text()')
string = "".join(xpath)
pattern = re.compile(r'(\d{1,2}).(\d{1,2}).(\d{1,2})')
return pattern.search(string).group()
hue_ver = hue_version()
print(hue_ver)
if hue_ver == '3.9.0':
do something
elif hue_version == '3.7.0':
do something else
else:
print("Hue version not recognized!")
def hue_login():
driver = webdriver.Chrome(global_var.chromeDriverPath)
driver.get(global_var.domain + global_var.loginPath)
input_username = driver.find_element_by_name('username')
input_password = driver.find_element_by_name('password')
input_username.send_keys(username)
input_password.send_keys(password)
input_password.submit()
sleep(1)
driver.find_element_by_id('jHueTourModalClose').click()
def file_browser():
hue_login()
click_file_browser_link = global_var.domain + global_var.fileBrowserLink
driver = webdriver.Chrome(global_var.chromeDriverPath)
driver.get(click_file_browser_link)
How can I call hue_login() from file_browser() function that rest of the code from file_browser() will be executed in the same window opened by hue_login()?
Here you go:
driver = webdriver.Chrome(global_var.chromeDriverPath)
def hue_login():
driver.get(global_var.domain + global_var.loginPath)
input_username = driver.find_element_by_name('username')
input_password = driver.find_element_by_name('password')
input_username.send_keys(username)
input_password.send_keys(password)
input_password.submit()
sleep(1)
driver.find_element_by_id('jHueTourModalClose').click()
def file_browser():
hue_login()
click_file_browser_link = global_var.domain + global_var.fileBrowserLink
driver.get(click_file_browser_link)

how extract real time form time.gov in python?

I want to show real time in my program from time.gov. I saw ntplib module and this example:
import ntplib
from time import ctime
c = ntplib.NTPClient()
response = c.request('europe.pool.ntp.org', version=3)
ctime(response.tx_time)
but I can't use time.gov instead of 'europe.pool.ntp.org' because time.gov is not a ntp server. Also I saw some java script code in page source. is there a way to extract real time from time.gov in python with or without ntplib?
Assuming the goal is just to get official US government time, you could stick with using NTP, and refer to time.nist.gov, instead of time.gov. They're both run by NIST.
Use urllib to retrieve
http://time.gov/actualtime.cgi
that returns something like this:
<timestamp time="1433396367767836" delay="0"/>
Looks like microseconds
>>> time.ctime(1433396367.767836)
'Thu Jun 4 15:39:27 2015'
Somehow the ntp time server was blocked by firewall in our institute's system. So an alternative, in this case, would be to scrape time from the website.
You will need chrome driver to run this which can be downloaded from here.
Here is the working code:
from selenium import webdriver
import time
from datetime import datetime
# Chromedriver can be downloaded from https://chromedriver.chromium.org/
driver_path = r'pathtochromedriver\chromedriver.exe'
driver = webdriver.Chrome(driver_path)
# Reference website for datetime
url = 'https://www.time.gov/'
driver.get(url)
# Wait to respond
time.sleep(4)
# Correct time
timedata = driver.find_element_by_xpath('//*[#id="timeUTC"]')
# Correct date
datedata = driver.find_element_by_xpath('//*[#id="myDate"]')
result_time = timedata.text
result_date = datedata.text
driver.close() # close the webpage
result_datetime = result_date[7:]+result_time
datetime_now = datetime.strptime(result_datetime, '%m/%d/%Y%H:%M:%S')
print(datetime_now)

Categories

Resources