How to use Selenium with the Wappalyzer browser plugin - python

I want to make a python CLI for Wappalyzer (https://www.wappalyzer.com), but it is a browser plugin. The plugin identifies programs/frameworks running on a webpage, and I want to be able to use/get that information from a python script. While they do have a paid API, I was wondering if it is possible to use Selenium and the ChromeDriver to visit the page with a chrome extension, and then retrieve the data generated by Wappalyzer.

Related

Keep changes made to firefox extensions through Selenium

My goal is to use Selenium to configure an extension in Firefox programmatically.
What I did is the following:
Load my default Firefox profile, so that my Selenium session has access to my extensions.
Use Selenium to navigate to the extension's configuration page (moz-extensions://*********************/options/)
Use selenium to automate the configuration process and save.
My only issue is that my changes are lost. That is, upon starting Firefox again (like a "normal" user), none of the changes I made are kept.
Is there a way to ensure that Selenium saves the changes to my profile?

Load local chrome user profile to Heroku to use it with selenium

I want to build a simple web scraper using python selenium and deploy it on Heroku. I've already done the deployment process with chromedriver and chrome buildpacks and everything is working fine. But I still need to implement one thing. I want to use my local chrome profile so that I don't have to sign up into Google. This is working fine locally by using
options = webdriver.ChromeOptions()
options.add_argument(r"user-data-dir=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data")
To access my chrome profile on Heroku I just uploaded it the whole folder in the same directory as the code. After the deployment I can access the folder under /app/User Data and can see all files. However if I pass
options.add_argument(r"user-data-dir=/app/User Data")
to the Driver, it doesn't load the profile and the login process fails. I tried to get more information by printing the source code of "chrome://version", but that's just an empty page.
Do you have any suggestion what I can try instead to get it working? Thank you!
If you just need to login you can use https://pypi.org/project/selenium-stealth/ to login. It is selenium but it doesn't get detected when signing in.

Headless Browser in Azure Notebook

I want to scrape a website which content is generated by Javascript dynamically. The web scraping is executed on Microsoft Azure Notebook so that I can continue processing it via Python and Jupyter.
Therefore, a headless browser is needed to render the content during web scraping. I'm thinking either PhantomJS or CasperJS but they require installation with root permission, and I cannot install it.
What else option can I use in Azure Notebook for dynamically generated content?

How do I access Chromium extension API from Selenium Webdriver?

Is it possible to make a chromium extension that would expose an API to python selenium webdriver code? For example, I can make an extension with a background script, that would count the tabs with chrome.tabs.query, now I'd like to access that information from my python code using selenium webdriver.
I've been able to do this by querying background script from content script using chrome.extension.sendMessage, saving the data to window.localStorage, and fetching it to python with Command.GET_LOCAL_STORAGE_ITEM (or execute_script, doesn't matter), but is there a simpler way?

How websites identify my web browser doesn't have the plugin installed?

I am trying to scrape things out of a website. Whenever i access the website using curl or python-requests, it keeps showing me that,i need to install a plugin to continue. But when i use a web browser which has the plugin installed , everything works fine.
I want to understand, how a website identifies that a web browser has the plugin installed or not?

Categories

Resources