Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 days ago.
Improve this question
I'm writing desktop automation for VSCode. VSCode generates an HTML report. VSCode UI provides an option to open the report, which can then be viewed on the browser
So, at this point, I don't need to navigate to a web page. I am not using Selenium since I'm not dealing with web applications. How do I get the current URL from the browser using Python? The current URL would essentially give me the location of the html report on my local machine .
I’ve read the documentation. I’m able to locate the html report file. Its path is something like <dir/some-random-number/index.html>. Every report is generated in its own folder which makes it challenging for me to get the location of the html report. I need a way to get the current URL through Python so that my automation can read some elements from that html report using Beautiful Soup.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
What I am trying to do...
I am trying to automate a download of a zip file from a URL that does not redirect the URL, but instead opens up a "Save as" prompt the moment you open the URL.
What I have tried...
"Urllib request", "Wget", and "Requests" libraries are all giving me a 1KB file which in a text editor reads "Invalid request". This could make sense as the Website URL I am inputting is blank by default, and I don't believe its redirecting the URL to anywhere as I had "allow_redirects=True" using the "Requests" Library. I believe this link is using JavaScript to redirect to the "Save as" and when I click it and head to downloads (In Chrome) and see that there is a download link for this file. This download link appears to always work but I am unsure how to grab it with Python.
Leads...
I have found a lead in Stack Overflow about using the library "Spynner", but I am not sure HOW and WHY that would solve my problem.
I am using Python 3.8.2
You need a web scraping tool. They usually have headless browsers and everything you need to "bot like" human behaviour. I would recommend Selenium because you can use it from python directly; here is an example: File managing Selenium.
Be careful, web scraping is not completely legal so you should have authorization to use it on any web service. Proceed with caution.
Like Juan said, I just needed to use a web scraping tool. After learning selenium I was able to bypass the save as requirement.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have html code embeded with java script code related to angular js. Later I realized that rows and columns of html code need to be inter cahnged. As I have bunch of html files so decided to use Python script. Have tried using BeautifulSoup 4.x. I could able to do interchange of rows and columns but while writing back to disk, it is noticed that few java script tags are missing.
My question is can I use beautiful soup for angular js code? if yes, code snippet would be extremely helpful.
Thanks
Beautiful Soup is a Python library for pulling data out of HTML and XML files. You can't directly use it for angular js code.
See this previous answer for a quick look at what some code using Selenium to get at the javascript might look like.
https://stackoverflow.com/a/25985828/4147462
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Is there a way to scrape data from a popup? I'd like to import data from the site tennisinsight.com.
For example, http://tennisinsight.com/match-preview/?matchid=191551201
This is a sample data extraction link. When clicking "overview" there is a button with "Match Stats", I'd like to be able to import those data from many links in a text or CSV file.
What's the best way to accomplish this? Is Scrapy able to do this? Is there software able to do this?
You want to open the network analyzer in your browser (e.g. in Web Developer in Firefox) to see what requests are sent when you click the "match stats" button in order to replicate them using python.
When I do it, a POST request is sent to http://tennisinsight.com/wp-admin/admin-ajax.php with action and matchID parameters.
You presumably already know the match ID (see URL you posted above), so you just need to set up a POST request for each matchID you have.
import requests
r = requests.post('http://tennisinsight.com/wp-admin/admin-ajax.php', data={'action':'showMatchStats', 'matchID':'191551201'})
print r.text #this is your content of interest
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I know one can generate html from csv, but how to turn that HTML into an image using Python?
You can use python-webkit2png project to convert HTML code to an image using webkit engine (same as Chrome uses)
One method of doing this is to
Generate a HTML file
Open this file in a web browser controlled by Python using Selenium WebDriver. For the server-side you can make a headless browser installation. Both Firefox and Chrome should be good.
Call WebDriver screenshot function to capture the rendered output as image
If the image is larger than the (virtual) screen used by the browser then Firefox has some addons to capture the whole web page as an image.
Here is one of my old scripts where I was capturing pages generated JavaScript to images on the server.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am currently doing a research project and I am attempting to figure out a good way to identify ads given access to the html of a webpage.
I thought it might be a good idea to start with AdBlock. AdBlock is a program that prevents ads from being displayed to the user, so presumably it has a mechanism for identifying things as ads.
I downloaded the source code for AdBlockPlus, but I find myself completely lost in all of the files. I am not sure where to start looking for this detection mechanism, so I was wondering if anyone had any advice on where to start. Alternatively if you have dealt with AdBlock before and are familiar with it, I would appreciate any extra information.
For example, if the webpage needs to be rendered in a real browser to use Adblock, there are programs that will automate the loading of a webpage so this wouldn't be a problem but I am not sure how to figure out if this is what AdBlock does in the first place.
Note: AdBlock is written in Python and Perl :)
Thanks!
I would advise you to first have a look at writing adblock filter rules.
Then, once you get an idea of this, you can start parsing adblock lists available in various languages to suit your needs.