Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I need to open a website in background, and after few seconds of loading, download everything on the page.
I can open the page,
import webbrowser
import string
url = 'www.face.com'
webbrowser.open(url)
But this opens the webbrowser , while I need not to show the site actually being open. I think I can use wget to download the page.
If you need to fool the webpage that it is accessed by a human use selenium. But mind that you might download contents using a variety HTTP clients (in which case you won't get any dynamically loaded content).
Python has a built-in HTTP client, but I recoment using requests.
You can use selenium.
from selenium import webdriver
driver = webdriver.PhantomJS("./phantomjs") # path to phantomjs binary
driver.get("www.face.com")
## refer https://pypi.python.org/pypi/selenium
driver.quit() # quit driver
You can use cgi script and u should import cgi
You may want to use ghost.py It is a webkit web client written in python (PyQT needed).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 days ago.
Improve this question
I'm writing desktop automation for VSCode. VSCode generates an HTML report. VSCode UI provides an option to open the report, which can then be viewed on the browser
So, at this point, I don't need to navigate to a web page. I am not using Selenium since I'm not dealing with web applications. How do I get the current URL from the browser using Python? The current URL would essentially give me the location of the html report on my local machine .
I’ve read the documentation. I’m able to locate the html report file. Its path is something like <dir/some-random-number/index.html>. Every report is generated in its own folder which makes it challenging for me to get the location of the html report. I need a way to get the current URL through Python so that my automation can read some elements from that html report using Beautiful Soup.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
What I am trying to do...
I am trying to automate a download of a zip file from a URL that does not redirect the URL, but instead opens up a "Save as" prompt the moment you open the URL.
What I have tried...
"Urllib request", "Wget", and "Requests" libraries are all giving me a 1KB file which in a text editor reads "Invalid request". This could make sense as the Website URL I am inputting is blank by default, and I don't believe its redirecting the URL to anywhere as I had "allow_redirects=True" using the "Requests" Library. I believe this link is using JavaScript to redirect to the "Save as" and when I click it and head to downloads (In Chrome) and see that there is a download link for this file. This download link appears to always work but I am unsure how to grab it with Python.
Leads...
I have found a lead in Stack Overflow about using the library "Spynner", but I am not sure HOW and WHY that would solve my problem.
I am using Python 3.8.2
You need a web scraping tool. They usually have headless browsers and everything you need to "bot like" human behaviour. I would recommend Selenium because you can use it from python directly; here is an example: File managing Selenium.
Be careful, web scraping is not completely legal so you should have authorization to use it on any web service. Proceed with caution.
Like Juan said, I just needed to use a web scraping tool. After learning selenium I was able to bypass the save as requirement.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
What code would I use to input text into a box (not edit text, something along the lines of a message bot)?
Say I run the code, it opens a webpage and inputs text into a (e.g.) search box.
Thanks!
You can use the Selenium library to create a bot in Python that inputs texts into a webpage.
According to their description, "the selenium package is used automate web browser interaction from Python."
You can install it by typing in the terminal:
pip install selenium
Then you'll have to build a set of instructions to tell the program what you want to do.
Example:
TASKS:
1. open a new Firefox browser
2. load the Yahoo homepage
3. search for “seleniumhq”
4. close the browser
CODE:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title
elem = browser.find_element_by_name('p') # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)
browser.quit()
Also, read the Selenium Docs and take a look at a few videos to understand how can you build your own web bot!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I know one can generate html from csv, but how to turn that HTML into an image using Python?
You can use python-webkit2png project to convert HTML code to an image using webkit engine (same as Chrome uses)
One method of doing this is to
Generate a HTML file
Open this file in a web browser controlled by Python using Selenium WebDriver. For the server-side you can make a headless browser installation. Both Firefox and Chrome should be good.
Call WebDriver screenshot function to capture the rendered output as image
If the image is larger than the (virtual) screen used by the browser then Firefox has some addons to capture the whole web page as an image.
Here is one of my old scripts where I was capturing pages generated JavaScript to images on the server.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am attempting to retrieve information from a Health Inspection website, then parse and save the data to variables, then maybe save the records to a file. I suppose I could use dictionaries to store the information, from each business.
The website in question is: http://www.swordsolutions.com/Inspections.
Clicking [Search] on the website will start displaying information.
I need to be able to pass some search data to the website, and then parse the information that is returned into variables and then to files.
I am fetching the website to a file using:
import urllib
u = urllib.urlopen('http://www.swordsolutions.com/Inspections')
data = u.read()
f = open('data.html', 'wb')
f.write(data)
f.close()
This is the data that is retrieved by urllib: http://bpaste.net/show/126433/ and currently does not show anything useful.
Any ideas?
I'll just refer you.
You want to submit a form with several pre-defined field values. Then you want to parse the data returned. Then, next steps depend on whether it is easy to automate that form post request.
You have plenty of options here:
using browser developer tools analyze what is going on while clicking on "submit". Then, if there is a simple POST request - simulate it using urllib2 or requests or mechanize or whatever you like
give a try to Scrapy and it's FormRequest class
use a real automated browser with the help of selenium. Fill the data into fields, click submit, get and parse the data using the same one tool (selenium)
Basically, if there is a lot of javascript logic involved into the form submitting process - you'll have to go with automated browsing tool, like selenium.
Plus, note that there are several tools for parsing HTML: BeautifulSoup, lxml.
Also see:
Web scraping with Python
Hope that helps.