I'm a python beginner and I want my code to open a local HTML page with parameters (e.g. /help/index.html#1). Right now I have the below code:
def openHelp(self, helpid):
import subprocess
import webbrowser
import sys, os
directory = os.getcwd()
if sys.platform == 'darwin': # For macOS
url = 'file://' + directory + '/help/index.html' + '#{}'.format(str(helpid))
webbrowser.get().open(url)
else:
url = 'help/index.html' + '#{}'.format(str(helpid))
webbrowser.open(url)
The code opens the webpage, however without the parameters (#helpid). Where did I make a mistake? Thanks in advance!
Your code looked fine to me. I tried it and it worked. You have to call it with openHelp("",1). The #helpid parameter was added correctly. Make sure it is a number.
Related
I am writing an script that will upload file from my local machine to an webpage. This is the url: https://tutorshelping.com/bulkask and there is a upload option. but i am trouble not getting how to upload it.
my current script:
import webbrowser, os
def fileUploader(dirname):
mydir = os.getcwd() + dirname
filelist = os.listdir(mydir)
for file in filelist:
filepath = mydir + file #This is the file absolte file path
print(filepath)
url = "https://tutorshelping.com/bulkask"
webbrowser.open_new(url) # open in default browser
webbrowser.get('firefox').open_new_tab(url)
if __name__ == '__main__':
dirname = '/testdir'
fileUploader(dirname)
A quick solution would be to use something like AppRobotic personal macro software to either interact with the Windows pop-ups and apps directly, or just use X,Y coordinates to move the mouse, click on buttons, and then to send keyboard keys to type or tab through your files.
Something like this would work when tweaked, so that it runs at the point when you're ready to click the upload button and browse for your file:
import win32com.client
x = win32com.client.Dispatch("AppRobotic.API")
import webbrowser
# specify URL
url = "https://www.google.com"
# open with default browser
webbrowser.open_new(url)
# wait a bit for page to open
x.Wait(3000)
# use UI Item Explorer to find the X,Y coordinates of Search box
x.MoveCursor(438, 435)
# click inside Search box
x.MouseLeftClick
x.Type("AppRobotic")
x.Type("{ENTER}")
I don't think the Python webbrowser package can do anything else than open a browser / tab with a specific url.
If I understand your question well, you want to open the page, set the file to upload and then simulate a button click. You can try pyppeteer for this.
Disclaimer: I have never used the Python version, only the JS version (puppeteer).
I copy some Python code in order to download data from a website. Here is my specific website:
https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1
Here is the code which I copied:
import requests
from bs4 import BeautifulSoup
def _getUrls_(res):
hrefs = []
soup = BeautifulSoup(res.text, 'lxml')
main_content = soup.find('div',{'id' : 'content-core'})
table = main_content.find("table")
for a in table.findAll('a', href=True):
hrefs.append(a['href'])
return(hrefs)
bidurl = 'https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1'
r = requests.get(bidurl)
hrefs = _getUrls_(r)
def _getPdfs_(hrefs, basedir):
for i in range(len(hrefs)):
print(hrefs[i])
respdf = requests.get(hrefs[i])
pdffile = basedir + "/pdf_dot/" + hrefs[i].split("/")[-1] + ".pdf"
try:
with open(pdffile, 'wb') as p:
p.write(respdf.content)
p.close()
except FileNotFoundError:
print("No PDF produced")
basedir= "/Users/ABC/Desktop"
_getPdfs_(hrefs, basedir)
The code runs successfully, but it did not download anything at all, even though there is no Filenotfounderror obviously.
I tried the following two URLs:
https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017/aqc-088a-035-20360
https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017/aqc-r100-258-21125
However both of these URLs return >>> No PDF produced.
The thing is that the code worked and downloaded successfully for other people, but not me.
Your code works I just tested. You need to make sure the basedir exists, you want to add this to your code:
if not os.path.exists(basedir):
os.makedirs(basedir)
I used this exact (indented) code but replaced the basedir with my own dir and it worked only after I made sure that the path actually exists. This code does not create the folder in case it does not exist.
As others have pointed out, you need to create basedir beforehand. The user running the script may not have the directory created. Make sure you insert this code at the beginning of the script, before the main logic.
Additionally, hardcoding the base directory might not be a good idea when transferring the script to different systems. It would be preferable to use the users %USERPROFILE% enviorment variable:
from os import envioron
basedir= join(environ["USERPROFILE"], "Desktop", "pdf_dot")
Which would be the same as C:\Users\blah\Desktop\pdf_dot.
However, the above enivorment variable only works for Windows. If you want it to work Linux, you will have to use os.environ["HOME"] instead.
If you need to transfer between both systems, then you can use os.name:
from os import name
from os import environ
# Windows
if name == 'nt':
basedir= join(environ["USERPROFILE"], "Desktop", "pdf_dot")
# Linux
elif name == 'posix':
basedir = join(environ["HOME"], "Desktop", "pdf_dot")
You don't need to specify the directory or create any folder manually. All you need do is run the following script. When the execution is done, you should get a folder named pdf_dot in your desktop containing the pdf files you wish to grab.
import requests
from bs4 import BeautifulSoup
import os
URL = 'https://www.codot.gov/business/bidding/bid-tab-archives/bid-tabs-2017-1'
dirf = os.environ['USERPROFILE'] + '\Desktop\pdf_dot'
if not os.path.exists(dirf):os.makedirs(dirf)
os.chdir(dirf)
res = requests.get(URL)
soup = BeautifulSoup(res.text, 'lxml')
pdflinks = [itemlink['href'] for itemlink in soup.find_all("a",{"data-linktype":"internal"}) if "reject" not in itemlink['href']]
for pdflink in pdflinks:
filename = f'{pdflink.split("/")[-1]}{".pdf"}'
with open(filename, 'wb') as f:
f.write(requests.get(pdflink).content)
I want to save all images from a site. wget is horrible, at least for http://www.leveldesigninspirationmachine.tumblr.com since in the image folder it just drops html files, and nothing as an extension.
I found a python script, the usage is like this:
[python] ImageDownloader.py URL MaxRecursionDepth DownloadLocationPath MinImageFileSize
Finally I got the script running after some BeautifulSoup problems.
However, I can't find the files anywhere. I also tried "/" as the output dir in hope the images got on the root of my HD but no luck. Can someone either help me to simplify the script so it outputs at the cd directory set in terminal. Or give me a command that should work. I have zero python experience and I don't really want to learn python for a 2 year old script that maybe doesn't even work the way I want.
Also, how can I pass an array of website? With a lot of scrapers it gives me the first few results of the page. Tumblr has the load on scroll but that has no effect so i would like to add /page1 etc.
thanks in advance
# imageDownloader.py
# Finds and downloads all images from any given URL recursively.
# FB - 201009094
import urllib2
from os.path import basename
import urlparse
#from BeautifulSoup import BeautifulSoup # for HTML parsing
import bs4
from bs4 import BeautifulSoup
global urlList
urlList = []
# recursively download images starting from the root URL
def downloadImages(url, level, minFileSize): # the root URL is level 0
# do not go to other websites
global website
netloc = urlparse.urlsplit(url).netloc.split('.')
if netloc[-2] + netloc[-1] != website:
return
global urlList
if url in urlList: # prevent using the same URL again
return
try:
urlContent = urllib2.urlopen(url).read()
urlList.append(url)
print url
except:
return
soup = BeautifulSoup(''.join(urlContent))
# find and download all images
imgTags = soup.findAll('img')
for imgTag in imgTags:
imgUrl = imgTag['src']
# download only the proper image files
if imgUrl.lower().endswith('.jpeg') or \
imgUrl.lower().endswith('.jpg') or \
imgUrl.lower().endswith('.gif') or \
imgUrl.lower().endswith('.png') or \
imgUrl.lower().endswith('.bmp'):
try:
imgData = urllib2.urlopen(imgUrl).read()
if len(imgData) >= minFileSize:
print " " + imgUrl
fileName = basename(urlsplit(imgUrl)[2])
output = open(fileName,'wb')
output.write(imgData)
output.close()
except:
pass
print
print
# if there are links on the webpage then recursively repeat
if level > 0:
linkTags = soup.findAll('a')
if len(linkTags) > 0:
for linkTag in linkTags:
try:
linkUrl = linkTag['href']
downloadImages(linkUrl, level - 1, minFileSize)
except:
pass
# main
rootUrl = 'http://www.leveldesigninspirationmachine.tumblr.com'
netloc = urlparse.urlsplit(rootUrl).netloc.split('.')
global website
website = netloc[-2] + netloc[-1]
downloadImages(rootUrl, 1, 50000)
As Frxstream has commented, this program creates the files in the current directory (i.e. where you run it). After running the program, run ls -l (or dir) to find the files it has created.
If it seemingly hasn't created any files, then most probably it really hasn't created any files, most probably because there was an exception which your except: pass has hidden. To see what was going wrong, replace try: ... except: pass with just ..., and rerun the program. (If you can't understand and fix that, ask a separate StackOverflow question.)
it's hard to tell without looking at the errors (+1 to turning off your try/except block so you can see the exceptions) but I do see one typo here:
fileName = basename(urlsplit(imgUrl)[2])
you didn't do "from urlparse import urlsplit" you have "import urlparse" so you need to refer to it as urlparse.urlsplit() as you have in other places, so should be like this
fileName = basename(urlparse.urlsplit(imgUrl)[2])
I have a python file html_gen.py which write a new html file index.html in the same directory, and would like to open up the index.html when the writing is finished.
So I wrote
import webbrowser
webbrowser.open("index.html");
But nothing happen after executing the .py file. If I instead put a code
webbrowser.open("http://www.google.com")
Safari will open google frontpage when executing the code.
I wonder how to open the local index.html file?
Try specifying the "file://" at the start of the URL. Also, use the absolute path of the file:
import webbrowser, os
webbrowser.open('file://' + os.path.realpath(filename))
Convert the filename to url using urllib.pathname2url:
import os
try:
from urllib import pathname2url # Python 2.x
except:
from urllib.request import pathname2url # Python 3.x
url = 'file:{}'.format(pathname2url(os.path.abspath('1.html')))
webbrowser.open(url)
I have a list of links, which is stored in a file. I would like to open all the links in my browsers via some script instead of manually copy-pasting every items.
For example, OS: MAC OS X; Browser: Chrome; Script: Python (prefered)
Take a look at the webbrowser module.
import webbrowser
urls = ['http://www.google.com', 'http://www.reddit.com', 'http://stackoverflow.com']
b = webbrowser.get('firefox')
for url in urls:
b.open(url)
P.S.: support for Chrome has been included in the version 3.3, but Python 3.3 is still a release candidate.
Since you're on a mac, you can just use the subprocess module to call open http://link1 http://link2 http://link3. For example:
from subprocess import call
call(["open","http://www.google.com", "http://www.stackoverflow.com"])
Note that this will just open your default browser; however, you could simply replace the open command with the specific browser's command to choose your browser.
Here's a full example for files of the general format
alink
http://anotherlink
(etc.)
from subprocess import call
import re
import sys
links = []
filename = 'test'
try:
with open(filename) as linkListFile:
for line in linkListFile:
link = line.strip()
if link != '':
if re.match('http://.+|https://.+|ftp://.+|file://.+',link.lower()):
links.append(link)
else:
links.append('http://' + link)
except IOError:
print 'Failed to open the file "%s".\nExiting.'
sys.exit()
print links
call(["open"]+links)