How to download a file using Python - python

I tried to download something from the Internet using Python, I am using urllib.retriever from the urllib module but I just can't get it work. I would like to be able to save the downloaded file to a location of my choice.
If someone could explain to me how to do it with clear examples, that would be VERY appreciated.

I suggest using urllib2 like so:
source = urllib2.urlopen("http://someUrl.com/somePage.html").read()
open("/path/to/someFile", "wb").write(source)
You could even shorten it to (although, you wouldnt want to shorten it if you plan to enclose each individual call in a try - except):
open("/path/to/someFile", "wb").write(urllib2.urlopen("http://someUrl.com/somePage.html").read())

You can also use the urllib:
source = urllib.request.urlopen(("full_url")).read()
and then use what chown used above:
open("/path/to/someFile", "wb").write(source)

Related

Connecting to a part of a website depending on input. Python

Im new to python and wondering if there is a way for it to open a webpage depending on whats been inputted. EG
Market=input("market")
ticker=input("Ticket")
would take you to this part of the website.
https://www.tradingview.com/symbols/'market'-'ticker'/technicals
Thanks
Looks like you were pretty much there, but it python you can use the + sign to concatenate strings and then cause it to open that link using webbrowser library
import webbrowser
market=input("market")
ticker=input("Ticket")
webbrowser.open('https://www.tradingview.com/symbols/'+market+'-'+ticker+'/technicals')
Its cleaner to use format string like this:
import webbrowser
market=input("market")
ticker=input("Ticket")
webbrowser.open(f'https://www.tradingview.com/symbols/{market}-{ticker}/technicals')

Import Skin Weight Maps (Python) - (Maya)

Like mentioned on this post, I would like to just import a skin weightmap (a .weightMap file) into a scene without having to open a dialogue box. Trying to reverse - engineer the script mentioned in the reply didn't get me anywhere.
When I do it manually thru maya's ui - the script history shows...
ImportSkinWeightMaps;
...as a command. But my searches on this keep leading me to the deformerWeights command.
Thing is, there is no example on the documentation as to how to correctly write the syntax. Writing the flags, the path thru trial and error with it didn't work out, plus additional searches keep giving me the hint that I need to use a .xml file for some reason? when all I want to do is import a .weightMap file.
I even ended up looking at weight importer scripts in highend3d.com in hopes at looking at what a proper importing syntax should look like.
All I need is the correct syntax (or command) for something like:
mel.eval("ImportSkinWeightMaps;")
or
cmds.deformerWeights (p = "path to my .weightMap file", im=True, )
or
from pymel.core import *
pymel.core.runtime.ImportSkinWeightMaps ( 'targetOject', 'path to .weightMap file' )
Any help would be greatly appreciated.
Thanks!
why not using some cmds.skinPercent ?
It is more reliable.
http://tech-artists.org/forum/showthread.php?5490-Faster-way-to-find-number-of-influences-Maya&p=27598#post27598

Saving NLTK Alignments

I am using NLTK 3.2 and I was wondering how you save NLTK alignments. I have found this link: How to save Python NLTK alignment models for later use?, but it seems that there is no align() method. Also, I figured out that nltk.align has been renamed to nltk.translate, but I still cannot access the align() method. Thanks!
Yeah, you are right. The method align became private in the current version. So, if you want to use that method, you have to modify the source code.
To modify the source code, you have to get to the directory of the file. You can find that directory by:
Open your terminal
Type these commands:
>>> python
>>> import nltk
>>> nltk.translate.ibm1.__file__
Here is a screen-shot of what it should look like:
Now, you have to go to that directory and find the file 'ibm1.py'. Open the file and modify the method __align to align.
It's the last method in the file.
CAUTION:
The align method returns Alignment class instead of AlignedSent in earlier versions.

How to write a python script for downloading?

I want to download some files from this site: http://www.emuparadise.me/soundtracks/highquality/index.php
But I only want to get certain ones.
Is there a way to write a python script to do this? I have intermediate knowledge of python
I'm just looking for a bit of guidance, please point me towards a wiki or library to accomplish this
thanks,
Shrub
Here's a link to my code
I looked at the page. The links seem to redirect to another page, where the file is hosted, clicking which downloads the file.
I would use mechanize to follow the required links to the right page, and then use BeautifulSoup or lxml to parse the resultant page to get the filename.
Then it's a simple matter of opening the file using urlopen and writing its contents out into a local file like so:
f = open(localFilePath, 'w')
f.write(urlopen(remoteFilePath).read())
f.close()
Hope that helps
Make a url request for the page. Once you have the source, filter out and get urls.
The files you want to download are urls that contain a specific extension. It is with this that you can do a regular expression search for all urls that match your criteria.
After filtration, then do a url request for each matched url's data and write it to memory.
Sample code:
#!/usr/bin/python
import re
import sys
import urllib
#Your sample url
sampleUrl = "http://stackoverflow.com"
urlAddInfo = urllib.urlopen(sampleUrl)
data = urlAddInfo.read()
#Sample extensions we'll be looking for: pngs and pdfs
TARGET_EXTENSIONS = "(png|pdf)"
targetCompile = re.compile(TARGET_EXTENSIONS, re.UNICODE|re.MULTILINE)
#Let's get all the urls: match criteria{no spaces or " in a url}
urls = re.findall('(https?://[^\s"]+)', data, re.UNICODE|re.MULTILINE)
#We want these folks
extensionMatches = filter(lambda url: url and targetCompile.search(url), urls)
#The rest of the unmatched urls for which the scrapping can also be repeated.
nonExtMatches = filter(lambda url: url and not targetCompile.search(url), urls)
def fileDl(targetUrl):
#Function to handle downloading of files.
#Arg: url => a String
#Output: Boolean to signify if file has been written to memory
#Validation of the url assumed, for the sake of keeping the illustration short
urlAddInfo = urllib.urlopen(targetUrl)
data = urlAddInfo.read()
fileNameSearch = re.search("([^\/\s]+)$", targetUrl) #Text right before the last slash '/'
if not fileNameSearch:
sys.stderr.write("Could not extract a filename from url '%s'\n"%(targetUrl))
return False
fileName = fileNameSearch.groups(1)[0]
with open(fileName, "wb") as f:
f.write(data)
sys.stderr.write("Wrote %s to memory\n"%(fileName))
return True
#Let's now download the matched files
dlResults = map(lambda fUrl: fileDl(fUrl), extensionMatches)
successfulDls = filter(lambda s: s, dlResults)
sys.stderr.write("Downloaded %d files from %s\n"%(len(successfulDls), sampleUrl))
#You can organize the above code into a function to repeat the process for each of the
#other urls and in that way you can make a crawler.
The above code is written mainly for Python2.X. However, I wrote a crawler that works on any version starting from 2.X
Why yes! 5 years later and, not only is this possible, but you've now got a lot of ways to do it.
I'm going to avoid code-examples here, because mainly want to help break your problem into segments and give you some options for exploration:
Segment 1: GET!
If you must stick to the stdlib, for either python2 or python3, urllib[n]* is what you're going to want to use to pull-down something from the internet.
So again, if you don't want dependencies on other packages:
urllib or urllib2 or maybe another urllib[n] I'm forgetting about.
If you don't have to restrict your imports to the Standard Library:
you're in luck!!!!! You've got:
requests with docs here. requests is the golden standard for gettin' stuff off the web with python. I suggest you use it.
uplink with docs here. It's relatively new & for more programmatic client interfaces.
aiohttp via asyncio with docs here. asyncio got included in python >= 3.5 only, and it's also extra confusing. That said, it if you're willing to put in the time it can be ridiculously efficient for exactly this use-case.
...I'd also be remiss not to mention one of my favorite tools for crawling:
fake_useragent repo here. Docs like seriously not necessary.
Segment 2: Parse!
So again, if you must stick to the stdlib and not install anything with pip, you get to use the extra-extra fun and secure (<==extreme-sarcasm) xml builtin module. Specifically, you get to use the:
xml.etree.ElementTree() with docs here.
It's worth noting that the ElementTree object is what the pip-downloadable lxml package is based on, and made make easier to use. If you want to recreate the wheel and write a bunch of your own complicated logic, using the default xml module is your option.
If you don't have to restrict your imports to the Standard Library:
lxml with docs here. As i said before, lxml is a wrapper around xml.etree that makes it human-usable & implements all those parsing tools you'd need to make yourself. However, as you can see by visiting the docs, it's not easy to use by itself. This brings us to...
BeautifulSoup aka bs4 with docs here. BeautifulSoup makes everything easier. It's my recommendation for this.
Segment 3: GET GET GET!
This section is nearly exactly the same as "Segment 1," except you have a bunch of links not one.
The only thing that changes between this section and "Segment 1" is my recommendation for what to use: aiohttp here will download way faster when dealing with several URLs because it's allows you to download them in parallel.**
* - (where n was decided-on from python-version to ptyhon-version in a somewhat frustratingly arbitrary manner. Look up which urllib[n] has .urlopen() as a top-level function. You can read more about this naming-convention clusterf**k here, here, and here.)
** - (This isn't totally true. It's more sort-of functionally-true at human timescales.)
I would use a combination of wget for downloading - http://www.thegeekstuff.com/2009/09/the-ultimate-wget-download-guide-with-15-awesome-examples/#more-1885 and BeautifulSoup http://www.crummy.com/software/BeautifulSoup/bs4/doc/ for parsing the downloaded file

python: edit ISO file directly

Is it possible to take an ISO file and edit a file in it directly, i.e. not by unpacking it, changing the file, and repacking it?
It is possible to do 1. from Python? How would I do it?
You can use for listing and extracting, I tested the first.
https://github.com/barneygale/iso9660/blob/master/iso9660.py
import iso9660
cd = iso9660.ISO9660("/Users/murat/Downloads/VisualStudio6Enterprise.ISO")
for path in cd.tree():
print path
https://github.com/barneygale/isoparser
import isoparser
iso = isoparser.parse("http://www.microsoft.com/linux.iso")
print iso.record("boot", "grub").children
print iso.record("boot", "grub", "grub.cfg").content
Have you seen Hachoir, a Python library to "view and edit a binary stream field by field"? I haven't had a need to try it myself, but ISO 9660 is listed as a supported parser format.
PyCdlib can read and edit ISO-9660 files, as well as Joliet, Rock Ridge, UDF, and El Torito extensions. It has a bunch of detailed examples in its documentation, including one showing how to edit a file in-place. At the time of writing, it cannot add or remove files, or edit directories. However, it is still actively maintained, in contrast to the libraries linked in older answers.
Of course, as with any file.
It can be done with open/read/write/seek/tell/close operations on a file. Pack/unpack the data with struct/ctypes. It would require serious knowledge of the contents of ISO, but I presume you already know what to do. If you're lucky you can try using mmap - the interface to file contents string-like.

Categories

Resources