Can I save images to disk using python? An example of an image would be:
Easiest is to use urllib.urlretrieve.
Python 2:
import urllib
urllib.urlretrieve('http://chart.apis.google.com/...', 'outfile.png')
Python 3:
import urllib.request
urllib.request.urlretrieve('http://chart.apis.google.com/...', 'outfile.png')
If your goal is to download a png to disk, you can do so with urllib:
import urllib
urladdy = "http://chart.apis.google.com/chart?chxl=1:|0|10|100|1%2C000|10%2C000|100%2C000|1%2C000%2C000|2:||Excretion+in+Nanograms+per+gram+creatinine+milliliter+(logarithmic+scale)|&chxp=1,0|2,0&chxr=0,0,12.1|1,0,3&chxs=0,676767,13.5,0,lt,676767|1,676767,13.5,0,l,676767&chxtc=0,-1000&chxt=y,x,x&chbh=a,1,0&chs=640x465&cht=bvs&chco=A2C180&chds=0,12.1&chd=t:0,0,0,0,0,0,0,0,0,1,0,0,3,2,4,6,6,9,3,6,5,11,9,10,6,2,2,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0&chdl=n=87&chtt=William+MD+-+Buprenorphine+Graph"
filename = r"c:\tmp\toto\file.png"
urllib.urlretrieve(urladdy, filename)
In python 3, you will need to use urllib.request.urlretrieve instead of urllib.urlretrieve.
The Google chart API produces PNG files. Just retrieve them with urllib.urlopen(url).read() or something along these lines and safe to a file the usual way.
Full example:
>>> import urllib
>>> url = 'http://chart.apis.google.com/chart?chxl=1:|0|10|100|1%2C000|10%2C000|100%2C000|1%2C000%2C000|2:||Excretion+in+Nanograms+per+gram+creatinine+milliliter+(logarithmic+scale)|&chxp=1,0|2,0&chxr=0,0,12.1|1,0,3&chxs=0,676767,13.5,0,lt,676767|1,676767,13.5,0,l,676767&chxtc=0,-1000&chxt=y,x,x&chbh=a,1,0&chs=640x465&cht=bvs&chco=A2C180&chds=0,12.1&chd=t:0,0,0,0,0,0,0,0,0,1,0,0,3,2,4,6,6,9,3,6,5,11,9,10,6,2,2,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0&chdl=n=87&chtt=William+MD+-+Buprenorphine+Graph'
>>> image = urllib.urlopen(url).read()
>>> outfile = open('chart01.png','wb')
>>> outfile.write(image)
>>> outfile.close()
As noted in other examples, 'urllib.urlretrieve(url, outfilename)` is even more straightforward, but playing with urllib and urllib2 will surely be instructive for you.
Related
I have an existing url of an image,
I want to download the image straight to a variable (no need to actually download it, maybe get it from a response?
The end result will be "download an image into a BytesIO() variable".
What is the correct way to do so?
You can use requests:
import requests
from io import BytesIO
response = requests.get(url)
image_data = BytesIO(response.content)
Note this works in Python 3.X
You could also just duck-type the underlying urllib3 response object, which is for many practical purposes the same interface as a BytesIO anyway.
Example using the PNG of your identicon:
>>> url = "https://www.gravatar.com/avatar/33f6d36c91913f4b6776525a09d131d0?s=32&d=identicon&r=PG&f=1"
>>> resp = requests.get(url, stream=True)
>>> resp.raw
<urllib3.response.HTTPResponse at 0x7fffe88927b8>
>>> resp.raw.read()
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00 \x00\x00\x00 \x08\x06\x00\x00\x00szz\xf4\x00\x00\x00\tpHYs\x00\x00\x0e\xc4\x00\x00\x0e\xc4\x01\x95+\x0e\x1b\x00\x00\x00\xf6IDATX\x85\xedW1\x12\xc20\x0c\x93\xb9\x0em\xc3\xeb\x98)3?b\x87\x9d\xcf\xd1\xa4[\xcd\x06\xd89bz\xe50C\xb4\xe5\xda\xaa\xba\xc8Qlbf\xc6\x0b\xd2.\xa1\x84\xfe\xda\x17\x9f\xa7!\x01\xf1\xfd\xf3\xee\xdc\x81\xb6\xf4Xo\x8al?#\x15\xd0h\xcf\xdbS\x0b\nO\x8f^\xfd\x02\x80\xe98\x81\xa3(\x1b\x81\xfe"k\x84G\xf9\xeet\x98\xa4\x00M#\x81\xb2\x9f\n\xc2\xc8\xc5"\xcb\xf8\n\\\xc0\x1fX\xe0. \xb7\xc0\xd82\xed\xf1b\x04\x08\x0b\xddw\xa0\n }\x17\xe8s\xbe\xd6\xf34\xc8\x9c\xd1|Y\x11.=\xe7&\x0c.w\x0b\xaa\x80*\xc0]\x00\xc5\xbd\xbc\xdcWg\xbd\x01\x9d3\xcdW\xcf\xfc\x07\xd09\xe3n\x81\xbb\x80<\x8aG.\xf6\x04V\xdfo\xcd\r\xfa[\xf7\x1d\xa8\x02h\xbe\xcd\xb2\x1fP};\x82\\Z9\x91\xcd\r\xcas=w4V\x13\xba4\'\xac~B\xcf\x1d\xee\x16\xb8\x0b\xb8\x03\x91\x99Z?\x1eYA8\x00\x00\x00\x00IEND\xaeB`\x82'
I was trying to make a script to download songs from internet. I was first trying to download the song by using "requests" library. But I was unable to play the song. Then, I did the same using "urllib2" library and I was able to play the song this time.
Can't we use "requests" library to download songs? If yes, how?
Code by using requests:
import requests
doc = requests.get("http://gaana99.com/fileDownload/Songs/0/28768.mp3")
f = open("movie.mp3","wb")
f.write(doc.text)
f.close()
Code by using urllib2:
import urllib2
mp3file = urllib2.urlopen("http://gaana99.com/fileDownload/Songs/0/28768.mp3")
output = open('test.mp3','wb')
output.write(mp3file.read())
output.close()
Use doc.content to save binary data:
import requests
doc = requests.get('http://gaana99.com/fileDownload/Songs/0/28768.mp3')
with open('movie.mp3', 'wb') as f:
f.write(doc.content)
Explanation
A MP3 file is only binary data, you cannot retrieve its textual part. When you deal with plain text, doc.text is ideal, but for any other binary format, you have to access bytes with doc.content.
You can check the used encoding, when you get a plain text response, doc.encoding is set, else it is empty:
>>> doc = requests.get('http://gaana99.com/fileDownload/Songs/0/28768.mp3')
>>> doc.encoding
# nothing
>>> doc = requests.get('http://www.example.org')
>>> doc.encoding
ISO-8859-1
A similar way from here:
import urllib.request
urllib.request.urlretrieve('http://gaana99.com/fileDownload/Songs/0/28768.mp3', 'movie.mp3')
Sorry that the title wasn't very clear, basically I have a list with a whole series of url's, with the intention of downloading the ones that are pictures. Is there anyway to check if the webpage is an image, so that I can just skip over the ones that arent?
Thanks in advance
You can use requests module. Make a head request and check the content type. Head request will not download the response body.
import requests
response = requests.head(url)
print response.headers.get('content-type')
There is no reliable way. But you could find a solution that might be "good enough" in your case.
You could look at the file extension if it is present in the url e.g., .png, .jpg could indicate an image:
>>> import os
>>> name = url2filename('http://example.com/a.png?q=1')
>>> os.path.splitext(name)[1]
'.png'
>>> import mimetypes
>>> mimetypes.guess_type(name)[0]
'image/png'
where url2filename() function is defined here.
You could inspect Content-Type http header:
>>> import urllib.request
>>> r = urllib.request.urlopen(url) # make HTTP GET request, read headers
>>> r.headers.get_content_type()
'image/png'
>>> r.headers.get_content_maintype()
'image'
>>> r.headers.get_content_subtype()
'png'
You could check the very beginning of the http body for magic numbers indicating image files e.g., jpeg may start with b'\xff\xd8\xff\xe0' or:
>>> prefix = r.read(8)
>>> prefix # .png image
b'\x89PNG\r\n\x1a\n'
As #pafcu suggested in the answer to the related question, you could use imghdr.what() function:
>>> import imghdr
>>> imghdr.what(None, b'\x89PNG\r\n\x1a\n')
'png'
You can use mimetypes https://docs.python.org/3.0/library/mimetypes.html
import urllib
from mimetypes import guess_extension
url="http://example.com/image.png"
source = urllib.urlopen(url)
extension = guess_extension(source.info()['Content-Type'])
print extension
this will return "png"
Would like to create a function that pulls a sound from given url and saves it in my machine locally
use urllib module
import urllib
urllib.urlretrieve(url,sound_clip_name)
the file will be save as what you provide the name
alternative, using urllib2
import urllib2
file = urllib2.urlopen(url).read()
f = open('sound_clip','w')
f.write(file)
f.close()
don't forget to give the extension of your file
If in Python 2.7, urllib2 module is your friend, or urllib.request in Python3.
Example in 2.7 :
import urllib2
f = urllib2.urlopen('http://www.python.org/')
with open(filename, w) as fd:
fd.write(f.read)
I have grabbed a pdf from the web using for example
import requests
pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf")
I would like to modify this code to display it
from gi.repository import Poppler, Gtk
def draw(widget, surface):
page.render(surface)
document = Poppler.Document.new_from_file("file:///home/me/some.pdf", None)
page = document.get_page(0)
window = Gtk.Window(title="Hello World")
window.connect("delete-event", Gtk.main_quit)
window.connect("draw", draw)
window.set_app_paintable(True)
window.show_all()
Gtk.main()
How do I modify the document = line to use the variable pdf that contains the pdf?
(I don't mind using popplerqt4 or anything else if that makes it easier.)
It all depends on the OS your using. These might usually help:
import os
os.system('my_pdf.pdf')
or
os.startfile('path_to_pdf.pdf')
or
import webbrowser
webbrowser.open(r'file:///my_pdf.pdf')
How about using a temporary file?
import tempfile
import urllib
import urlparse
import requests
from gi.repository import Poppler, Gtk
pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf")
with tempfile.NamedTemporaryFile() as pdf_contents:
pdf_contents.file.write(pdf)
file_url = urlparse.urljoin(
'file:', urllib.pathname2url(pdf_contents.name))
document = Poppler.Document.new_from_file(file_url, None)
Try this and tell me if it works:
document = Poppler.Document.new_from_data(str(pdf.content),len(repr(pdf.content)),None)
If you want to open pdf using acrobat reader then below code should work
import subprocess
process = subprocess.Popen(['<here path to acrobat.exe>', '/A', 'page=1', '<here path to pdf>'], shell=False, stdout=subprocess.PIPE)
process.wait()
Since there is a library named pyPdf, you should be able to load PDF file using that.
If you have any further questions, send me messege.
August 2015 : On a fresh intallation in Windows 7, the problem is still the same :
Poppler.Document.new_from_data(data, len(data), None)
returns : Type error: must be strings not bytes.
Poppler.Document.new_from_data(str(data), len(data), None)
returns : PDF document is damaged (4).
I have been unable to use this function.
I tried to use a NamedTemporayFile instead of a file on disk, but for un unknown reason, it returns an unknown error.
So I am using a temporary file. Not the prettiest way, but it works.
Here is the test code for Python 3.4, if anyone has an idea :
from gi.repository import Poppler
import tempfile, urllib
from urllib.parse import urlparse
from urllib.request import urljoin
testfile = "d:/Mes Documents/en cours/PdfBooklet3/tempfiles/preview.pdf"
document = Poppler.Document.new_from_file("file:///" + testfile, None) # Works fine
page = document.get_page(0)
print(page) # OK
f1 = open(testfile, "rb")
data1 = f1.read()
f1.close()
data2 = "".join(map(chr, data1)) # converts bytes to string
print(len(data1))
document = Poppler.Document.new_from_data(data2, len(data2), None)
page = document.get_page(0) # returns None
print(page)
pdftempfile = tempfile.NamedTemporaryFile()
pdftempfile.write(data1)
file_url = urllib.parse.urljoin('file:', urllib.request.pathname2url(pdftempfile.name))
print( file_url)
pdftempfile.seek(0)
document = Poppler.Document.new_from_file(file_url, None) # unknown error