I am using python-docx with django to generate word documents.
Is there a way to use add_picture to add an image from the web rather then from the file system?
In word, when I select to add a picture, I can just give the URL.
I tried to simply so the same and write:
document.add_picture("http://icdn4.digitaltrends.com/image/microsoft_xp_bliss_desktop_image-650x0.jpg")
and got error:
IOError: [Errno 22] invalid mode ('rb') or filename:
'http://icdn4.digitaltrends.com/image/microsoft_xp_bliss_desktop_image-650x0.jpg'
Not very elegant, but i found a solution, based on the question in here
my code now looks like that:
import urllib2, StringIO
image_from_url = urllib2.urlopen(url_value)
io_url = StringIO.StringIO()
io_url.write(image_from_url.read())
io_url.seek(0)
try:
document.add_picture(io_url ,width=Px(150))
and this works fine.
If you use docxtemplater (command line interface),
you can create your own templates and embed images with a URL.
See : https://github.com/edi9999/docxtemplater
docxtemplater command line interface
docxtemplater image replacing
Below is a fresh implementation for Python 3:
from io import BytesIO
import requests
from docx import Document
from docx.shared import Inches
response = requests.get(your_image_url) # no need to add stream=True
# Access the response body as bytes
# then convert it to in-memory binary stream using `BytesIO`
binary_img = BytesIO(response.content)
document = Document()
# `add_picture` supports image path or stream, we use stream
document.add_picture(binary_img, width=Inches(2))
document.save('demo.docx')
Related
I'm working with Python hug API would like to create a GET API for the frontend. The frontend can download a created word document file e.g. via download button. However, after going through a documentation, I still cannot figure out a way to do it.
Here is my working script so far:
import os
import hug
from docx import Document
#hug.get("/download_submission_document")
def download_submission_document():
file_name = 'example.docx'
document = Document()
document.add_heading('Test header', level=2)
document.add_paragraph('Test paragraph')
document.save(file_name)
# TO DO: send a created file to frontend
I'm not sure if we can send the object right away or we have to save it first somewhere before sending the the frontend. (requirements: hug, python-docx)
I'm trying to use something like
#hug.get("/download_submission_document", output=hug.output_format.file)
but not sure how to return a file.
Alright, I found a solution which is easier than I thought. Just do the following:
#hug.get("/download_submission_document", output=hug.output_format.file)
def download_submission_document():
file_name = 'example.docx'
document = Document()
document.add_heading('Test header', level=2)
document.add_paragraph('Test paragraph')
document.save(file_name)
return file_name
Return file_name already download the docx
I was trying to make a script to download songs from internet. I was first trying to download the song by using "requests" library. But I was unable to play the song. Then, I did the same using "urllib2" library and I was able to play the song this time.
Can't we use "requests" library to download songs? If yes, how?
Code by using requests:
import requests
doc = requests.get("http://gaana99.com/fileDownload/Songs/0/28768.mp3")
f = open("movie.mp3","wb")
f.write(doc.text)
f.close()
Code by using urllib2:
import urllib2
mp3file = urllib2.urlopen("http://gaana99.com/fileDownload/Songs/0/28768.mp3")
output = open('test.mp3','wb')
output.write(mp3file.read())
output.close()
Use doc.content to save binary data:
import requests
doc = requests.get('http://gaana99.com/fileDownload/Songs/0/28768.mp3')
with open('movie.mp3', 'wb') as f:
f.write(doc.content)
Explanation
A MP3 file is only binary data, you cannot retrieve its textual part. When you deal with plain text, doc.text is ideal, but for any other binary format, you have to access bytes with doc.content.
You can check the used encoding, when you get a plain text response, doc.encoding is set, else it is empty:
>>> doc = requests.get('http://gaana99.com/fileDownload/Songs/0/28768.mp3')
>>> doc.encoding
# nothing
>>> doc = requests.get('http://www.example.org')
>>> doc.encoding
ISO-8859-1
A similar way from here:
import urllib.request
urllib.request.urlretrieve('http://gaana99.com/fileDownload/Songs/0/28768.mp3', 'movie.mp3')
So, I'm developing a Flask application which uses the GDAL library, where I want to stream a .tif file through an url.
Right now I have method that reads a .tif file using gdal.Open(filepath). When run outside of the Flask environment (like in a Python console), it works fine by both specifying the filepath to a local file and a url.
from gdalconst import GA_ReadOnly
import gdal
filename = 'http://xxxxxxx.blob.core.windows.net/dsm/DSM_1km_6349_614.tif'
dataset = gdal.Open(filename, GA_ReadOnly )
if dataset is not None:
print 'Driver: ', dataset.GetDriver().ShortName,'/', \
dataset.GetDriver().LongName
However, when the following code is executed inside the Flask environement, I get the following message:
ERROR 4: `http://xxxxxxx.blob.core.windows.net/dsm/DSM_1km_6349_614.tif' does
not exist in the file system,
and is not recognised as a supported dataset name.
If I instead download the file to the local filesystem of the Flask app, and insert the path to the file, like this:
block_blob_service = get_blobservice() #Initialize block service
block_blob_service.get_blob_to_path('dsm', blobname, filename) # Get blob to local filesystem, path to file saved in filename
dataset = gdal.Open(filename, GA_ReadOnly)
That works just fine...
The thing is, since I'm requesting some big files (200 mb), I want to stream the files using the url instead of the local file reference.
Does anyone have an idea of what could be causing this? I also tried putting "/vsicurl_streaming/" in front of the url as suggested elsewhere.
I'm using Python 2.7, 32-bit with GDAL 2.0.2
Please try the follow code snippet:
from gzip import GzipFile
from io import BytesIO
import urllib2
from uuid import uuid4
from gdalconst import GA_ReadOnly
import gdal
def open_http_query(url):
try:
request = urllib2.Request(url,
headers={"Accept-Encoding": "gzip"})
response = urllib2.urlopen(request, timeout=30)
if response.info().get('Content-Encoding') == 'gzip':
return GzipFile(fileobj=BytesIO(response.read()))
else:
return response
except urllib2.URLError:
return None
url = 'http://xxx.blob.core.windows.net/container/example.tif'
image_data = open_http_query(url)
mmap_name = "/vsimem/"+uuid4().get_hex()
gdal.FileFromMemBuffer(mmap_name, image_data.read())
dataset = gdal.Open(mmap_name)
if dataset is not None:
print 'Driver: ', dataset.GetDriver().ShortName,'/', \
dataset.GetDriver().LongName
Which use a GDAL memory-mapped file to open an image retrieved via HTTP directly as a NumPy array without saving to a temporary file.
Refer to https://gist.github.com/jleinonen/5781308 for more info.
I am currently using
import urllib
urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")
Is there a way to see if the link contains a pic or not, if not then no need to download, if so, then download.
Thanks!
The extension does not mean a file is an actual image, if you want to check that the file is indeed an image you could use imagemagik identify:
from subprocess import check_output, CalledProcessError
from tempfile import NamedTemporaryFile
import requests
from shutil import move
r = requests.get("http://www.digimouth.com/news/media/2011/09/google-logo.jpg").content
tmp = NamedTemporaryFile("wb", delete=False, dir=".")
tmp.write(r)
try:
out = check_output(["identify", "-format", "%m", tmp.name])
print(out)
move(tmp.name, "whatever.{}".format(out.lower()))
except CalledProcessError:
tmp.delete = True
To see all the format supported run identify -list format.
I'm trying to use Python to read .pdf files from the web directly rather than save them all to my computer. All I need is the text from the .pdf and I'm going to be reading a lot (~60k) of them, so I'd prefer to not actually have to save them all.
I know how to save a .pdf from the internet using urllib and open it with PyPDF2. (example)
I want to skip the saving-to-file step.
import urllib, PyPDF2
urllib.urlopen('https://bitcoin.org/bitcoin.pdf')
wFile = urllib.urlopen('https://bitcoin.org/bitcoin.pdf')
lFile = PyPDF2.pdf.PdfFileReader(wFile.read())
I get an error that is fairly easy to understand:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
fil = PyPDF2.pdf.PdfFileReader(wFile.read())
File "C:\Python27\lib\PyPDF2\pdf.py", line 797, in __init__
self.read(stream)
File "C:\Python27\lib\PyPDF2\pdf.py", line 1245, in read
stream.seek(-1, 2)
AttributeError: 'str' object has no attribute 'seek'
Obviously PyPDF2 doesn't like that I'm giving it the urllib.urlopen().read() (which appears to return a string). I know that this string is not the "text" of the .pdf but a string representation of the file. How can I resolve this?
EDIT: NorthCat's solution resolved my error, but when I try to actually extract the text, I get this:
>>> print lFile.getPage(0).extractText()
ˇˆ˘˘˙˘˘˝˘˛˘ˇ˘ˇ˚ˇˇˇ˘ˆ˘˘˘˚ˇˆ˘ˆ˘ˇ˜ˇ˝˚˘˛˘ˇ ˘˘˘ˇ˛˘˚˚ˆˇˇ!
˝˘˚ˇ˘˘˚"˘˘ˇ˘˚ˇ˘˘˚ˇ˘˘˘˙˘˘˘#˘˘˘ˆ˘˛˘˚˛˙ ˘˘˚˚˘˛˙#˘ˇ˘ˇˆ˘˘˛˛˘˘!˘˘˛˘˝˘˘˘˚ ˛˘˘ˇ˘ˇ˛$%&˘ˇ'ˆ˛
$%&˘ˇˇ˘˚ˆ˚˘˘˘˘ ˘ˆ(ˇˇ˘˘˘˘ˇ˘˚˘˘#˘˘˘ˇ˛!ˇ)˘˘˚˘˘˛ ˚˚˘ˇ˘˝˘˚'˘˘ˇˇ ˘˘ˇ˘˛˙˛˛˘˘˚ˇ˘˘ˆ˘˘ˆ˙
$˘˘˘*˘˘˘ˇˆ˘˘ˇˆ˛ˇ˘˝˚˚˘˘ˇ˘ˆ˘"˘ˆ˘ˇˇ˘˛ ˛˛˘˛˘˘˘˘˘˘˛˘˘˚˚˘$ˇ˘ˇˆ˙˘˝˘ˇ˘˘˘ˇˇˆˇ˘ ˘˛ˇ˝˘˚˚#˘˛˘˚˘˘
˘ˇ˘˚˛˛˘ˆ˛ˇˇˇ ˚˘˘˚˘˘ˇ˛˘˙˘˝˘ˇ˘ˆ˘˛˙˘˝˘ˇ˘˘˝˘"˘˛˘˝˘ˇ ˘˘˘˚˛˘˚)˘˘ˆ˛˘˘
˘˛˘˛˘ˆˇ˚˘˘˘˘˚˘˘˘˘˛˛˚˘˚˝˚ˇ˘#˘˘˚ˆ˘˘˘˝˘˚˘ˆˆˇ˘ˆ
˘˘˘ˆ˘˝˘˘˚"˘˘˚˘˚˘ˇ˘ˆ˘ˆ˘˚ˆ˛˚˛ˆ˚˘˘˘˘˘˘˚˛˚˚ˆ#˘ˇˇˆˇ˘˝˘˘ˇ˚˘ˇˇ˘˛˛˚ ˚˘˘˘ˇ˚˘˘ˇ˘˘˚ˆ˘*˘
˘˘ˇ˘˚ˇ˘˙˘˚ˇ˘˘˘˙˙˘˘˚˚˘˘˝˘˘˘˛˛˘ˇˇ˚˘˛#˘ˆ˘˘ˇ˘˚˘ˇˇ˘˘ˇˆˇ˘$%&˘ˆ˘˛˘˚˘,
Try this:
import urllib, PyPDF2
import cStringIO
wFile = urllib.urlopen('https://bitcoin.org/bitcoin.pdf')
lFile = PyPDF2.pdf.PdfFileReader( cStringIO.StringIO(wFile.read()) )
Because PyPDF2 does not work, there are a couple of solutions, however, require saving the file to disk.
Solution 1
You can use ps2ascii (if you are using linux or mac ) or xpdf (Windows). Example of using xpdf:
import os
os.system('C:\\xpdfbin-win-3.03\\bin32\\pdftotext.exe C:\\xpdfbin-win-3.03\\bin32\\bitcoin.pdf bitcoin1.txt')
or
import subprocess
subprocess.call(['C:\\xpdfbin-win-3.03\\bin32\\pdftotext.exe', 'C:\\xpdfbin-win-3.03\\bin32\\bitcoin.pdf', 'bitcoin2.txt'])
Solution 2
You can use one of online pdf to txt converter. Example of using pdf.my-addr.com
import MultipartPostHandler
import urllib2
def pdf2text( absolute_path ):
url = 'http://pdf.my-addr.com/pdf-to-text-converter-tool.php'
params = { 'file' : open( absolute_path, 'rb' ),
'encoding': 'UTF-8',
}
opener = urllib2.build_opener( MultipartPostHandler.MultipartPostHandler )
return opener.open( url, params ).read()
print pdf2text('bitcoin.pdf')
Code of MultipartPostHandler you can find here. I tried to use the cStringIO instead open(), but it did not work.
Maybe it will be helpful for you.
I know this question is old, but I had the same issue and here is how I solved it.
In the newer docs of Py2PDF there is a section about streaming data
The example there looks like this:
from io import BytesIO
# Prepare example
with open("example.pdf", "rb") as fh:
bytes_stream = BytesIO(fh.read())
# Read from bytes_stream
reader = PdfReader(bytes_stream)
Therefore, what I did instead was this:
import urllib
from io import BytesIO
from PyPDF2 import PdfReader
NEW_PATH = 'https://example.com/path/to/pdf/online?id=123456789&date=2022060'
wFile = urllib.request.urlopen(NEW_PATH)
bytes_stream = BytesIO(wFile.read())
reader = PdfReader(bytes_stream)