Save image with urllib.urlretrieve()

Save image with urllib.urlretrieve() - python

I am trying to access the following link through my script and download the chart which comes up.
I was implementing it using the accepted response here but when I try to open the file, I get error: The file “test.png” could not be opened because it is empty.
Here is my code snippet:
import urllib
image_element = driver.find_element_by_id('chartImg')
src = image_element.get_attribute("src")
if src:
urllib.urlretrieve(str(src), "test.png")
Next I tried to debug further and changed my code to
if src:
a, b = urllib.urlretrieve(str(src), "test.png")
print a, b.items()
which gives me the following output:
test.png
[('date', 'Sat, 19 Nov 2016 01:19:20 GMT'), ('connection', 'Keep-Alive'), ('content-length', '0'), ('server', 'BigIP')]
Does anyone know why 'content-length' is '0'? I think this is the reason downloaded file is empty.

I think the reason for this is because the image you are scraping does not contain an extension. If you run this code for example:
src = "http://i.imgur.com/2C7Csq6.png"
urllib.urlretrieve(src, "test.png")
The PNG file works, and it is the exact same image. I've tried looking for ways to do this without having to upload to an image sharing service where it would provide an extension, but haven't found anything. I've also tried adding .png to the original src string, but that didn't work either. My guess is this is a website-specific problem. Hopefully you can find a workaround for this, good luck!

I found a work around...take screenshot
image_element = driver.find_element_by_id('chartImg')
src = image_element.get_attribute("src")
if src:
driver.get(src)
driver.save_screenshot('screen.png')
Don't know if there is a better way but this does the job

Related

How can I enable PDF page breaks from HTML, maybe using a marker in the source HTML file?

I am using pdfkit to create a PDF from a HTML file... like so:
import pdfkit
pdfkit.from_file([source], target + '.pdf')
I create the HTML file myself before doing this conversion.
What I'm now trying to do is find a way to impleet a page break.
The HTML file doesn't use page breaks because ... well, it's basic html.
But PDF's are page type structures.
So how can I pickup something in the HTML as a marker, and then use that to implement a page break in the PDF?
Of course pdfkit.from_file([source], target + '.pdf') is a simple single line... there's no parsing of the content..... so I don't see how I could tell it what to look for.
Any ideas?
EDIT
With some advice from #Nathanial below, I've added to my CSS
#media print {
h2 {
page-break-before: always;
}
But I don't see pdfkit.from_file([source], target + '.pdf') picking it up?
Opening the html file in the browser and printing to PDF works perfectly. so this is more of a pdfkit issue.
Found a similar question here:
How to insert a page break in HTML so wkhtmltopdf parses it?
I think the pdfkit wrapper for wkhtmltopdf is limited.
On the commnd line, this works perfectly.
wkhtmltopdf --print-media-type 10100005.html 10100005.pdf
But how do I replicate that in python? It's not my first choice to doa os.execute....:/

After some fiddling, this worked for me. I'm putting this here to help the next person.
Thanks #Nathaniel Flick for pointing me to use media print and print only styles.
Example 11 on this page also helped
https://www.programcreek.com/python/example/100586/pdfkit.from_file
In the style sheet
#media print {
h2 {
page-break-before: always;
}
}
Then in the python code
pdfkit_options = {
'print-media-type': '',
}
>>> print (source)
c:/users/maxcot/desktop/Reports/10100001.html
>>> print (target)
c:/users/maxcot/desktop/Reports/10100001.pdf
>>> print (pdfkit_options)
{'print-media-type': ''}
pdfkit.from_file(source, target, options=pdfkit_options)

Unable to load torrent file using raw base64 form and python xmlrpc client

I am trying to load a torrent file into rtorrent using xmlrpc with the following python3 code:
import xmlrpc.client
server_url = "https://%s:%s#%s/xmlrpc" % ('[REDACTED]', '[REDACTED]', '[REDACTED]');
server = xmlrpc.client.Server(server_url);
with open("test.torrent", "rb") as torrent:
server.load.raw_verbose(xmlrpc.client.Binary(torrent.read()),"d.delete_tied=","d.custom1.set=Test","d.directory.set=/home/[REDACTED]/files")
The load_raw command returns without an error (return code 0), but the torrent does not appear in the rutorrent UI. I seem to be experiencing the same thing as from this reddit post, but I am using the Binary class without any luck.
I am using a Whatbox seedbox.
EDIT:
After enabling logging I am seeing
1572765194 E Could not create download, the input is not a valid torrent.
when trying to load the torrent file, however manually loading the torrent file through the rutorrent UI works fine.

I needed to add "" as the first argument:
server.load.raw_verbose("",xmlrpc.client.Binary(torrent.read()),"d.delete_tied=","d.custom1.set=Test","d.directory.set=/home/[REDACTED]/files")
Not sure why, the docs don't seem to show this is needed.

File upload through python mechanize

I am trying to upload image file into the browser using mechanize.
Although there is no error, the uploaded file does not reflect when I check manually in the browser (post submit/saving).
I am using the following code to upload the files
import mechanize as mc
br = mc.Browser()
br.set_handle_robots(False)
br.select_form(nr=0)
br.form.add_file(open("test.png"), content_type="image/png",
filename='before',name="ctl00$ContentPlaceHolder1$fileuploadBeforeimages")
br.submit("ctl00$ContentPlaceHolder1$cmdSave")
# this is supposed to save the form on the webpage. It saves the texts in the other fields, whereas the image does not show up.
The add file command seems to work. I can confirm this because when I print br.forms()[0] the file details show up (<FileControl(ctl00$ContentPlaceHolder1$fileuploadBeforeimages=before)>).
But there is no sign of the image file post this code snippet. I have checked several examples which include br.submit() without any specific button control, when I do this no page is saved on the website.
What am I missing?
Thanks in advance.
EDIT
When I manually try to upload the file, I see a pop-up asking for confirmation. Under inspect, this is present as
onchange="if (confirm('Upload ' + this.value + '?')) this.form.submit();"
I am not sure if this is a JavaScript element and mechanize cannot pass through this part for upload function. Can someone confirm this.?

you can just put 'rb' in front of image name like this:
br.form.add_file(open("test.png",'rb'),'images/png',filename,name='file')

I can open picture from local, but when I visit from website there is always present contain some errors

I can locally visit my picture, but when I want my visit from my django server there is always error.
the source code:
from django.http.response import HttpResponse
import mimetypes
fd = open(CONFIG.SERVICES_PATH + sname+'/'+url,'r')
print CONFIG.SERVICES_PATH + sname+'/'+url
mime_type_guess = mimetypes.guess_type(url)
print mime_type_guess
data = fd.read()
fd.close()
response = HttpResponse(data,mimetype = mime_type_guess[0])
the print out on console is:
E:/workspace/sydney/main/services/Hunt-Club/shop/1.jpg
('image/pjpeg', None)
I can visit the picture from local path, but when I run the django server and visit from browser it gives me errors:
http://localhost:8000/gallery/image/Hunt-Club/shop/1/” cannot be displayed because it contains errors.">
I do not know want I give correct path and read the data, there is still error for picture require.

You may want to rethink serving files through the django app.
Instead you should serve them from /static folder which in an apache setup you would configure using an alias.
That being said, check this out: image/pjpeg and image/jpeg
It says it may have something to do with serving pjpeg content-type to IE.
Hope that helps.

i find where my error is, the read image file code should be:
fd = open(CONFIG.SERVICES_PATH + sname+'/'+url,'rb')
Because in windows open file default applying open in ASCII, so the file doesn't read properly.

Better way to code this in Python

I'm a newbie to python and still on the process of learning on the go ...
I have a webserver which has list of images to load on a Device Under Test (DUT) ...
Requirement is:
if the image is already present on the server, proceed with loading the image onto the DUT.
if the image is not present on the server , then proceed with the download of the image and then upgrade the DUT.
I have written the following code but I'm quite not happy with the way I have written this, because I have a feeling that it could have been done better using some other method/s
Please suggest the areas where i could have done better and the techniques to do so..
Appreciate your time in reading this email and for your valuable suggestions.
import urllib2
url = 'http://localhost/test'
filename = 'Image60.txt' # image to Verify
def Image_Upgrade():
print 'proceeding with Image upgrade !!!'
def Image_Download():
print 'Proceeding with Image Download !!!'
resp = urllib2.urlopen(url)
flag = False
list_of_files = []
for contents in resp.readlines():
if 'Image' in contents:
c=(((contents.split('href='))[-1]).split('>')[0]).strip('"') # The content output would have html tags. so removing the tags to pick only image name
if c != filename:
list_of_files.append(c)
else:
Image_Upgrade()
flag = True
if flag==False:
Image_Download()
Thanks,
Vijay Swaminathan

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Save image with urllib.urlretrieve() - python

I found a work around...take screenshot image_element = driver.find_element_by_id('chartImg') src = image_element.get_attribute("src") if src: driver.get(src) driver.save_screenshot('screen.png') Don't know if there is a better way but this does the job

Related

How can I enable PDF page breaks from HTML, maybe using a marker in the source HTML file?

Unable to load torrent file using raw base64 form and python xmlrpc client

File upload through python mechanize

I can open picture from local, but when I visit from website there is always present contain some errors

Better way to code this in Python

Categories

Resources