How to get image details from firefox webdriver?

How to get image details from firefox webdriver? - python

I've got an image on a page rendered by Firefox via Webdriver, I can get its object (wd.find_element_by_xpath("id('main')/form/p[5]/img")), but how can I get its body either base64-encoded or just a location on my hard drive?
PS: please don't suggest getting the src and fetching it with an external tool. I want the image I already have in the browser.

Cached images can be extracted from Firefox's cache by navigating to an URL like this one:
about:cache-entry?client=HTTP&sb=1&key=http://your.server/image.png
The resulting page will contain a line with the "file on disk" label, like this one:
file on disk: /home/fviktor/.mozilla/firefox/7jx6k3hx.default/Cache/CF7379D8d01
This page will also contain the hex dump of the file's contents. You can load the file from that path or parse the hex dump. Please note, that the path can also be none in the case of small files cached only in memory. Your only option is parsing the hex dump in this case.
Maybe there's a way to suppress the hex dump if there's a cache file on the disk, but I'm not sure about it.

I've created a little script for extracting data from browser cache. You can extract cache entries using it. Check it out at this gist. Check this post for usage guide.

fvictor's answer helped, but the syntax has changed. In Firefox version 60.9esr, the entries are stored as about:cache-entry?storage=disk&context=&eid=&uri=https://example.com/images/img.png, and the page doesn't contain a file on disk label. But at the bottom you will still find the hex dump.

Related

How to upload file with python mechanicalsoup to ASP.net site inside a form

I'm trying to automate the interaction of a website. The website is built with ASP.net so most of the interactions work as a form under the hood. One of the things I need to do is upload a file. In Chrome's inspect window I see this part of the form:
ctl00$ctl00$MainContent$RPBVContent$ucPriceUploader$FileUpload1: (binary)
Chrome's inspect doesn't show the form information when I actually submit the file. It only shows this when I try to upload without having selected the file.
I previously tried doing:
with open('pricestoy.csv', 'rb') as f:
pp=browser.submit_selected(files={'prices.csv': f})
but the website didn't seem to receive the file even though it returned a 200.
It seems like I need to do something like
with open('pricestoy.csv', 'rb') as f:
browser['ctl00$ctl00$MainContent$RPBVContent$ucPriceUploader$cmdUpload1']=f.read()
pp=browser.submit_selected()
but that's got the same issue where I get a 200 but the site doesn't seem to recognize having got a file.
if I do pp.request.headers I see that the Content-Length is 6158289 but when I submit the file in Chrome then it has Content_Length of 6158414 so there seems to be something Chrome is adding. I don't know if that matters since it's very close.
Another difference is that Chrome has
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryg4PYqQpHVnsxwtTh
whereas the python version has
Content-Type: multipart/form-data; boundary=d5416a61760fabc3ac8e6f99229df131
At this point I'm at a loss.

The very helpful thing in figuring this out was to use Firefox instead of Chrome. Firefox has the option to 'Copy POST data' in its Inspect Network whereas Chrome doesn't. By doing that I could see that I was trying to put the file into the wrong form id. I could also see that boundary was just an arbitrary string so focusing on that was time wasted. Also I found this deep in the documentation for mechanicalsoup:
Example: uploading a file through a
field (provide the path to the local file, and its content will be
uploaded):
form.set("tagname", path_to_local_file)
So it turned out to be as simple as doing
browser.form.set('ctl00$ctl00$MainContent$RPBVContent$ucPriceUploader$FileUpload1', 'pricestoy.csv')
browser.submit_selected()

Python web crawling/scraping - Download diagram(PDF or TIFF) from Webpage and save to Local machine

I have one website which has search button and i need to give some numeric value and give enter button. It will go to another page and it display some content in which there are some URL, if i click that URL, it will ask to save diagram and the diagram is either tiff format or PDF.
To download Tiff format diagram, i am using swift plugin in internet explore and save to my machine
Here i am doing this work manually, just i want to do automate this whole process.
Steps:
Using python request module and pass the URL with numeric value to post method
save response content to variable
perform pattern matching and fetch url
click the url but i am stuck with this part to save the diagram local since it is tiff.
is there any module to download tiff based diagram and save to local machine?

Just I want to share How i resolved the issue for the above question and it might be useful for others.
Since tiff image needs to be downloaded from web, so I used python request module with pillow module as below,
from PIL import image
import requests
tiffURL='https://***.tif'
img=Image.open(requests.get(tiffURL,stream=True).raw)
img.save('imagename.jpg')
#img.save('imagename.jpg',quality=95)
Note:
tiff image can not be viewed by normal editor , so i converted to jpg
if you want high resoultion, you can pass quality=95 to save method

Send PDF file path to client to download after covnersion in WeasyPrint

In my Django app, I'm using WeasyPrint to convert html report to pdf. I need to send the converted file back to client so they can download it. But I don't see any code on WeasyPrint site where we can get the path of saved file or know in any way where the file was saved.
If I hard code the path, like, D:/Python/Workspace/report.pdf and try to open it via javascript, it simply says that the address was not understood.
What is a better way to apporach this issue?
My code:
HTML(string=htmlContent).write_pdf(fileName,
stylesheets=[CSS(filename='css/bootstrap.min.css')])
This is all the code related to WeasyPrint that generated PDF file.

You didn't even bothered to post the relevant code, but anyway:
If you're using the Python API, you either specify the output file path when calling weasyprint.HTML().write_pdf() or get the PDF back as bytestring, as documented here - and then you can either manually save it to a file somewhere you can redirect your user to or just pass the bytestring to django's HttpResponse.
If you're using the commandline (which would be quite surprising from a Django app...), you have to specify the output path too...
IOW : I don't really understand your problem. FWIW, the whole documentation is here : http://weasyprint.readthedocs.io/en/latest/ - and there's a quite obvious link on the project's homepage (which is how I found it FWIW).
EDIT : now you posted your actual code: the answer is written in plain in the FineManual(tm):
Parameters: target – A filename, file-like object, or None
Returns:
The PDF as byte string if target is not provided or None, otherwise None
(the PDF is written to target.)
IOW, either you choose to pass the filename for the generated to be generated and serve this file to the user, or you can just pass your Django HttpResponse as target, cf this example in Django's doc.

Python 3.4 - Downloading newly uploaded text files from pastebin.com

I want to download text files from pastebin.com.
Once I start the program it should look for text files that are being uploaded and "download" them once they're uploaded.
I know how to "download" them but not how to tell Python to click on one of the public files on http://pastebin.com/archive and then click on the "raw"-button to open a new tab that contains the "raw" content.
I googled a lot but literally nothing came up that would help me.
Thanks

Well, a program doesn't know how to "click" anything :). In order to retrieve information from a page, you simply need to send a GET request at the correct url. In your case, that would be http://pastebin.com/raw/4ffLHviP or any other code of the pastebin you want to download. You can retrieve codes manually, or e.g. by applying text parsers (regex, beautifulsoup...) on the archive page.
Note that, there is an API for scraping Pastebin (see http://pastebin.com/scraping). It is strongly recommended, if you want to extract consequent content from them, to use it. It is more "polite", may offer better service, and will avoid you to be blacklisted.

To choose a file you simply do the following:
Visit the link of the file, ex. http://pastebin.com/B8A6L7Zt
The raw content is already on that page, namely inside<textarea id='paste_code'>...</textarea>. So you just cut this content off, using regex for example.

Upload image with an in-memory stream to input using Pillow + WebDriver?

I'm getting an Image from URL with Pillow, and creating an stream (BytesIO/StringIO).
r = requests.get("http://i.imgur.com/SH9lKxu.jpg")
stream = Image.open(BytesIO(r.content))
Since I want to upload this image using an <input type="file" /> with selenium WebDriver. I can do something like this to upload a file:
self.driver.find_element_by_xpath("//input[#type='file']").send_keys("PATH_TO_IMAGE")
I would like to know If its possible to upload that image from a stream without having to mess with files / file paths... I'm trying to avoid filesystem Read/Write. And do it in-memory or as much with temporary files. I'm also Wondering If that stream could be encoded to Base64, and then uploaded passing the string to the send_keys function you can see above :$
PS: Hope you like the image :P

You seem to be asking multiple questions here.
First, how do you convert a a JPEG without downloading it to a file? You're already doing that, so I don't know what you're asking here.
Next, "And do it in-memory or as much with temporary files." I don't know what this means, but you can do it with temporary files with the tempfile library in the stdlib, and you can do it in-memory too; both are easy.
Next, you want to know how to do a streaming upload with requests. The easy way to do that, as explained in Streaming Uploads, is to "simply provide a file-like object for your body". This can be a tempfile, but it can just as easily be a BytesIO. Since you're already using one in your question, I assume you know how to do this.
(As a side note, I'm not sure why you're using BytesIO(r.content) when requests already gives you a way to use a response object as a file-like object, and even to do it by streaming on demand instead of by waiting until the full content is available, but that isn't relevant here.)
If you want to upload it with selenium instead of requests… well then you do need a temporary file. The whole point of selenium is that it's scripting a web browser. You can't just type a bunch of bytes at your web browser in an upload form, you have to select a file on your filesystem. So selenium needs to fake you selecting a file on your filesystem. This is a perfect job for tempfile.NamedTemporaryFile.
Finally, "I'm also Wondering If that stream could be encoded to Base64".
Sure it can. Since you're just converting the image in-memory, you can just encode it with, e.g., base64.b64encode. Or, if you prefer, you can wrap your BytesIO in a codecs wrapper to base-64 it on the fly. But I'm not sure why you want to do that here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.