script that able to download zip file from server - python

Can you please help me to make script in python that do the following:
download zip file http (I already have a code for this one)
download zip file in file://<server location>, I have problem with this one. the location of the file is in file://<server location>file.zip
can't download the #2 file :(
Code below, #1 is working if using HTTP, but when using file://// it's not working. Anybody has idea how to download a zip file from file:////?
import urllib2
response = urllib2.urlopen('file:////server/file.zip')
print response.info()
html = response.read()
# do something
response.close() # best practice to close the file

urllib2 does not have handlers for the file:// protocol; I think it will open local files if there is no protocol given (//server/file.zip), but I've never used that, and haven't tested it. If you have a local file name, you can just use open() and read() rather than urrlib2.
Your code will be simpler if you use with closing (from contextlib); opened files are already context managers in Python 2.7 and 3.x, so they're even easier to use.

Related

Python requests module not following redirect for file download

Trying to automate downloading a .zip file from the link here (the links will always be different, but they are always in this format):
If this link is entered into a web browser, it downloads a file called Badges.zip. When trying to download it from Python with the code below, it saves to Badges.zip, but the .zip is not an archive. It's some Google Analytics code. It's like the requests module is not redirecting all the way to the file. I've tried get, head, trying to stream the download, and lots of other ways and I can't get it to download the file correctly. Here's the current code I'm using:
import requests
url = "https://schools.clever.com/files/badges.zip?fromEmail=1&randomID=5f9cffb0ee8c81418ac2e019"
r = requests.get(url, allow_redirects=True)
open('c:/data/Badges.zip', 'wb').write(r.content)
I'm open to any ideas. Have tried other modules and get similar results. I'm even open to kicking off external utilities if needed like wget or curl (which I haven't had any luck with yet either).
Note that the Clever Badges in this download have been voided to prevent use.
Thanks!

Downloading an excel report from website in python saves a blank file

I have about 8 reports that I need to pull from a system every week which takes quite a bit of time so I am working on automating this process. I am using requests to login to the site and download the files. However, when I download the file using my python script the file comes back blank. When I use the same link to download from the browser its not blank. Below is my code:
payload = {
'txtUsername': 'uid',
'txtPassword': 'pass'
}
domain = 'https://example.com/login.aspx?ReturnUrl=%2fiweb%2f'
path = 'C:\\Users\\workspace\\data-in\\'
with requests.Session() as s:
p = s.post(domain, data=payload)
r = s.get('https://example.com/forms/MSWordFromSql.aspx?ContentType=excel&object=Organization&FormKey=f326228c-3c49-4531-b80d-d59600485557')
with open(path + 'report1.xls', 'wb') as f:
f.write(r.content)
A little about the url. When I was looking for the url I found that it's wrapped in some JS.
Export Raw Data to Excel
However, when I take a look at the path from which the files was downloaded the true location for the report is this:
https://example.com/forms/MSWordFromSql.aspx?ContentType=excel&object=Organization&FormKey=f326228c-3c49-4531-b80d-d59600485557
This is the URL I am using in my code to download a report. After I run the script the file is created, named and saved to the correct directory but its empty. As I mentioned at the top of the thread, if I simply copy the URL about to the browser it downloads the report with no problem.
I was also thinking about using Selenium to get this done but the issue is I cannot rename the files while they are being downloaded. I need each file to have a specific name because all of the downloaded reports are then used in another automation script.
As #Lucas mentioned, your Python code likely sends a different request than your browser does, and thus receives a different response.
I'd use the browser dev tools to inspect the request the browser makes to initiate the download. Use "Copy as curl" and try to reproduce the correct behavior from the command line.
Then reduce the differences between the curl request and the one your python code makes by removing unnecessary parts from the curl invocations and adding the necessary headers to your python code. https://curl.trillworks.com/ can help with the latter.

How to download pdf files using Python?

I was looking for a way to download pdf files in python, and I saw answers on other questions recommending the urllib module. I tried to download a pdf file using it, but when I try to open the downloaded file, a message shows up saying that the file cannot be opened.
error message
This is the code I used-
import urllib
urllib.urlretrieve("http://papers.gceguide.com/A%20Levels/Mathematics%20(9709)/9709_s11_qp_42.pdf", "9709_s11_qp_42.pdf")
What am I doing wrong? Also, the file automatically saves to the directory my python file is in. How do I change the location to which it gets saved?
Edit-
I tried again with the link to a sample pdf, http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf
The code is working with this link, so why won't it work for the other one?
Try this. It works.
import requests
url='https://pdfs.semanticscholar.org/c029/baf196f33050ceea9ecbf90f054fd5654277.pdf'
r = requests.get(url, stream=True)
with open('C:/Users/MICRO HARD/myfile.pdf', 'wb') as f:
f.write(r.content)
You can also use wget to download pdfs via a link:
import wget
wget.download(link)
Here's a guide about how to search & download all pdf files from a webpage in one go: https://medium.com/the-innovation/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48
You can't download the pdf content from the given url using
requests or urllib.
Because initially the given url was pointed to another web page after that
only it loads the pdf.
If you have doubt save the response as html instead of pdf.
You need to use headless browsers like panthomJS to download files
from these kind of web pages.

How do I simply transfer and download a file from an FTP server with Python?

I've noticed that the FTP library doesn't seem to have a method or function of straight up downloading a file from an FTP server. The only function I've come across for downloading a file is ftp.retrbinary and in order to transfer the file contents, you essentially have to write the contents to a pre-existing file on the local computer where the Python script is located.
Is there a way to download the file as-is without having to create a local file first?
Edit: I think the better question to ask is: do I need to have a pre-existing file in order to download an FTP server file's contents?
To download a file from FTP this code will do the job
import urllib urllib.urlretrieve('ftp://server/path/to/file', 'file') # if you need to pass credentials: # urllib.urlretrieve('ftp://username:password#server/path/to/file', 'file')

SWF file loads a new url, how to grab it using Python?

I'll start with saying I'm not very familiar with AS3 coding at all, which I'm pretty sure SWF files are coded with (someone can correct me if I'm wrong)
I have a SWF file which accepts an ID parameter, within the code it takes the ID and performs some hash routines on it, eventually produces a new 'token' and within the code loads a new url using this token
I found this by taking the swf file to showmycode and decompiling
My code is in Python and the SWF file is online, I could download and save it locally
Is it possible to somehow execute the swf in python or by using urllib to grab this new url?
It doesn't seem to act the same as a redirect url, as when I do:
request = urllib2.Request(url)
response = urllib2.urlopen(request)
print response.geturl()
Just returns the url that I am requesting, so I'm not sure how or even if I can grab what is being spit out
Edit - This is the MD5 that is being used - https://code.google.com/p/as3corelib/source/browse/trunk/src/com/adobe/crypto/MD5.as?r=51
Trying to find a Python equivalent
Execute the swf in Python? As far as I understand, you want to have the same token transformation functionality developed in Python, right?
If so - you just need to read the code and translate it into your own app. You cannot run swf from python, nor you will get any response (or "spit out" as you call it). Flash is an executable file ran from a plugin (virtual machine). You won't be able to grab anything from it nor you will be able to execute it by your own.
Looks like I was making things too complicated
I was able to just use python hashlib.md5 to produce the same results as the AS3 code
m = hashlib.md5()
m.update('test')
m.hexdigest()

Categories

Resources