SWF file loads a new url, how to grab it using Python? - python

I'll start with saying I'm not very familiar with AS3 coding at all, which I'm pretty sure SWF files are coded with (someone can correct me if I'm wrong)
I have a SWF file which accepts an ID parameter, within the code it takes the ID and performs some hash routines on it, eventually produces a new 'token' and within the code loads a new url using this token
I found this by taking the swf file to showmycode and decompiling
My code is in Python and the SWF file is online, I could download and save it locally
Is it possible to somehow execute the swf in python or by using urllib to grab this new url?
It doesn't seem to act the same as a redirect url, as when I do:
request = urllib2.Request(url)
response = urllib2.urlopen(request)
print response.geturl()
Just returns the url that I am requesting, so I'm not sure how or even if I can grab what is being spit out
Edit - This is the MD5 that is being used - https://code.google.com/p/as3corelib/source/browse/trunk/src/com/adobe/crypto/MD5.as?r=51
Trying to find a Python equivalent

Execute the swf in Python? As far as I understand, you want to have the same token transformation functionality developed in Python, right?
If so - you just need to read the code and translate it into your own app. You cannot run swf from python, nor you will get any response (or "spit out" as you call it). Flash is an executable file ran from a plugin (virtual machine). You won't be able to grab anything from it nor you will be able to execute it by your own.

Looks like I was making things too complicated
I was able to just use python hashlib.md5 to produce the same results as the AS3 code
m = hashlib.md5()
m.update('test')
m.hexdigest()

Related

Downloading an excel report from website in python saves a blank file

I have about 8 reports that I need to pull from a system every week which takes quite a bit of time so I am working on automating this process. I am using requests to login to the site and download the files. However, when I download the file using my python script the file comes back blank. When I use the same link to download from the browser its not blank. Below is my code:
payload = {
'txtUsername': 'uid',
'txtPassword': 'pass'
}
domain = 'https://example.com/login.aspx?ReturnUrl=%2fiweb%2f'
path = 'C:\\Users\\workspace\\data-in\\'
with requests.Session() as s:
p = s.post(domain, data=payload)
r = s.get('https://example.com/forms/MSWordFromSql.aspx?ContentType=excel&object=Organization&FormKey=f326228c-3c49-4531-b80d-d59600485557')
with open(path + 'report1.xls', 'wb') as f:
f.write(r.content)
A little about the url. When I was looking for the url I found that it's wrapped in some JS.
Export Raw Data to Excel
However, when I take a look at the path from which the files was downloaded the true location for the report is this:
https://example.com/forms/MSWordFromSql.aspx?ContentType=excel&object=Organization&FormKey=f326228c-3c49-4531-b80d-d59600485557
This is the URL I am using in my code to download a report. After I run the script the file is created, named and saved to the correct directory but its empty. As I mentioned at the top of the thread, if I simply copy the URL about to the browser it downloads the report with no problem.
I was also thinking about using Selenium to get this done but the issue is I cannot rename the files while they are being downloaded. I need each file to have a specific name because all of the downloaded reports are then used in another automation script.
As #Lucas mentioned, your Python code likely sends a different request than your browser does, and thus receives a different response.
I'd use the browser dev tools to inspect the request the browser makes to initiate the download. Use "Copy as curl" and try to reproduce the correct behavior from the command line.
Then reduce the differences between the curl request and the one your python code makes by removing unnecessary parts from the curl invocations and adding the necessary headers to your python code. https://curl.trillworks.com/ can help with the latter.

Python - Capture auto-downloading file from aspx web page

I'm trying to export a CSV from this page via a python script. The complicated part is that the page opens after clicking the export button on this page, begins the download, and closes again, rather than just hosting the file somewhere static. I've tried using the Requests library, among other things, but the file it returns is empty.
Here's what I've done:
url = 'http://aws.state.ak.us/ApocReports/CampaignDisclosure/CDExpenditures.aspx?exportAll=True&amp%3bexportFormat=CSV&amp%3bisExport=True%22+id%3d%22M_C_sCDTransactions_csfFilter_ExportDialog_hlAllCSV?exportAll=True&exportFormat=CSV&isExport=True'
with open('CD_Transactions_02-27-2017.CSV', "wb") as file:
# get request
response = get(url)
# write to file
file.write(response.content)
I'm sure I'm missing something obvious, but I'm pulling my hair out.
It looks like the file is being generated on demand, and the url stays only valid as long as the session lasts.
There are multiple requests from the browser to the webserver (including POST requests).
So to get those files via code, you would have to simulate the browser, possibly including session state etc (and in this case also __VIEWSTATE ).
To see the whole communication, you can use developer tools in the browser (usually F12, then select NET to see the traffic), or use something like WireShark.
In other words, this won't be an easy task.
If this is open government data, it might be better to just ask that government for the data or ask for possible direct links to the (unfiltered) files (sometimes there is a public ftp server for example) - or sometimes there is an API available.
The file is created on demand but you can download it anyway. Essentially you have to:
Establish a session to save cookies and viewstate
Submit a form in order to click the export button
Grab the link which lies behind the popped-up csv-button
Follow that link and download the file
You can find working code here (if you don't mind that it's written in R): Save response from web-scraping as csv file

script that able to download zip file from server

Can you please help me to make script in python that do the following:
download zip file http (I already have a code for this one)
download zip file in file://<server location>, I have problem with this one. the location of the file is in file://<server location>file.zip
can't download the #2 file :(
Code below, #1 is working if using HTTP, but when using file://// it's not working. Anybody has idea how to download a zip file from file:////?
import urllib2
response = urllib2.urlopen('file:////server/file.zip')
print response.info()
html = response.read()
# do something
response.close() # best practice to close the file
urllib2 does not have handlers for the file:// protocol; I think it will open local files if there is no protocol given (//server/file.zip), but I've never used that, and haven't tested it. If you have a local file name, you can just use open() and read() rather than urrlib2.
Your code will be simpler if you use with closing (from contextlib); opened files are already context managers in Python 2.7 and 3.x, so they're even easier to use.

Python webrowser open url with bookmarks like www.something.com/file.html#top

I am using a hmtl file as a help document for my program, and would really like to be able to open the file at a specific point. i assumed i would be able to do this using the built in webbrowser module by specifying a url with a bookmark.
this is my html file name: help.html
i assumed that i would be able to use: help.html#top
this is the code i am using to open the file, this works fine:
webbrowser.open("Files\help.html")
and this is the code i have been trying to use to open at a specific point which ie9 apparently cant display (not sure why it is trying to load in ie9 as chrome is my default browser, and the working one above loads in chrome):
webbrowser.open("Files\help.html#2.1.0")
any ideas guys?
webbrowser.open() calls the browser from the command line. So you might try to do that yourself first. If that doesn't work, it's likely that your browser just doesn't support that for local files or something.
With Ubuntu+Firefox for example, webbrowser.open() does what you ask. (but - as Dave Webb said in his answer - you do have to provide a file: url, not just a filename).
(not on Windows at the moment, so haven't checked there)
As for why it doesn't load chrome but ie9: (you can look in the webbrowser.py code yourself if you want) I think it does try to use your default webbrowser, by doing os.startfile(url). What happens when you doubleclick your help.html file, of when you just type help.html (adjust path as needed) at the command line? It should do the same.
EDIT:
It seems that it doesn't always use the command line. On Windows, when trying to use the default browser, it uses os.startfile() which in turn uses the win32 ShellExecute api. ShellExecute can be used to perform certain actions on a file, folder or URL, like "open", "edit" or "print" with its default application. In this case, ShellExecute is asked to "open" the URL.
It seems however, that ShellExecute ignores the fragment identifier (the part after #) when opening file: urls. Strangely enough, this is not the case with http:urls. Presumably, a file: url is just converted to a plain filename first.
There seems to be little you can do about this except:
write something that "does the right thing" yourself (and register it as a browser controller for the webbrowsermodule, and use webbrowser.get() to get your controller, see docs)
as many applications do: configure the browser you want to use (or make it possible for your users to do so). The easiest way would be to set the BROWSER environment variable (see webbrowser module docs)
serve the file via a localhost http server, and open the http url, which would then be something like "http://. "http://localhost:8000/help.html#2.1.0". (The SimpleHttpServer module might come in handy)
Or, the easiest way: as you seem to be on windows: just try to open internet explorer specifically:
try:
browser = webbrowser.get('c:\\Program Files\\Internet Explorer\\IEXPLORE.EXE')
except Webbrowser.Error:
browser = webbrowser.get()
browser.open(url)
(This will fall back to using the default, so your code would still work on other platforms)
I think webbrowser is expecting a URL, so have you tried something like:
webbrowser.open("file://c:/path/to/files/html.html#2.1.0")

What's a Django/Python solution for providing a one-time url for people to download files?

I'm looking for a way to sell someone a card at an event that will have a unique code that they will be able to use later in order to download a file (mp3, pdf, etc.) only one time and mask the true file location so a savvy person downloading the file won't be able to download the file more than once. It would be nice to host the file on Amazon S3 to save on bandwidth where our server is co-located.
My thought for the codes would be to pre-generate the unique codes that will get printed on the cards and store those in a database that could also have a field that stores the number of times the file was downloaded. This way we could set how many attempts we would allow the user for downloading the file.
The part that I need direction on is how do I hide/mask the original file location so people can't steal that url and then download the file as many times as they want. I've done Google searches and I'm either not searching using the right keywords or there aren't very many libraries or snippets out there already for this type of thing.
I'm guessing that I might be able to rig something up using django.views.static.serve that acts as a sort of proxy between the actual file and the user downloading the file. The only drawback to this method I would think is that I would need to use the actual web server and wouldn't be able to store the file on Amazon S3.
Any suggestions or thoughts are greatly appreciated.
Neat idea. However, I would warn against the single-download method, because there is no guarantee that their first download attempt will be successful. Perhaps use a time-expiration method instead?
But it is certainly possible to do this with Django. Here is an outline of the basic approach:
Set up a django url for serving these files
Use a GET parameter which is a unique string to identify which file to get.
Keep a database table which has a FileField for the file to download. This table maps the unique strings to the location of the file on the file system.
To serve the file as a download, set the response headers in the view like this:
(path is the location of the file to serve)
with open(path, 'rb') as f:
response = HttpResponse(f.read())
response['Content-Type'] = 'application/octet-stream';
response['Content-Disposition'] = 'attachment; filename="%s"' % 'insert_filename_here'
return response
Since we are using this Django page to serve the file, the user cannot find out the original file location.
You can just use something simple such as mod_xsendfile. This functionality is also available in other popular webservers such lighttpd or nginx.
It works like this: when enabled your application (e.g. a trivial PHP script) can send a special response header, causing the webserver to serve a static file.
If you want it to work with S3 you will need to handle each and every request this way, meaning the traffic will go through your site, from there to AWS, back to your site and back to the client. Does S3 support symbolic links / aliases? If so you might just redirect a valid user to one of the symbolic URLs and delete that symlink after a couple of hours.

Categories

Resources