get dropbox file content using dropbox URL - python

I have dropbox filepicker in my project, once user select file i am receiving URL of the dropbox, now i want a content of that file using that URL in python.
here is the link which i received from dropbox picker : https://www.dropbox.com/s/ocissavtfvvdh2g/images.png?dl=0
I have checked this link https://sodocumentation.net/dropbox-api/topic/408/downloading-a-file, but it will ask for the path, but i don't have path of the file, i just have URL of the file

I just have to use 1 in dl querystring.
https://www.dropbox.com/s/ocissavtfvvdh2g/images.png?dl=1 like this

I would suggest you to use Dropbox API for your project. I made a similar project but using Google Drive API.
Check bellow
https://www.dropbox.com/developers/documentation/python#install

It sounds like you're using the Dropbox Chooser. If you just want direct access to the data of the selected file(s), you should use the "direct" link type.
Based on your sample link, I see that you're currently using the "preview" link type, which doesn't link directly to the file data, but rather a preview page.
Also, the sample code you linked to is for accessing files via the Dropbox API with an access token, not using a link returned by the Chooser, so it isn't relevant.
If you switch to the "direct" link type, you can use the returned link to directly access the file data just with an HTTP GET request, for four hours.
Alternatively, if you need access for longer than that, you can keep using the "preview" link type but modify the link as documented here.

Related

Sharepoint API how to read file havin only sharing link to it

I'm using python office365 library to access sharepoint documents. I don't know how to access file via API that have been shared with me by sharing link. I need to get this file content and if possible metadata (last modify date). Could anyone help?
The user that I'm using have no access to this sharepoint folder other than a sharing link to a single file.
I tried many variations of normal file access API, bot by hand and by office365 library. I couldnt find a way to access a file when I have only sharing link to it.
My sharing link looks like that:
https://[redacted].sharepoint.com/:x:/s/[redacted]/dir1/dir2/ESd0HkNNSbJMhQFavQsr9-4BNHC2rHSWsnbs3zRdjtZsC3g so there is not really a filename here and I cannot read via API content of any folder per se because I have an error Attempted to perform an unathorized operation.. Authentication goes fine (when i mistake password I get different error).
According to my research and testing, you can use the following Rest API to read file (get file content):
https://xxxx.sharepoint.com/sites/xxx/_api/web/GetFolderByServerRelativeUrl('/sites/xxx/Library_Name/Folder Name')/Files('Document.docx')/$value
If you want to get last modify date, you can use the following Rest API to get the Modified field:
https://xxxx.sharepoint.com/sites/xxx/_api/web/lists/getbytitle('test_library')/Items?$select=Modified

Python - Capture auto-downloading file from aspx web page

I'm trying to export a CSV from this page via a python script. The complicated part is that the page opens after clicking the export button on this page, begins the download, and closes again, rather than just hosting the file somewhere static. I've tried using the Requests library, among other things, but the file it returns is empty.
Here's what I've done:
url = 'http://aws.state.ak.us/ApocReports/CampaignDisclosure/CDExpenditures.aspx?exportAll=True&amp%3bexportFormat=CSV&amp%3bisExport=True%22+id%3d%22M_C_sCDTransactions_csfFilter_ExportDialog_hlAllCSV?exportAll=True&exportFormat=CSV&isExport=True'
with open('CD_Transactions_02-27-2017.CSV', "wb") as file:
# get request
response = get(url)
# write to file
file.write(response.content)
I'm sure I'm missing something obvious, but I'm pulling my hair out.
It looks like the file is being generated on demand, and the url stays only valid as long as the session lasts.
There are multiple requests from the browser to the webserver (including POST requests).
So to get those files via code, you would have to simulate the browser, possibly including session state etc (and in this case also __VIEWSTATE ).
To see the whole communication, you can use developer tools in the browser (usually F12, then select NET to see the traffic), or use something like WireShark.
In other words, this won't be an easy task.
If this is open government data, it might be better to just ask that government for the data or ask for possible direct links to the (unfiltered) files (sometimes there is a public ftp server for example) - or sometimes there is an API available.
The file is created on demand but you can download it anyway. Essentially you have to:
Establish a session to save cookies and viewstate
Submit a form in order to click the export button
Grab the link which lies behind the popped-up csv-button
Follow that link and download the file
You can find working code here (if you don't mind that it's written in R): Save response from web-scraping as csv file

Python Facebook upload video from external link

I'm trying to upload a video to facebook from an external url. But I got error when I post it. I tried with local videos, and all works fine.
My simple code is :
answer = graph.post(
path="597739293577402/videos",
source='https://d3ldtt2c6t0t08.cloudfront.net/files/rhn4phpt3rh4u/2015/06/17/Z7EO2GVADLFBG6WVMKSD5IBOFI/main_OUTPUT.tmp.mp4',
)
and my error is allways the same :
FacebookError: [6000] There was a problem uploading your video file. Please try again with another file.
I looked into the docs and found the parameter file_url but it still the same issue.
The format of the video is .mp4 so it should work.
Any idea ?
Apparently this error message is very confusing. It's the same message when you've an access_token who doesn't work. For example, I've this error message when I'm trying with my user access token and not if I use the Page access token.
I've never used source, I'm pretty sure that's for reading video data off their API. Instead, I use file_url in my payload when passing video file URLs to Facebook Graph API.
Refer to their API doc for clarity on that...
It's also possible that the tmp.mp4 file extension is causing you problems. I've had issues with valid video URLs with non-typical file extensions similar to that. Is it possible to alter that at the source so that the URL doesn't have the tmp ?
A typical payload pass using Requests module to their API that works for me might look something like this:
fburl = 'https://graph-video.facebook.com/v2.3/156588/videos?access_token='+str(access)
payload = {'name': '%s' %(videoName), 'description': '%s' %(videoDescription), 'file_url': '%s' %(videoUrl)}
flag = requests.post(fburl, data=payload).text
print flag
fb_res = json.loads(flag)
I would also highly recommend that you obtain a permanent page access token. It's the best way to mitigate the complexities of Facebook's oAuth process.
facebook: permanent Page Access Token?

Python - Direct linking blocking via iFrames, can I still get the binaries?

I have a scraper script that pulls binary content off publishers websites. Its built to replace the manual action of saving hundreds of individual pdf files that colleagues would other wise have to undertake.
The websites are credential based, and we have the correct credentials and permissions to collect this content.
I have encountered a website that has the pdf file inside an iFrame.
I can extract the content URL from the HTML. When I feed the URL to the content grabber, I collect a small piece of HTML that says: <html><body>Forbidden: Direct file requests are not allowed.</body></html>
I can feed the URL directly to the browser, and the PDF file resolves correctly.
I am assuming that there is a session cookie (or something, I'm not 100% comfortable with the terminology) that gets sent with the request to show that the GET request comes from a live session, not a remote link.
I looked at the refering URL, and saw these different URLs that point to the same article that I collected over a day of testing (I have scrubbed identifers from the URL):-
http://content_provider.com/NDM3NTYyNi45MTcxODM%3D/elibrary//title/issue/article.pdf
http://content_provider.com/NDM3NjYyMS4wNjU3MzY%3D/elibrary//title/issue/article.pdf
http://content_provider.com/NDM3Njc3Mi4wOTY3MDM%3D/elibrary//title/issue/article.pdf
http://content_provider.com/NDM3Njg3Ni4yOTc0NDg%3D/elibrary//title/issue/article.pdf
This suggests that there is something in the URL that is unique, and needs associating to something else to circumvent the direct link detector.
Any suggestions on how to get round this problem?
OK. The answer was Cookies and headers. I collected the get header info via httpfox and made a identical header object in my script, and i grabbed the session ID from request.cookie and sent the cookie with each request.
For good measure I also set the user agent to a known working browser agent, just in case the server was checking agent details.
Works fine.

What's a Django/Python solution for providing a one-time url for people to download files?

I'm looking for a way to sell someone a card at an event that will have a unique code that they will be able to use later in order to download a file (mp3, pdf, etc.) only one time and mask the true file location so a savvy person downloading the file won't be able to download the file more than once. It would be nice to host the file on Amazon S3 to save on bandwidth where our server is co-located.
My thought for the codes would be to pre-generate the unique codes that will get printed on the cards and store those in a database that could also have a field that stores the number of times the file was downloaded. This way we could set how many attempts we would allow the user for downloading the file.
The part that I need direction on is how do I hide/mask the original file location so people can't steal that url and then download the file as many times as they want. I've done Google searches and I'm either not searching using the right keywords or there aren't very many libraries or snippets out there already for this type of thing.
I'm guessing that I might be able to rig something up using django.views.static.serve that acts as a sort of proxy between the actual file and the user downloading the file. The only drawback to this method I would think is that I would need to use the actual web server and wouldn't be able to store the file on Amazon S3.
Any suggestions or thoughts are greatly appreciated.
Neat idea. However, I would warn against the single-download method, because there is no guarantee that their first download attempt will be successful. Perhaps use a time-expiration method instead?
But it is certainly possible to do this with Django. Here is an outline of the basic approach:
Set up a django url for serving these files
Use a GET parameter which is a unique string to identify which file to get.
Keep a database table which has a FileField for the file to download. This table maps the unique strings to the location of the file on the file system.
To serve the file as a download, set the response headers in the view like this:
(path is the location of the file to serve)
with open(path, 'rb') as f:
response = HttpResponse(f.read())
response['Content-Type'] = 'application/octet-stream';
response['Content-Disposition'] = 'attachment; filename="%s"' % 'insert_filename_here'
return response
Since we are using this Django page to serve the file, the user cannot find out the original file location.
You can just use something simple such as mod_xsendfile. This functionality is also available in other popular webservers such lighttpd or nginx.
It works like this: when enabled your application (e.g. a trivial PHP script) can send a special response header, causing the webserver to serve a static file.
If you want it to work with S3 you will need to handle each and every request this way, meaning the traffic will go through your site, from there to AWS, back to your site and back to the client. Does S3 support symbolic links / aliases? If so you might just redirect a valid user to one of the symbolic URLs and delete that symlink after a couple of hours.

Categories

Resources