Retrieve audio/mp3 file from URL and save to blobstore

Retrieve audio/mp3 file from URL and save to blobstore - python

I am trying to save a file (audio/mp3 in this case) to the App Engine blobstore, but with mixed success. Everything seems to work, a file is saved in the blobstore, of the right type, but it essentially empty (1.5kB vs. the expected 6.5kB) and so won't play. The URL in question is http://translate.google.com/translate_tts?ie=UTF-8&tl=en&q=revenues+in+new+york+were+56+million
The app engine logs do not show anything unusual - all parts are executing as expected... Any pointers would be appreciated!
class Dictation(webapp2.RequestHandler):
def post(self):
sentence = self.request.get('words')
# Google Translate API cannot handle strings > 100 characters
sentence = sentence[:100]
# Replace the non-alphanumeric characters
# The spaces in the sentence are replaced with the Plus symbol
sentence = urllib.urlencode({'q': sentence})
# Name of the MP3 file generated using the MD5 hash
mp3_file = hashlib.md5(sentence).hexdigest()
# Save the MP3 file in this folder with the .mp3 extension
mp3_file = mp3_file + ".mp3"
# Create the full URL
url = 'http://translate.google.com/translate_tts?ie=UTF-8&tl=en&' + sentence
# upload to blobstore
mp3_file = files.blobstore.create(mime_type = 'audio/mp3', _blobinfo_uploaded_filename = mp3_file)
mp3 = urllib.urlopen(url).read()
with files.open(mp3_file, 'a') as f:
f.write(mp3)
files.finalize(mp3_file)
blob_key = files.blobstore.get_blob_key(mp3_file)
logging.info('blob_key identified as %s', blob_key)

The problem has nothing to do with your code; it is correctly retrieving the data from the URL you gave.
For example, if I try this at the command line:
$ curl -O http://translate.google.com/translate_tts?ie=UTF-8&tl=en&q=revenues+in+new+york+were+56+million
I get a 1.5kB 403 error page, whose contents say:
403. That's an error.
Your client does not have permission to get URL /translate_tts?ie=UTF-8&tl=en&q=revenues+in+new+york+were+56+million from this server. (Client IP address: 1.2.3.4)
That’s all we know.
And your code does the exact same thing, whether run in GAE or directly in the interactive interpreter.
Most likely, the reason it works in your browser is that you do have permissions. So, what does that mean? It could mean that you have a valid SID cookie from google.com in your browser, but not your script. Or it could mean that your browser's user agent is recognized as something that can play HTML5 audio, but your script's isn't. Or…
Well, you can try to reverse-engineer what's different in the cookies, headers, etc. between your browser and your script, and narrow it down to the relevant difference, and use explicit headers or cookies or whatever you need to work around the problem.
But it will just break the next time Google changes anything.
And Google will probably not be happy with you if you try this. They offer a Google Translate API service that they want you to use, and they got rid of all of the free options for that API because of "substantial economic burden caused by extensive abuse." Trying to publish a Google App Engine web service that evades Google's API pricing by scraping their pages is probably not the kind of thing they enjoy their customers doing.

Related

Gaining authorization to modify Spotify playlists using spotipy for Python3

I'm currently attempting to use spotipy, a python3 module, to access and edit my personal Spotify premium account. I've followed the tutorial on https://github.com/plamere/spotipy/blob/master/docs/index.rst using the util.prompt_for_user_token method by entering the necessary parameters directly (username, client ID, secret ID, scope and redirect uri). Everything seems to be fine up to this part. My code (fillers for username, client id and client secret for security reasons) :
code
It opens up my default web browser and redirects me to my redirect url with the code in it. At this point, I copy and paste the redirect url (as prompted) and hit enter. It returns the following error:
Error
My redirect uri is 'http://google.com/' for this specific example. However, I've tried multiple redirect uris but they all seem to produce the same error for me. (and yes, I did set my redirect uri as whitespace for my application). I've spent hours trying to fix this issue by looking at online tutorials, trying different redirect urls, changing my code but have yet to make any progress. I'm hoping I am just overlooking a simple mistake here! Any feedback on how to fix this is much appreciated!
If it matters: I'm using the IDE PyCharm.

I had to use two different solutions to deal with the redirect_uri issue depending on which IDE I was using. For Jupyter Lab/Notebook, I could use a localhost for the redirect_url
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id="your_client_id", client_secret="your_client_secret", redirect_uri="https://localhost:8890/callback/", scope="user-library-read"))
For Google Colab, I had to use a publicly accessible website. I think "https://google.com/" should work but I used my band's website so I'd remember that the redirect_uri had to match the one in your Spotify Develop dashboard settings.
sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id="your_client_id", client_secret="your_client_secret", redirect_uri="https://yourwebsite.com/", scope="user-library-read"))
I just ended up using my bands website because it was easier for me to remember. Make sure to go to the Spotify developer dashboard (https://developer.spotify.com/dashboard/applications) and match the redirect_uri with what you are planning to use at that time.

I think it is your redirect URL - working for me with:
import os
import spotipy.util as util
# credentials
user = 'username'
desired_scope = 'playlist-modify-private'
id = os.environ.get('SPOT_CLIENT')
secret = os.environ.get('SPOT_SECRET')
uri = 'https://localhost'
token = util.prompt_for_user_token(username=user,
scope=desired_scope,
client_id=id,
client_secret=secret,
redirect_uri=uri)
I think for your redirect url spotify requires the initial http(s) part - don't forget to add it to the white-list in your Spotify for Developers app too, as otherwise you will get 'invalid-redirect-uri'.

Python - Capture auto-downloading file from aspx web page

I'm trying to export a CSV from this page via a python script. The complicated part is that the page opens after clicking the export button on this page, begins the download, and closes again, rather than just hosting the file somewhere static. I've tried using the Requests library, among other things, but the file it returns is empty.
Here's what I've done:
url = 'http://aws.state.ak.us/ApocReports/CampaignDisclosure/CDExpenditures.aspx?exportAll=True&amp%3bexportFormat=CSV&amp%3bisExport=True%22+id%3d%22M_C_sCDTransactions_csfFilter_ExportDialog_hlAllCSV?exportAll=True&exportFormat=CSV&isExport=True'
with open('CD_Transactions_02-27-2017.CSV', "wb") as file:
# get request
response = get(url)
# write to file
file.write(response.content)
I'm sure I'm missing something obvious, but I'm pulling my hair out.

It looks like the file is being generated on demand, and the url stays only valid as long as the session lasts.
There are multiple requests from the browser to the webserver (including POST requests).
So to get those files via code, you would have to simulate the browser, possibly including session state etc (and in this case also __VIEWSTATE ).
To see the whole communication, you can use developer tools in the browser (usually F12, then select NET to see the traffic), or use something like WireShark.
In other words, this won't be an easy task.
If this is open government data, it might be better to just ask that government for the data or ask for possible direct links to the (unfiltered) files (sometimes there is a public ftp server for example) - or sometimes there is an API available.

The file is created on demand but you can download it anyway. Essentially you have to:
Establish a session to save cookies and viewstate
Submit a form in order to click the export button
Grab the link which lies behind the popped-up csv-button
Follow that link and download the file
You can find working code here (if you don't mind that it's written in R): Save response from web-scraping as csv file

Simple way to use the google contacts API from a python script

I'm currently trying to write a script and I would need to access my GMail contacts to finish it. I tried googling that a bit and it looks like there are a few libraries to do that, but some answers seems very old, and some others aren't that clear.
What is the correct way currently to access the contacts API from a script ? Using https://github.com/google/gdata-python-client I assume ?
Any recent response seems to involve the user copying a link into his browser by hand, getting redirected to some non-existent URL and being asked to copy that URL back in the script to parse the code in it. That seems perfectly ridiculous, I do hope there is a proper way to access the API for non-web applications ?
I'll be the only user of that script so I don't mind having to put my full e-mail address and password in if that makes it easier, as long as I don't have to keep re-authenticating by hand. Just want to authorize it once and have it work forever. I'm just trying to find the contact's name from the phone number so I don't even need to access everything.

Here is what I'm using in the end :
if arg == "--init":
gflow = OAuth2WebServerFlow(client_id='<ID>', client_secret='<secret>',
scope='https://www.googleapis.com/auth/contacts.readonly',
redirect_uri='http://localhost');
gflow.params['access_type'] = 'offline';
auth_uri = gflow.step1_get_authorize_url();
weechat.prnt("", "Please go to " + auth_uri + " and come back here with the code. Run /gauth code");
else:
credentials = gflow.step2_exchange(arg);
storage.put(credentials);
http = httplib2.Http();
http = credentials.authorize(http);
people = build('people', 'v1', http=http);
Works fine with that. I do need to copy / paste the URL by hand and then get the code out of the redirection, but since I put the access_type to offline I have to do that only once.

Python Facebook upload video from external link

I'm trying to upload a video to facebook from an external url. But I got error when I post it. I tried with local videos, and all works fine.
My simple code is :
answer = graph.post(
path="597739293577402/videos",
source='https://d3ldtt2c6t0t08.cloudfront.net/files/rhn4phpt3rh4u/2015/06/17/Z7EO2GVADLFBG6WVMKSD5IBOFI/main_OUTPUT.tmp.mp4',
)
and my error is allways the same :
FacebookError: [6000] There was a problem uploading your video file. Please try again with another file.
I looked into the docs and found the parameter file_url but it still the same issue.
The format of the video is .mp4 so it should work.
Any idea ?
Apparently this error message is very confusing. It's the same message when you've an access_token who doesn't work. For example, I've this error message when I'm trying with my user access token and not if I use the Page access token.

I've never used source, I'm pretty sure that's for reading video data off their API. Instead, I use file_url in my payload when passing video file URLs to Facebook Graph API.
Refer to their API doc for clarity on that...
It's also possible that the tmp.mp4 file extension is causing you problems. I've had issues with valid video URLs with non-typical file extensions similar to that. Is it possible to alter that at the source so that the URL doesn't have the tmp ?
A typical payload pass using Requests module to their API that works for me might look something like this:
fburl = 'https://graph-video.facebook.com/v2.3/156588/videos?access_token='+str(access)
payload = {'name': '%s' %(videoName), 'description': '%s' %(videoDescription), 'file_url': '%s' %(videoUrl)}
flag = requests.post(fburl, data=payload).text
print flag
fb_res = json.loads(flag)
I would also highly recommend that you obtain a permanent page access token. It's the best way to mitigate the complexities of Facebook's oAuth process.
facebook: permanent Page Access Token?

What's a Django/Python solution for providing a one-time url for people to download files?

I'm looking for a way to sell someone a card at an event that will have a unique code that they will be able to use later in order to download a file (mp3, pdf, etc.) only one time and mask the true file location so a savvy person downloading the file won't be able to download the file more than once. It would be nice to host the file on Amazon S3 to save on bandwidth where our server is co-located.
My thought for the codes would be to pre-generate the unique codes that will get printed on the cards and store those in a database that could also have a field that stores the number of times the file was downloaded. This way we could set how many attempts we would allow the user for downloading the file.
The part that I need direction on is how do I hide/mask the original file location so people can't steal that url and then download the file as many times as they want. I've done Google searches and I'm either not searching using the right keywords or there aren't very many libraries or snippets out there already for this type of thing.
I'm guessing that I might be able to rig something up using django.views.static.serve that acts as a sort of proxy between the actual file and the user downloading the file. The only drawback to this method I would think is that I would need to use the actual web server and wouldn't be able to store the file on Amazon S3.
Any suggestions or thoughts are greatly appreciated.

Neat idea. However, I would warn against the single-download method, because there is no guarantee that their first download attempt will be successful. Perhaps use a time-expiration method instead?
But it is certainly possible to do this with Django. Here is an outline of the basic approach:
Set up a django url for serving these files
Use a GET parameter which is a unique string to identify which file to get.
Keep a database table which has a FileField for the file to download. This table maps the unique strings to the location of the file on the file system.
To serve the file as a download, set the response headers in the view like this:
(path is the location of the file to serve)
with open(path, 'rb') as f:
response = HttpResponse(f.read())
response['Content-Type'] = 'application/octet-stream';
response['Content-Disposition'] = 'attachment; filename="%s"' % 'insert_filename_here'
return response
Since we are using this Django page to serve the file, the user cannot find out the original file location.

You can just use something simple such as mod_xsendfile. This functionality is also available in other popular webservers such lighttpd or nginx.
It works like this: when enabled your application (e.g. a trivial PHP script) can send a special response header, causing the webserver to serve a static file.
If you want it to work with S3 you will need to handle each and every request this way, meaning the traffic will go through your site, from there to AWS, back to your site and back to the client. Does S3 support symbolic links / aliases? If so you might just redirect a valid user to one of the symbolic URLs and delete that symlink after a couple of hours.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.