How to save incoming file in bottle api to hdfs - python

I am defining bottle api where I need to accept a file from the client and then save that file to HDFS on the local system.
The code looks something like this.
#route('/upload', method='POST')
def do_upload():
import pdb; pdb.set_trace()
upload = request.files.upload
name, ext = os.path.splitext(upload.filename)
save_path = "/data/{user}/{filename}".format(user=USER, filename=name)
hadoopy.writetb(save_path, upload.file.read())
return "File successfully saved to '{0}'.".format(save_path)
The issue is, the request.files.upload.file is an object of type cStringIO.StringO which can be converted to a str with a .read() method. But the hadoopy.writetb(path, content) expects the content to be some other format and the server sticks at that point. It doesn't give exception, it doesn't give error or any result. Just stands there as if it was in infinite loop.
Does anyone know how to write incoming file in bottle api to HDFS?

From the hadoopy documentation, it looks like the second parameter to writetb is supposed to be an iterable of pairs; but you're passing in bytes.
...the hadoopy.writetb command which takes an iterator of key/value pairs...
Have you tried passing in a pair? Instead of what you're doing,
hadoopy.writetb(save_path, upload.file.read()) # 2nd param is wrong
try this:
hadoopy.writetb(save_path, (path, upload.file.read()))
(I'm not familiar with Hadoop so it's not clear to me what the semantics of path are, but presumably it'll make sense to someone who knows HDFS.)

Related

zapier - python - pass bytes to output to be used for next action

I am trying to make an automation on Zapier with the flow like this:
Trigger: a web hook that receive POST request. The body is a file key with a value of base64 string of a certain PDF, so the type is str
Action: a Zapier Python Code that retrieve the file from web hooks, decode the base64 string to bytes to get the real valid content of the PDF to say a variable named file_bytes
Action: a dropbox that retrieve the file_bytes from the step before, and upload it to dropbox
I coded the decoder myself (point 2) and tested that it worked well on my local system.
The problem is that Dropbox (point 3) only receive binary, while Python (point 2) can not pass a value other than JSON serializable. This is a clear limitation from Zapier:
output A dictionary or list of dictionaries that will be the "return value" of this code. You can explicitly return early if you like. This must be JSON serializable!
...
The close to what I can get from other question on this sites are these two, but it did not give me any luck.
Why am I getting a Runtime.MarshalError when using this code in Zapier?
Use Python to get image in Zapier
...
The code to decode base64 string to bytes is like so:
file_bytes = base64.b64decode(input_data['file'])
What I already did:
pass the file_bytes to output like so:
output = [{'file': input_data['file_bytes']}]}]
but it gave me This must be JSON serializable!
pass the file_bytes as string like so:
output = [{'file': str(input_data['file_bytes'])}]
it do uploaded to dropbox, but the file content is corrupt. (of course it is, duh)
pass the file_bytes as decoded string with latin-1 encoding:
output = [{'file': input_data['file_bytes'].decode('latin-1')}]
it do uploaded to dropbox, the PDF can also be opened, even having the same page number as the original PDF, but it is all blank (white, no content)
...
So, is this kind of feature really visible in Zapier platform? Or I was already at dead end even since the beginning?

Issue with Python Server Returning File On GET

I created a simple threaded python server, and I have two parameters for format, one is JSON (return string data) and the other is zip. When a user selects the format=zip as one of the input parameters, I need the server to return a zip file back to the user. How should I return a file to a user on a do_GET() for my server? Do I just return the URL where the file can be downloaded or can I send the file back to the user directly? If option two is possible, how do I do this?
Thank you
You should send the file back to the user directly, and add a Content-Type header with the correct media type, such as application/zip.
So the header could look like this:
Content-Type: application/zip
The issue was that I hadn't closed the zipfile object before I tried to return it. It appeared there was a lock on the file.
To return a zip file from a simple http python server using GET, you need to do the following:
Set the header to 'application/zip'
self.send_header("Content-type:", "application/zip")
Create the zip file using zipfile module
Using the file path (ex: c:/temp/zipfile.zip) open the file using 'rb' method to read the binary information
openObj = open( < path > , 'rb')
return the object back to the browser
openObj.close()
del openObj
self.wfile.write(openObj.read())
That's about it. Thank you all for your help.

ftp sending python bytesio stream

I want to send a file with python ftplib, from one ftp site to another, so to avoid file read/write processees.
I create a BytesIO stream:
myfile=BytesIO()
And i succesfully retrieve a image file from ftp site one with retrbinary:
ftp_one.retrbinary('RETR P1090080.JPG', myfile.write)
I can save this memory object to a regular file:
fot=open('casab.jpg', 'wb')
fot=myfile.readvalue()
But i am not able to send this stream via ftp with storbinary. I thought this would work:
ftp_two.storbinary('STOR magnafoto.jpg', myfile.getvalue())
But doesnt. i get a long error msg ending with 'buf = fp.read(blocksize)
AttributeError: 'str' object has no attribute 'read'
I also tried many absurd combinations, but with no success. As an aside, I am also quite puzzled with what I am really doing with myfoto.write. Shouldnt it be myfoto.write() ?
I am also quite clueless to what this buffer thing does or require. Is what I want too complicated to achieve? Should I just ping pong the files with an intermediate write/read in my system? Ty all
EDIT: thanks to abanert I got things straight. For the record, storbinary arguments were wrong and a myfile.seek(0) was needed to 'rewind' the stream before sending it. This is a working snippet that moves a file between two ftp addresses without intermediate physical file writes:
import ftplib as ftp
from io import BytesIO
ftp_one=ftp.FTP(address1, user1, pass1)
ftp_two=ftp.FTP(address2, user2, pass2)
myfile=BytesIO()
ftp_one.retrbinary ('RETR imageoldname.jpg', myfile.write)
myfile.seek(0)
ftp_two.storbinary('STOR imagenewname.jpg', myfile)
ftp_one.close()
ftp_two.close()
myfile.close()
The problem is that you're calling getvalue(). Just don't do that:
ftp_two.storbinary('STOR magnafoto.jpg', myfile)
storbinary requires a file-like object that it can call read on.
Fortunately, you have just such an object, myfile, a BytesIO. (It's not entirely clear from your code what the sequence of things is here—if this doesn't work as-is, you may need to myfile.seek(0) or create it in a different mode or something. But a BytesIO will work with storbinary unless you do something wrong.)
But instead of passing myfile, you pass myfile.getvalue(). And getvalue "Returns bytes containing the entire contents of the buffer."
So, instead of giving storbinary a file-like object that it can call read on, you're giving it a bytes object, which is of course the same as str in Python 2.x, and you can't call read on that.
For your aside:
As an aside, I am also quite puzzled with what I am really doing with myfoto.write. Shouldnt it be myfoto.write() ?
Look at the docs. The second parameter isn't a file, it's a callback function.
The callback function is called for each block of data received, with a single string argument giving the data block.
What you want is a function that appends each block of data to the end of myfoto. While you could write your own function to do that:
def callback(block_of_data):
myfoto.write(block_of_data)
… it should be pretty obvious that this function does exactly the same thing as the myfoto.write method. So, you can just pass that method itself.
If you don't understand about bound methods, see Method Objects in the tutorial.
This flexibility, as weird as it seems, lets you do something even better than downloading the whole file into a buffer to send to another server. You can actually open the two connections at the same time, and use callbacks to send each buffer from the source server to the destination server as it's received, without ever storing anything more than one buffer.
But, unless you really need that, you probably don't want to go through all that complexity.
In fact, in general, ftplib is kind of low-level. And it has some designs (like the fact that storbinary takes a file, while retrbinary takes a callback) that make total sense at that low level but seem very odd from a higher level. So, you may want to look at some of the higher-level libraries by doing a search at PyPI.

Taking String arguments for a function without quotes

I've got a function meant to download a file from a URL and write it to a disk, along with imposing a particular file extension. At present, it looks something like this:
import requests
import os
def getpml(url,filename):
psc = requests.get(url)
outfile = os.path.join(os.getcwd(),filename+'.pml')
f = open(outfile,'w')
f.write(psc.content)
f.close()
try:
with open(outfile) as f:
print "File Successfully Written"
except IOError as e:
print "I/O Error, File Not Written"
return
When I try something like
getpml('http://www.mysite.com/data.txt','download') I get the appropriate file sitting in the current working directory, download.pml. But when I feed the function the same arguments without the ' symbol, Python says something to the effect of "NameError: name 'download' is not defined" (the URL produces a syntax error). This even occurs if, within the function itself, I use str(filename) or things like that.
I'd prefer not to have to input the arguments of the function in with quote characters - it just makes entering URLs and the like slightly more difficult. Any ideas? I presume there is a simple way to do this, but my Python skills are spotty.
No, that cannot be done. When you are typing Python source code you have to type quotes around strings. Otherwise Python can't tell where the string begins and ends.
It seems like you have a more general misunderstanding too. Calling getpml(http://www.mysite.com) without quotes isn't calling it with "the same argument without quotes". There simply isn't any argument there at all. It's not like there are "arguments with quotes" and "arguments without quotes". Python isn't like speaking a natural human language where you can make any sound and it's up to the listener to figure out what you mean. Python code can only be made up of certain building blocks (object names, strings, operators, etc.), and URLs aren't one of those.
You can call your function differently:
data = """\
http://www.mysite.com/data1.txt download1
http://www.mysite.com/data2.txt download2
http://www.mysite.com/data3.txt download3
"""
for line in data.splitlines():
url, filename = line.strip().split()
getpml(url, filename)

How to send a file via HTTP, the good way, using Python?

If a would-be-HTTP-server written in Python2.6 has local access to a file, what would be the most correct way for that server to return the file to a client, on request?
Let's say this is the current situation:
header('Content-Type', file.mimetype)
header('Content-Length', file.size) # file size in bytes
header('Content-MD5', file.hash) # an md5 hash of the entire file
return open(file.path).read()
All the files are .zip or .rar archives no bigger than a couple of megabytes.
With the current situation, browsers handle the incoming download weirdly. No browser knows the file's name, for example, so they use a random or default one. (Firefox even saved the file with a .part extension, even though it was complete and completely usable.)
What would be the best way to fix this and other errors I may not even be aware of, yet?
What headers am I not sending?
Thanks!
This is how I send ZIP file,
req.send_response(200)
req.send_header('Content-Type', 'application/zip')
req.send_header('Content-Disposition', 'attachment;'
'filename=%s' % filename)
Most browsers handle it correctly.
If you don't have to return the response body (that is, if you are given a stream for the response body by your framework) you can avoid holding the file in memory with something like this:
fp = file(path_to_the_file, 'rb')
while True:
bytes = fp.read(8192)
if bytes:
response.write(bytes)
else:
return
What web framework are you using?

Categories

Resources