Python base64.decode does not seem to work on windows - python

I am consuming a webservice (written in java) - that basically returns a byte[] array (the SOAP equivalent is base64 encoded binary data).
I am using the python suds library and the following code works for me on my mac (and on cygwin under windows), but the decoding does not work on vanilla windows (python 2.6.5). I am primarily a java developer so any help will be really helpful.
from suds.client import Client
import base64,os,shutil,tarfile,StringIO
u = "user"
p = "password"
url = "https://xxxx/?wsdl"
client = Client(url, username=u, password=p)
bin = client.service.getTargz("test")
f = open("tools.tar.gz", "w")
f.write(base64.b64decode(bin.encode('ASCII')))
f.close()
print "finished writing"
tarfile.open("tools.tar.gz").extractall()
Works great on a mac - but on windows gives me this error:
C:\client>python client.py
xml
Getting the sysprep file from the webservice
finished writing
Traceback (most recent call last):
File "client.py", line 28, in
tarfile.open("tools.tar.gz").extractall()
File "C:\Python26\lib\tarfile.py", line 1653, in open
return func(name, "r", fileobj, **kwargs)
File "C:\Python26\lib\tarfile.py", line 1720, in gzopen
**kwargs)
File "C:\Python26\lib\tarfile.py", line 1698, in taropen
return cls(name, mode, fileobj, **kwargs)
File "C:\Python26\lib\tarfile.py", line 1571, in __init__
self.firstmember = self.next()
File "C:\Python26\lib\tarfile.py", line 2317, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "C:\Python26\lib\tarfile.py", line 1235, in fromtarfile
buf = tarfile.fileobj.read(BLOCKSIZE)
File "C:\Python26\lib\gzip.py", line 219, in read
self._read(readsize)
File "C:\Python26\lib\gzip.py", line 271, in _read
uncompress = self.decompress.decompress(buf)
zlib.error: Error -3 while decompressing: invalid distance too far back

Try
f = open("tools.tar.gz", "wb")
It's crucial to tell Python that it's a binary file (in Py3, it also becomes crucial on Unixy systems, but in Py2 it's not strictly needed on them, which is why your code works on MacOSX): the default is text, which, on Windows, translates each \n written into \r\n on disk upon writing.

Related

Airflow with gcp python : ValueError: Stream must be at beginning

I am using python along with airflow and gcp python library. I automated the process of sending files to gcp using airflow dags. The code is as follows :-
for fileid, filename in files_dictionary.items():
if ftp.size(filename) <= int(MAX_FILE_SIZE):
data = BytesIO()
ftp.retrbinary('RETR ' + filename, callback=data.write)
f = client.File(client, fid=fileid)
size = sys.getsizeof(data.read())
// Another option is to use FileIO but not sure how
f.send(data, filename, size) // This method is in another library
The code to trigger the upload is current repo (as soon above) but real upload is done by another dependency which is not in our control. The documentation of that method is
def send(self, fp, filename, file_bytes):
"""Send file to cloud
fp file object
filename is the name of the file.
file_bytes is the size of the file in bytes
"""
data = self.initiate_resumable_upload(self.getFileid())
_, blob = self.get_gcs_blob_and_bucket(data)
# Set attachment filename. Does this work with datasets with folders
original_filename = filename.rsplit(os.sep, 1)[-1]
blob.content_disposition = "attachment;filename=" + original_filename
blob.upload_from_file(fp)
self.finish_resumable_upload(self.getFileid())
I am getting below error
[2020-04-23 09:43:17,239] {{models.py:1788}} ERROR - Stream must be at beginning.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1657, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 103, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 108, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/dags/transfer_data.py", line 241, in upload
f.send(data, filename, size)
File "/usr/local/lib/python3.6/site-packages/client/utils.py", line 53, in wrapper_timer
value = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/client/client.py", line 518, in send
blob.upload_from_file(fp)
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1158, in upload_from_file
client, file_obj, content_type, size, num_retries, predefined_acl
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1068, in _do_upload
client, stream, content_type, size, num_retries, predefined_acl
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 1011, in _do_resumable_upload
predefined_acl=predefined_acl,
File "/usr/local/lib/python3.6/site-packages/google/cloud/storage/blob.py", line 960, in _initiate_resumable_upload
stream_final=False,
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/requests/upload.py", line 343, in initiate
stream_final=stream_final,
File "/usr/local/lib/python3.6/site-packages/google/resumable_media/_upload.py", line 415, in _prepare_initiate_request
raise ValueError(u"Stream must be at beginning.")
ValueError: Stream must be at beginning.
The upload_from_file function has a parameter that handles the seek(0) call for you:
I would modify your upload_from_file call to:
blob.upload_from_file(file_obj=fp, rewind=True)
That should do the trick, and you don't need to include the additional seek()
When reading a binary file, you can navigate through it using seek operations. In other words, you can move the reference from the beginning of the file to any other position. The error ValueError: Stream must be at beginning. is basically saying: "your reference is not pointed to the beginning of the stream and it must be"
Given that, you need to set your reference back to the beginning of the stream. You can do that using the function seek.
In your case, you would do something like:
data = BytesIO()
ftp.retrbinary('RETR ' + filename, callback=data.write)
f = client.File(client, fid=fileid)
size = sys.getsizeof(data.read())
data.seek(0)
f.send(data, filename, size)

Twisted throws "Can only pass-through bytes on Python 2" for no reason

I am using twisted as my webserver. I am delivering normal text sites and binary downloads with this setup.
I am using the exact same setup on 6 machines. Only difference is 3 are running Debian and the other 3 are running Ubuntu.
On two out of my three Ubuntu machines I am getting this error:
Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/protocols/basic.py", line 571, in dataReceived
why = self.lineReceived(line)
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1655, in lineReceived
self.allContentReceived()
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1730, in allContentReceived
req.requestReceived(command, path, version)
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 826, in requestReceived
self.process()
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 189, in process
self.render(resrc)
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 238, in render
body = resrc.render(self)
File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 250, in render
return m(request)
File "/usr/lib/python2.7/dist-packages/twisted/web/static.py", line 631, in render_GET
producer.start()
File "/usr/lib/python2.7/dist-packages/twisted/web/static.py", line 710, in start
self.request.registerProducer(self, False)
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 872, in registerProducer
self.transport.registerProducer(producer, streaming)
File "/usr/lib/python2.7/dist-packages/twisted/internet/_newtls.py", line 233, in registerProducer
FileDescriptor.registerProducer(self, producer, streaming)
File "/usr/lib/python2.7/dist-packages/twisted/internet/abstract.py", line 112, in registerProducer
producer.resumeProducing()
File "/usr/lib/python2.7/dist-packages/twisted/web/static.py", line 720, in resumeProducing
self.request.write(data)
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 217, in write
http.Request.write(self, data)
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1001, in write
value = networkString('%s' % (value,))
File "/usr/lib/python2.7/dist-packages/twisted/python/compat.py", line 364, in networkString
raise TypeError("Can only pass-through bytes on Python 2")
exceptions.TypeError: Can only pass-through bytes on Python 2
Unhandled Error
Traceback (most recent call last):
Failure: exceptions.RuntimeError: Producer was not unregistered for /file/10983801
The other runs just fine.
Python version - on all three ubuntu servers - is: ii python2.7 2.7.6-8 amd64
I haven't updated python recently nor did I change something within my codebase. I also tried rebooting -> no success.
I would really appreciate some hints on that.
Googeling only hinted me to this:
Running Portia (scrapy) on Windows
But since I am running 2.7.6 and Linux this shouldn't apply to my situation.
EDIT appending the actual code:
class PyQueueFile(Resource):
def __init__(self):
Resource.__init__(self)
self.ipcTalker = talker.Talker()
def getChild(self, convert_id, request):
"""
:param request: The http Request
:type request: twisted.web.http.Request
"""
try:
db = database.Database()
video = db.getVideo(convert_id)
request.setHeader("Content-Disposition",
"attachment; filename=\"" + os.path.basename(video['title'] + "." + video['format']) + "\"")
request.setHeader("Content-type", "application/force-download")
fileResponse = File(video['path'])
except TypeError:
return Page404()
return fileResponse
def fireup():
try:
myconfig = config.Config()
root = Resource()
root.putChild("file", PyQueueFile())
factory = Site(root)
reactor.listenTCP(myconfig.webPort, factory, 100, myconfig.webIp)
reactor.run()
except (KeyboardInterrupt, SystemExit):
reactor.stopListening()
EDIT 2:
I have also tried to install twisted via pip. Same problem. :/
Check the chapter "unicode filenames" in https://docs.python.org/2/howto/unicode.html explaing this:
>>> import os
>>> os.path.basename(u'/a/b/c')
u'c'
>>> os.path.basename('/a/b/c')
'c'
Anyway your fix will fail for non-ascii characters in the filename, it should be URL-encoded (urllib.urlencode)
With the help of the mailing list I could fix the problem.
It was caused by a unicode string.
Changing this line:
"attachment; filename=\"" + os.path.basename(video['title'] + "." + video['format']) + "\"")
to this:
"attachment; filename=\"" + str(os.path.basename(video['title']) + "." + video['format']) + "\"")
solves the problem. But I still have no idea why this didn't happen on all platforms?

Testing file upload with Flask and Python 3

I'm using Flask with Python 3.3 and I know support is still experimental but I'm running into errors when trying to test file uploads. I'm using unittest.TestCase and based on Python 2.7 examples I've seen in the docs I'm trying
rv = self.app.post('/add', data=dict(
file=(io.StringIO("this is a test"), 'test.pdf'),
), follow_redirects=True)
and getting
TypeError: 'str' does not support the buffer interface
I've tried a few variations around io.StringIO but can't find anything that works. Any help is much appreciated!
The full stack trace is
Traceback (most recent call last):
File "archive_tests.py", line 44, in test_add_transcript
), follow_redirects=True)
File "/srv/transcript_archive/py3env/lib/python3.3/site-packages/werkzeug/test.py", line 771, in post
return self.open(*args, **kw)
File "/srv/transcript_archive/py3env/lib/python3.3/site-packages/flask/testing.py", line 108, in open
follow_redirects=follow_redirects)
File "/srv/transcript_archive/py3env/lib/python3.3/site-packages/werkzeug/test.py", line 725, in open
environ = args[0].get_environ()
File "/srv/transcript_archive/py3env/lib/python3.3/site-packages/werkzeug/test.py", line 535, in get_environ
stream_encode_multipart(values, charset=self.charset)
File "/srv/transcript_archive/py3env/lib/python3.3/site-packages/werkzeug/test.py", line 98, in stream_encode_multipart
write_binary(chunk)
File "/srv/transcript_archive/py3env/lib/python3.3/site-packages/werkzeug/test.py", line 59, in write_binary
stream.write(string)
TypeError: 'str' does not support the buffer interface
In Python 3, you need to use io.BytesIO() (with a bytes value) to simulate an uploaded file:
rv = self.app.post('/add', data=dict(
file=(io.BytesIO(b"this is a test"), 'test.pdf'),
), follow_redirects=True)
Note the b'...' string defining a bytes literal.
In the Python 2 test examples, the StringIO() object holds a byte string, not a unicode value, and in Python 3, io.BytesIO() is the equivalent.

Dropbox Python API: File size detection may have failed

I'm attempting to upload a text file to Dropbox using this code:
def uploadFile(file):
f = open('logs/%s.txt' % file)
response = client.put_file('/%s.txt' % file, f)
print "Uploaded log file %s" % file
Connecting to dropbox works perfectly fine, it's just when I upload files I recieve this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\dropbox_python_sdk-1.5.1-py2.7.egg\dropbox
\client.py", line 352, in put_file
return self.rest_client.PUT(url, file_obj, headers)
File "C:\Python27\lib\site-packages\dropbox_python_sdk-1.5.1-py2.7.egg\dropbox
\rest.py", line 265, in PUT
return cls.IMPL.PUT(*n, **kw)
File "C:\Python27\lib\site-packages\dropbox_python_sdk-1.5.1-py2.7.egg\dropbox
\rest.py", line 211, in PUT
return self.request("PUT", url, body=body, headers=headers, raw_response=raw
_response)
File "C:\Python27\lib\site-packages\dropbox_python_sdk-1.5.1-py2.7.egg\dropbox
\rest.py", line 174, in request
raise util.AnalyzeFileObjBug(clen, bytes_read)
dropbox.util.AnalyzeFileObjBug:
Expected file object to have 18 bytes, instead we read 17 bytes.
File size detection may have failed (see dropbox.util.AnalyzeFileObj)
Google has given me no help with this one.
Sounds like you are a victim of newline unification. The file object reports a file size of 18 bytes ("abcdefghijklmnop\r\n") but you read only 17 bytes ("abcdefghijklmnop\n").
Open the file in binary mode to avoid this:
f = open('logs/%s.txt' % file, 'rb')
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading.

Python SUDS unicode decode error returned from Webservice

I am attempting to use a Webservice created by one of our developers that allows us to upload files into the system, within certain restrictions.
Using SUDS, I get the following information:
Suds ( https://fedorahosted.org/suds/ ) version: 0.4 GA build: R699-20100913
Service ( ConnectToEFS ) tns="http://tempuri.org/"
Prefixes (3)
ns0 = "http://schemas.microsoft.com/2003/10/Serialization/"
ns1 = "http://schemas.microsoft.com/Message"
ns2 = "http://tempuri.org/"
Ports (1):
(BasicHttpBinding_IConnectToEFS)
Methods (2):
CreateContentFolder(xs:string FileCode, xs:string FolderName, xs:string ContentType, xs:string MetaDataXML, )
UploadFile(ns1:StreamBody FileByteStream, )
Types (4):
ns1:StreamBody
ns0:char
ns0:duration
ns0:guid
My method to using UploadFile is as follows:
def webserviceUploadFile(self, targetLocation, fileName, fileSource):
fileSource = './test_files/' + fileSource
ntlm = WindowsHttpAuthenticated(username=uname, password=upass)
client = Client(webservice_url, transport=ntlm)
client.set_options(soapheaders={'TargetLocation':targetLocation, 'FileName': fileName})
body = client.factory.create('AIRDocument')
body_file = open(fileSource, 'rb')
body_data = body_file.read()
body.FileByteStream = body_data
return client.service.UploadFile(body)
Running this gets me the following result:
Traceback (most recent call last):
File "test_cases.py", line 639, in test_upload_file_invalid_extension
result_string = self.HM.webserviceUploadFile('9999', 'AD-1234-5424__44.exe',
'test_data.pdf')
File "test_cases.py", line 81, in webserviceUploadFile
return client.service.UploadFile(body)
File "build\bdist.win32\egg\suds\client.py", line 542, in __call__
return client.invoke(args, kwargs)
File "build\bdist.win32\egg\suds\client.py", line 595, in invoke
soapenv = binding.get_message(self.method, args, kwargs)
File "build\bdist.win32\egg\suds\bindings\binding.py", line 120, in get_message
content = self.bodycontent(method, args, kwargs)
File "build\bdist.win32\egg\suds\bindings\document.py", line 63, in bodycontent
p = self.mkparam(method, pd, value)
File "build\bdist.win32\egg\suds\bindings\document.py", line 105, in mkparam
return Binding.mkparam(self, method, pdef, object)
File "build\bdist.win32\egg\suds\bindings\binding.py", line 287, in mkparam
return marshaller.process(content)
File "build\bdist.win32\egg\suds\mx\core.py", line 62, in process
self.append(document, content)
File "build\bdist.win32\egg\suds\mx\core.py", line 75, in append
self.appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 102, in append
appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 243, in append
Appender.append(self, child, cont)
File "build\bdist.win32\egg\suds\mx\appender.py", line 182, in append
self.marshaller.append(parent, content)
File "build\bdist.win32\egg\suds\mx\core.py", line 75, in append
self.appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 102, in append
appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 198, in append
child.setText(tostr(content.value))
File "build\bdist.win32\egg\suds\sax\element.py", line 251, in setText
self.text = Text(value)
File "build\bdist.win32\egg\suds\sax\text.py", line 43, in __new__
result = super(Text, cls).__new__(cls, *args, **kwargs)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal
not in range(128)
After much research and talking with the developer of the webservice, I modified the body_data = body_file.read() into body_data = body_file.read().decode("UTF-8") which gets me this error:
Traceback (most recent call last):
File "test_cases.py", line 639, in test_upload_file_invalid_extension
result_string = self.HM.webserviceUploadFile('9999', 'AD-1234-5424__44.exe', 'test_data.pdf')
File "test_cases.py", line 79, in webserviceUploadFile
body_data = body_file.read().decode("utf-8")
File "C:\python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 10: invalid
continuation byte
Which is less than helpful.
After more research into the problem, I tried adding 'errors='ignore'' to the UTF-8 encode, and this was the result:
<TransactionDescription>Error in INTL-CONF_France_PROJ_MA_126807.docx: An exception has been thrown when reading the stream.. Inner Exception: System.Xml.XmlException: The byte 0x03 is not valid at this location. Line 1, position 318.
at System.Xml.XmlExceptionHelper.ThrowXmlException(XmlDictionaryReader reader, String res, String arg1, String arg2, String arg3)
at System.Xml.XmlUTF8TextReader.Read()
at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.Exhaust(XmlDictionaryReader reader)
at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.Read(Byte[] buffer, Int32 offset, Int32 count). Source: System.ServiceModel</TransactionDescription>
Which pretty much stumps me on what to do. Based on the result stack trace by the webservice, it looks like it wants UTF-8 but I can't seem to get it to the webservice without Python or SUDS throwing a fit, or by ignoring problems in the encoding. The system I'm working on only takes in MicroSoft office type files (doc, xls, and the like), PDFs, and TXT files, so using something that I have more control on the encoding is not an option. I also tried detecting the encoding used by the sample PDF and the sample DOCX, but using what it suggested (Latin-1, ISO8859-x, and several windows XXXX) all were accepted by Python and SUDS, but not by the webservice.
Also note in the example shown, its most frequently referencing a test to an invalid extension. This error applies even in what should be a test of the successful upload, which is the only time really that the final stacktrace ever shows up.
You can use this base64.b64encode(body_file.read()) and this will return the base64 string value. So your request variable must be a string.

Categories

Resources