Unable to push data to xcom in airflow - python

from airflow.operators.python import get_current_context
context = get_current_context()
ti = context['ti']
ti.xcom_push(key="file", value = doc )
I have the above code in a task and doc is the data that I want to pass to xcom. Its throwing the following error stack trace :
Traceback (most recent call last):
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/decorators/base.py", line 217, in execute
return_value = super().execute(context)
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/operators/python.py", line 175, in execute
return_value = self.execute_callable()
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/operators/python.py", line 192, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/bitnami/airflow/dags/rover_ocr_pipeline.py", line 65, in retrieve
ti.xcom_push(key="file", value = doc )
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2294, in xcom_push
XCom.set(
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/session.py", line 72, in wrapper
return func(*args, **kwargs)
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/models/xcom.py", line 234, in set
value = cls.serialize_value(
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/models/xcom.py", line 627, in serialize_value
return json.dumps(value, cls=XComEncoder).encode("UTF-8")
File "/opt/bitnami/python/lib/python3.9/json/__init__.py", line 234, in dumps
return cls(
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/json.py", line 176, in encode
return super().encode(o)
File "/opt/bitnami/python/lib/python3.9/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/opt/bitnami/python/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/json.py", line 153, in default
CLASSNAME: o.__module__ + "." + o.__class__.__qualname__,
AttributeError: 'bytes' object has no attribute '__module__'
This was working till now, I am guessing its an issue with airflow version. Previously I was using 2.3.4 , now using 2.5.0.
Airflow is running on kubernetes cluster and using airflow:2.5.0-debian-11-r11 image.

Moving from comments to an actual answer, see above comments for full conversation
XCOM tries to convert everything to a string before storing in the XCOM tables. In this case since bytes is a class, it was trying to serialize it which isn't possible. Converting the bytes to a normal string by base64 encoding the bytes allowed for it to be stored in xcom.
While probably not worth the effort for just this case, this could be handled automatically by creating a custom xcom backend that accurately detects when dealing with byte strings and performs the conversion behind the scenes.

Related

Can I read parquet from HTTP(s) octet-stream?

Some backend-endpoint returns parquet-file in octet-stream.
In pandas I can do something like this:
result = requests.get("https://..../file.parquet")
df = pd.read_parquet(io.BytesIO(result.content))
Can I do it in Dask somehow?
This code:
dd.read_parquet("https://..../file.parquet")
Raises exception (obviously, because this is bytes-like object):
File "to_parquet_dask.py", line 153, in <module>
main(*parser.parse_args())
File "to_parquet_dask.py", line 137, in main
download_parquet(
File "to_parquet_dask.py", line 121, in download_parquet
dd.read_parquet(
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/dask/dataframe/io/parquet/core.py", line 313, in read_parquet
read_metadata_result = engine.read_metadata(
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/dask/dataframe/io/parquet/fastparquet.py", line 733, in read_metadata
parts, pf, gather_statistics, base_path = _determine_pf_parts(
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/dask/dataframe/io/parquet/fastparquet.py", line 148, in _determine_pf_parts
elif fs.isdir(paths[0]):
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 88, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 69, in sync
raise result[0]
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
result[0] = await coro
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/fsspec/implementations/http.py", line 418, in _isdir
return bool(await self._ls(path))
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/fsspec/implementations/http.py", line 195, in _ls
out = await self._ls_real(url, detail=detail, **kwargs)
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/fsspec/implementations/http.py", line 150, in _ls_real
text = await r.text()
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 1082, in text
return self._body.decode(encoding, errors=errors) # type: ignore
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 7: invalid start byte
UPD
With changes in fsspec from #mdurant answer I got error
ValueError: Cannot seek streaming HTTP file
So I put "simplecache::" to my url and I face next:
Traceback (most recent call last):
File "to_parquet_dask.py", line 161, in <module>
main(*parser.parse_args())
File "to_parquet_dask.py", line 145, in main
download_parquet(
File "to_parquet_dask.py", line 128, in download_parquet
dd.read_parquet(
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/dask/dataframe/io/parquet/core.py", line 313, in read_parquet
read_metadata_result = engine.read_metadata(
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/dask/dataframe/io/parquet/fastparquet.py", line 733, in read_metadata
parts, pf, gather_statistics, base_path = _determine_pf_parts(
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/dask/dataframe/io/parquet/fastparquet.py", line 185, in _determine_pf_parts
pf = ParquetFile(
File "/home/bc30138/Documents/CODE/flexydrive/driver_style/.venv/lib/python3.8/site-packages/fastparquet/api.py", line 127, in __init__
raise ValueError("Opening directories without a _metadata requires"
ValueError: Opening directories without a _metadata requiresa filesystem compatible with fsspec
Temperary workaround
Maybe this way is dirty and not optimal, but some kind of works:
#dask.delayed
def parquet_from_http(url, token):
result = requests.get(
url,
headers={'Authorization': token}
)
return pd.read_parquet(io.BytesIO(result.content))
delayed_download = parquet_from_http(url, token)
df = dd.from_delayed(delayed_download, meta=meta)
p.s. meta argument in this approach is necessary, because otherwise dask will use this function twice: to find out meta and than to calculate, so two requests will be made.
This is not an answer, but I believe the following change in fsspec will fix your problem. If you would be willing to try and confirm, we can make this a patch.
--- a/fsspec/implementations/http.py
+++ b/fsspec/implementations/http.py
## -472,7 +472,10 ## class HTTPFileSystem(AsyncFileSystem):
async def _isdir(self, path):
# override, since all URLs are (also) files
- return bool(await self._ls(path))
+ try:
+ return bool(await self._ls(path))
+ except (FileNotFoundError, ValueError):
+ return False
(we can put this in a branch, if that makes it easier for you to install)
-edit-
The second problem (which is the same thing in both parquet engines) stems from the server either not providing the size of the file, or not allowing range-gets. The parquet format requires random access to the data to be able to read. The only way to get around this (short of improving the server) is to copy the whole file locally, e.g., by prepending "simplecache::" to your URL.

Removing an entire tree in Neptune using Gremlin

I'm new to using Gremlin + Neptune. I'm writing a small python program to cleanup some edges + nodes in a Neptune DB. To start, I want to delete all nodes + edges rooted at a particular node.
After looking around, I came across How to delete all child nodes when parent node deleted using the gremlin query?, and I tried the following:
g.V().has("id", <insert id>).union(__(), __.repeat(__.out()).emit()).drop()
However, the node with "id" still exists in the DB. When I try running the above command straight on the python3 console, I get the following output:
>>> g.V().has("id", <insert id>).union(__(), __.repeat(__.out()).emit()).drop()
[['V'], ['has', 'id', '5c4266a3a44649ddb24ce6cf4f831300'], ['union', <gremlin_python.process.graph_traversal.__ object at 0x7fa51c967fd0>, [['repeat', [['out']]], ['emit']]], ['drop']]
What's going on here? It almost looks like it's listing out the steps it plans to take, but not executing them.
Edit -- From answer below, I added .iterate() to the end. I get this error now:
Traceback (most recent call last):
...
tg.g.V().has("id", <id>).union(__(), __.repeat(__.out()).emit()).drop().iterate()
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/process/traversal.py", line 66, in iterate
try: self.nextTraverser()
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/process/traversal.py", line 71, in nextTraverser
self.traversal_strategies.apply_strategies(self)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/process/traversal.py", line 573, in apply_strategies
traversal_strategy.apply(traversal)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/remote_connection.py", line 149, in apply
remote_traversal = self.remote_connection.submit(traversal.bytecode)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/driver_remote_connection.py", line 55, in submit
result_set = self._client.submit(bytecode)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/client.py", line 111, in submit
return self.submitAsync(message, bindings=bindings).result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/connection.py", line 66, in cb
f.result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/protocol.py", line 74, in write
request_id, request_message)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/serializer.py", line 132, in serialize_message
message = self.build_message(request_id, processor, op, args)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/serializer.py", line 142, in build_message
return self.finalize_message(message, b"\x21", self.version)
File "/home/vagrant/.cache/bazel/_bazel_vagrant/f9910d98673307a31f928c448bd4acd0/execroot/project/bazel-out/k8-fastbuild/bin/path/to/directory/python_code.runfiles/pip_deps/pypi__gremlinpython/gremlin_python/driver/serializer.py", line 145, in finalize_message
message = json.dumps(message)
File "/usr/lib/python3.7/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.7/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.7/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.7/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type __ is not JSON serializable
Unlike the Gremlin Console, the Python Console doesn't automatically iterate your traversals for you. As in your application code you must iterate them yourself. When you do
g.V().has("id", <insert id>).union(__.identity(), __.repeat(__.out()).emit()).drop()
that simply spawns a Traversal object but does not execute it. You must therefore iterate it in some way to exhaust the elements within it - in your case, the appropriate terminator to use would be iterate():
g.V().has("id", <insert id>).union(__.identity(), __.repeat(__.out()).emit()).drop().iterate()
The semantics around drops in TinkerPop aren't always consistent unfortunately. TinkerPop tried to preserve flexibility for providers in how they implement that but it causes confusion sometimes because a query will work fine in TinkerGraph but then behave slightly differently when executed on a different provider. If the above approach only drops the root, you could try to realize the results before the drop:
g.V().has("id", <insert id>).
union(__.identity(),
__.repeat(__.out()).emit()).
fold().unfold().
drop()
Looks a bit dumb but will force all the vertices you wish to drop to be traversed into a list before they are dropped. In that way, you won't kill the repeat() by dropping the parent and its edges first and leaving the child paths untravelled.

could not serialize access due to concurrent update while creating pos picking from a job

Impacted versions:
12.0
Steps to reproduce:
I have made a customization to postpone the creation of pos order picking
and delegate the task to a job
some time i get the error below
Current behavior:
2020-06-18 17:49:24,588 1370 ERROR cafe9.rabeh.io odoo.addons.base.models.ir_cron: Call from cron POS Orders: Process Pending Orders for server action #610 failed in Job #24
Traceback (most recent call last):
File "/opt/rabeh/odoo/odoo/addons/base/models/ir_cron.py", line 102, in _callback
self.env['ir.actions.server'].browse(server_action_id).run()
File "/opt/rabeh/odoo/odoo/addons/base/models/ir_actions.py", line 569, in run
res = func(action, eval_context=eval_context)
File "/opt/rabeh/odoo/odoo/addons/base/models/ir_actions.py", line 445, in run_action_code_multi
safe_eval(action.sudo().code.strip(), eval_context, mode="exec", nocopy=True) # nocopy
allows to return 'action'
File "/opt/rabeh/odoo/odoo/tools/safe_eval.py", line 350, in safe_eval
return unsafe_eval(c, globals_dict, locals_dict)
File "/opt/rabeh-12/rabeh_addons/pos_pending_session/models/pos_order.py", line 68, in pending_picking_creation
po_order.create_picking()
File "/opt/rabeh-12/rabeh_addons/pos_pending_session/models/pos_order.py", line 36, in
create_picking
res = super(PosOrder, orders).create_picking()
File "/opt/rabeh/odoo/addons/point_of_sale/models/pos_order.py", line 841, in create_picking
order._force_picking_done(order_picking)
File "/opt/rabeh/odoo/addons/point_of_sale/models/pos_order.py", line 856, in _force_picking_done
picking.action_done()
File "/opt/rabeh/odoo/addons/stock/models/stock_picking.py", line 631, in action_done
todo_moves._action_done()
File "/opt/rabeh/odoo/addons/purchase_stock/models/stock.py", line 96, in _action_done
res = super(StockMove, self)._action_done()
File "/opt/rabeh/odoo/addons/stock_account/models/stock.py", line 389, in _action_done
res = super(StockMove, self)._action_done()
File "/opt/rabeh/odoo/addons/stock/models/stock_move.py", line 1137, in _action_done
moves_todo.mapped('move_line_ids')._action_done()
File "/opt/rabeh/odoo/addons/stock/models/stock_move_line.py", line 445, in _action_done
Quant._update_available_quantity(ml.product_id, ml.location_dest_id, quantity, lot_id=ml.lot_id, package_id=ml.result_package_id, owner_id=ml.owner_id, in_date=in_date)
File "/opt/rabeh/odoo/addons/stock/models/stock_quant.py", line 216, in _update_available_quantity
self._cr.execute("SELECT 1 FROM stock_quant WHERE id = %s FOR UPDATE NOWAIT", [quant.id], log_exceptions=False)
File "/opt/rabeh/odoo/odoo/sql_db.py", line 148, in wrapper
return f(self, *args, **kwargs)
File "/opt/rabeh/odoo/odoo/sql_db.py", line 225, in execute
res = self._obj.execute(query, params)
psycopg2.errors.SerializationFailure: could not serialize access due to concurrent update
Expected behavior:
i think this line should generate could not obtain lock
i was just wondering when it could generate "could not serialize access due to concurrent update"
It could have obtained the lock, as the lock is currently available. But it had been locked at some previous point which overlaps with the current transaction's snapshot. So obtaining the lock is possible, but would create a serialization problem if it were to acquire it. Reporting that as a serialization failure seems like the correct outcome.
Using FOR UPDATE NOWAIT in a transaction with elevated isolation level seems inconsistent, or at least unnecessary. What are you hoping to accomplish by doing this? Your description of "while creating pos picking from a job" doesn't elucidate this for me. Is that some odoo-specific jargon?

Searching for a string with Web.py

I'm trying to build a python function with web.py and SQLite that will allow users to search for a given string within a description field and will return all matching results.
Right now I've gotten to the below function, which works but only if the input is an exact match.
def getItem(params, max_display):
query_string = 'SELECT * FROM items WHERE 1=1'
description = params['description']
if params['description']:
query_string = query_string + ' AND description LIKE $description'
result = query(query_string, {
'description': params['description']
I've tried to implement this feature with LIKE "%$description%"' , however I keep getting the below web.py error.
Traceback (most recent call last):
File "lib/web/wsgiserver/__init__.py", line 1245, in communicate
req.respond()
File "lib/web/wsgiserver/__init__.py", line 775, in respond
self.server.gateway(self).respond()
File "lib/web/wsgiserver/__init__.py", line 2018, in respond
response = self.req.server.wsgi_app(self.env, self.start_response)
File "lib/web/httpserver.py", line 306, in __call__
return self.app(environ, xstart_response)
File "lib/web/httpserver.py", line 274, in __call__
return self.app(environ, start_response)
File "lib/web/application.py", line 279, in wsgi
result = self.handle_with_processors()
File "lib/web/application.py", line 249, in handle_with_processors
return process(self.processors)
File "lib/web/application.py", line 246, in process
raise self.internalerror()
File "lib/web/application.py", line 478, in internalerror
return debugerror.debugerror()
File "lib/web/debugerror.py", line 305, in debugerror
return web._InternalError(djangoerror())
File "lib/web/debugerror.py", line 290, in djangoerror
djangoerror_r = Template(djangoerror_t, filename=__file__, filter=websafe)
File "lib/web/template.py", line 846, in __init__
code = self.compile_template(text, filename)
File "lib/web/template.py", line 926, in compile_template
ast = compiler.parse(code)
File "/Users/sokeefe/homebrew/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/compiler/transformer.py", line 51, in parse
return Transformer().parsesuite(buf)
File "/Users/sokeefe/homebrew/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/compiler/transformer.py", line 128, in parsesuite
return self.transform(parser.suite(text))
AttributeError: 'module' object has no attribute 'suite'
Any thoughts on what might be going wrong with this function?
Thanks in advance!
What do you think is going on with parser.py?
Here is the relevant portion of the error message:
File
"/Users/sokeefe/homebrew/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/compiler/transformer.py",
line 128, in parsesuite
return self.transform(parser.suite(text)) AttributeError: 'module' object has no attribute 'suite'
So, somewhere there is a file called parser.py, which defines a function called suite(), which is used by some library code that executes when your program executes. But because you named one of your files parser.py, when the library code executes, python searches for a file named parser.py, and python found your file first, and there was no function named suite() in your file.

Python SUDS unicode decode error returned from Webservice

I am attempting to use a Webservice created by one of our developers that allows us to upload files into the system, within certain restrictions.
Using SUDS, I get the following information:
Suds ( https://fedorahosted.org/suds/ ) version: 0.4 GA build: R699-20100913
Service ( ConnectToEFS ) tns="http://tempuri.org/"
Prefixes (3)
ns0 = "http://schemas.microsoft.com/2003/10/Serialization/"
ns1 = "http://schemas.microsoft.com/Message"
ns2 = "http://tempuri.org/"
Ports (1):
(BasicHttpBinding_IConnectToEFS)
Methods (2):
CreateContentFolder(xs:string FileCode, xs:string FolderName, xs:string ContentType, xs:string MetaDataXML, )
UploadFile(ns1:StreamBody FileByteStream, )
Types (4):
ns1:StreamBody
ns0:char
ns0:duration
ns0:guid
My method to using UploadFile is as follows:
def webserviceUploadFile(self, targetLocation, fileName, fileSource):
fileSource = './test_files/' + fileSource
ntlm = WindowsHttpAuthenticated(username=uname, password=upass)
client = Client(webservice_url, transport=ntlm)
client.set_options(soapheaders={'TargetLocation':targetLocation, 'FileName': fileName})
body = client.factory.create('AIRDocument')
body_file = open(fileSource, 'rb')
body_data = body_file.read()
body.FileByteStream = body_data
return client.service.UploadFile(body)
Running this gets me the following result:
Traceback (most recent call last):
File "test_cases.py", line 639, in test_upload_file_invalid_extension
result_string = self.HM.webserviceUploadFile('9999', 'AD-1234-5424__44.exe',
'test_data.pdf')
File "test_cases.py", line 81, in webserviceUploadFile
return client.service.UploadFile(body)
File "build\bdist.win32\egg\suds\client.py", line 542, in __call__
return client.invoke(args, kwargs)
File "build\bdist.win32\egg\suds\client.py", line 595, in invoke
soapenv = binding.get_message(self.method, args, kwargs)
File "build\bdist.win32\egg\suds\bindings\binding.py", line 120, in get_message
content = self.bodycontent(method, args, kwargs)
File "build\bdist.win32\egg\suds\bindings\document.py", line 63, in bodycontent
p = self.mkparam(method, pd, value)
File "build\bdist.win32\egg\suds\bindings\document.py", line 105, in mkparam
return Binding.mkparam(self, method, pdef, object)
File "build\bdist.win32\egg\suds\bindings\binding.py", line 287, in mkparam
return marshaller.process(content)
File "build\bdist.win32\egg\suds\mx\core.py", line 62, in process
self.append(document, content)
File "build\bdist.win32\egg\suds\mx\core.py", line 75, in append
self.appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 102, in append
appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 243, in append
Appender.append(self, child, cont)
File "build\bdist.win32\egg\suds\mx\appender.py", line 182, in append
self.marshaller.append(parent, content)
File "build\bdist.win32\egg\suds\mx\core.py", line 75, in append
self.appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 102, in append
appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 198, in append
child.setText(tostr(content.value))
File "build\bdist.win32\egg\suds\sax\element.py", line 251, in setText
self.text = Text(value)
File "build\bdist.win32\egg\suds\sax\text.py", line 43, in __new__
result = super(Text, cls).__new__(cls, *args, **kwargs)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal
not in range(128)
After much research and talking with the developer of the webservice, I modified the body_data = body_file.read() into body_data = body_file.read().decode("UTF-8") which gets me this error:
Traceback (most recent call last):
File "test_cases.py", line 639, in test_upload_file_invalid_extension
result_string = self.HM.webserviceUploadFile('9999', 'AD-1234-5424__44.exe', 'test_data.pdf')
File "test_cases.py", line 79, in webserviceUploadFile
body_data = body_file.read().decode("utf-8")
File "C:\python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 10: invalid
continuation byte
Which is less than helpful.
After more research into the problem, I tried adding 'errors='ignore'' to the UTF-8 encode, and this was the result:
<TransactionDescription>Error in INTL-CONF_France_PROJ_MA_126807.docx: An exception has been thrown when reading the stream.. Inner Exception: System.Xml.XmlException: The byte 0x03 is not valid at this location. Line 1, position 318.
at System.Xml.XmlExceptionHelper.ThrowXmlException(XmlDictionaryReader reader, String res, String arg1, String arg2, String arg3)
at System.Xml.XmlUTF8TextReader.Read()
at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.Exhaust(XmlDictionaryReader reader)
at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.Read(Byte[] buffer, Int32 offset, Int32 count). Source: System.ServiceModel</TransactionDescription>
Which pretty much stumps me on what to do. Based on the result stack trace by the webservice, it looks like it wants UTF-8 but I can't seem to get it to the webservice without Python or SUDS throwing a fit, or by ignoring problems in the encoding. The system I'm working on only takes in MicroSoft office type files (doc, xls, and the like), PDFs, and TXT files, so using something that I have more control on the encoding is not an option. I also tried detecting the encoding used by the sample PDF and the sample DOCX, but using what it suggested (Latin-1, ISO8859-x, and several windows XXXX) all were accepted by Python and SUDS, but not by the webservice.
Also note in the example shown, its most frequently referencing a test to an invalid extension. This error applies even in what should be a test of the successful upload, which is the only time really that the final stacktrace ever shows up.
You can use this base64.b64encode(body_file.read()) and this will return the base64 string value. So your request variable must be a string.

Categories

Resources