AWS Glue An error occurred while calling o128.resolveChoice - python

I have an AWS Glue job that is currently running nightly and scanning about 20 TB worth of raw JSON data and converting it to parquet. I just have the generic Python script that is generated when creating the job. I've run into an issue that is causing the job to fail and resulting in the following error.
py4j.protocol.Py4JError: An error occurred while calling o131.resolveChoice
This job has run successfully before without any issues. I made a change and added partition keys and it seems to be failing now after that change. This job takes almost 24 hours to run so coming up with a solution has been a slow process. I've not been able to find any kind of errors that match this one so i'm curious to find out what is happening here. Anybody have any ideas?
Here is the traceback from Cloudwatch
Traceback (most recent call last):
File "script_2020-09-14-02-00-49.py", line 17, in <module>
resolvechoice = ResolveChoice.apply(frame = applymapping, choice = "make_struct", transformation_ctx = "resolvechoice")
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/PyGlue.zip/awsglue/transforms/transform.py", line 24, in apply
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/PyGlue.zip/awsglue/transforms/resolve_choice.py", line 17, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/PyGlue.zip/awsglue/dynamicframe.py", line 420, in resolveChoice
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt/yarn/usercache/root/appcache/application_1600045621100_0001/container_1600045621100_0001_02_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 327, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o128.resolveChoice
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:41623)
It seems to be failing on this line.
resolvechoice = ResolveChoice.apply(frame = applymapping, choice = "make_struct", transformation_ctx = "resolvechoice")
This script is currently working in my pre-production AWS env but not in my production env.

Related

Using VS Code to debug python files. Exception thrown on breakpoint and breakpoint is ignored

Tried with multiple different python files. Every time I try to use the debugger in vs code and set breakpoints the breakpoint gets ignored and exception gets raised and the script continues on. I've been googling and tinkering for over 2 hours and can't seem to figure out what's going on here. Tried rebooting PC, running vs code as admin, uninstall/reinstall the python extension for vs code. Tried to dig into the files mentioned in the traceback and pinpointed the function that seems to be raising the exception but I can't figure out where it's being called from or why it's raising the exception. I'm still new-ish to Python. Debugging works properly on my laptop but for whatever reason my desktop is having this issue.
Traceback (most recent call last):
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd_file_utils.py", line 529, in _original_file_to_client
return cache[filename]
KeyError: 'c:\\users\\joel\\local settings\\application data\\programs\\python\\python37-32\\lib\\runpy.py'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 330, in _on_run
self.process_net_command_json(self.py_db, json_contents)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_process_net_command_json.py", line 190, in process_net_command_json
cmd = on_request(py_db, request)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_process_net_command_json.py", line 771, in on_stacktrace_request
self.api.request_stack(py_db, request.seq, thread_id, fmt=fmt, start_frame=start_frame, levels=levels)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_api.py", line 214, in request_stack
if internal_get_thread_stack.can_be_executed_by(get_current_thread_id(threading.current_thread())):
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 661, in can_be_executed_by
py_db, self.seq, self.thread_id, frame, self._fmt, must_be_suspended=not timed_out, start_frame=self._start_frame, levels=self._levels)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_net_command_factory_json.py", line 213, in make_get_thread_stack_message
py_db, frames_list
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_net_command_factory_xml.py", line 175, in _iter_visible_frames_info
new_filename_in_utf8, applied_mapping = pydevd_file_utils.norm_file_to_client(filename_in_utf8)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd_file_utils.py", line 531, in _original_file_to_client
translated = _path_to_expected_str(get_path_with_real_case(_AbsFile(filename)))
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd_file_utils.py", line 221, in _get_path_with_real_case
return _resolve_listing(drive, iter(parts))
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd_file_utils.py", line 184, in _resolve_listing
dir_contents = cache[resolved_lower] = os.listdir(resolved)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\Joel\\Local Settings'
So I get this traceback every time a breakpoint is hit. Taking a peek at the "_original_file_to_client" function in "pydevd_file_utils.py" we get this:
def _original_file_to_client(filename, cache={}):
try:
return cache[filename]
except KeyError:
translated = _path_to_expected_str(get_path_with_real_case(_AbsFile(filename)))
cache[filename] = (translated, False)
return cache[filename]
I wasn't able to figure out where this function was being called from or what the expected output was supposed to be. Any help with this would be greatly appreciated!
Edit: Forgot to mention I'm using Windows 10 if it wasn't obvious from the trace
This is a similar question. The spaces in the filename cause this problem:
"Local Settings", "application data"

MISP Threat Indicators to Azure sentinel - Python Script issue

I am trying to push the threat indicators from my MISP instance to azure sentinel by using azure's default documentation on github : https://github.com/microsoftgraph/security-api-solutions/tree/master/Samples/MISP
I performed the steps as per documentation however python3 script.py is giving me following error:
Traceback (most recent call last):
File "script.py", line 100, in <module>
main()
File "script.py", line 96, in main
request_manager.handle_indicator(request_body)
File "/var/azure/sentinel/security-api-solutions/Samples/MISP/RequestManager.py", line 197, in handle_indicator
self._post_to_graph()
File "/var/azure/sentinel/security-api-solutions/Samples/MISP/RequestManager.py", line 184, in _post_to_graph
self._log_post(response)
File "/var/azure/sentinel/security-api-solutions/Samples/MISP/RequestManager.py", line 98, in _log_post
if len(response['value']) > 0:
KeyError: 'value'
This is calling inbuilt method in RequestManager.py for posting the indicators to Graph API
Don't know the answer to your Python questions but have you tried using the Threat Intelligence Platform connector directly against your app? It is in public preview right now.

Using Celery (python) & RabbitMQ to save results of tasks to a specific file location?

Hey guys I'm working on a prototype for a project at my school (I'm a research assistant so this isn't a graded project). I'm running celery on a server cluster (with 48 workers/cores) which is already setup and working. The nutshell of my project is that we want to use celery for some number crunching of a rather large amount of files/tasks.
Because of this it is very important that we save results to an actual file, we have gigs upon gigs of data and it WON'T fit in RAM while running the traditional task queue/backend.
Anyways...
My prototype (with a trivial add function):
task.py
from celery import Celery
app=Celery()
#app.task
def mult(x,y):
return x*y
And this works great when I execute: $ celery worker -A task -l info
But if I try and add a new backend:
from celery import Celery
app=Celery()
app.conf.update(CELERY_RESULT_BACKEND = 'file://~/Documents/results')
#app.task
def mult(x,y):
return x*y
I get a rather large error:
[2017-08-04 13:22:18,133: CRITICAL/MainProcess] Unrecoverable error:
AttributeError("'NoneType' object has no attribute 'encode'",)
Traceback (most recent call last):
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
return obj.__dict__[self.__name__]
KeyError: 'backend'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bartolucci/anaconda3/lib/python3.6/site- packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/bootsteps.py", line 115, in start
self.on_start()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 143, in on_start
self.emit_banner()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 158, in emit_banner
' \n', self.startup_info(artlines=not use_image))),
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 221, in startup_info
results=self.app.backend.as_uri(),
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/kombu/utils/objects.py", line 44, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 1183, in backend
return self._get_backend()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 902, in _get_backend
return backend(app=self, url=url)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/backends/filesystem.py", line 45, in __init__
self.path = path.encode(encoding)
AttributeError: 'NoneType' object has no attribute 'encode'
I am only 2 days into this project and have never worked with celery (or a similar library) before (I come from the algorithmic, mathy side of the fence). I currently wrangling with celery's user guide docs, but they're honestly pretty sparse on this detail.
Any help is much appreciated and thank you.
Looking at the celery code for filesystem backed result backend here.
https://github.com/celery/celery/blob/master/celery/backends/filesystem.py#L54
Your path needs to start with file:/// (3 slashes)
Your settings has it starting with file:// (2 slashes)
You might also want to use the absolute path instead of the ~.

Tweepy issues with twitter bot and python

I have a few twitterbots that I run on my raspberryPi. I have most functions wrapped in a try / except to ensure that if something errors it doesn't break the program and continues to execute.
I'm also using Python's Streaming library as my source of monitoring for the tags that I want the bot to retweet.
Here is an issue that happens that kills the program although I have the main function wrapped in a try/except:
Unhandled exception in thread started by <function startBot5 at 0x762fbed0>
Traceback (most recent call last):
File "TwitButter.py", line 151, in startBot5
'<botnamehere>'
File "/home/pi/twitter/bots/TwitBot.py", line 49, in __init__
self.startFiltering(trackList)
File "/home/pi/twitter/bots/TwitBot.py", line 54, in startFiltering
self.myStream.filter(track=tList)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 445, in filter
self._start(async)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 361, in _start
self._run()
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 294, in _run
raise exception
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 263, in _run
self._read_loop(resp)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 313, in _read_loop
line = buf.read_line().strip()
AttributeError: 'NoneType' object has no attribute 'strip'
My setup:
I have a parent class TwitButter.py, that creates an object from the TwitBot.py. These objects are the bots, and they are started on their own thread so they can run independently.
I have a function in the TwitBot that runs the startFiltering() function. It is wrapped in a try/except, but my except code is never triggered.
My guess is that the error is occurring within the Streaming library. Maybe that library is poorly coded and breaks on the line that is specified at the bottom of the traceback.
Any help would be awesome, and I wonder if others have experienced this issue?
I can provide extra details if needed.
Thanks!!!
This actually is problem in tweepy that was fixed by github #870 in 2017-04. So, should be resolved by updating your local copy to latest master.
What I did to discover that:
Did a web search to find the tweepy source repo.
Looked at streaming.py for context on the last traceback lines.
Noticed the most recent change to the file was the same problem.
I'll also note that most of the time you get a traceback from deep inside a Python library, the problem comes from the code calling it incorrectly, rather than a bug in the library. But not always. :)

Parallel Python - RuntimeError: Communication pipe read error

I'm using parallel python to run multiple dynamic simulations using a module called OrcFxAPI. The program works perfectly if it is run as a python program on my machine however if I convert it to an exe file using py2exe and then run I get the following error:
Traceback (most recent call last):
File "Analysis.pyc", Line 500, in multiprocessor
File "pp.pyc", Line 342, in __init__
File "pp.pyc", Line 506, in set_ncpus
File "pp.pyc", Line 140, in __init__
File "pp.pyc", Line 152, in start
File "pptransport.pyc", Line 140, in receive
RuntimeError: Communication pipe read error
It is failing at this line in my program:
job_server = pp.Server(ppservers=ppservers)
but I think it might be something to do with the path used to import the OrcFxAPI module when submitting the job:
job = job_server.submit(max_seastate, (gui_vars, case_list, case, line_info, output_vars), (), ("OrcFxAPI",), callback=finished, callbackargs=(case_no, no_of_cases,))

Categories

Resources