How to Debug Apache Storm Python

How to Debug Apache Storm Python - python

I have Apache Storm setup in IntelliJ to run in local mode. I can run the starter topologies just fine. However, I'd like to know how to debug Python bolts. So as a simple example, how would one debug splitsentences.py for the WordCountTopology?

If you have pydev installed (or you don't mind installing it), you can debug remote applications following this instructions.
pydev is quite ok if you have Java background since it is basically eclipse. Installing it is fairly straightforward following this.
In my machine, the remote debugging works for local processes. I have pydev installed over Eclipse Mars.
(I don't think this would be important, but in my case I have two different installations of Eclipse in my machine, one for Java and one for pydev.)
Hope it helps.

I can only give a "high level" answer:
Using Storm's multilang feature results in forking off a new process that executes the external code. Thus, a new java.lang.UNIXProcess is started that executes the python command as specified in WordCountTopology:
public SplitSentence() { super("python", "splitsentence.py"); }
You need to do a remote debug session and attach to this process from within Eclipse. However, as I am not familiar with Python I don't know how to remote debug Python in Eclipse.

Related

Debugging any Known Python File Ran by an Interpreter

I'm trying to debug any Python script that an interpreter runs so long and I have a reference to that script. I.e if my connected interpreter runs a script called abc.py, and in my script directory I have abc.py with breakpoints attached. The IDE will automatically stop execution at that break point
I'm using PyCharm, but I'd like to know the theory here to say if I'd ever like to connect VS Code I'd be able to do that as well. Additionally I'm currently connecting to a Docker container running airflow.
Given the above, I'm assuming that the goal is to do a "remote" debug.
Also since Python is a script, and run by an interpreter I am assuming, if I can read into the interpreter and if PyCharm can match the file ran by the interpreter then it should be able to pause the execution.
I am additionally assuming that the interpreter can run in "normal" mode. Not in debug mode as we have in Java.
I have read three approaches:
ssh interpreter to my Docker container - seems most promising for my current goal, but unsure if it'll work
using Python debug server (Debugging Airflow Tasks with IDE tools?) - still requires manual changes in the specific scripts
using Docker interpreter (https://medium.com/#andrewhharmon/apache-airflow-using-pycharm-and-docker-for-remote-debugging-b2d1edf83d9d) - still requires individual debug configs for executing a single DAG / script
Is debugging any file executed by a python interpreter possible, at least in theory?
Is it possible remotely?
Is it possible using airflow at all?

How to connect local PyCharm to python installed on a server? Is this even possible?

I have PyCharm on my machine (8GB RAM). I am required to to heavy data processing, and would like to use an institutionally provided server. This server has Python installed, but without any IDE. So all I see is a CUI, and it is difficult to program in such an environment.Also note that I cannot ask server admin to install software on the server for me. So, how can one connect one's local PyCharm to a python installed on a remote server? Is this even possible?

You can configure an interpreter using SSH:
Open the Add Python Interpreter dialogue
In the left-hand pane of the Add Python Interpreter dialogue, click SSH Interpreter.
Follow the wizard.
For more detailed instructions, check:
https://www.jetbrains.com/help/pycharm/configuring-remote-interpreters-via-ssh.html
Note: unfortunately, this option is not available in the PyCharm Community Edition.

remote debugging in pycharm, debugging a subprocess

Good Day!
I have a script which runs on Python3.5, It spawns a subprocess which runs a java application.
subprocess.run(["/usr/bin/java","-jar",<pathToMyJar>])
This Java application internally invokes some of my python scripts which runs on Jython2.5.
So I want to debug those Jython scripts, I'm enabling remote debugging in the start of my jython script. Check the following code which does that,
sys.path.append(os.path.join(libspath, "pycharm-debug.egg"))
import pydevd
pydevd.settrace('localhost', port=9999, stdoutToserver=True, stderrToServer=True, suspend=True)
I have created a debug server on my Pycharm with same host and port as above, every time before running my script I start my server on Pycharm but I can't able to debug my Jython scripts.On pycharm I can able to see waiting for process connection.., after that nothing happens.
what is wrong with my approach, is there anything I'm missing here.
I'm using Pycharm-2018.1.2 professional version on ubuntu.

pydev remote debug path

I want to use eclipse, pydev to remote debug my python script. Python script is on a remote Ubuntu server, and Eclispe/pydev is running on my Windows 7 machine.
I followed every step according to this one.
http://pydev.org/manual_adv_remote_debugger.html
The problem is in the last step of configuring path in pydevd_file_utils.py on server, it does not recognize the change. This is what I changed:
PATHS_FROM_ECLIPSE_TO_PYTHON = [(r'c:\EZ_Green\plugins', r'/home/jiechao/EZ_Green/plugins')]
When I run the script, it gives me such error.
pydev debugger: warning: trying to add breakpoint to file that does not exist: /home/jiechao/EZ_Green/plugins/D:/EZ Green/backend/getData.py (will have no effect)
Seems the change does not apply, has anyone done this before or have any ideas?
Thanks a lot
-----------------update 1--------------
So I solve the previous problem and now here is the new problem.
This is the output of program, and it seems the path configuration is correct.
Debug Server at port: 5678
pydev debugger: replacing to server: D:\EZ Green\Product\EZ_Green\plugins\test.py
pydev debugger: sent to server: /home/jiechao/EZ_Green/plugins\test.py
pydev debugger: replacing to client: /home/jiechao/EZ_Green/plugins/test.py
pydev debugger: sent to client: D:\EZ Green\Product\EZ_Green\plugins/test.py
But eclipse does not stop at the breakpoint, not even at pydevd.settrace()
I have no idea why it does not stop.
When I use remote debug on local machine, it works pretty well. When I want to debug on a remote server machine, it does not work. I don't know what's the problem.
------------------update 2---------------------
Problem solved. The script on my client and server turns out to be a little different. So I did not see the breakpoint it stopped.
I am so stupid!
Thanks anyway.

Even though it is possibly not the exact approach you may expect,
one option is to start the Unittest from the command line and attach the debugger by RemoteDebugServer via 'pydevd.py'.
This is now a fully automated option of ePyUnit which includes the automation of remote debugging with PyDev and Eclipse by 'pydevd.py'. This works seamlessly for 'subprocesses' as well as independently started command line processes.
The hostame and the port number could be varied as required, default is
localhost:5678.
See:
https://pypi.python.org/pypi/epyunit
https://pythonhosted.org/epyunit/
For basics of remote debugging:
http://www.pydev.org/manual_adv_remote_debugger.html
Also enhanced unittest integration into PyUnit.
Comments and fixes are welcome.
Have fun.

How to debug a remote python application with (Python Tools for) Visual Studio?

According to http://pytools.codeplex.com/, PTVS supports "Local and remote debugging". However, I couldn't find anything related to it or.
So I'm curious if their "remote debugging" is simply attaching to a running process on the same machine or if there's some actual remote debugging support over TCP/IP available. I'd like to use PTVS for a WSGI-based web applications running on apache on another (linux) machine, but without a proper remote debugger (such as WinPDB, which is not that bad but sonmething integrated in the IDE would be better) it's not really useful...

Remote debugging for platforms other than Windows was not available until 2.0 alpha release, but it is possible now - see the documentation or the video tutorial for details.

There's a couple of different ways to get into remote debugging. The main scenario is probably our MPI cluster debugging. There you can create a new MPI project, set it up to launch to the Windows HPC cluster, and we'll deploy everything needed onto the cluster and setup the remote debugging session.
The "deploy everything needed" part though can be done on your own for normal remote debugging scenarios. This is more or less just standard VS remote debugging with the addition of having PTVS installed. The basic steps for this are:
1) Install the Visual Studio remote debugger components on the remote machine
2) Install PTVS onto the remote machine
3) Start the VS remote debugger monitor (msvsmon)
Then you can do Debug->Attach to Process, select the machine, and start debugging.

Yes, for remote debugging you do need VS + PTVS installed on the remote machine currently, which implies windows only. If you want to see this feature implemented, vote for this ticket here (which also has a few details on the situation):
http://pytools.codeplex.com/workitem/536

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.