Connecting Spyder to Docker Containers

Connecting Spyder to Docker Containers - python

I've started working at a company that develops code using docker containers, which I have had no experience with thus far. The nature of my work is Data Science-y, and so I find Spyder to be an invaluable tool for such work.
I would like to connect spyder to the docker containers that are being used by my colleagues, but I am not sure how to, or if this is even possible. I was not able to find helpful material on this that I could comprehend.
I considered ditching Spyder in favor of VS Code, as it has the ability to connect to docker containers. But my best attempts at trying to recreate Spyder's functionality in VS Code were only partially successful.
Given the popularity of both Spyder and Docker, I thought this would be a straightforward thing to do. Anyway, I would greatly appreciate any information you might have on this topic. I suppose I could consider other IDEs if you are aware of any that can do this. The key features I need are the ability to launch an interactive python environment which allows me to run scripts in the docker, keep the variables stored after the script runs, using these variables to find where things are going wrong and easily create plots, and possibly also have access to a debugger like Spyder's.
I obviously don't want to bloat the Dockerfile and install Spyder inside the container, I'd like something to run on the outside but be able to connect into the docker container and use the python environment defined there.
The following two links were not helpful to me:
Connect Spyder to a console in a docker container on a remote host
Connecting Spyder to Remote Jupyter Notebook in a Docker Container

I'm not sure anyone cares, but I can briefly describe my solution. You can set up ssh in the docker container and then use Spyder's "connect to an existing kernel" feature to use Spyder as a sort of frontend for an ipython kernel running in the docker container. This feature allows you to connect to remote ipython kernels through ssh. It's a bit of an annoyance to setup, so if you can get by with just ipython sessions and ipdb inside the container, that's probably the way to go. But if you're really stuck trying to debug something and want the full Spyder frontend, this works reliably, in my limited experience.
If you do try this, just note that some versions of Spyder seem to simply crash anytime you try to connect to any existing kernel, in a docker container or not. So if the latest version doesn't work, try some other ones...
EDIT: I'm now especially not sure anyone cares, but I've abandoned Spyder since posting this question. It took too long to actually start getting work done when trying to rely on Spyder inside docker, which itself was buggy in its ability to connect to remote kernels and with respect to its debugger functionality. I got inconsistent experiences on Windows/Ubuntu. Instead, if I don't need to do any data visualizations, I just use ipython in the container.
If I need to do data visualizations, I usually do whatever I need to do in docker to get the data I need saved to a file. Then, I have a conda environment that I use to analyze the data in the file. Even outside of docker I've stopped using Spyder. Instead I write my files in vim, run them at the command line, and I use breakpoint() in my code for debugging; I set export PYTHONBREAKPOINT=ipdb.set_trace or IPython.embed depending on if I want ipdb to step around through the code, or if I just need to try some code out at a particular spot in the program in an interactive shell with all the program's variables defined at that point. In fact, you can launch an ipython shell from ipdb by issuing from IPython import embed; embed(). This has been working well for me, and may work for others depending on the nature of the work/company. It is very much freeing not having to rely on bulky tools like Spyder.

Related

Using plotly in WSL changes the font (console window changes to raster), and won't work at all in VSCode

I've attempted to work a bit more in WSL recently (I've got the most up-to-date version of WSL2 and the Windows 11 insider beta, both of which I updated today)
Everything works great! But plotly has been giving me issues. When I run it from within VSCode (making sure Python Interpreter is set to my correct environment), it spits out the following error:
tcgetpgrp failed: Not a tty
It then opens a tab in my default browser, but it just hangs until eventually failing to connect
Alternatively, if I run it directly from the WSL console (no VScode), it still gives the same error as above, but it DOES correctly open a window in my web browser. It also, for some reason, changes the font of the console?
I'm not 100% sure what the problem is here. I've used WSL for awhile, and never had any issues with displaying plots and things as needed (though, historically, I've used matplotlib... this is the first time I've tried using plotly, but I've used it without problem on native linux and native windows).
Has anyone else had this issue? Or one similar to it? Any idea on what might be wrong?

Figured this out after a bit of reading on other projects that use plotly. The fix is actually very simple, and just requires adding:
export BROWSER="/mnt/c/path/to/browser.exe"
To your ~/.bashrc file. For example, for me this was:
export BROWSER="/mnt/c/Program Files/Google/Chrome/Application/chrome.exe"

How to debug a python script launched by a third party app

I'm using Linux Eclipse (pydev) as IDE to develop python scripts that are launched by an application written in C++. I can debug the python script without problems in the IDE, but the environment is not real (the C++ program sends and receives messages through the stdin/stdout and it's a complex communication channel that I can't fully reproduce writing the messages by hand).
Until now I was using log messages to debug (poor man's debug) but it's getting too complex. When I do something similar in PHP I can just leave xdebug listening and add breakpoints in Netbeans. Very neat and easy. Is it possible to do something like that in Python 3.X (with Eclipse or other IDE)?
NOTE: I know there is a Pydev / Attach to Process functionality, but it doesn't work. Always fails to attach.
NOTE2: There is also a built-in "breakpoint()" in Python 3.7 but it links to a debugger and if also fails, the IDE never gets the control.

After some research, this is the best option I have found. Without any other solution provided, I post it just in case anyone has the same problem.
Python has an integrated debugger: pdb. It works as a module and it doesn't allow to use it if you don't have the window control (i.e. you launch the script).
To solve this there are some coders that have created modules that add a layer on pdb. I have tried some and the most easy and still visual interesting is rpudb (but have a look also to this).
To install it:
pip3 install https://github.com/msbrogli/rpudb/archive/master.zip
(if you install it using the pip3 install rpudb command it will install an old version only valid for python 2)
Then, you use it just adding an import and a function call:
import rpudb
.....
rpudb.set_trace('127.0.0.1', 4444)
.....
Launch the program and it will stop in the set_trace call. To debug it (and continue) open a terminal and launch a telnet like this:
telnet 127.0.0.1 4444
You will have a visual debugger in front of you with the advantage that you can not only debug local programs, but also remote (just change the IP).

I was able to attach PyCharm to a running python process and use break points using PyCharm attach to process
I created a bash script which exec a python script, should work the same with C++

Why don't the ipython env variables match the bash env in the associated terminal emulator?

Lately I have been doing some interactive work in Python.
My setup is an IPython notebook running on a server that uses a grid engine to manage jobs.
Today I was trying to get an IPython cluster going following an example posted here that uses subprocess.Popen to start a the cluster.
I couldn't get the example to work so I tried opening up the IPython/Jupyter terminal emulator and typing in the ipcluster start command and the cluster started right up!
After playing around with things for a while I realized that if I typed env in the terminal emulator I got a different list of environment variables than when I looked at the os.environ variable in Python. The source of the problem seemed to be that the PATH variables were different.
Now I know that I can change the PATH variable in os.environ, but I'm wondering why it is different in the first place? I know very little about environment variables, so this may be a stupid question, but I would have assumed that a terminal emulator and notebook running on the exact same node in the exact same IPython notebook server would have had the exact same environment variables.
Any insight on why the environment variables in the terminal and notebook might be different will be greatly appreciated.
Update: In case it matters, the server I am working on uses the Univa Grid Engine. Also I have noticed that it seems to make a difference whether I use qrsh or qsub to start the notebook server.
Previously I had been using qsub, but by starting the notebook server with qrsh I eliminate many of the differences between env and os.environ. There are still differences, but much fewer. Still not sure what any of that means though:)

As per manual page for qsub, qsh, qrsh, to propagate current shell environment to the job use the -V option:
-V Available for qsub, qsh, qrsh with command and qalter.
Specifies that all environment variables active within the qsub utility be exported to the context of the job.
All environment variables specified with -v, -V or the DISPLAY variable provided with -display will be exported to the defined JSV instances only optionally when this is
requested explicitly during the job submission verification. (see -jsv option above or find more information concerning JSV in jsv(1))

Python IDE on Linux Console

This may sound strange, but I need a better way to build python scripts than opening a file with nano/vi, change something, quit the editor, and type in python script.py, over and over again.
I need to build the script on a webserver without any gui. Any ideas how can I improve my workflow?

put this line in your .vimrc file:
:map <F2> :w\|!python %<CR>
now hitting <F2> will save and run your python script

You should give the screen utility a look. While it's not an IDE it is some kind of window manager on the terminal -- i.e. you can have multiple windows and switch between them, which makes especially tasks like this much easier.

You can execute shell commands from within vim.

Using emacs with python-mode you can execute the script with C-c C-c

you can try ipython. using its edit command, it will bring up your editor (nano/vim/etc), you write your script, and then on exiting you're returned to the ipython prompt and the script is automatically executed.

When working with Vim on the console, I have found that using "tabs" in Vim, instead of having multiple Vim instances suspended in the background, makes handling multiple files in Vim more efficient. It takes a bit of getting used to, but it works really well.

You could run XVNC over ssh, which is actually passably responsive for doing this sort of thing and gets you a windowing GUI. I've done this quite effectively over really asthmatic Jetstart DSL services in New Zealand (128K up/ 128K down =8^P) and it's certainly responsive enough for gvim and xterm windows. Another option would be screen, which lets you have multiple textual sessions open and switch between them.

There are actually 2 questions. First is polling for a console IDE for python and the second is a better dev/test/deploy workflow.
For while there are many ways you can write python code in the console, I find a combination of screen, vim and python/ipython is the best as they are usually available on most servers. If you are doing long sessions, I find emacs + python-mode typically involves less typing.
For a better workflow, I would suggest setting up a development environment. You can easily setup a Linux VM on your desktop/laptop easily these days - there isn't a excuse not to even if it's for hobby projects. That opens up a much larger selection of IDEs available to you, such as:
GUI versions of VI and friends
Remote file editing with tramp and testing locally with python-mode inside Emacs
http://www.netbeans.org
and of course http://eclipse.org with the PyDev plugin
I would also setup a SCM to keep track of changes so that you do
better QA and use it to deploy tested changes onto the server.
For example I use Mercurial for my pet projects and I simply tag my repo when it's ready and update the production server to the tag when I deploy. On the devbox, I do:
(hack hack hack, test test test)
hg ci -m 'comment'
hg tag
hg push
Then I jump onto the server and do the following when I deploy:
hg update
restart service/webserver as needed

Well, apart from using one of the more capable console editors (Emacs or vi would come to mind), why do you have to edit it on the web server itself? Just edit it remotely if constant FTP/WebDAV transfer would seem to cumbersome.
Emacs has Tramp Mode, gedit on Linux and bbedit on the Mac support remote editing, too. Probably quite a large number of other editors. In that case you would just edit in on a more capable desktop and restart the script from a shell window.

For what it's worth, VIM alone can do the same tasks as previously posted. I have had the same problem with testing Python from the command line.
My solution was to use the screen command. I split screens vertically, I run Python in one instance of a shell, and on the second screen, I usually edit Python code with VIM.
Command to install screen:
sudo apt-get install screen
The screen package has a bit of a learning curve but there isn't any mystery if you can remember the "Ctrl-Alt ?" command that contains all knowledge.
No GUI is required!

How can I make a fake "active session" for gconf?

I've automated my Ubuntu installation - I've got Python code that runs automatically (after a clean install, but before the first user login - it's in a temporary /etc/init.d/ script) that sets up everything from Apache & its configuration to my personal Gnome preferences. It's the latter that's giving me trouble.
This worked fine in Ubuntu 8.04 (Hardy), but when I use this with 8.10 (Intrepid), the first time I try to access gconf, I get this exception:
Failed to contact configuration server; some possible causes are that you need to enable TCP/IP networking for ORBit, or you have stale NFS locks due to a system crash. See http://www.gnome.org/projects/gconf/ for information. (Details - 1: Not running within active session)
Yes, right, there's no Gnome session when this is running, because the user hasn't logged in yet - however, this worked before; this appears to be new with Intrepid's Gnome (2.24?).
Short of modifying the gconf's XML files directly, is there a way to make some sort of proxy Gnome session? Or, any other suggestions?
(More details: this is python code that runs as root, but setuid's & setgid's to be me before setting my preferences using the "gconf" module from the python-gconf package.)

I can reproduce this by installing GConf 2.24 on my machine. GConf 2.22 works fine, but 2.24 breaks it.
GConf is failing to launch because D-Bus is not running. Manually spawning D-Bus and the GConf daemon makes this work again.
I tried to spawn the D-Bus session bus by doing the following:
import dbus
dummy_bus = dbus.SessionBus()
...but got this:
dbus.exceptions.DBusException: org.freedesktop.DBus.Error.Spawn.ExecFailed: dbus-launch failed to autolaunch D-Bus session: Autolaunch error: X11 initialization failed.
Weird. Looks like it doesn't like to come up if X isn't running. To work around that, start dbus-launch manually (IIRC use the os.system() call):
$ dbus-launch
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-eAmT3q94u0,guid=c250f62d3c4739dcc9a12d48490fc268
DBUS_SESSION_BUS_PID=15836
You'll need to parse the output somehow and inject them into environment variables (you'll probably want to use os.putenv). For my testing, I just used the shell, and set the environment vars manually with export DBUS_SESSION_BUS_ADDRESS=blahblah..., etc.
Next, you need to launch gconftool-2 --spawn with those environment variables you received from dbus-launch. This will launch the GConf daemon. If the D-Bus environment vars are not set, the daemon will not launch.
Then, run your GConf code. Provided you set the D-Bus session bus environment variables for your own script, you will now be able to communicate with the GConf daemon.
I know it's complicated.
gconftool-2 provides a --direct option that enables you to set GConf variables without needing to communicate with the server, but I haven't been able to find an equivalent option for the Python bindings (short of outputting XML manually).
Edit: For future reference, if anybody wants to run dbus-launch from within a normal bash script (as opposed to a Python script, as this thread is discussing), it is quite easy to retrieve the session bus address for use within the script:
#!/bin/bash
eval `dbus-launch --sh-syntax`
export DBUS_SESSION_BUS_ADDRESS
export DBUS_SESSION_BUS_PID
do_other_stuff_here

Well, I think I understand the question. Looks like your script just needs to start the dbus daemon, or make sure its started. I believe "session" here refers to a dbus session. (here is some evidence), not a Gnome session. Dbus and gconf both run fine without Gnome.
Either way, faking an "active session" sounds like a pretty bad idea. It would only look for it if it needed it.
Perhaps we could see the script in a pastebin? I should have really seen it before making any comment.

Thanks, Ali & Jeremy - both your answers were a big help. I'm still working on this (though I've stopped for the evening).
First, I took the hint from Ali and was trying part of Jeremy's suggestion: I was using dbus-launch to run "gconftool-2 --spawn". It didn't work for me; I now understand why (thx, Jeremy) -- I was trying to use gconf from within the same python program that was launching dbus & gconftool, but its environment didn't have the environment variables - duh.
I set that strategy aside when I noticed gconftool-2's --direct option; internally, gconftool-2 is using API that isn't exposed by the gconf python bindings. So, I modified python-gconf to expose the extra method, and once that builds (I had some unrelated problems getting this to work), we'll see if that fixes things - if it doesn't (and maybe if it does, because building those bindings seems to build all of gnome!), I'll find a better way to manage the environment variables in that first strategy.
(I'll add another answer here tomorrow either way)
And it's the next day: I ran into a little trouble with my modified python-gconf, which inspired me to try Jeremy's simpler idea, which worked fine - before doing the first gconf operation, I simply ran "dbus-launch", parsed the resulting name-value pairs, and added them directly to python's environment. Having done that, I ran "gconftool-2 --spawn". Problem solved.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.