I am running 36 jupyter notebooks on AWS instance. The code generates some experimental data based on some probability distribution. I do realize that I cannot multi-thread as it uses the same randomness. I'm not sure if it is the same case if I run the python script separately.
I would like to know if the python code in the background accessing the same randomness?
Currently I'm using numpy.random. I do realize that I can use os.random to avoid this problem. But I really don't prefer it as I have to do a lot of wrap around to get the necessary distribution I require.
Thanks in advance
Related
I have written a Python script which models an academic problem which I wish to publish. I will put the source on Github and some academics that just happen to know Python may get my source and play with it themselves. However there are probably more academics that may be interested in the model but that are not python programmers and I would like them to be able to run my model too. Even though they are not programmers they could at least try out editing the values of some of the parameters to see how that affects the results. So now my question is how could I arrange for a non-python programmer to run a Python program as easily (for them) as possible. I would guess that my options may be...
google colab
an online python compiler like this one
compiling the program into an exe (and letting the user set parameters via a config file)
something else?
So now a couple of complications that makes my problem trickier.
The output of the program is graphical and uses matplotlib. As I understand it, the utilities that turn python scripts into exe files struggle or fail altogether when it comes to matplotlib.
The source is split into two separate files, one small neat file which contains the model and the user might like to have a good look at it and get the gist of it even if they're not really a python programmer. And a separate large ugly file which just handles the graphics - an academic would have no interest in this and I'd like to spare them the gory details.
EDIT: I did ask a related question here - but that was all about programmers that won't mind doing things like installing python and using pip... this question is in relation to non-programmers who would not be comfortable doing things like that.
Colab can handle the 2 problems, but you may need to adapt some code.
Matplotlib interface: Colab can display plots just fine. But you may want user to interact with slider, checkbox, dropdown menu. Then, you need to use Colab's own Form UI, or pywidgets. See an example here
2 separate python files: you can convert one of them to a notebook. Then import the other. Or you can create a new notebook that import both files. Here's an example.
Usually the main reason I'm using jupyter notebook with python is the possibility to initialize once (and only once) objects (or generally "data") that tend to have long (lets say more than 30 seconds) loading times. When my work is iterative, i.e. I run minimally changed version of some algorithm multiple times, the accumulated cost of repeated initialization can get large at end of a day.
I'm seeking an alternative approach (allowing to avoid the cost of repeated initialization without using a notebook) for the following reasons:
No "out of the box" version control when using notebook.
Occasional problems of "I forgot to rename variable in a single place". Everything keeps working OK until the notebook is restarted.
Usually I want to have usable python module at the end anyway.
Somehow when using a notebook I tend to get code that if far from "clean" (I guess this is more self discipline problem...).
Ideal workflow should allow to perform whole development inside IDE (e.g. pyCharm; BTW linux is the only option). Any ideas?
I'm thinking of implementing a simple (local) execution server that keeps the problematic objects pre-initialized as global variables and runs the code on demand (that uses those globals instead of performing initialization) by spawning a new process each time (this way those objects are protected from modification, at the same time thanks to those variables being global there is no pickle/unpickle penalty when spawning new process).
But before I start implementing this - maybe there is some working solution or workflow already known?
Visual Studio Code + Python extension works fine (both Windows and Mac, not sure about Linux). Very fast and lightweight, Git integration, debugging refactorings, etc.
Also there is an IDE called Spyder that is more Python-specific. Also works fine but is more heavy-weight.
I'm exploring 2D interpolations for the purpose of writing a script in Python. I've been using the PIL (Pillow) modules to quickly display the results of the algorithms - this is best for cases when interactive input isn't necessary (i.e. testing on a random set of points).
For interactive testing I've found jsfiddle to be the most lightweight solution, but I admit it isn't ideal to rewrite functions in another language merely to be able to move points around, and input specific shapes of quads.
simple example, 4 verts drawn at random (JavaScript in JS fiddle; would like to do the same in Python)
What would be a fastest way to play around with a Python script graphically? Is there a Python counterpart of jsfiddle? I googled 'Python fiddle', of course, but it's not what I'm looking for. What I need is a simple canvas implementation and click events support.
Thanks in advance,
Well, there is Python Fiddle, but I think this question is going to be closed by admins as being off-topic on Stack Overflow; see this thread here.
I'd also come to think of Jupyter and Anaconda. The latter includes the former. These will harness matplotlib, amongst others, and Jupyter gives you a Matlab-like interpreter that shows you variable values for each line and you're able to step by step look at those variables and any graphs you are making.
As mentioned in previous answer, there is Jupyter notebook – a software for interactive Python programming including graphical output.
You can run Jupyter locally or on your own server, but there are free cloud versions:
https://colab.research.google.com/
https://notebooks.azure.com/
I want to call a function similar to parallelize.map(function, args) that returns a list of results and the user is blind to the actual process. One of the functions I want to parallelize calls subprocess to another unix program that benefits from multiple cores.
I first tried ipython-cluster-helper. This works well with my setup, but I ran into problems installing it on several other machines. I also have to ask for names of clusters during setup. I haven't seen other programs start jobs on clusters for you, so I don't know if that is accepted practice.
joblib seems to be the standard for parallelization, but it can only use one cluster or computer at a time. This works as well, but is significantly slower because it is not using the cluster.
Also, the server I am running this code on complains if a program has run too long to ensure that people use the cluster. Do I write another script to run this program only on our cluster -- if I used joblib?
For now, I added special parameters in setup.py to add cluster names and install ipython-cluster-helper if necessary. And when map is called, it first checks if ipython-cluster-helper and the cluster names are available, use them, else use joblib.
What are others ways of achieving this? I'm looking for a standard way to do this that will work on most machines with or with out a cluster, so I can release the code and make it easy to use.
Thanks.
I saw this post on Medium, and wondered how one might go about managing multiple python scripts.
How I Hacked Amazon's Wifi Button
This describes a system where you need to run one or more scripts continuously to catch and react to events in your network.
My question: Let's say I had multiple python scripts that I wanted to do run while I work on other things. What approaches are available to manage these scripts? I have to imagine there is a better way than having a large number of terminal windows running each script individually.
I am coming back to python, and have no formal training in computer programming, so any guidance you can provide will be greatly appreciated.
Let's say I had multiple python scripts that I wanted to do run. What
approaches are available to manage these scripts? I have to imagine
there is a better way than having a large number of terminal windows
running each script individually.
If you have several .py files in a directory that you want to run, without having a specific order, you can do:
import glob
pyFiles = glob.glob('path/*.py')
for pyFile in pyFiles:
execfile(pyFile)
Your system already runs a large number of background processes, with output to the system log or occasionally to a service-specific log file.
A common arrangement for quick and dirty deployments -- where you don't necessarily want to invest in making the scripts robust and well-behaved enough to run as proper services -- is to start the script inside screen or tmux. You can detach when you don't need to be looking at it, and can reattach at any time -- even from a remote login -- to view the output, or to troubleshoot.
Take a look at luigi (I've not used it).
https://github.com/spotify/luigi
These days (five years after the question was asked) a lot of people use docker compose. But that's a little heavy weight depending on what you want to do.
I just saw today the script server of bugy. Maybe it might be a solution for you or somebody else.
(I am just trying to find a tampermonkey script structure for python..)