Running Bulk Synchronous Parallel Model (BSP) in Python

Running Bulk Synchronous Parallel Model (BSP) in Python - python

The BSP Parallel Programming Model has several benefits - the programmer need not explicitly care about synchronization, deadlocks become impossible and reasoning about speed becomes much easier than with traditional methods. There is a Python interface to the BSPlib in the SciPy:
import Scientific.BSP
I wrote a little program to test BSP. The Program is a simple random experiment which "calculates" the probalbility that throwing n dice yields a sum of k:
from Scientific.BSP import ParSequence, ParFunction, ParRootFunction
from sys import argv
from random import randint
n = int(argv[1]) ; m = int(argv[2]) ; k = int(argv[3])
def sumWuerfe(ws): return len([w for w in ws if sum(w)==k])
glb_sumWuerfe= ParFunction(sumWuerfe)
def ausgabe(result): print float(result)/len(wuerfe)
glb_ausgabe = ParRootFunction(output)
wuerfe = [[randint(1,6) for _ in range(n)] for _ in range(m)]
glb_wuerfe = ParSequence(wuerfe)
# The parallel calc:
ergs = glb_sumWuerfe(glb_wuerfe)
# collecting the results in Processor 0:
ergsGesamt= results.reduce(lambda x,y:x+y, 0)
glb_output(ergsGesamt)
The program works fine, but: It uses just one process!
My Question: Anyone knows how to tell this Pythonb-BSP-Script to use 4 (or 8 or 16) Processes? I thought this BSP Implementation woould use MPI, but starting the script via mpiexe -n 4 randExp.py doesnt work.

A minor thing, but Scientific Python != SciPy in your question...
If you download the ScientificPython sources you'll see a README.BSP, a README.MPI, and a README.BSPlib. Unfortunately, there's not really much mention made of the information there on the online webpages.
The README.BSP is pretty explicit about what you need to do to get the BSP stuff working in real Parallel:
In order to use the module
Scientific.BSP using more than one
real processor, you must compile
either the BSPlib or the MPI
interface. See README.BSPlib and
README.MPI for installation details.
The BSPlib interface is probably more
efficient (I haven't done extensive
tests yet), and allows the use of the
BSP toolset, on the other hand MPI is
more widely available and might thus
already be installed on your machine.
For serious use, you should probably
install both and make comparisons for
your own applications. Application
programs do not have to be modified to
switch between MPI and BSPlib, only
the method to run the program on a
multiprocessor machine must be
adapted.
To execute a program in parallel mode,
use the mpipython or bsppython
executable. The manual for your MPI or
BSPlib installation will tell you how
to define the number of processors.
and the README.MPI tells you what to do to get MPI support:
Here is what you have to do to get MPI
support in Scientific Python:
1) Build and install Scientific Python
as usual (i.e. "python setup.py
install" in most cases).
2) Go to the directory Src/MPI.
3) Type "python compile.py".
4) Move the resulting executable
"mpipython" to a directory on your
system's execution path.
So you have to build more BSP stuff explicitly to take advantage of real parallelism. The good news is you shouldn't have to change your program. The reason for this is that different systems have different parallel libraries installed, and libraries that go on top of those have to have a configuration/build step like this to take advantage of whatever is available.

Related

Spyder hangs on calling random.uniform()

My program frequently access random numbers. I initiate my random number generator via:
import random
random.seed(1)
I'm calling random.uniform() a lot in code for an evoluationary model (biology) and it repeatidly hangs (doing nothing for 20 minutes and then I stop it) after a while. While it hangs Python is using my CPU with 20%-30% (I have four cores). At the same time it's using 10GB Ram (I am having a lot of data).
Can I do something to make the default random library not hang or is there another random libary I can use?
I'm running Spyder 4.2.5 with Python 3.8 on Windows 10. (The problem already existed with an earlier version of Spyder and I installed Spyder 4.2.5 from skretch)

Just speculating, but the default random module definitely shouldn't do that, so I suspect either
there's something wrong with your Python install itself
your system doesn't have enough entropy (or you are exhausting the pool by making a massive number of sequential calls to it, rather than simply using the base value) and is relying on /dev/random (which blocks until data is available) .. see related
differences between random and urandom
https://www.python.org/dev/peps/pep-0524/
Try calling os.random with and without the blocking flag
import os
os.getrandom(1024, flags=os.GRND_NONBLOCK) # raise for low entropy

How do I speed up repeated calls a ruby program (github's linguist) from python?

I'm using github's linguist to identify unknown source code files. Running this from the command line after a gem install github-linguist is insanely slow. I'm using python's subprocess module to make a command-line call on a stock Ubuntu 14 installation.
Running against an empty file: linguist __init__.py takes about 2 seconds (similar results for other files). I assume this is completely from the startup time of Ruby. As #MartinKonecny points out, it seems that it is the linguist program itself.
Is there some way to speed this process up -- or a way to bundle the calls together?

One possibility is to just adapt the linguist program (https://github.com/github/linguist/blob/master/bin/linguist) to take multiple paths on the command-line. It requires mucking with a bit of Ruby, sure, but it would make it possible to pass multiple files without the startup overhead of Linguist each time.
A script this simple could suffice:
require 'linguist/file_blob'
ARGV.each do |path|
blob = Linguist::FileBlob.new(path, Dir.pwd)
# print out blob.name, blob.language, blob.sloc, etc.
end

IronPython vs Python speed with numpy

I have some Python source code that manipulates lists of lists of numbers (say about 10,000 floating point numbers) and does various calculations on these numbers, including a lot of numpy.linalg.norm for example.
Run time had not been an issue until we recently started using this code from a C# UI (running this Python code from C# via IronPython). I extracted a set of function calls (doing things as described in the 1st paragraph) and found that this code takes about 4x longer to run in IronPython compared to Python 2.7 (and this is after excluding the startup/setup time in C#/IronPython). I'm using a C# stopwatch around the repeated IronPython calls from C# and using the timeit module around an execfile in Python 2.7 (so the Python time results include more operation like loading the file, creating the objects, ... whereas the C# doesn't). The former requires about 4.0 seconds while the latter takes about 0.9 seconds.
Would you expect this kind of difference? Any ideas how I might resolve this issue? Other comments?
Edit:
Here is a simple example of code that runs about 10x slower on IronPython on my machine (4 seconds in Python 2.7 and 40 seconds in IronPython):
n = 700
for i in range(n-1):
for j in range(i, n):
dist = np.linalg.norm(np.array([i, i, i]) - np.array([j, j, j]))

You're using NUMPY?! You're lucky it works in IronPython at all! The support is being added literally as we speak!
To be exact, there's a CPython-extension-to-IronPython interface project and there's a native CLR port or numpy. I dunno which one you're using but both ways are orders of magnitude slower that working with the C version in CPython.
UPDATE:
The Scipy for IronPython port by Enthought that you're apparently using looks abandoned: last commits in the repos linked are a few years old and it's missing from http://www.scipy.org/install.html, too. Judging by the article, it was a partial port with interface in .NET and core in C linked with a custom interface. The previous paragraph stands for it, too.
Using the information from Faster alternatives to numpy.argmax/argmin which is slow , you may get some speedup if you limit data passing back and forth between the CLR and the C core.

Freezing shared objects with cx_freeze on Linux

I am trying to CX_Freeze an application for the Linux Platform. The Windows MSI installer works perfectly but the Linux counter-part doesn't really function the way I want it.
When the package is build it runs perfectly on the original system, but when ported to a different system (although same architecture) it generates a segfault. First thing I did was check the librarys and there are some huge version differences with libc, pthread and libdl. So I decided to include these in the build, like so:
if windows_build:
build_exe_options['packages'].append("win32net")
build_exe_options['packages'].append("win32security")
build_exe_options['packages'].append("win32con")
pywintypes_dll = 'pywintypes{0}{1}.dll'.format(*sys.version_info[0:2]) # e.g. pywintypes27.dll
build_exe_options['include_files'].append((os.path.join(GetSystemDirectory(), pywintypes_dll), pywintypes_dll))
else:
build_exe_options['packages'].append("subprocess")
build_exe_options['packages'].append("encodings")
arch_lib_path = ("/lib/%s-linux-gnu" % os.uname()[4])
shared_objects = ["libc.so.6", "libpthread.so.0", "libz.so.1", "libdl.so.2", "libutil.so.1", "libm.so.6", "libgcc_s.so.1", "ld-linux-x86-64.so.2"]
lib_paths = ["/lib", arch_lib_path, "/lib64"]
for so in shared_objects:
for lib in lib_paths:
lib_path = "%s/%s" % (lib, so)
if os.path.isfile(lib_path):
build_exe_options['include_files'].append((lib_path, so))
break
After checking the original cx_frozen bin it seems the dynamic libraries play there part and intercept the calls perfectly. Although now I am at the part where pthread segfaults due to the fact that he tries the systems libc instead of mine (checked with ldd and gdb).
My question is quite simple, this method I am trying is horrible as it doesn't do the recursive depency resolving. Therefore my question would be "what is the better way of doing this? or should I write recursive depency solving within my installer?"
And in order to beat the Solution: "Use native Python instead", we got some hardware appliances (think 2~4U) with Linux (and Bash access) where we want to run this aswell. porting entire python (with its dynamic links etcetc) seemd like way to much work when we can cx_freeze it and ship the librarys with.

I don't know about your other problems, but shipping libc.so.6 to another system the way you do it can not possibly work, as explained here.

cProfile taking a lot of memory

I am attempting to profile my project in python, but I am running out of memory.
My project itself is fairly memory intensive, but even half-size runs are dieing with "MemoryError" when run under cProfile.
Doing smaller runs is not a good option, because we suspect that the run time is scaling super-linearly, and we are trying to discover which functions are dominating during large runs.
Why is cProfile taking so much memory? Can I make it take less? Is this normal?

Updated: Since cProfile is built into current versions of Python (the _lsprof extension) it should be using the main allocator. If this doesn't work for you, Python 2.7.1 has a --with-valgrind compiler option which causes it to switch to using malloc() at runtime. This is nice since it avoids having to use a suppressions file. You can build a version just for profiling, and then run your Python app under valgrind to look at all allocations made by the profiler as well as any C extensions which use custom allocation schemes.
(Rest of original answer follows):
Maybe try to see where the allocations are going. If you have a place in your code where you can periodically dump out the memory usage, you can use guppy to view the allocations:
import lxml.html
from guppy import hpy
hp = hpy()
trees = {}
for i in range(10):
# do something
trees[i] = lxml.html.fromstring("<html>")
print hp.heap()
# examine allocations for specific objects you suspect
print hp.iso(*trees.values())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.