Gather list of structs in MPI for Python?

Gather list of structs in MPI for Python? - python

I am currently trying to communicate lists of struct objects but I run into an error:
Traceback (most recent call last):
File "Primary_secondary_co-sim.py", line 509, in <module>
sm_clear_all(params, curr_pri_timestep, soln)
File "/home/gridsan/jvineet9/vvc_sims/code/sm_clear_all.py", line
162, in sm_clear_all
node_num = soln_node['node_num']
TypeError: list indices must be integers or slices, not str
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been
aborted.
-----------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero
status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[56840,1],1]
Exit code: 1
Here is my code:
from mpi4py import MPI
import os
data = [4,5]
num_SMOs = 5
solns_list = []
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
name = MPI.Get_processor_name()
pid = os.getpid()
idx = range(num_SMOs)
myidx = idx[rank:len(idx):size]
for node in myidx:
soln_node = structtype()
# Add some data fields to struct
soln_node.node_num = node
soln_node.XX = data
solns_list.append(soln_node)
# All nodes send their solns to rank 0
solns_nodes_all = comm.gather(solns_list, root=0)
if rank == 0:
for soln_node in solns_nodes_all:
node_num = soln_node.node_num
data = soln_node.XX
And here's definition for the struct python object:
class structtype:
def __init__(self):
pass
I get the error at the line "for soln_node in solns_nodes_all:" when I loop through all the nodes. I expected solns_nodes_all to be a list of structs as constructed but instead it looks like it's become a list of lists after the MPI gather step?

mpi4pys gather operation fetches individual values from nodes, and puts those return values into a list. You return lists, so they are put into a list of lists. If you intended to return multiple values but want a flat list to iterate over, you can chain your iterables together:
import itertools
for soln_node in itertools.chain.from_iterable(solns_nodes_all):
node_num = soln_node.node_num
data = soln_node.XX
You could have found this out by printing the value of solns_nodes_all:
[
[<__main__.structtype object at 0x7ff148535c70>, <__main__.structtype object at 0x7ff148535700>],
[<__main__.structtype object at 0x7ff148535ee0>, <__main__.structtype object at 0x7ff148535dc0>],
[<__main__.structtype object at 0x7ff148535f40>]
]
The general pattern of gather goes like this:
Node 1 returns A
Node 2 returns B
Node 3 returns C
Gathered into [A, B, C]
So if you say A=[struct(), struct()], ... then what is gathered is [[struct(), struct()], ...]

Related

Memory Error subtracting values in sorted list made from dictionary values

I have a dictionary with a Ref_ID for street link as the key and sequenced stops on that street as the values. I want to determine if the stops are out of sequence. I have a dict consisting of items like so 1234567:[5,10,15,35,] where the array represents the sequences on a given block.
I am using a while loop with in a for loop that iterates through each value until the count = 2, appending the values to a tuple and then subtracting the first value from the second. I d the difference is greater than 40 I wan the program to store the link in another list under the route it's associated with.
I am presently getting a memory error when running the script:
eCheck = []
oCheck = []
for key, value in eLinks.items():
for k in value:
eValues.append(k)
eList = sorted(eValues)
for i in eList:
eValuescount = 0
while eValuescount < 2:
eCheck.append(k)
eItemscount += 1
x = eValues[1] - eValues[0]
print x
if x > 40:
eCheckStreet.append(key)
print "Route ", route, " even side"
for link in eCheckStreet:
print link
Here is the error:
Traceback (most recent call last):
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 323, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 654, in run
exec cmd in globals, locals
File "N:\Python\Completed scripts\Check_Sequences.py", line 1, in <module>
import arcpy
MemoryError

Segmentation fault using mpi4py

I am using mpi4py to spread a processing task over a cluster of cores.
My code looks like this:
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
'''Perform processing operations with each processor returning
two arrays of equal size, array1 and array2'''
all_data1 = comm.gather(array1, root = 0)
all_data2 = comm.gather(array2, root = 0)
This is returning the following error:
SystemError: Negative size passed to PyString_FromStringAndSize
I believe this error means that the array of data stored in all_data1 exceeds the maximum size of an array in Python, which is quite possible.
I tried doing it in smaller pieces, as follows:
comm.isend(array1, dest = 0, tag = rank+1)
comm.isend(array2, dest = 0, tag = rank+2)
if rank == 0:
for proc in xrange(size):
partial_array1 = comm.irecv(source = proc, tag = proc+1)
partial_array2 = comm.irecv(source = proc, tag = proc+2)
but this is returning the following error.
[node10:20210] *** Process received signal ***
[node10:20210] Signal: Segmentation fault (11)
[node10:20210] Signal code: Address not mapped (1)
[node10:20210] Failing at address: 0x2319982b
followed by a whole load of unintelligible path-like information and a final message:
mpirun noticed that process rank 0 with PID 0 on node node10 exited on signal 11 (Segmentation fault).
This seems to happen regardless of how many processors I am using.
For similar questions in C the solution seems to be subtly changing the way the arguments in the recv call are parsed. With Python the syntax is different so I would be grateful if someone could give some clarity to why this error is appearing and how to fix it.

I managed to resolve the problem I was having by doing the following.
if rank != 0:
comm.Isend([array1, MPI.FLOAT], dest = 0, tag = 77)
# Non-blocking send; allows code to continue before data is received.
if rank == 0:
final_array1 = array1
for proc in xrange(1,size):
partial_array1 = np.empty(len(array1), dtype = float)
comm.Recv([partial_array1, MPI.FLOAT], source = proc, tag = 77)
# A blocking receive is necessary here to avoid a Segfault.
final_array1 += partial_array1
if rank != 0:
comm.Isend([array2, MPI.FLOAT], dest = 0, tag = 135)
if rank == 0:
final_array2 = array2
for proc in xrange(1,size):
partial_array2 = np.empty(len(array2), dtype = float)
comm.Recv([partial_array2, MPI.FLOAT], source = proc, tag = 135)
final_array2 += partial_array2
comm.barrier() # This barrier call resolves the Segfault.
if rank == 0:
return final_array1, final_array2
else:
return None

TypeError: recv() got an unexpected keyword argument 'dest'

I am trying to use MPI in python to do some parallel computing for midpoint integration. I am not really familiar with MPI and I have looked around at some examples to produce what I have thus far. I am having trouble with a couple of errors where MPI.COMM does not recognize a couple of input arguments. Again I am not that familiar with MPI.
Below I have attached my code:
from mpi4py import MPI
import numpy as np
from numpy import *
import time
#initialize variables
n = 10e5 #number of increments within each process
a = 0.0; #lower bound
b = 5.0; #upper bound
dest = 0; #define the process that computes the final result
#Functions
def integral(my_a, num, h):
s = 0;
h2 = h/2;
for i in range(0,num):
temp = my_a + i*h;
s = s + fct(temp+h2)*h;
return (s)
def fct(x):
return x**2;
#Start the MPI process
comm = MPI.COMM_WORLD
p = comm.Get_size(); #gather number of processes
print "Number of processes (p): ", p;
myid = comm.Get_rank() #gather rank of the comm (number of cores)
print "Rank of (p): ", myid;
h = (b-a)/n; #length of increment
num = int(n/p); #number of intervals calculated by each process
print "Number of intervals calculated by a process: ", num;
my_range = (b-a)/p; #range per process
my_a = a + myid*my_range; #next lower limit
ti = time.clock();
my_result = integral(my_a,num,h) #get the result
print "Process " + str(myid) + " has the partial result of " + str(my_result) + ".";
if myid == 0:
result = my_result;
for i in range(1,p):
source = 1;
comm.recv(my_result, dest=1, tag=123);
result = result + my_result;
print "The result = " + str(result) + ".";
else :
comm.send(my_result, source=0, tag=123);
MPI_Finalize();
tf = time.clock();
print "Time(s): ", tf-ti;
Here is the error that I get when I try to run this code:
--------------------------------------------------------------------------
*******************************-VirtualBox ~/Documents/ME701/HW/HW5 $ mpirun -np 2 python HW5_prb3.py
Number of processes (p): 2
Rank of (p): 1
Number of intervals calculated by a process: 500000
Number of processes (p): 2
Rank of (p): 0
Number of intervals calculated by a process: 500000
Process 1 has the partial result of 36.4583333333.
Traceback (most recent call last):
File "HW5_prb3.py", line 50, in <module>
comm.send(my_result, source=0, tag=123);
File "Comm.pyx", line 1127, in mpi4py.MPI.Comm.send (src/mpi4py.MPI.c:90067)
**TypeError: send() got an unexpected keyword argument 'source'**
Process 0 has the partial result of 5.20833333333.
Traceback (most recent call last):
File "HW5_prb3.py", line 45, in <module>
comm.recv(my_result, dest=1, tag=123);
File "Comm.pyx", line 1142, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:90513)
**TypeError: recv() got an unexpected keyword argument 'dest'**
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[45152,1],1]
Exit code: 1
--------------------------------------------------------------------------
*******************************-VirtualBox ~/Documents/ME701/HW/HW5 $
The answer to the midpoint integration should be 41.66667. My teacher just wants us to perform a simple time study on parallel computing so we can see the power of it.
Thank you for your time.

I guess you just mixed up send and recv arguments - source is the process rank you receive data from, and dest (short for destination) is the rank of the process you send data to (you can see the docs, if you want).
So, just interchanging source and dest keywords in send and receive should be fine.

py2neo rel() list indices must be integer not float

I'm trying to import nodes into Neo4j in a batch. But when I try to execute it, it throws an error: List indices must be integers, not float. I don't really understand which listitems, I do have floats, but these are cast to strings...
Partial code:
graph_db = neo4j.GraphDatabaseService("http://127.0.0.1:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)
for ngram, one_grams in data.items():
ngram_rank = int(one_grams['_rank'])
ngram_prob = '%.16f' % float(one_grams['_prob'])
ngram_id = 'a'+str(n)
ngram_node = batch.create(node({"word": ngram, "rank": str(ngram_rank), "prob": str(ngram_prob)}))
for one_gram, two_grams in one_grams.items():
one_rank = int(two_grams['_rank'])
one_prob = '%.16f' % float(two_grams['_prob'])
one_node = batch.create(node({"word": one_gram, "rank": str(one_rank), "prob": one_prob}))
batch.create(rel((ngram_node, "FOLLOWED_BY", one_node))) #line 81 throwing error
results = batch.submit()
Full traceback
Traceback (most recent call last):
File "Ngram_neo4j.py", line 81, in probability_items
batch.create(rel((ngram_node, "FOLLOWED_BY", one_node))),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2692, in create
uri = self._uri_for(entity.start_node, "relationships"),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2537, in _uri_for
uri = "{{{0}}}".format(self.find(resource)),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2525, in find
for i, req in pendulate(self._requests):,
File "virtenv\\lib\\site-packages\\py2neo\\util.py", line 161, in pendulate
yield index, collection[index],
TypeError: list indices must be integers, not float
running neo4j 2.0, py2neo 1.6.1, Windows 7/64bit, python 3.3/64bit
--EDIT--
Did some testing, but the error is located in the referencing to nodes.
oversimplified sample code:
for key, dict in data.items(): #string, dictionary
batch = neo4j.WriteBatch(graph_db)
three_gram_node = batch.create(node({"word": key}))
pprint(three_gram_node)
batch.add_labels(three_gram_node, "3gram") # must be int, not float
for k,v in dict.items(): #string, string
four_gram_node = batch.create(node({"word": k}))
batch.create_path(three_gram_node, "FOLLOWED_BY", four_gram_node)
# cannot cast node from BatchRequest obj
batch.submit()
When a node is created batch.create(node({props})), the pprint returns a P2Neo.neo4j. batchrequest object.
At the line add_labels(), it gives the same error as when trying to create a relation: List indices must be integers, not float.
At the batch.create_path() line it throws an error saying it can't cast a node from a P2Neo.neo4j. batchrequest object.
I'm trying the dirty-debug now to understand the indices.
--Dirty Debug Edit--
I've been meddling around with the pendulate(collection) function.
Although I don't really understand how it fits in, and how it's used, the following is happening:
Whenever it hits an uneven number, it gets cast to a float (which is weird, since count - ((i + 1) / 2), where i is an uneven number.) This float then throws the list indices error. Some prints:
count: 3
i= 0
index: 0
(int)index: 0
i= 1 # i = uneven
index: 2.0 # a float appears
(int)index: 2 # this is a safe cast
This results in the list indices error. This also happens when i=0. As this is a common case, I made an additional if() to circumvent the code (possible speedup?) Although I've not unit tested this, it seems that we can safely cast index to an int...
The pendulate function as used:
def pendulate(collection):
count = len(collection)
print("count: ", count)
for i in range(count):
print("i=", i)
if i == 0:
index = 0
elif i % 2 == 0:
index = i / 2
else:
index = count - ((i + 1) / 2)
print("index:", index)
index = int(index)
print("(int)index:", index)
yield index, collection[index]

soft debug : print ngram_node and one_node to see what they contains
dirty debug : modify File "virtenv\lib\site-packages\py2neo\util.py", line 161, add a line before :
print index
You are accessing a collection (a Python list given the traceback), so, for sure, index must be an integer :)
printing it will probably help you to understand why exception raised
(Don't forget to remove your dirty debug afterwards ;))

While it is currently possible for WriteBatch objects to be executed multiple times with edits in between, it is inadvisable to use them in this way and this will be restricted in the next version of py2neo. This is because objects created during one execution will not be available during a subsequent execution and it is not easy to detect when this is being requested.
Without looking back at the underlying code, I'm unsure why you are seeing this exact error but I would suggest refactoring your code so that each WriteBatch creation is paired with one and only one execution call (submit). You can probably achieve this by putting your batch creation within your outer loop and moving your submit call out of the inner loop into the outer loop as well.

How do I remove the memory limit on openmpi processes?

I'm running a process with mpirun and 2 cores and it gets killed at the point when I'm mixing values between the two processes. Both processes use about 15% of the machines memory and even though the memory will increase when mixing, there should still be plenty of memory left. So I'm assuming that there is a limit on the amount of memory used for passing messages in between the processes. How do I find out what this limit is and how do I remove it?
The error message that I'm getting when mpirun dies is this:
File "Comm.pyx", line 864, in mpi4py.MPI.Comm.bcast (src/mpi4py.MPI.c:67787)
File "pickled.pxi", line 564, in mpi4py.MPI.PyMPI_bcast (src/mpi4py.MPI.c:31462)
File "pickled.pxi", line 93, in mpi4py.MPI._p_Pickle.alloc (src/mpi4py.MPI.c:26327)
SystemError: Negative size passed to PyBytes_FromStringAndSize
And this is the bit of the code that leads to the error:
sum_updates_j_k = numpy.zeros((self.col.J_total, self.K), dtype=numpy.float64))
comm.Reduce(self.updates_j_k, sum_updates_j_k, op=MPI.SUM)
sum_updates_j_k = comm.bcast(sum_updates_j_k, root=0)
The code usually works, it only runs into problems with larger amounts of data, which makes the size of the matrix that I'm exchanging between processes increase

The culprit is probably the following lines found in the code of PyMPI_bcast():
cdef int count = 0
...
if dosend: smsg = pickle.dump(obj, &buf, &count) # <----- (1)
with nogil: CHKERR( MPI_Bcast(&count, 1, MPI_INT, # <----- (2)
root, comm) )
cdef object rmsg = None
if dorecv and dosend: rmsg = smsg
elif dorecv: rmsg = pickle.alloc(&buf, count)
...
What happens here is that the object is first serialised at (1) using pickle.dump() and then the length of the pickled stream is broadcasted at (2).
There are two problems here and they both have to do with the fact that int is used for the length. The first problem is an integer cast inside pickle.dump and the other problem is that MPI_INT is used to transmit the length of the pickled stream. This limits the amount of data in your matrix to a certain size - namely the size that would result in a pickled object no bigger than 2 GiB (231-1 bytes). Any bigger object would result in an integer overflow and thus negative values in count.
This is clearly not an MPI issue but rather a bug in (or a feature of?) mpi4py.

I had the same problem with mpi4py recently. As pointed out by Hristo Iliev in his answer, it's a pickle problem.
This can be avoided by using the upper-case methods comm.Reduce(), comm.Bcast(), etc., which do not resort to pickle, as opposed to lower-case methods like comm.reduce(). As a bonus, upper case methods should be a bit faster as well.
Actually, you're already using comm.Reduce(), so I expect that switching to comm.Bcast() should solve your problem - it did for me.
NB: The syntax of upper-case methods is slightly different, but this tutorial can help you get started.
For example, instead of:
sum_updates_j_k = comm.bcast(sum_updates_j_k, root=0)
you would use:
comm.Bcast(sum_updates_j_k, root=0)

For such a case it is useful to have a function that can send numpy arrays in parts, e.g.:
from mpi4py import MPI
import math, numpy
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
def bcast_array_obj(obj = None, dtype = numpy.float64, root = 0):
"""Function for broadcasting of a numpy array object"""
reporter = 0 if root > 0 else 1
if rank == root:
for exp in range(11):
parts = pow(2, exp)
err = False
part_len = math.ceil(len(obj) / parts)
for part in range(parts):
part_begin = part * part_len
part_end = min((part + 1) * part_len, len(obj))
try:
comm.bcast(obj[part_begin: part_end], root = root)
except:
err = True
err *= comm.recv(source = reporter, tag = 2)
if err:
break
if err:
continue
comm.bcast(None, root = root)
print('The array was successfully sent in {} part{}'.\
format(parts, 's' if parts > 1 else ''))
return
sys.stderr.write('Failed to send the array even in 1024 parts')
sys.stderr.flush()
else:
obj = numpy.zeros(0, dtype = dtype)
while True:
err = False
try:
part_obj = comm.bcast(root = root)
except:
err = True
obj = numpy.zeros(0, dtype = dtype)
if rank == reporter:
comm.send(err, dest = root, tag = 2)
if err:
continue
if type(part_obj) != type(None):
frags = len(obj)
obj.resize(frags + len(part_obj))
obj[frags: ] = part_obj
else:
break
return obj
This function automatically determines optimal number of parts to break the input array.
For example,
if rank != 0:
z = bcast_array_obj(root = 0)
else:
z = numpy.zeros(1000000000, dtype = numpy.float64)
bcast_array_obj(z, root = 0)
outputs
The array was successfully sent in 4 parts

Apparently this is an issue in MPI itself and not in MPI4py. The actual variable which holds the size of the data being communicated is a signed 32 bit integer which will overflow to a negative value for around 2GB of data.
Maximum amount of data that can be sent using MPI::Send
It's been raised as an issue with MPI4py previously as well here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Gather list of structs in MPI for Python? - python

Related

Memory Error subtracting values in sorted list made from dictionary values

Segmentation fault using mpi4py

TypeError: recv() got an unexpected keyword argument 'dest'

py2neo rel() list indices must be integer not float

How do I remove the memory limit on openmpi processes?

Categories

Resources