Segmentation fault using mpi4py

Segmentation fault using mpi4py - python

I am using mpi4py to spread a processing task over a cluster of cores.
My code looks like this:
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
'''Perform processing operations with each processor returning
two arrays of equal size, array1 and array2'''
all_data1 = comm.gather(array1, root = 0)
all_data2 = comm.gather(array2, root = 0)
This is returning the following error:
SystemError: Negative size passed to PyString_FromStringAndSize
I believe this error means that the array of data stored in all_data1 exceeds the maximum size of an array in Python, which is quite possible.
I tried doing it in smaller pieces, as follows:
comm.isend(array1, dest = 0, tag = rank+1)
comm.isend(array2, dest = 0, tag = rank+2)
if rank == 0:
for proc in xrange(size):
partial_array1 = comm.irecv(source = proc, tag = proc+1)
partial_array2 = comm.irecv(source = proc, tag = proc+2)
but this is returning the following error.
[node10:20210] *** Process received signal ***
[node10:20210] Signal: Segmentation fault (11)
[node10:20210] Signal code: Address not mapped (1)
[node10:20210] Failing at address: 0x2319982b
followed by a whole load of unintelligible path-like information and a final message:
mpirun noticed that process rank 0 with PID 0 on node node10 exited on signal 11 (Segmentation fault).
This seems to happen regardless of how many processors I am using.
For similar questions in C the solution seems to be subtly changing the way the arguments in the recv call are parsed. With Python the syntax is different so I would be grateful if someone could give some clarity to why this error is appearing and how to fix it.

I managed to resolve the problem I was having by doing the following.
if rank != 0:
comm.Isend([array1, MPI.FLOAT], dest = 0, tag = 77)
# Non-blocking send; allows code to continue before data is received.
if rank == 0:
final_array1 = array1
for proc in xrange(1,size):
partial_array1 = np.empty(len(array1), dtype = float)
comm.Recv([partial_array1, MPI.FLOAT], source = proc, tag = 77)
# A blocking receive is necessary here to avoid a Segfault.
final_array1 += partial_array1
if rank != 0:
comm.Isend([array2, MPI.FLOAT], dest = 0, tag = 135)
if rank == 0:
final_array2 = array2
for proc in xrange(1,size):
partial_array2 = np.empty(len(array2), dtype = float)
comm.Recv([partial_array2, MPI.FLOAT], source = proc, tag = 135)
final_array2 += partial_array2
comm.barrier() # This barrier call resolves the Segfault.
if rank == 0:
return final_array1, final_array2
else:
return None

Related

Gather list of structs in MPI for Python?

I am currently trying to communicate lists of struct objects but I run into an error:
Traceback (most recent call last):
File "Primary_secondary_co-sim.py", line 509, in <module>
sm_clear_all(params, curr_pri_timestep, soln)
File "/home/gridsan/jvineet9/vvc_sims/code/sm_clear_all.py", line
162, in sm_clear_all
node_num = soln_node['node_num']
TypeError: list indices must be integers or slices, not str
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been
aborted.
-----------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero
status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[56840,1],1]
Exit code: 1
Here is my code:
from mpi4py import MPI
import os
data = [4,5]
num_SMOs = 5
solns_list = []
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
name = MPI.Get_processor_name()
pid = os.getpid()
idx = range(num_SMOs)
myidx = idx[rank:len(idx):size]
for node in myidx:
soln_node = structtype()
# Add some data fields to struct
soln_node.node_num = node
soln_node.XX = data
solns_list.append(soln_node)
# All nodes send their solns to rank 0
solns_nodes_all = comm.gather(solns_list, root=0)
if rank == 0:
for soln_node in solns_nodes_all:
node_num = soln_node.node_num
data = soln_node.XX
And here's definition for the struct python object:
class structtype:
def __init__(self):
pass
I get the error at the line "for soln_node in solns_nodes_all:" when I loop through all the nodes. I expected solns_nodes_all to be a list of structs as constructed but instead it looks like it's become a list of lists after the MPI gather step?

mpi4pys gather operation fetches individual values from nodes, and puts those return values into a list. You return lists, so they are put into a list of lists. If you intended to return multiple values but want a flat list to iterate over, you can chain your iterables together:
import itertools
for soln_node in itertools.chain.from_iterable(solns_nodes_all):
node_num = soln_node.node_num
data = soln_node.XX
You could have found this out by printing the value of solns_nodes_all:
[
[<__main__.structtype object at 0x7ff148535c70>, <__main__.structtype object at 0x7ff148535700>],
[<__main__.structtype object at 0x7ff148535ee0>, <__main__.structtype object at 0x7ff148535dc0>],
[<__main__.structtype object at 0x7ff148535f40>]
]
The general pattern of gather goes like this:
Node 1 returns A
Node 2 returns B
Node 3 returns C
Gathered into [A, B, C]
So if you say A=[struct(), struct()], ... then what is gathered is [[struct(), struct()], ...]

"Split" a image into packages of bytes

I am trying to do a project for college which consists of sending images using two Arduino Due boards and Python. I have two codes: one for the client (the one who sends the image) and one for the server (the one who receives the image). I know how to send the bytes and check if they are correct, however, I'm required to "split" the image into packages that have:
a header that has a size of 8 bytes and must be in this order:
the first byte must say the payload size;
the next three bytes must say how many packages will be sent in total;
the next three bytes must say which package I'm currently at;
the last byte must contain a code to an error message;
a payload containing data with a maximum size of 128 bytes;
an end of package (EOP) sequence (in this case, 3 bytes).
I managed to create the end of package sequence and append it correctly to a payload in order to send, however I'm facing issues on creating the header.
I'm currently trying to make the following loop:
with open(root.filename, 'rb') as f:
picture = f.read()
picture_size = len(picture)
packages = ceil(picture_size/128)
last_pack_size = (picture_size)
EOPs = 0
EOP_bytes = [b'\x15', b'\xff', b'\xd9']
for p in range(1,packages):
read_bytes = [None, int.to_bytes(picture[(p-1)*128], 1, 'big'),
int.to_bytes(picture[(p-1)*128 + 1], 1, 'big')]
if p != packages:
endrange = p*128+1
else:
endrange = picture_size
for i in range((p-1)*128 + 2, endrange):
read_bytes.append(int.to_bytes(picture[i], 1, 'big'))
read_bytes.pop(0)
if read_bytes == EOP_bytes:
EOPs += 1
print("read_bytes:", read_bytes)
print("EOP_bytes:", EOP_bytes)
print("EOPs", EOPs)
I expect at the end that the server receives the same amount of packages that the client has sent, and in the end I need to join the packages to recreate the image. I can manage to do that, I just need some help with creating the header.

Here is a a demo of how to construct your header, it's not a complete soultion but given you only asked for help constructing the header it may be what you are looking for.
headerArray = bytearray()
def Main():
global headerArray
# Sample Data
payloadSize = 254 # 0 - 254
totalPackages = 1
currentPackage = 1
errorCode = 101 # 0 - 254
AddToByteArray(payloadSize,1) # the first byte must say the payload size;
AddToByteArray(totalPackages,3) # the next three bytes must say how many packages will be sent in total;
AddToByteArray(currentPackage,3) # the next three bytes must say which package I'm currently at;
AddToByteArray(errorCode,1) # the last byte must contain a code to an error message;
def AddToByteArray(value,numberOfBytes):
global headerArray
allocate = value.to_bytes(numberOfBytes, 'little')
headerArray += allocate
Main()
# Output
print(f"Byte Array: {headerArray}")
for i in range(0,len(headerArray)):
print(f"Byte Position: {i} Value:{headerArray[i]}")
Obviously I have not included the logic to obtain the current package or total packages.

How to get the size of struct in python if the data inside struct has variable length binary string?

This code will read data from block device and pack data using struct.pack() and again unpack data using struct.unpack(). This is part of my main socket program. But I am facing an issue in calculating struct size. So I am putting a small code here to demonstrate my problem.
import sys,os
import struct as st
dev_read_data = os.open("/dev/sdc1",os.O_RDONLY)
buffer_size = 230400
offset = 0
while True:
data = os.pread(dev_read_data,buffer_size,offset)
packed_data = st.pack("%dsi" %(len(data)),data,offset) # This statement packs data and assign it to packed_data. Till here code works fine.
print ('Packed_Data {]'.format(packed_data))
unpacked_data = st.unpack("%dsi" %(len(data)),packed_data) # This unpacks data successfully.
offset += buffer_size
if buffer_size == 10036977152:
break
Now I want to calculate the size of struct using function:
struct.calcsize(format)
But in this case only one parameter can be passed. So how to get struct size in case of variable length binary string?
I will be very thankful if experts can find some time to answer my query.

like this :
import os
import binascii
import zlib
path = "/dev/sdc1"
stat = os.stat(path)
block_SIZE = stat.st_blksize
block_COUNT = os.statvfs(path).f_blocks
image = file(path,"rb")
indicator = 0
while True :
try :
if indicator > 2 : break #if indicator > block_Count : break
image.seek(indicator*block_SIZE)
data = image.read(block_SIZE)
HEX_CRC32 = binascii.unhexlify(str(hex(zlib.crc32(data) & 0xffffffff))[2:])
header = binascii.unhexlify(("%010X" % indicator)+("%04x"%block_SIZE))
""" NOW SEND TO SOCKET
My_socket.write(header+data+HEX_CRC32)
check Frame number : first 5 byte
Data Length : 2 Byte (block_SIZE)
Data : Data content
CRC32 : 2 Byte (32-bit value)
Client side : Check CRC32 , get `data[7:-2]` for current block
"""
indicator += 1
except Exception,e :
"""
very important (if failed,re send packet )
"""
print e
if indicator > block_Count : break
Not only brake ! Tell to client all packet has ben succesfully send, please close to socket on client side.

TypeError: recv() got an unexpected keyword argument 'dest'

I am trying to use MPI in python to do some parallel computing for midpoint integration. I am not really familiar with MPI and I have looked around at some examples to produce what I have thus far. I am having trouble with a couple of errors where MPI.COMM does not recognize a couple of input arguments. Again I am not that familiar with MPI.
Below I have attached my code:
from mpi4py import MPI
import numpy as np
from numpy import *
import time
#initialize variables
n = 10e5 #number of increments within each process
a = 0.0; #lower bound
b = 5.0; #upper bound
dest = 0; #define the process that computes the final result
#Functions
def integral(my_a, num, h):
s = 0;
h2 = h/2;
for i in range(0,num):
temp = my_a + i*h;
s = s + fct(temp+h2)*h;
return (s)
def fct(x):
return x**2;
#Start the MPI process
comm = MPI.COMM_WORLD
p = comm.Get_size(); #gather number of processes
print "Number of processes (p): ", p;
myid = comm.Get_rank() #gather rank of the comm (number of cores)
print "Rank of (p): ", myid;
h = (b-a)/n; #length of increment
num = int(n/p); #number of intervals calculated by each process
print "Number of intervals calculated by a process: ", num;
my_range = (b-a)/p; #range per process
my_a = a + myid*my_range; #next lower limit
ti = time.clock();
my_result = integral(my_a,num,h) #get the result
print "Process " + str(myid) + " has the partial result of " + str(my_result) + ".";
if myid == 0:
result = my_result;
for i in range(1,p):
source = 1;
comm.recv(my_result, dest=1, tag=123);
result = result + my_result;
print "The result = " + str(result) + ".";
else :
comm.send(my_result, source=0, tag=123);
MPI_Finalize();
tf = time.clock();
print "Time(s): ", tf-ti;
Here is the error that I get when I try to run this code:
--------------------------------------------------------------------------
*******************************-VirtualBox ~/Documents/ME701/HW/HW5 $ mpirun -np 2 python HW5_prb3.py
Number of processes (p): 2
Rank of (p): 1
Number of intervals calculated by a process: 500000
Number of processes (p): 2
Rank of (p): 0
Number of intervals calculated by a process: 500000
Process 1 has the partial result of 36.4583333333.
Traceback (most recent call last):
File "HW5_prb3.py", line 50, in <module>
comm.send(my_result, source=0, tag=123);
File "Comm.pyx", line 1127, in mpi4py.MPI.Comm.send (src/mpi4py.MPI.c:90067)
**TypeError: send() got an unexpected keyword argument 'source'**
Process 0 has the partial result of 5.20833333333.
Traceback (most recent call last):
File "HW5_prb3.py", line 45, in <module>
comm.recv(my_result, dest=1, tag=123);
File "Comm.pyx", line 1142, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:90513)
**TypeError: recv() got an unexpected keyword argument 'dest'**
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[45152,1],1]
Exit code: 1
--------------------------------------------------------------------------
*******************************-VirtualBox ~/Documents/ME701/HW/HW5 $
The answer to the midpoint integration should be 41.66667. My teacher just wants us to perform a simple time study on parallel computing so we can see the power of it.
Thank you for your time.

I guess you just mixed up send and recv arguments - source is the process rank you receive data from, and dest (short for destination) is the rank of the process you send data to (you can see the docs, if you want).
So, just interchanging source and dest keywords in send and receive should be fine.

How do I remove the memory limit on openmpi processes?

I'm running a process with mpirun and 2 cores and it gets killed at the point when I'm mixing values between the two processes. Both processes use about 15% of the machines memory and even though the memory will increase when mixing, there should still be plenty of memory left. So I'm assuming that there is a limit on the amount of memory used for passing messages in between the processes. How do I find out what this limit is and how do I remove it?
The error message that I'm getting when mpirun dies is this:
File "Comm.pyx", line 864, in mpi4py.MPI.Comm.bcast (src/mpi4py.MPI.c:67787)
File "pickled.pxi", line 564, in mpi4py.MPI.PyMPI_bcast (src/mpi4py.MPI.c:31462)
File "pickled.pxi", line 93, in mpi4py.MPI._p_Pickle.alloc (src/mpi4py.MPI.c:26327)
SystemError: Negative size passed to PyBytes_FromStringAndSize
And this is the bit of the code that leads to the error:
sum_updates_j_k = numpy.zeros((self.col.J_total, self.K), dtype=numpy.float64))
comm.Reduce(self.updates_j_k, sum_updates_j_k, op=MPI.SUM)
sum_updates_j_k = comm.bcast(sum_updates_j_k, root=0)
The code usually works, it only runs into problems with larger amounts of data, which makes the size of the matrix that I'm exchanging between processes increase

The culprit is probably the following lines found in the code of PyMPI_bcast():
cdef int count = 0
...
if dosend: smsg = pickle.dump(obj, &buf, &count) # <----- (1)
with nogil: CHKERR( MPI_Bcast(&count, 1, MPI_INT, # <----- (2)
root, comm) )
cdef object rmsg = None
if dorecv and dosend: rmsg = smsg
elif dorecv: rmsg = pickle.alloc(&buf, count)
...
What happens here is that the object is first serialised at (1) using pickle.dump() and then the length of the pickled stream is broadcasted at (2).
There are two problems here and they both have to do with the fact that int is used for the length. The first problem is an integer cast inside pickle.dump and the other problem is that MPI_INT is used to transmit the length of the pickled stream. This limits the amount of data in your matrix to a certain size - namely the size that would result in a pickled object no bigger than 2 GiB (231-1 bytes). Any bigger object would result in an integer overflow and thus negative values in count.
This is clearly not an MPI issue but rather a bug in (or a feature of?) mpi4py.

I had the same problem with mpi4py recently. As pointed out by Hristo Iliev in his answer, it's a pickle problem.
This can be avoided by using the upper-case methods comm.Reduce(), comm.Bcast(), etc., which do not resort to pickle, as opposed to lower-case methods like comm.reduce(). As a bonus, upper case methods should be a bit faster as well.
Actually, you're already using comm.Reduce(), so I expect that switching to comm.Bcast() should solve your problem - it did for me.
NB: The syntax of upper-case methods is slightly different, but this tutorial can help you get started.
For example, instead of:
sum_updates_j_k = comm.bcast(sum_updates_j_k, root=0)
you would use:
comm.Bcast(sum_updates_j_k, root=0)

For such a case it is useful to have a function that can send numpy arrays in parts, e.g.:
from mpi4py import MPI
import math, numpy
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
def bcast_array_obj(obj = None, dtype = numpy.float64, root = 0):
"""Function for broadcasting of a numpy array object"""
reporter = 0 if root > 0 else 1
if rank == root:
for exp in range(11):
parts = pow(2, exp)
err = False
part_len = math.ceil(len(obj) / parts)
for part in range(parts):
part_begin = part * part_len
part_end = min((part + 1) * part_len, len(obj))
try:
comm.bcast(obj[part_begin: part_end], root = root)
except:
err = True
err *= comm.recv(source = reporter, tag = 2)
if err:
break
if err:
continue
comm.bcast(None, root = root)
print('The array was successfully sent in {} part{}'.\
format(parts, 's' if parts > 1 else ''))
return
sys.stderr.write('Failed to send the array even in 1024 parts')
sys.stderr.flush()
else:
obj = numpy.zeros(0, dtype = dtype)
while True:
err = False
try:
part_obj = comm.bcast(root = root)
except:
err = True
obj = numpy.zeros(0, dtype = dtype)
if rank == reporter:
comm.send(err, dest = root, tag = 2)
if err:
continue
if type(part_obj) != type(None):
frags = len(obj)
obj.resize(frags + len(part_obj))
obj[frags: ] = part_obj
else:
break
return obj
This function automatically determines optimal number of parts to break the input array.
For example,
if rank != 0:
z = bcast_array_obj(root = 0)
else:
z = numpy.zeros(1000000000, dtype = numpy.float64)
bcast_array_obj(z, root = 0)
outputs
The array was successfully sent in 4 parts

Apparently this is an issue in MPI itself and not in MPI4py. The actual variable which holds the size of the data being communicated is a signed 32 bit integer which will overflow to a negative value for around 2GB of data.
Maximum amount of data that can be sent using MPI::Send
It's been raised as an issue with MPI4py previously as well here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Segmentation fault using mpi4py - python

Related

Gather list of structs in MPI for Python?

"Split" a image into packages of bytes

How to get the size of struct in python if the data inside struct has variable length binary string?

TypeError: recv() got an unexpected keyword argument 'dest'

How do I remove the memory limit on openmpi processes?

Categories

Resources