Having the class handle pickle - python

I am changing some code to spin up VMs in ec2 instead of openstack. Main starts a thread per VM, and then various modules perform tasks on these VM. Each thread controls it's own VM. So, instead of either having to add parameters to all of the downstream modules to look up information, or having to change all of the code to unpickle the class instance that created the vm, I am hoping that I can have the class itself decide whether to start a new VM or return the existing pickle. That way the majority of the code wont need to be altered.
This is the general idea, and closest I have gotten to getting it to work:
import os
import sys
import pickle
if sys.version_info >= (2, 7):
from threading import current_thread
else:
from threading import currentThread as current_thread
class testA:
def __init__(self, var="Foo"):
self.class_pickle_file = "%s.p" % current_thread().ident
if os.path.isfile(self.class_pickle_file):
self.load_self()
else:
self.var = var
pickle.dump(self, open(self.class_pickle_file, "wb"))
def test_method(self):
print self.var
def load_self(self):
return pickle.load(open(self.class_pickle_file, "rb"))
x = testA("Bar")
y = testA()
y.test_method()
But that results in: NameError: global name 'var' is not defined
But, If I do y = pickle.load(open("140355004004096.p", "rb")) it works just fine. So the data IS getting in there by storing self inside the class, it's a problem of getting the class to return the pickle instead of itself...
Any ideas? Thanks.

It looks to me like you create a file named by the current thread's ident, then you instantiate another TestA object using the same thread (!!same ident!!), so it checks for a pickle file (and finds it, that's bad), then self.var never gets set.
In test_method, you check for a variable that was never set.
Run each item in its own thread to get different idents, or ensure you set self.var no matter what.

Related

What is the best way to pass arguments from a locust user to taskset parameters, where the tasksets have been separated to different files?

entry_point.py
from other_file import UserBehaviour
class ApiUser(HttpUser):
tasks = [UserBehaviour]
def on_start(self):
# log in and return session id and cookie
# example: self.foo = "foo"
other_file.py
from entry_point import ApiUser
class UserBehaviour(TaskSet):
#task
def do_something(self, session_id, session_cookie)
# use session id and session cookie from user instance running the taskset
# example: print(self.ApiUser.foo)
NOTE: Going through the documentation, I did find that "the User instance can be accessed from within a TaskSet instance through the TaskSet.user", however all my attempts to import the user into the taskset file led to a "cannot import name 'ApiUser' from 'entry_point'" error. If instead of from entry_point import ApiUser I do from entry_point import *, then I get a name 'ApiUser' is not defined error.
Thank you very much #Cyberwiz for putting me on the right track. I've finally managed to figure out what I was doing wrong... which, as it turns out, was a couple of things.
Firstly, importing ApiUser in other_file.py was incorrect for two reasons: 1) it creates a cyclical dependency, and 2) even if it would eventually work it would import the ApiUser class, not the instance of the ApiUser class.
Secondly, I was previously getting a module locust.user has no attribute {name} error, and that was because my code looked like this:
class UserBehaviour(TaskSet):
# do something with user.foo
Having figured this out, I honestly have no idea why I thought the above would work. I've changed my code to reflect the example below and everything now works like a charm:
class UserBehaviour(TaskSet):
#task
def do_something(self):
# do something with self.user.foo

Single instance of class from another module

I come from Java background and most of my thinking comes from there. Recently started learning Python. I have a case where I want to just create one connection to Redis and use it everywhere in the project. Here is how my structure and code looks.
module: state.domain_objects.py
class MyRedis():
global redis_instance
def __init__(self):
redis_instance = redis.Redis(host='localhost', port=6379, db=0)
print("Redus instance created", redis_instance)
#staticmethod
def get_instance():
return redis_instance
def save_to_redis(self, key, object_to_cache):
pickleObj = pickle.dumps(object_to_cache)
redis_instance.set(key, pickleObj)
def get_from_redis(self, key):
pickled_obj = redis_instance.get(key)
return pickle.loads(pickled_obj)
class ABC():
....
Now I want to use this from other modules.
module service.some_module.py
from state.domain_objects import MyRedis
from flask import Flask, request
#app.route('/chat/v1/', methods=['GET'])
def chat_service():
userid = request.args.get('id')
message_string = request.args.get('message')
message = Message(message_string, datetime.datetime.now())
r = MyRedis.get_instance()
user = r.get(userid)
if __name__ == '__main__':
global redis_instance
MyRedis()
app.run()
When I start the server, MyRedis() __init__ method gets called and the instance gets created which I have declared as global. Still when the service tries to access it when the service is called, it says NameError: name 'redis_instance' is not defined I am sure this is because I am trying to java-fy the approach but not sure how exactly to achieve it. I read about globals and my understanding of it is, it acts like single variable to the module and thus the way I have tried doing it. Please help me clear my confusion. Thanks!

What is the scope of a variable in logging?

logging.LogRecord.getMessage() simplifies the manipulation of logging records by providing a factory. I use a module of mine, imported in each piece of code, to homogenize the logging:
# this is mylogging.py
import logging
import logging.handlers
def mylogging(name):
old_factory = logging.getLogRecordFactory()
def record_factory(*args, **kwargs):
record = old_factory(*args, **kwargs)
# send an SMS for critical events (level = 50)
if args[1] == 50:
pass # here is the code which sends an SMS
return record
logging.setLogRecordFactory(record_factory)
# common logging info
log = logging.getLogger(name)
log.setLevel(logging.DEBUG)
(...)
All my scripts bootstrap logging via a
log = mylogging.mylogging("name_of_the_project")
This works fine.
I now would like to keep track of the number of SMS sent. For this I would like to set a counter within mylogging.py, common to all scripts which import mylogging. The problem is that such a variable will be local to each script.
On the other hand, logging is peculiar in the sense that when different scripts call logging.getLogger(name) with the same name, the handler is reused - which means that there is some persistence between scripts (even though each of them does an independent import logging).
With this in mind, is there a way to use a variable which would be common to all logging, placed right after the here is the code which sends an SMS line, and which would be incremented no matter what script the logging request comes from?
An import such as
from mylogging import mycounter
mycounter += 1
adds a new reference to mycounter in the local module namespace. For an immutable type such as an integer counter, the addition rebinds the value in the local namespace only - other modules see the value at the point where they imported.
One solution is to keep the original namespace so that the rebinding happens in mylogger itself.
import mylogger
mylogger.mycounter += 1
This is fragile. Its not very obvious that it only works because of the way the import was done.
A better solution is to use a mutable type. itertools.count is interesting but doesn't let you view the current value of the counter. Here's a simple class that will do it. I've adding locking so that it works in a multithreaded environment also.
Add to mylogger.py
import threading
class MyCounter(object):
def __init__(self):
self.val = 0
self.lock = threading.Lock()
def inc(self):
with self.lock:
self.val += 1
return self.val
sms_counter = MyCounter()
Some other module
from mylogger import sms_counter
print('sms count is {}'.format(sms_counter.inc()))

Dynamic traits do not survive pickling

traits_pickle_problem.py
from traits.api import HasTraits, List
import cPickle
class Client(HasTraits):
data = List
class Person(object):
def __init__(self):
self.client = Client()
# dynamic handler
self.client.on_trait_event(self.report,'data_items')
def report(self,obj,name,old,new):
print 'client added-- ' , new.added
if __name__ == '__main__':
p = Person()
p.client.data = [1,2,3]
p.client.data.append(10)
cPickle.dump(p,open('testTraits.pkl','wb'))
The above code reports a dynamic trait. Everything works as expected in this code. However, using a new python process and doing the following:
>>> from traits_pickle_problem import Person, Client
>>> p=cPickle.load(open('testTraits.pkl','rb'))
>>> p.client.data.append(1000)
causes no report of the list append. However, re-establishing the listener separately as follows:
>>> p.client.on_trait_event(p.report,'data_items')
>>> p.client.data.append(1000)
client added-- [1000]
makes it work again.
Am I missing something or does the handler need to be re-established in __setstate__ during the unpickling process.
Any help appreciated. This is for Python 2.7 (32-bit) on windows with traits version 4.30.
Running pickletools.dis(cPickle.dumps(p)), you can see the handler object being referenced:
...
213: c GLOBAL 'traits.trait_handlers TraitListObject'
...
But there's no further information on how it should be wired to the report method. So either the trait_handler doesn't pickle itself out properly, or it's an ephemeral thing like a file handle that can't be pickled in the first place.
In either case, your best option is to overload __setstate__ and re-wire the event handler when the object is re-created. It's not ideal, but at least everything is contained within the object.
class Person(object):
def __init__(self):
self.client = Client()
# dynamic handler
self.client.on_trait_event(self.report, 'data_items')
def __setstate__(self, d):
self.client = d['client']
self.client.on_trait_event(self.report, 'data_items')
def report(self, obj, name, old, new):
print 'client added-- ', new.added
Unpickling the file now correctly registers the event handler:
p=cPickle.load(open('testTraits.pkl','rb'))
p.client.data.append(1000)
>>> client added-- [1000]
You might find this talk Alex Gaynor did at PyCon interesting. It goes into the high points of how pickling work under the hood.
EDIT - initial response used on_trait_change - a typo that appears to work. Changed it back to on_trait_event for clarity.
I had the same problem but came around like this: Imaging I want to pickle only parts of a quiet big class and some of the objects has been set so transient=True so they're not pickled because there is nothing important to save, e.g.
class LineSpectrum(HasTraits):
andor_cam = Instance(ANDORiKonM, transient=True)
In difference to objects which should be saved, e.g.
spectrometer = Instance(SomeNiceSpectrometer)
In my LineSpectrum class, I have a
def __init__(self, f):
super(LineSpectrum, self).__init__()
self.load_spectrum(f)
def __setstate__(self, state): # WORKING!
print("LineSpectrum: __setstate__ with super(...) call")
self.__dict__.update(state)
super(LineSpectrum, self).__init__() # this has to be done, otherwise pickled sliders won't work, also first update __dict__!
self.from_pickle = True # is not needed by traits, need it for myself
self.andor_cam = ANDORiKonM(self.filename)
self.load_spectrum(self.filename)
In my case, this works perfectly - all sliders are working, all values set at the time the object has been pickled are set back.
Hope this works for you or anybody who's having the same problem. Got Anaconda Python 2.7.11, all packages updated.
PS: I know the thread is old, but didn't want to open a new one just for this.

Reference inherited class functions

I am inheriting from both threading.Thread and bdb.Bdb. Thread requires a run function for the start function to call, and I need to user the Bdb.run function. How do I reference Bdb's run function since I can't do it with self.run? I tried super, but I'm apparently not using that right, I get TypeError: must be type, not classobj.
import sys
import os
import multiprocessing
import threading
import bdb
from bdb import Bdb
from threading import Thread
from el_tree_o import ElTreeO, _RUNNING, _PAUSED, _WAITING
from pysignal import Signal
class CommandExec(Thread, Bdb):
'''
Command Exec is an implementation of the Bdb python class with is a base
debugger. This will give the user the ability to pause scripts when needed
and see script progress through line numbers. Useful for command and
control scripts.
'''
def __init__(self, mainFile, skip=None):
Bdb.__init__(self,skip=skip)
Thread.__init__(self)
# need to define botframe to protect against an error
# generated in bdb.py when set_quit is called before
# self.botframe is defined
self.botframe = None
# self.even is used to pause execution
self.event = threading.Event()
# used so I know when to start debugging
self.mainFile = mainFile
self.start_debug = 0
# used to run a file
self.statement = ""
def run(self):
self.event.clear()
self.set_step()
super(bdb.Bdb,self).run(self.statement)
Just as you invoked Bdb's __init__ method on line 22, you can invoke its run method:
Bdb.run(self, self.statement)
super is only useful when you don't know which parent class you need to invoke next, and you want to let Python's inheritance machinery figure it out for you. Here, you know precisely which function you want to call, Bdb.run, so just call it.

Categories

Resources