Zipline: pickle backtest to resume simulation

Zipline: pickle backtest to resume simulation - python

I have been using Zipline for some time now and could highly benefit from being able to pickle a backtest in order to be able to resume it later. The idea is to save the state of the trading algorithm and update it when new data become available. I started pickling some attributes I could think of but forgot some others and was therefore wondering if anyone had an easy solution to do that.
Best,
Vincent
PS:
I tried updating the portfolio with those few lines. It goes ok but more attributes need to be overwritten.
if self.load_former_ptf:
for k, v in context.former_portfolio.__dict__.items():
self.TradingAlgorithm.portfolio.__setattr__(k, v)
updPositionDict = {}
for p in context.former_portfolio.positions.values():
formerDelta = p.amount*p.last_sale_price
newSid = context.symbol(p.sid.symbol)
newPrice = data[newSid].price
newQuantity = int(formerDelta/newPrice)
# portfolio should be made of positions instead of plain dict
updPositionDict.update({newSid:{'amount':newQuantity, 'cost_basis':p.cost_basis,
'last_sale_date':p.last_sale_price, 'last_sale_price':newPrice,
'sid':newSid}})
self.TradingAlgorithm.portfolio.positions = updPositionDict
self.load_former_ptf = False

...wondering if anyone had an easy solution to do that.
I have not used zipline and don't have an implemented solution for what you are asking. All I can help you with is with pickling your test state.
Though from the information you've provided, I can only read that perf_tracker is what you need help with. Looking at its source there shouldn't be any problem with pickling it (correct me if I am wrong, because I haven't done it myself).
Also you can read about pickling custom objects here. Another interesting method is __repr__ which helps recreating objects from strings.

Related

how to compare two lists in different functions in Python?

I am making a Django web Application and i am facing a problem in comparing two different lists in different functions
def test(request,slug):
n=request.user
choice=TestOptions.objects.filter(choice=slug).first()
que=questions.objects.filter(Subject=choice)
question=[]
un=['a','b','c','d','e']
for q in que:
if q not in question:
question.append(q)
else:
continue
sampling = random.sample(question, 5)
print("--------------SamPLING111")
print(sampling)
print("--------------SamPLING111")
correctAnswers=[]
for j in sampling:
correctAnswers.append(j.answer)
marks(correctAnswers)
d = dict(zip(un,sampling))
return render(request,"code/test.html",{'questions':d})
def acceptAnswer(request):
answers=[]
if request.method=="POST":
answers.append(request.POST['a'])
answers.append(request.POST['b'])
answers.append(request.POST['c'])
answers.append(request.POST['d'])
answers.append(request.POST['e'])
score(answers)
return render(request,"code/dub.html")
def marks(correct):
list1=[]
l1=correct
def score(and):
list2=[]
l2=ans
function test is passing a list and function acceptAnswer is passing another list my job is to compare those two lists
how can I compare l1 and l2?

I am not 100 percent what you are trying to do with these lists, but in order to compare them I would just return them. Here is a quick example:
def marks(correct):
list1 = []
l1 = correct
return l1
def score(answer):
list2 = []
l2 = answer
return l2
numbers = [1,2,3]
numbers2 = [1,2,3]
numbers3 = [3,4,5]
print(marks(numbers) == score(numbers2)) # True
print(marks(numbers2) == score(numbers3)) # False
Hopefully this helps!

Rather than continue with comments I figured I'd elaborate in an answer though it isn't an exact answer to your question I think it is the real answer.
You really have two issues. One is a design issue ie how to make your program work correctly and the other is an implementation issue about the scope of variables and how to deal with it.
I can see in your profile you're a university student and given the nature of the code it seems very likely you're writing your web app for the purposes of learning maybe even an assignment.
If you were doing this outside of a university I'd expect you were seeking practitioner type skills in which case I'd suggest the answer would be to design your application the way Django expects you to, which I would say would translate into storing state information in a database.
If this is a lab however you may not have covered databases yet. Labs sometimes have students doing silly things because they can't teach everything at the same time. So your Prof may not expect you to use a database yet.
Inside a web application you have to consider that the web is request response and that you can get requests from a lot of different sources so you have state management concerns that classical desktop applications don't have. Who is supposed to see these tests and who is supposed to see the marks and what is the order these things happen? Should anyone be able to create a test? Should anyone be able to take a test? You might not care yet, eventually you'll want to care about sessions. If people are taking their own tests you could store data in a user session but then other people wouldn't see those tests. Generally the correct way to store this sort of state is in a database where you can access it according to what you know about the current request. If this is some sort of basic intro app your Prof may be happy with you doing something kludgy for now.

python what data type is this?

I'm pretty new to python, and currently playing with the zeroconf library.
when I try to register a service on the network, I'm seeing this in the function definition:
def register_service(self, info, ttl=_DNS_TTL):
"""Registers service information to the network with a default TTL
of 60 seconds. Zeroconf will then respond to requests for
information for that service. The name of the service may be
changed if needed to make it unique on the network."""
self.check_service(info)
self.services[info.name.lower()] = info
if info.type in self.servicetypes:
self.servicetypes[info.type] += 1
else:
self.servicetypes[info.type] = 1
now = current_time_millis()
next_time = now
i = 0
while i < 3:
if now < next_time:
self.wait(next_time - now)
now = current_time_millis()
continue
out = DNSOutgoing(_FLAGS_QR_RESPONSE | _FLAGS_AA)
out.add_answer_at_time(DNSPointer(info.type, _TYPE_PTR,
_CLASS_IN, ttl, info.name), 0)
out.add_answer_at_time(DNSService(info.name, _TYPE_SRV,
_CLASS_IN, ttl, info.priority, info.weight, info.port,
info.server), 0)
out.add_answer_at_time(DNSText(info.name, _TYPE_TXT, _CLASS_IN,
ttl, info.text), 0)
if info.address:
out.add_answer_at_time(DNSAddress(info.server, _TYPE_A,
_CLASS_IN, ttl, info.address), 0)
self.send(out)
i += 1
next_time += _REGISTER_TIME
Anyone know what type info is meant to be?
EDIT
Thanks for providing the answer that it's a ServiceInfo class. Besides the fact that the docstring provides this answer when one goes searching for it. I'm still unclear on:
the process expert python programmers follow when encountering this sort of situation - what steps to take to find the data type for info say when docstring wasn't available?
how does python interpreter know info is of ServiceInfo class when we don't specify the class type as part of the input param for register_service? How does it know info.type is a valid property, and say info.my_property isn't?

It is an instance of ServiceInfo class.
It can be deduced from reading the code and docstrings. register_service invokes check_service function which, I quote, "checks the network for a unique service name, modifying the ServiceInfo passed in if it is not unique".

It looks like it should be a ServiceInfo. Found in the examples of the repository:
https://github.com/jstasiak/python-zeroconf/blob/master/examples/registration.py
Edit
I'm not really sure what to say besides "any way I have to". In practice I can't really remember a time when the contract of the interface wasn't made perfectly clear, because that's just part of using Python. Documentation is more a requirement for this reason.
The short answer is, "it doesn't". Python uses the concept of "duck typing" in which any object that supports the necessary operations of the contract is valid. You could have given it any value that has all the properties the code uses and it wouldn't know the difference. So, per part 1, worst case you just have to trace every use of the object back as far as it is passed around and provide an object that meets all the requirements, and if you miss a piece, you'll get a runtime error for any code path that uses it.
My preference is for static typing as well. Largely I think documentation and unit tests just become "harder requirements" when working with dynamic typing since the compiler can't do any of that work for you.

Pythonic way to parse command line output into a container object

Please read this whole question before answering, as it's not what you think... I'm looking at creating python object wrappers that represent hardware devices on a system (trimmed example below).
class TPM(object):
#property
def attr1(self):
"""
Protects value from being accidentally modified after
constructor is called.
"""
return self._attr1
def __init__(self, attr1, ...):
self._attr1 = attr1
...
#classmethod
def scan(cls):
"""Calls Popen, parses to dict, and passes **dict to constructor"""
Most of the constructor inputs involve running command line outputs in subprocess.Popen and then parsing the output to fill in object attributes. I've come up with a few ways to handle these, but I'm unsatisfied with what I've put together just far and am trying to find a better solution. Here are the common catches that I've found. (Quick note: tool versions are tightly controlled, so parsed outputs don't change unexpectedly.)
Many tools produce variant outputs, sometimes including fields and sometimes not. This means that if you assemble a dict to be wrapped in a container object, the constructor is more or less forced to take **kwargs and not really have defined fields. I don't like this because it makes static analysis via pylint, etc less than useful. I'd prefer a defined interface so that sphinx documentation is clearer and errors can be more reliably detected.
In lieu of **kwargs, I've also tried setting default args to None for many of the fields, with what ends up as pretty ugly results. One thing I dislike strongly about this option is that optional fields don't always come at the end of the command line tool output. This makes it a little mind-bending to look at the constructor and match it up to tool output.
I'd greatly prefer to avoid constructing a dictionary in the first place, but using setattr to create attributes will make pylint unable to detect the _attr1, etc... and create warnings. Any ideas here are welcome...
Basically, I am looking for the proper Pythonic way to do this. My requirements, for a re-summary are the following:
Command line tool output parsed into a container object.
Container object protects attributes via properties post-construction.
Varying number of inputs to constructor, with working static analysis and error detection for missing required fields during runtime.
Is there a good way of doing this (hopefully without a ton of boilerplate code) in Python? If so, what is it?
EDIT:
Per some of the clarification requests, we can take a look at the tpm_version command. Here's the output for my laptop, but for this TPM it doesn't include every possible attribute. Sometimes, the command will return extra attributes that I also want to capture. This makes parsing to known attribute names on a container object fairly difficult.
TPM 1.2 Version Info:
Chip Version: 1.2.4.40
Spec Level: 2
Errata Revision: 3
TPM Vendor ID: IFX
Vendor Specific data: 04280077 0074706d 3631ffff ff
TPM Version: 01010000
Manufacturer Info: 49465800
Example code (ignore lack of sanity checks, please. trimmed for brevity):
def __init__(self, chip_version, spec_level, errata_revision,
tpm_vendor_id, vendor_specific_data, tpm_version,
manufacturer_info):
self._chip_version = chip_version
...
#classmethod
def scan(cls):
tpm_proc = Popen("/usr/sbin/tpm_version")
stdout, stderr = Popen.communicate()
tpm_dict = dict()
for line in tpm_proc.stdout.splitlines():
if "Version Info:" in line:
pass
else:
split_line = line.split(":")
attribute_name = (
split_line[0].strip().replace(' ', '_').lower())
tpm_dict[attribute_name] = split_line[1].strip()
return cls(**tpm_dict)
The problem here is that this (or a different one that I may not be able to review the source of to get every possible field) could add extra things that cause my parser to work, but my object to not capture the fields. That's what I'm really trying to solve in an elegant way.

I've been working on a more solid answer to this the last few months, as I basically work on hardware support libraries and have finally come up with a satisfactory (though pretty verbose) answer.
Parse the tool outputs, whatever they look like, into objects structures that match up to how the tool views the device. These can have very generic dict structures, but should be broken out as much as possible.
Create another container class on top of that that which uses attributes to access items in the tool-container-objects. This enforces an API and can return sane errors across multiple versions of the tool, and across differing tool outputs!

how to minimize db calls in ndb and optimize python code?

I have a collection of properties. some of these properties are given review score. review is saved in AggregateReview table. I have to sort these properties on basis of their score. Property with highest review score will come first.When all properties with review score will be sorted, the i have to append those property which aren't reviewed.
This is the code i have written, It's working fine....but i want to know is there anything in this code, that can be optimized, I am new to app engine and ndb, so any help will be appreciated. ( I think there are so many calls to db )...
sortedProperties = []
sortedProperties.extend(sorted([eachProperty for eachProperty in properties if AggregateReview.query(AggregateReview.property == eachProperty.key).get()],key=lambda property: AggregateReview.query(AggregateReview.property == property.key).get().rating,reverse=True))
sortedProperties.extend([eachProperty for eachProperty in properties if AggregateReview.query(AggregateReview.property == eachProperty.key).get() is None])
return sortedProperties
after bit of workaround i came to this:
return sorted(properties,key=lambda property: property.review_aggregate.get().average,reverse=True)
but it throws an error :
'NoneType' object has no attribute 'average'
because it can not find the review_aggregate for every property. I want it to accept None....

I am not sure what's going on in your code there, to be honest, as the lines are soo long. But I know that you may want to look into Memcached. It can be used to minimize hits on a database, and is widely used indeed. It is actually built into Google app engine. Read the docs on it here

How can I find why is my pickled data so large?

I've got a python (3.1 if that matters) application that pickles data for another process to consume, and exchange them over network connections. For some reason, some exchange are unexpectedly large ... I can make sense of some of the pickled data and figure out what's transmitted, but there remain a large blob of apparently binary data which I fail to explain to myself, such as redundant strings or large chunk of binary data.
Do you know whether there is a wireshark plugin that could assist me with that task, or another process you'd recommend to someone trying to figure out what more should have been =None'd before the object is transmitted over the connection ?
RouteDirect
q.).q.}q.(X...._RouteDirect__dst_nodeq.cnode
Node
q.).q.}q.(X...._Node__status_upq.NX...._Node__neighbourhoodq.NX...._Node__sendq.NX
..._Node__cpeq
cequation
CPE
q.).q.}q^M(X...._CPE__dim_countq.}q.X...._CPE__internal_nodesq.]q.ubX...._Node__major_stateq.NX...._Node__partition_idq.G?.$:..4. X...._Node__name_idq.cnodeid
NameID
q.).q.}q.X^M..._NameID__nameq.X....checkq.sbX...._Node__dispatcherq.NX...._Node__pendingq.]q.cmessages
^ I can make sense of that: RouteDirect, CPE and NameID are classes in my program.
v I'm more surprised about this: there shouldn't be that much "plain binary" data in the exchange, although Iproto, Tflags, Isrc and Idst are strings contained within those data
q0).q1}q2(X...._Range__maxq3X....1f40q4X...._Range__min_includedq5.X...._Range__max_includedq6.X...._
Range__minq7h4ubX...._Component__dimensionq8h'ubh&).q9}q:h)X....Tflagsq;sbh+).q<}q=(h..h/h0).q>}q?
(h3X....02q#h5.h6.h7h#ubh8h9ubh&).qA}qBh)X....IprotoqCsbh+).qD}qE(h..h/h0).qF}qG(h3X...
.06qHh5.h6.h7hHubh8hAubh&).qI}qJh)X....IsrcqKsbh+).qL}qM(h..h/h0).qN}qO(h3X....7d59d8faqPh5.h6.
h7hPubh8hIubh&).qQ}qRh)X....IdstqS
sbh+).qT}qU(h..h/h0).qV}qW(h3X....00001011qXh5.h6.h7hXubh8hQubh&).qY}qZh)X....Tsrcq[sbh+).q\}q]
(h..h/h0).q^}q_(h3X....0bcfq`h5.h6.h7h`ubh8hYubusbX....
v and this is really perplexing.
qt).qu}qv(X...._LookupRequest__keyqwh!).qx}qyh$}qz(h&).q{}q|h)h*sbh+).q}}q~(h..h/h0).q.}q.
(h3h4h5.h6.h7h4ubh8h{ubh&).q.}q.h)h;sbh+).q.}q.(h..h/h0).q.}q.(h3h#h5.h6.h7h#ubh8h.ubh&).q.}q.h)hCsbh+).q.}q.(h..h/h0).q.}q.
(h3hHh5.h6.h7hHubh8h.ubh&).q.}q.h)hKsbh+).q.}q.(h..h/h0).q.}q.(h3hPh5.h6.h7hPubh8h.ubh&).q.}q.h)hSsbh+).q.}q.(h..h/h0).q.}q.
(h3hXh5.h6.h7hXubh8h.ubh&).q.}q.h)h[sbh+).q.}q.(h..h/h0).q.}q.
(h3h`h5.h6.h7h`ubh8h.ubusbX...._LookupRequest__nonceq.G?...u...X...._LookupRequest__fromq.h.).q.}q.(h.Nh.Nh.Nh
h.).q.}q.(h.}q.
What puzzle me the most is that it seems too regular to be e.g. mere floats/ints in binary. It has some affinity for numbers and [shub] and lot of 'isolated' q's ... which reminds me more of machine code. or is it just my eyes ?
example of pickling support in the Node class,
#
# Define special pickling behaviour.
def __getstate__(self):
"""Indicate witch fields should be pickled."""
state = copy.copy(self.__dict__)
# 'state' is a shallow copy: don't modify objects' content
# Make transients fields point to nothing
state['_Node__dispatcher'] = None
state['_Node__send'] = None
state['_Node__neighbourhood'] = None
state['_Node__status_up'] = None
state['_Node__data_store'] = None
state['_Node__running_op'] = None
state['_Node__major_state'] = None
return state
Many other objects (e.g. CPE, RouteDirect) have no __getstate__ method. I'd love it if there was some technique that doesn't require me to crawl through all constructors of all classes, of course.

Ah, reading /usr/lib/python3.1/pickle.py code at least make one point less obscure: output of pickling is indeed some bytecode for some interpreter, with push/pop pairs that explains the regular patterns seen.
BINPUT = b'q' # store stack top in memo; index is 1-byte arg
BINGET = b'h' # push item from memo on stack; index is 1-byte arg
EMPTY_TUPLE = b')' # push empty tuple
MARK = b'(' # push special markobject on stack
etc.
Following #Alfe's comment, I captured raw traffic using wireshark "follow TCP stream" and "save as ..." features, then used
x=pickle.load(open("wirecapture.bin","rb"))
and used Python evaluator to get a better understanding of what was there. Esp. using
len(pickle.dump(x.my_field))
for all fields reported by dir(x) allowed me to pin-point the over-sized field. Unfortunately, I couldn't get
for y in dir(x):
print("%s: %iKb"%(y,len(pickle.dumps(x[y])/1024))
properly working (x[y] wasn't the valid way to extract x.my_field when y == 'my_field' >_< )

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.