I have a function that takes several arguments, one of which is a contact number. The data provided to the function is used to generate documents, and if one option is selected, that document is immediately returned inline, where the other option takes the contact number and generates an email. In the original version of this function, the contact number was immediately parsed at the start of the function, but I moved it into the else block as that is where the email is actually generated that uses that contact number and I saw no reason to create a new variable if it was not used half of the time. An example of this is below, and is built in Python using the Django framework:
def function(request, object, number=None):
obj = ObjectItem.objects.get(id=object)
# Originally number processed here
if request.method == 'POST':
if 'inline' in request.POST:
data = {
'object': obj,
}
return generate_document(data, inline=True)
else:
if number:
contact = '{}'.format(number)
else:
contact = obj.contact
data = {
'object': obj,
}
document = generate_document(data, inline=False)
return message(document, contact)
else:
return redirect()
While looking at my code, I realize that I could move the data dict creation outside of the processing for the inline vs no inline in the POST, but I do not know if moving the processing of the number argument into the else block in that processing actually saves any time or is the more standard way of doing things. I know that as Python is a scripting language, there is not any kind of optimizations that would be performed automatically like they would rearranging that kind of declaration in a compiled language, so I am looking for the most efficient way of doing this.
From a performance perspective, it makes no difference whether you create data above the if or in the if. Python will only hit the line once and the dict will only be created once. But you should move it above the if for design reasons.
First, don't repeat yourself - if you can reasonably implement a bit of code in one place, don't sprinkle it around your code. Suppose you decide a defaultdict is better later, you only have to change it in one place.
Second, placement implies intent. If you put it above your if you've made a statement that you plan to use that data structure everywhere. In your current code, readers will ask the same question you do... why wasn't that above the if? Its kinda trivial but the reading of the code shouldn't raise more questions.
Related
To provide a bit of context, I am building a risk model that pulls data from various different sources. Initially I wrote the model as a single function that when executed read in the different data sources as pandas.DataFrame objects and used those objects when necessary. As the model grew in complexity, it quickly became unreadable and I found myself copy an pasting blocks of code often.
To cleanup the code I decided to make a class that when initialized reads, cleans and parses the data. Initialization takes about a minute to run and builds my model in its entirety.
The class also has some additional functionality. There is a generate_email method that sends an email with details about high risk factors and another method append_history that point-in-times the risk model and saves it so I can run time comparisons.
The thing about these two additional methods is that I cannot imagine a scenario where I would call them without first re-calibrating my risk model. So I have considered calling them in init() like my other methods. I haven't only because I am trying to justify having a class in the first place.
I am consulting this community because my project structure feels clunky and awkward. I am inclined to believe that I should not be using a class at all. Is it frowned upon to create classes merely for the purpose of organization? Also, is it bad practice to call instance methods (that take upwards of a minute to run) within init()?
Ultimately, I am looking for reassurance or a better code structure. Any help would be greatly appreciated.
Here is some pseudo code showing my project structure:
class RiskModel:
def __init__(self, data_path_a, data_path_b):
self.data_path_a = data_path_a
self.data_path_b = data_path_b
self.historical_data = None
self.raw_data = None
self.lookup_table = None
self._read_in_data()
self.risk_breakdown = None
self._generate_risk_breakdown()
self.risk_summary = None
self.generate_risk_summary()
def _read_in_data(self):
# read in a .csv
self.historical_data = pd.read_csv(self.data_path_a)
# read an excel file containing many sheets into an ordered dictionary
self.raw_data = pd.read_excel(self.data_path_b, sheet_name=None)
# store a specific sheet from the excel file that is used by most of
# my class's methods
self.lookup_table = self.raw_data["Lookup"]
def _generate_risk_breakdown(self):
'''
A function that creates a DataFrame from self.historical_data,
self.raw_data, and self.lookup_table and stores it in
self.risk_breakdown
'''
self.risk_breakdown = some_dataframe
def _generate_risk_summary(self):
'''
A function that creates a DataFrame from self.lookup_table and
self.risk_breakdown and stores it in self.risk_summary
'''
self.risk_summary = some_dataframe
def generate_email(self, recipient):
'''
A function that sends an email with details about high risk factors
'''
if __name__ == "__main__":
risk_model = RiskModel(data_path_a, data_path_b)
risk_model.generate_email(recipient#generic.com)
In my opinion it is a good way to organize your project, especially since you mentioned the high rate of re-usability of parts of the code.
One thing though, I wouldn't put the _read_in_data, _generate_risk_breakdown and _generate_risk_summary methods inside __init__, but instead let the user call this methods after initializing the RiskModel class instance.
This way the user would be able to read in data from a different path or only to generate the risk breakdown or summary, without reading in the data once again.
Something like this:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
my_risk_model.generate_risk_breakdown(parameters)
my_risk_model.generate_risk_summary(other_parameters)
If there is an issue of user calling these methods in an order which would break the logical chain, you could throw an exception if generate_risk_breakdown or generate_risk_summary are called before read_in_data. Of course you could only move the generate... methods out, leaving the data import inside __init__.
To advocate more on exposing the generate... methods out of __init__, consider a case scenario, where you would like to generate multiple risk summaries, changing various parameters. It would make sense, not to create the RiskModel every time and read the same data, but instead change the input to generate_risk_summary method:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
for parameter in [50, 60, 80]:
my_risk_model.generate_risk_summary(parameter)
my_risk_model.generate_email('test#gmail.com')
So I'm working on a website that takes in some XML data sent through a javascript function to be processed by the server.
It's taking it in fine. It's just the fact that parsing in request.form['whateverKey'] through to an object method will make python throw an error saying that my input supplied 2 outputs, when the function only needs one. The first input seems to be an instance of the object I was just working with, the second though would be the post data in string form.
Interestingly though, it won't consider request.form['data'] two variables if I were to declare it straight into a variable (i.e: var = request.form['data']) - only for functions it seems.
Here's a snippet of the code.
#app.route("/dependencygraph", methods=["POST"])
def dependencygraph_landing_page():
error = None
if request.method == "POST":
entity_collection.read_XML_data(request.form['xml'])
return render_template("dependencygraph.html")
read_XML_data is also defined this way:
def read_XML_data(data):
#TODO: Figure out why it takes 2 variables instead
print(data)
If it matters, entity_collection is an object. And its class is written and defined in a different .py.
Been googling around to no avail. Enlightening reasons for why would be helpful, and a solution (if needed) around this would be great. Thanks:3
I have a strange error using the built in webserver in Django (haven't tested against Apache as I'm in active development). I have a url pattern that works for short url parameters (e.g. Chalk%20Hill), but locks up python on this one
http://localhost:8000/chargeback/checkDuplicateProject/Bexar%20Street%20Phase%20IV%20Brigham%20Ln%20to%20Myrtle%20St
The get request just says pending, and never returns, and I have to force quit python to get the server to function again. What am I doing wrong?
EDIT:
In continuing testing, it's strange, if I just enter the url, it returns the correct json response. Then it locks python. While I'm in the website, though, it never returns, and locks python.
urls:
url(r'^chargeback/checkDuplicateProject/(?P<aProjectName>(\w+)((\s)?(-)?(\w+)?)*)/$', 'chargeback.views.isProjectDuplicate'),
views:
def isProjectDuplicate(request, aProjectName):
#count the number of matching project names
p = Project.objects.filter(projectName__exact = aProjectName).count()
#if > 0, the project is a duplicate
if p > 0:
return HttpResponse('{"results":["Duplicate"]}', mimetype='application/json')
else:
return HttpResponse('{"results":["Not Duplicate"]}', mimetype='application/json')
Model:
class Project(models.Model):
projectName = models.TextField('project name')
department = models.ForeignKey('Department')
def __unicode__(self):
return self.projectName
The accepted answer is spot on about the regex, but since we're discussing optimization, I thought I should note that the code for checking whether a project exists could be modified to generate a much quicker query, especially in other contexts where you could be counting millions of rows needlessly. Call this 'best practices' advice, if you will.
p = Project.objects.filter(projectName__exact = aProjectName).count()
if p > 0:
could instead be
if Project.objects.filter(project_name__iexact=aProjectName).exists():
for two reasons.
First, you're not using p for anything so there's no need to store it as a variable as it increases readability and p is an obscure variable name and the best code is no code at all.
Secondly, this way we only ask for a single row instead of saving the results to the queryset cache. Please see the official Queryset API docs, a related question on Stack Overflow and the discussion about the latter on the django-developers group.
Additionally, it is customary in python (and Django, naturally) to name your fields lower_cased_separated_by_underscores. Please see more about this on the Python Style Guide (PEP 8).
Since you are going to check whether aProjectName already exists in the database, there's no need for you to make the regex so complicated.
I suggest you simplify the regex to
url(r'^chargeback/checkDuplicateProject/(?P<aProjectName>[\w+\s-]*)/$', 'chargeback.views.isProjectDuplicate'),
For a further explanation, see the question url regex keeps django busy/crashing on the django-users group.
For my app, I need to print out a series of outputs and then accepts inputs from the user. What would be the best way of doing this?
Like:
print '1'
x = raw_input()
print '2'
y = raw_input()
Something like this, but it would go on for at least 10 times. My only concern with doing the above is that it would make up for poor code readability.
How should I do it? Should I create a function like this:
def printOut(string):
print string
Or is there a better way?
First one note: raw_input() takes an optional argument ... a prompt string.
Regarding the broader question, a simplistic approach would be to create a class which defines the elements of your form and provides the functions for their input, validation, and later manipulations or output.
With such a class instances can be created (instantiated), and collected, stored, etc.
Such an approach need not any more complicated than something like:
#!/usr/bin/python
# I use /usr/bin/env python; but making SO's syntax highlighter happy.
class generic_form:
def __init__(self, element_list):
self.form_elements = element_list
self.contents= dict()
def fill_it_in(self):
for prompt in self.form_elements:
self.contents[prompt] = raw_input(prompt)
def get(self, item):
return self.contents[item]
def print_it(self):
for each in self.form_elements:
print each, self.contents[each]
if __name__ == '__main__':
sample_fields = ("Given Name: ",
"Surname: ",
"Date of Birth: ",
"Notes: ")
example = generic_form(sample_fields)
print "Fill in my form:"
example.fill_it_in()
print
print "Please review your input:"
example.print_it()
# store(:%s, %s: %s" % (example.get('Surname: '), \
# example.get('Given Name: '), example.get('Notes: '))
The main code is only a dozen lines long to define a generic form class with input
and output functionality (and a simple get() method for further illustrative purposes).
The rest of this example simply creates an instance and shows how it could be used.
Because my generic_form class is generic, we have to supply a list of field names which are to be filled in. The names are used as both the names of the fields for later access (see the get() method for an example). Personally I wouldn't do it this way, I'd provide a list of short field names and prompts similar to Marcelo's example. However, I wanted this particular example to be a short as possible to get the main point across.
(The comment at the end would be a call to a hypothetical "store()" function to store this for posterity, by the way).
This is the most minimal approach. However, you'd rapidly find that it's far more useful to have a richer class with validation for each field, and separate classes which format and output instances of that in different ways, and different classes for input. "teletype" input (as provided by the Python raw_input() built-in function) is the crudest form (primarily useful for simplicity and for the ability to process files using shell redirection). One could also support input with the GNU readline support (already included as a standard library in Python), curses support (also included), and one could imagine writing some HTML wrapper and CGI code for handling web-based input.
Coupling "raw_input()" and "print" into our class would mean more work if we ever needed or wanted to support any forms of input or output other than "dumb terminal."
If we create a class which only concerns itself with the data to be collected, then it could provide an interface for any other input class to get the list of the prompts with references to "setter" functions (and perhaps a "required" or "optional" flag). Then any instance of any input class could request the list of desired/required inputs for any form ... present the prompts, call the "setter" methods (which return a boolean to indicate if the data supplied was valid), loop over bad inputs on "required" fields, offer to skip "optional" fields, and so on.
Notice that the logic for displaying prompts, accepting responses, relaying those back to the data object via their setter methods, and handling invalid inputs and be the same for many types of forms. All we need is a way for the form to provide the list of prompts and their corresponding validation functions (and we need to ensure that all these validation functions have the same semantics --- taking the same parameters and so on).
Here's an example of separating the input behavior from the storage and validation of the data fields:
#!/usr/bin/env python
class generic_form:
def __init__(self, element_list):
self.hints = list()
for each in element_list:
self.hints.append((each, each, self.store))
self.contents= dict()
def store(self, key, data):
'''Called by client instances
'''
self.contents[key] = data
return True
def get_hints(self):
return self.hints
def get(self, item):
return self.contents[item]
def form_input(form):
for each, key, fn in form.get_hints():
while True:
if fn(key,raw_input(each)):
break
else:
keep_trying = raw_input("Try again:")
if keep_trying.lower() in ['n', 'no', 'naw']:
break
if __name__ == '__main__':
sample_fields = ("Given Name: ",
"Surname: ",
"Date of Birth: ",
"etc: ")
example = generic_form(sample_fields)
print "Fill in my form:"
form_input(example)
print
print "Please review your input:"
for i, x, x in example.get_hints():
print example.get(i),
In this case the extra complication is not doing anything useful. Our generic_form performs no validation. However, this same input function could be used with any data/form class that provided the same interface. That interface, in this example, only requires a get_hints() method providing tuples of "prompt string", storage key, and storage function references, and a store() method which must return "True" or "False" and take arguments for the key and data to be stored.
The fact that our storage key is passed to our input "client" as an opaque item that must be passed back through its calls to our store() method is a bit subtle; but it allows us to use any single validation function for multiple form elements ... all names can be any string, all dates must pass some call to time.strftime() or some third party parser ... and so on.
The main point is that I can create better forms classes which implement data validation methods as appropriate to the data being gathered and stored. The input example will work for our original dumb forms, but it will work better with forms that return meaningful results from our calls to store() (A better interface between forms and input handling might supply "error" and "help" prompts as well as the simple short "input" prompt we show here. A more complex system might pass "datum" objects through the get_hints() methods. That would require that the forms class instantiate such objects and store a list of them instead of the tuples I'm showing here).
Another benefit is that I can also write other input functions (or classes which implement such functions) that can also use this same interface to any form. Thus I could write some HTML rendering and CGI processing which could use all of the forms that had developed with no changes to my data validation semantics.
(In this example I'm using the get_hints() method as hints for my crude output function as well as my inputs. I'm only doing this to keep the example simple. In practice I'd want to separate input hinting from output handling).
If you are reading in several fields, you might want to do something like this:
field_defs = [
('name', 'Name'),
('dob' , 'Date of Birth'),
('sex' , 'Gender'),
#...
]
# Figure out the widest description.
maxlen = max(len(descr) for (name, descr) in field_defs)
fields = {}
for (name, descr) in field_defs:
# Pad to the widest description.
print '%-*s:' % (maxlen, descr),
fields[name] = raw_input()
# You should access the fields directly from the fields variable.
# But if you really want to access the fields as local variables...
locals().update(fields)
print name, dob, sex
"10 times... poor code readability"
Not really. You'll have to provide something more complex than that.
20 lines of code is hardly a problem. You can easily write more than 20 lines of code trying to save yourself from simply writing 20 lines of code.
You should, also, read the description of raw_input. http://docs.python.org/library/functions.html#raw_input
It writes a prompt. Your four lines of code is really
x = raw_input( '1' )
y = raw_input( '2' )
You can't simplify this much more.
I'm trying to write a freeze decorator for Python.
The idea is as follows :
(In response to the two comments)
I might be wrong but I think there is two main use of
test case.
One is the test-driven development :
Ideally , developers are writing case before writing the code.
It usually helps defining the architecture because this discipline
forces to define the real interfaces before development.
One may even consider that in some case the person who
dispatches job between dev is writing the test case and
use it to illustrate efficiently the specification he has in mind.
I don't have any experience of the use of test case like that.
The second is the idea that all project with a decent
size and a several programmers is suffering from broken code.
Something that use to work may get broken from a change
that looked like an innocent refactoring.
Though good architecture, loose couple between component may
help to fight against this phenomenon ; you will sleep better
at night if you have written some test case to make sure
that nothing will break your program's behavior.
HOWEVER,
Nobody can deny the overhead of writting test cases. In the
first case one may argue that test case is actually guiding
development and is therefore not to be considered as an overhead.
Frankly speaking, I'm a pretty young programmer and if I were
you, my word on this subject is not really valuable...
Anyway, I think that mosts company/projects are not working
like that, and that unit tests are mainly used in the second
case...
In other words, rather than ensuring that the program is
working correctly, it is aiming at checking that it will
work the same in the future.
This needs can be met without the cost of writing tests,
by using this freezing decorator.
Let's say you have a function
def pow(n,k):
if n == 0: return 1
else: return n * pow(n,k-1)
It is perfectly nice, and you want to rewrite it as an optimized version.
It is part of a big project. You want it to give back the same result
for a few value.
Rather than going through the pain of test cases, one could use some
kind of freeze decorator.
Something such that the first time the decorator is run,
the decorator run the function with the defined args (below 0, and 7)
and saves the result in a map ( f --> args --> result )
#freeze(2,0)
#freeze(1,3)
#freeze(3,5)
#freeze(0,0)
def pow(n,k):
if n == 0: return 1
else: return n * pow(n,k-1)
Next time the program is executed, the decorator will load this map and check
that the result of this function for these args as not changed.
I already wrote quickly the decorator (see below), but hurt a few problems about
which I need your advise...
from __future__ import with_statement
from collections import defaultdict
from types import GeneratorType
import cPickle
def __id_from_function(f):
return ".".join([f.__module__, f.__name__])
def generator_firsts(g, N=100):
try:
if N==0:
return []
else:
return [g.next()] + generator_firsts(g, N-1)
except StopIteration :
return []
def __post_process(v):
specialized_postprocess = [
(GeneratorType, generator_firsts),
(Exception, str),
]
try:
val_mro = v.__class__.mro()
for ( ancestor, specialized ) in specialized_postprocess:
if ancestor in val_mro:
return specialized(v)
raise ""
except:
print "Cannot accept this as a value"
return None
def __eval_function(f):
def aux(args, kargs):
try:
return ( True, __post_process( f(*args, **kargs) ) )
except Exception, e:
return ( False, __post_process(e) )
return aux
def __compare_behavior(f, past_records):
for (args, kargs, result) in past_records:
assert __eval_function(f)(args,kargs) == result
def __record_behavior(f, past_records, args, kargs):
registered_args = [ (a, k) for (a, k, r) in past_records ]
if (args, kargs) not in registered_args:
res = __eval_function(f)(args, kargs)
past_records.append( (args, kargs, res) )
def __open_frz():
try:
with open(".frz", "r") as __open_frz:
return cPickle.load(__open_frz)
except:
return defaultdict(list)
def __save_frz(past_records):
with open(".frz", "w") as __open_frz:
return cPickle.dump(past_records, __open_frz)
def freeze_behavior(*args, **kvargs):
def freeze_decorator(f):
past_records = __open_frz()
f_id = __id_from_function(f)
f_past_records = past_records[f_id]
__compare_behavior(f, f_past_records)
__record_behavior(f, f_past_records, args, kvargs)
__save_frz(past_records)
return f
return freeze_decorator
Dumping and Comparing of results is not trivial for all type. Right now I'm thinking about using a function (I call it postprocess here), to solve this problem.
Basically instead of storing res I store postprocess(res) and I compare postprocess(res1)==postprocess(res2), instead of comparing res1 res2.
It is important to let the user overload the predefined postprocess function.
My first question is :
Do you know a way to check if an object is dumpable or not?
Defining a key for the function decorated is a pain. In the following snippets
I am using the function module and its name.
** Can you think of a smarter way to do that. **
The snippets below is kind of working, but opens and close the file when testing and when recording. This is just a stupid prototype... but do you know a nice way to open the file, process the decorator for all function, close the file...
I intend to add some functionalities to this. For instance, add the possibity to define
an iterable to browse a set of argument, record arguments from real use, etc.
Why would you expect from such a decorator?
In general, would you use such a feature, knowing its limitation... Especially when trying to use it with POO?
"In general, would you use such a feature, knowing its limitation...?"
Frankly speaking -- never.
There are no circumstances under which I would "freeze" results of a function in this way.
The use case appears to be based on two wrong ideas: (1) that unit testing is either hard or complex or expensive; and (2) it could be simpler to write the code, "freeze" the results and somehow use the frozen results for refactoring. This isn't helpful. Indeed, the very real possibility of freezing wrong answers makes this a bad idea.
First, on "consistency vs. correctness". This is easier to preserve with a simple mapping than with a complex set of decorators.
Do this instead of writing a freeze decorator.
print "frozen_f=", dict( (i,f(i)) for i in range(100) )
The dictionary object that's created will work perfectly as a frozen result set. No decorator. No complexity to speak of.
Second, on "unit testing".
The point of a unit test is not to "freeze" some random results. The point of a unit test is to compare real results with results developed another (simpler, more obvious, poorly-performing way). Usually unit tests compare hand-developed results. Other times unit tests use obvious but horribly slow algorithms to produce a few key results.
The point of having test data around is not that it's a "frozen" result. The point of having test data is that it is an independent result. Done differently -- sometimes by different people -- that confirms that the function works.
Sorry. This appears to me to be a bad idea; it looks like it subverts the intent of unit testing.
"HOWEVER, Nobody can deny the overhead of writting test cases"
Actually, many folks would deny the "overhead". It isn't "overhead" in the sense of wasted time and effort. For some of us, unittests are essential. Without them, the code may work, but only by accident. With them, we have ample evidence that it actually works; and the specific cases for which it works.
Are you looking to implement invariants or post conditions?
You should specify the result explicitly, this wil remove most of you problems.