Reading a series of input / output in Python - python

For my app, I need to print out a series of outputs and then accepts inputs from the user. What would be the best way of doing this?
Like:
print '1'
x = raw_input()
print '2'
y = raw_input()
Something like this, but it would go on for at least 10 times. My only concern with doing the above is that it would make up for poor code readability.
How should I do it? Should I create a function like this:
def printOut(string):
print string
Or is there a better way?

First one note: raw_input() takes an optional argument ... a prompt string.
Regarding the broader question, a simplistic approach would be to create a class which defines the elements of your form and provides the functions for their input, validation, and later manipulations or output.
With such a class instances can be created (instantiated), and collected, stored, etc.
Such an approach need not any more complicated than something like:
#!/usr/bin/python
# I use /usr/bin/env python; but making SO's syntax highlighter happy.
class generic_form:
def __init__(self, element_list):
self.form_elements = element_list
self.contents= dict()
def fill_it_in(self):
for prompt in self.form_elements:
self.contents[prompt] = raw_input(prompt)
def get(self, item):
return self.contents[item]
def print_it(self):
for each in self.form_elements:
print each, self.contents[each]
if __name__ == '__main__':
sample_fields = ("Given Name: ",
"Surname: ",
"Date of Birth: ",
"Notes: ")
example = generic_form(sample_fields)
print "Fill in my form:"
example.fill_it_in()
print
print "Please review your input:"
example.print_it()
# store(:%s, %s: %s" % (example.get('Surname: '), \
# example.get('Given Name: '), example.get('Notes: '))
The main code is only a dozen lines long to define a generic form class with input
and output functionality (and a simple get() method for further illustrative purposes).
The rest of this example simply creates an instance and shows how it could be used.
Because my generic_form class is generic, we have to supply a list of field names which are to be filled in. The names are used as both the names of the fields for later access (see the get() method for an example). Personally I wouldn't do it this way, I'd provide a list of short field names and prompts similar to Marcelo's example. However, I wanted this particular example to be a short as possible to get the main point across.
(The comment at the end would be a call to a hypothetical "store()" function to store this for posterity, by the way).
This is the most minimal approach. However, you'd rapidly find that it's far more useful to have a richer class with validation for each field, and separate classes which format and output instances of that in different ways, and different classes for input. "teletype" input (as provided by the Python raw_input() built-in function) is the crudest form (primarily useful for simplicity and for the ability to process files using shell redirection). One could also support input with the GNU readline support (already included as a standard library in Python), curses support (also included), and one could imagine writing some HTML wrapper and CGI code for handling web-based input.
Coupling "raw_input()" and "print" into our class would mean more work if we ever needed or wanted to support any forms of input or output other than "dumb terminal."
If we create a class which only concerns itself with the data to be collected, then it could provide an interface for any other input class to get the list of the prompts with references to "setter" functions (and perhaps a "required" or "optional" flag). Then any instance of any input class could request the list of desired/required inputs for any form ... present the prompts, call the "setter" methods (which return a boolean to indicate if the data supplied was valid), loop over bad inputs on "required" fields, offer to skip "optional" fields, and so on.
Notice that the logic for displaying prompts, accepting responses, relaying those back to the data object via their setter methods, and handling invalid inputs and be the same for many types of forms. All we need is a way for the form to provide the list of prompts and their corresponding validation functions (and we need to ensure that all these validation functions have the same semantics --- taking the same parameters and so on).
Here's an example of separating the input behavior from the storage and validation of the data fields:
#!/usr/bin/env python
class generic_form:
def __init__(self, element_list):
self.hints = list()
for each in element_list:
self.hints.append((each, each, self.store))
self.contents= dict()
def store(self, key, data):
'''Called by client instances
'''
self.contents[key] = data
return True
def get_hints(self):
return self.hints
def get(self, item):
return self.contents[item]
def form_input(form):
for each, key, fn in form.get_hints():
while True:
if fn(key,raw_input(each)):
break
else:
keep_trying = raw_input("Try again:")
if keep_trying.lower() in ['n', 'no', 'naw']:
break
if __name__ == '__main__':
sample_fields = ("Given Name: ",
"Surname: ",
"Date of Birth: ",
"etc: ")
example = generic_form(sample_fields)
print "Fill in my form:"
form_input(example)
print
print "Please review your input:"
for i, x, x in example.get_hints():
print example.get(i),
In this case the extra complication is not doing anything useful. Our generic_form performs no validation. However, this same input function could be used with any data/form class that provided the same interface. That interface, in this example, only requires a get_hints() method providing tuples of "prompt string", storage key, and storage function references, and a store() method which must return "True" or "False" and take arguments for the key and data to be stored.
The fact that our storage key is passed to our input "client" as an opaque item that must be passed back through its calls to our store() method is a bit subtle; but it allows us to use any single validation function for multiple form elements ... all names can be any string, all dates must pass some call to time.strftime() or some third party parser ... and so on.
The main point is that I can create better forms classes which implement data validation methods as appropriate to the data being gathered and stored. The input example will work for our original dumb forms, but it will work better with forms that return meaningful results from our calls to store() (A better interface between forms and input handling might supply "error" and "help" prompts as well as the simple short "input" prompt we show here. A more complex system might pass "datum" objects through the get_hints() methods. That would require that the forms class instantiate such objects and store a list of them instead of the tuples I'm showing here).
Another benefit is that I can also write other input functions (or classes which implement such functions) that can also use this same interface to any form. Thus I could write some HTML rendering and CGI processing which could use all of the forms that had developed with no changes to my data validation semantics.
(In this example I'm using the get_hints() method as hints for my crude output function as well as my inputs. I'm only doing this to keep the example simple. In practice I'd want to separate input hinting from output handling).

If you are reading in several fields, you might want to do something like this:
field_defs = [
('name', 'Name'),
('dob' , 'Date of Birth'),
('sex' , 'Gender'),
#...
]
# Figure out the widest description.
maxlen = max(len(descr) for (name, descr) in field_defs)
fields = {}
for (name, descr) in field_defs:
# Pad to the widest description.
print '%-*s:' % (maxlen, descr),
fields[name] = raw_input()
# You should access the fields directly from the fields variable.
# But if you really want to access the fields as local variables...
locals().update(fields)
print name, dob, sex

"10 times... poor code readability"
Not really. You'll have to provide something more complex than that.
20 lines of code is hardly a problem. You can easily write more than 20 lines of code trying to save yourself from simply writing 20 lines of code.
You should, also, read the description of raw_input. http://docs.python.org/library/functions.html#raw_input
It writes a prompt. Your four lines of code is really
x = raw_input( '1' )
y = raw_input( '2' )
You can't simplify this much more.

Related

Is it appropriate to use a class for the purpose of organizing functions that share inputs?

To provide a bit of context, I am building a risk model that pulls data from various different sources. Initially I wrote the model as a single function that when executed read in the different data sources as pandas.DataFrame objects and used those objects when necessary. As the model grew in complexity, it quickly became unreadable and I found myself copy an pasting blocks of code often.
To cleanup the code I decided to make a class that when initialized reads, cleans and parses the data. Initialization takes about a minute to run and builds my model in its entirety.
The class also has some additional functionality. There is a generate_email method that sends an email with details about high risk factors and another method append_history that point-in-times the risk model and saves it so I can run time comparisons.
The thing about these two additional methods is that I cannot imagine a scenario where I would call them without first re-calibrating my risk model. So I have considered calling them in init() like my other methods. I haven't only because I am trying to justify having a class in the first place.
I am consulting this community because my project structure feels clunky and awkward. I am inclined to believe that I should not be using a class at all. Is it frowned upon to create classes merely for the purpose of organization? Also, is it bad practice to call instance methods (that take upwards of a minute to run) within init()?
Ultimately, I am looking for reassurance or a better code structure. Any help would be greatly appreciated.
Here is some pseudo code showing my project structure:
class RiskModel:
def __init__(self, data_path_a, data_path_b):
self.data_path_a = data_path_a
self.data_path_b = data_path_b
self.historical_data = None
self.raw_data = None
self.lookup_table = None
self._read_in_data()
self.risk_breakdown = None
self._generate_risk_breakdown()
self.risk_summary = None
self.generate_risk_summary()
def _read_in_data(self):
# read in a .csv
self.historical_data = pd.read_csv(self.data_path_a)
# read an excel file containing many sheets into an ordered dictionary
self.raw_data = pd.read_excel(self.data_path_b, sheet_name=None)
# store a specific sheet from the excel file that is used by most of
# my class's methods
self.lookup_table = self.raw_data["Lookup"]
def _generate_risk_breakdown(self):
'''
A function that creates a DataFrame from self.historical_data,
self.raw_data, and self.lookup_table and stores it in
self.risk_breakdown
'''
self.risk_breakdown = some_dataframe
def _generate_risk_summary(self):
'''
A function that creates a DataFrame from self.lookup_table and
self.risk_breakdown and stores it in self.risk_summary
'''
self.risk_summary = some_dataframe
def generate_email(self, recipient):
'''
A function that sends an email with details about high risk factors
'''
if __name__ == "__main__":
risk_model = RiskModel(data_path_a, data_path_b)
risk_model.generate_email(recipient#generic.com)
In my opinion it is a good way to organize your project, especially since you mentioned the high rate of re-usability of parts of the code.
One thing though, I wouldn't put the _read_in_data, _generate_risk_breakdown and _generate_risk_summary methods inside __init__, but instead let the user call this methods after initializing the RiskModel class instance.
This way the user would be able to read in data from a different path or only to generate the risk breakdown or summary, without reading in the data once again.
Something like this:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
my_risk_model.generate_risk_breakdown(parameters)
my_risk_model.generate_risk_summary(other_parameters)
If there is an issue of user calling these methods in an order which would break the logical chain, you could throw an exception if generate_risk_breakdown or generate_risk_summary are called before read_in_data. Of course you could only move the generate... methods out, leaving the data import inside __init__.
To advocate more on exposing the generate... methods out of __init__, consider a case scenario, where you would like to generate multiple risk summaries, changing various parameters. It would make sense, not to create the RiskModel every time and read the same data, but instead change the input to generate_risk_summary method:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
for parameter in [50, 60, 80]:
my_risk_model.generate_risk_summary(parameter)
my_risk_model.generate_email('test#gmail.com')

Is it more efficient to use function args after branching (Python)?

I have a function that takes several arguments, one of which is a contact number. The data provided to the function is used to generate documents, and if one option is selected, that document is immediately returned inline, where the other option takes the contact number and generates an email. In the original version of this function, the contact number was immediately parsed at the start of the function, but I moved it into the else block as that is where the email is actually generated that uses that contact number and I saw no reason to create a new variable if it was not used half of the time. An example of this is below, and is built in Python using the Django framework:
def function(request, object, number=None):
obj = ObjectItem.objects.get(id=object)
# Originally number processed here
if request.method == 'POST':
if 'inline' in request.POST:
data = {
'object': obj,
}
return generate_document(data, inline=True)
else:
if number:
contact = '{}'.format(number)
else:
contact = obj.contact
data = {
'object': obj,
}
document = generate_document(data, inline=False)
return message(document, contact)
else:
return redirect()
While looking at my code, I realize that I could move the data dict creation outside of the processing for the inline vs no inline in the POST, but I do not know if moving the processing of the number argument into the else block in that processing actually saves any time or is the more standard way of doing things. I know that as Python is a scripting language, there is not any kind of optimizations that would be performed automatically like they would rearranging that kind of declaration in a compiled language, so I am looking for the most efficient way of doing this.
From a performance perspective, it makes no difference whether you create data above the if or in the if. Python will only hit the line once and the dict will only be created once. But you should move it above the if for design reasons.
First, don't repeat yourself - if you can reasonably implement a bit of code in one place, don't sprinkle it around your code. Suppose you decide a defaultdict is better later, you only have to change it in one place.
Second, placement implies intent. If you put it above your if you've made a statement that you plan to use that data structure everywhere. In your current code, readers will ask the same question you do... why wasn't that above the if? Its kinda trivial but the reading of the code shouldn't raise more questions.

designing my ticket api

I have a function named getTicket which take two argument id which is a number and format (string)
def getTicket(id, format):
if format == "pdf":
getTicketPDF(id) #some specialized pdf method gets called
elif format == "json":
getTicketJSON(id) #specialized json method
Now if I have to support some new format like "html" then I can create another elif for html.
But I want to generalize this code so that if in future n new method gets added I do not have to change my code
How can I design my getTicket api?
You can create a dictionary that stores the format to function mapping , like "pdf" mapping to function getTicketPDF , etc. And then in your getTicket() function you call the dictionary's value for format and call it by passing id parameter to it. Example -
funcdict = {"pdf":getTicketPDF
"json":getTicketJSON}
def getTicket(id, format):
try:
funcdict[format](id)
except KeyError:
#Handle case where format is not found in dictionary
If later you decide to add a new function for a new format, you just need to add a new mapping to the dictionary.
Your use case calls for a Strategy Pattern Implementation(PDF/JSON/HTML ticket generation strategies) which uses a Factory Pattern to obtain the correct strategy implementation class.
Here are the high-level steps -
Separate the functionality of ticket generation into a class TicketGenerator. Let this be an interface. It will have a single abstract method generateTicket()
Use a TicketGeneratorFactory to get the correct TicketGenerator instance based on the type of ticket i.e. an instance of PDFTicketGenerator, JSONTicketGenerator, HTMLTicketGenerator and so on... Each of these implemention classes have a generateTicket() implementation as per the type i.e. PDF/JSON/HTML.
This instance should be assigned to the base TicketGenerator Type.
TicketGenerator.generateTicket() would then give you the ticket in the desired format - PDF/JSON/HTML.

A more pythonic way to build a class based on a string (how not to use eval)

OK.
So I've got a database where I want to store references to other Python objects (right now I'm using to store inventory information for person stores of beer recipe ingredients).
Since there are about 15-20 different categories of ingredients (all represented by individual SQLObjects) I don't want to do a bunch of RelatedJoin columns since, well, I'm lazy, and it seems like it's not the "best" or "pythonic" solution as it is.
So right now I'm doing this:
class Inventory(SQLObject):
inventory_item_id = IntCol(default=0)
amount = DecimalCol(size=6, precision=2, default=0)
amount_units = IntCol(default=Measure.GM)
purchased_on = DateCol(default=datetime.now())
purchased_from = UnicodeCol(default=None, length=256)
price = CurrencyCol(default=0)
notes = UnicodeCol(default=None)
inventory_type = UnicodeCol(default=None)
def _get_name(self):
return eval(self.inventory_type).get(self.inventory_item_id).name
def _set_inventory_item_id(self, value):
self.inventory_type = value.__class__.__name__
self._SO_set_inventory_item_id(value.id)
Please note the ICKY eval() in the _get_name() method.
How would I go about calling the SQLObject class referenced by the string I'm getting from __class__.__name__ without using eval()? Or is this an appropriate place to utilize eval()? (I'm sort of of the mindset where it's never appropriate to use eval() -- however since the system never uses any end user input in the eval() it seems "safe".)
To get the value of a global by name; Use:
globals()[self.inventory_type]

Dictionary or If statements, Jython

I am writing a script at the moment that will grab certain information from HTML using dom4j.
Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:
if type == 'extractTitle':
extractTitle(dom)
if type == 'extractMetaTags':
extractMetaTags(dom)
I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:
{
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}[type](dom)
I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?
Update: #Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.
handle_extractTag(self, dom, anotherObject)
# Do something
How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)
Cheers
To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg
class MyHandler(object):
def handle_extractTitle(self, dom):
# do something
def handle_extractMetaTags(self, dom):
# do something
def handle(self, type, dom):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(dom)
Usage:
handler = MyHandler()
handler.handle('extractTitle', dom)
Update:
When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:
def handle(self, type, *args, **kwargs):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(*args, **kwargs)
With your code you're running your functions all get called.
handlers = {
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}
handlers[type](dom)
Would work like your original if code.
It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.
However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.
Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:
switch_dict = {'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags}
switch_dict[type](dom)
And that way is facter and more extensible if you have a large (or variable) number of items.
The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.
I suggest that you actually have polymorphic objects that do extractions from the DOM.
It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.
class ExtractTitle( object ):
def process( dom ):
return something
class ExtractMetaTags( object ):
def process( dom ):
return something
Instead of setting type="extractTitle", you'd do this.
type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )
Then, you wouldn't be building this particular dictionary or if-statement.

Categories

Resources