I'm a C programmer and I'm getting quite good with Python. But I still have some problems getting my mind around the OO awesomeness of Python.
Here is my current design problem:
The end "product" is a JSON data structure created in Python (and passed to Javascript code) containing different types of data like:
{ type:url, {urlpayloaddict) }
{ type:text, {textpayloaddict}
...
My Javascript knows how to parse and display each type of JSON response.
I'm happy with this design. My question comes from handling this data in the Python code.
I obtain my data from a variety of sources: MySQL, a table lookup, an API call to a web service...
Basically, should I make a super class responseElement and specialise it for each type of response, then pass around a list of these objects in the Python code OR should I simply pass around a list of dictionaries that contain the response data in key value pairs. The answer seems to result in significantly different implementations.
I'm a bit unsure if I'm getting too object happy ??
In my mind, it basically goes like this: you should try to keep things the same where they are the same, and separate them where they're different.
If you're performing the exact same operations on and with the data, and it can all be represented in a common format, then there's no reason to have separate objects for it - translate it into a common format ASAP and Don't Repeat Yourself when it comes to implementing things that don't distinguish.
If each type/source of data requires specialized operations specific to it, and there isn't much in the way of overlap between such at the layer your Python code is dealing with, then keep things in separate objects so that you maintain a tight association between the specialized code and the specific data on which it is able to operate.
Do the different response sources represent fundamentally different categories or classes of objects? They don't appear to, the way you've described it.
Thus, various encode/decode functions and passing around only one type seems the best solution for you.
That type can be a dict or your own class, if you have special methods to use on the data (but those methods would then not care what input and output encodings were), or you could put the encode/decode pairs into the class. (Decode would be a classmethod, returning a new instance.)
Your receiver objects (which can perfectly well be instances of different classes, perhaps generated by a Factory pattern depending on the source of incoming data) should all have a common method that returns the appropriate dict (or other directly-JSON'able structure, such as a list that will turn into a JSON array).
Differently from what one answer states, this approach clearly doesn't require higher level code to know what exact kind of receiver it's dealing with (polymorphism will handle that for you in any OO language!) -- nor does the higher level code need to know "names of keys" (as, again, that other answer peculiarly states), as it can perfectly well treat the "JSON'able data" as a pretty opaque data token (as long as it's suitable to be the argument for a json.dumps later call!).
Building up and passing around a container of "plain old data" objects (produced and added to the container in various ways) for eventual serialization (or other such uniform treatment, but you can see JSON translation as a specific form of serialization) is a common OO pattern. No need to carry around anything richer or heavier than such POD data, after all, and in Python using dicts as the PODs is often a perfectly natural implementation choice.
I've had success with the OOP approach. Consider a base class with a "ToJson" method and have each subclass implement it appropriately. Then your higher level code doesn't need to know any detail about how the data was obtained...it just knows it has to call "ToJson" on every object in the list you mentioned.
A dictionary would work too, but it requires your calling code to know names of keys, etc and won't scale as well.
OOP I say!
Personally, I opt for the latter (passing around a list of data) wherever and whenever possible. I think OO is often misused/abused for certain things. I specifically avoid things like wrapping data in an object just for the sake of wrapping it in an object. So this, {'type':'url', 'data':{some_other_dict}} is better to me than:
class DataObject:
def __init__(self):
self.type = 'url'
self.data = {some_other_dict}
But, if you need to add specific functionality to this data, like the ability for it to sort its data.keys() and return them as a set, then creating an object makes more sense.
Related
I currently have code that acquires and manipulates data from multiple sources using pandas DataFrames. The intent is for a user to create an instance of a class (call it dbase) which provides methods to do things like acquire and store data from API queries. I'm doing this by allowing the user to define their own functions to format values in dbase, but I've found that I tend to pass those user-defined functions through several other functions in ways that get confusing. I think this must be an obvious mistake to someone who knows what they're doing but I haven't come up with a better way to give the user control of the data.
The API queries are the worst example right now. Say I want to get a name from a server. Right now I do something like the following, in which the user-defined function for transforming the name gets passed across three other functions before it's called.
# file with code for api interaction
def submitter(this_query, dbase, name_mangler):
new_data = api.submit(this_query)
new_dbase_entry = name_mangler(new_data)
# in reality there is much more complicated data transformation here
dbase.update(new_dbase_entry)
def query_api(dbase, meta, name_mangler):
queries = make_query_strings(dbase, meta)
# using pandas.DataFrame.apply() here to avoid a for loop
queries.apply(lambda x: submitter(x, dbase))
# other file with class definition
from api_code import query_api
class dbase():
__init__():
self.df = pandas.DataFrame()
# data gets moved around between more than one data
# structure in this class, I'm just using a single
# dataframe as a minimal example
def get_remote_data(self, meta, name_mangler):
# in reality there is code here to handle multiple
# cases here rather than a trivial wrapper for another
# function
query_api(self, meta, name_mangler)
def update(self, new_data):
# do consistency checks
# possibly write new dbase entry
A user would then do something like this
import dbase
def custom_mangler(name):
# User determines how to store the name in dbase
# for instance this could take "Grace Hopper" to "hopper"
return(mangled_name)
my_dbase = dbase.dbase()
# meta defines what needs to be queried and how the remote
# data should get processed into dbase
meta = {stuff}
my_dbase.get_remote_data(meta, custom_mangler)
I find it very hard to follow my own code here because the definitions of functions can be widely separated from the first point at which they're called. How should I refactor to address this problem? (and does this approach violate accepted coding patterns for other reasons?)
It's a little hard to infer context from what you've posted, so take this with a grain of salt. The general concepts still apply. Also take a look at https://codereview.stackexchange.com/ as this question might be a better fit for that site.
Two things come to mind.
Try to give your functions/classes/variables better names
Think about orthogonality
Good Names
Consider how this looks from a users perspective. dbase is not a very descriptive name for either the module or the class. meta doesn't tell me at all what the dict should contain. mangler tells me that the string gets changed, but nothing about where the string comes from or how it should be changed.
Good names are hard, but it's worth spending time to make them thoughtful. It's always a trade off between being descriptive and overly verbose. If you can't think of a name that gives clear meaning without taking up too much space, then consider if your API is overly complex. Always consider names from the end users perspective as well as future programmers who will be reading/maintaining your code.
Orthogonality
Following the Unix mantra of "do one thing and do it well", sometimes an API is simpler and more flexible if we separate out different tasks to different functions rather than having one function that does it all.
When writing code, I think "what is the minimum this function needs to do to be useful".
In your example
my_dbase.get_remote_data(meta, custom_mangler)
get_remote_data not only fetches the data, but also processes it. That can be confusing as a user. There's a lot happening behind the scenes in this function that isn't obvious from the function name.
It might be more appropriate to have separate function calls for this. Let's assume that you're querying weather servers about temperature and rainfall.
london_weather_data = weatheraggrigator.WeatherAggrigator()
reports = london_weather_data.fetch_weather_reports(sources=[server_a, server_b])
london_weather_data.process_reports(reports, short_name_formatter)
Yes it's longer to type, but as a user it's a big improvement as I know what I'm getting.
Ultimately you need to decide where to split up tasks. The above may not make sense for your application.
I am writing an application which extracts some data from HTML using BeautifoulSoup4. These are search results of some kind, to be more specific. I thought it would be a good a idea to have a Parser class, storing default values like URL prefixes, request headers etc. After configuring those parameters, the public method would return a list of objects, each of them containing a single result or maybe even an object with a list composed into it alongside with some other parameters. I'm struggling to decouple small pieces of logic that build that parser implementation from the parser class itself. I want to write dozens of parser private utility methods like: _is_next_page_available, _are_there_any_results, _is_did_you_mean_available etc. However, these are the perfect candidates for writing unit tests! And since I want to make them private, I have a feeling that I'm missing something...
My other idea was to write that parser as a function, calling bunch of other utility functions, but that would be just equal to making all of those methods public, which doesn't make sense, since they're implementation details.
Could you please advice me how to design this properly?
I think you're interpreting the Single-Responsibility Principle (SRP) a little differently. It's actual meaning is a little off from 'a class should do only one thing'. It actually states that a class should have one and only one reason to change.
To employ the SRP you have to ask yourself to what/who would your parser module methods be responsible, what/who might make them change. If the answer for each method is the same, then your Parser class employs the SRP correctly. If there are methods that are responsible to different things (business-rule givers, groups of users etc.) then those methods should be taken out and be placed elsewhere.
Your overall objective with the SRP is to protect your class from changes coming from different directions.
Newbie Python question here - I am writing a little utility in Python to do disk space calculations when given the attributes of 2 different files.
Should I create a 'file' class with methods appropriate to the conversion and then create each file as an instance of that class? I'm pretty new to Python, but ok with Perl, and I believe that in Perl (I may be wrong, being self-taught), from the examples that I have seen, that most Perl is not OO.
Background info - These are IBM z/OS (mainframe) data sets, and when given the allocation attributes for a file on a specific disk type and file organisation (it's block size) and then given the allocation parameters for a different disk type & organisation, the space requirements can vary enormously.
Definition nitpicking preface: Everything in Python is technically an object, even functions and numbers. I'm going to assume you mean classes vs. functions in your question.
Actually I think one of the great things about Python is that it doesn't embrace classes for absolutely everything as some other languages (e.g., Java and C#).
It's perfectly acceptable in Python (and the built-in modules do this a lot) to define module level functions rather than encapsulating all logic in objects.
That said, classes do have their place, for example when you perform multiple actions on a single piece of data, and especially when these actions change the data and you want to keep its state encapsulated.
For Your Question and you requirements ..a short answer is "No"
The use of objects is not in itself "object oriented". Functional programming uses objects, too, just not in the same way. Python can be used in a very FP way, even though Python uses objects heavily behind the scenes.
Overuse of primitives can be a problem, but it's impossible to say whether that applies to your case without more data.
I think of OO as an interface design approach: If you are creating tools that are straightforward to interact with (and substitutable) as objects with predictable methods, then by all means, create objects. But if the interactions are straightforward to describe with module-level functions, then don't try too hard to engineer your code into classes.
First and foremost - Pythonic is a term that needs to disappear, preferably with everyone who uses it. It doesn't mean anything and it's used by people who can't use reason to justify anything, so they need a mandatory term to justify their nonsense.
But to the point you never HAVE to use object oriented concepts in your software development, as everything OOP can as easily be written with functions and solid spaghetti stringers. But the question is - do use of objects makes sense in my solution?
To understand when and how to use it, you have to ask what exactly is object oriented programming. And this was already very well explained in very old, but also free, book called Thinking in java which I consider to be the 101 bible of thinking on OO terms. I strongly urge you to grab a free copy and read couple of first chapters.
Because if you don't understand the object oriented approach, how can you apply it properly? When you do - then when to use it, or not use it, becomes clear, because you can clearly translate real life items and interactions into abstract objects. And this is the guideline - when the translation of given action, item or data to OOP model is straightforward and logical - then you should do it.
I am quite new to python programming (C/C++ background).
I'm writing code where I need to use complex data structures like dictionaries of dictionaries of lists.
The issue is that when I must use these objects I barely remember their structure and so how to access them.
This makes it difficult to resume working on code that was untouched for days.
A very poor solution is to use comments for each variable, but that's very inflexible.
So, given that python variables are just pointers to memory and they cannot be statically type-declared, is there any convention or rule that I could follow to ease complex data structures usage?
If you use docstrings in your classes then you can use help(vargoeshere) to see how to use it.
Whatever you do, do NOT, I repeat, do NOT use Hungarian Notation! It causes severe brain & bit rot.
So, what can you do? Python and C/C++ are quite different. In C++ you typically handle polymorphic calls like so:
void doWithFooThing(FooThing *foo) {
foo->bar();
}
Dynamic polymorphism in C++ depends on inheritance: the pointer passed to doWithFooThing may point only to instances of FooThing or one of its subclasses. Not so in Python:
def do_with_fooish(fooish):
fooish.bar()
Here, any sufficiently fooish thing (i.e. everything that has a callable bar attribute) can be used, no matter how it is releated to any other fooish thing through inheritance.
The point here is, in C++ you know what (base-)type every object has, whereas in Python you don't, and you don't care. What you try to achieve in Python is code that is reusable in as many situations as possible without having to force everthing under the rigid rule of class inheritance. Your naming should also reflect that. You dont write:
def some_action(a_list):
...
but:
def some_action(seq):
...
where seq might be not only a list, but any iterable sequence, be it list, tuple, dict, set, iterator, whatever.
In general, you put emphasis on the intent of your code, instead of its the type structure. Instead of writing:
dict_of_strings_to_dates = {}
you write:
users_birthdays = {}
It also helps to keep functions short, even more so than in C/C++. Then you'll be easily able to see what's going on.
Another thing: you shouldn't think of Python variables as pointers to memory. They're in fact dicionary entries:
assert foo.bar == getattr(foo, 'bar') == foo.__dict__['bar']
Not always exactly so, I concur, but the details can be looked up at docs.python.org.
And, BTW, in Python you don't declare stuff like you do in C/C++. You just define stuff.
I believe you should take a good look some of your complex structures, what you are doing with them, and ask... Is This Pythonic? Ask here on SO. I think you will find some cases where the complexity is an artifact of C/C++.
Include an example somewhere in your code, or in your tests.
I was recently going over a coding problem I was having and someone looking at the code said that subclassing list was bad (my problem was unrelated to that class). He said that you shouldn't do it and that it came with a bunch of bad side effects. Is this true?
I'm asking if list is generally bad to subclass and if so, what are the reasons. Alternately, what should I consider before subclassing list in Python?
The abstract base classes provided in the collections module, particularly MutableSequence, can be useful when implementing list-like classes. These are available in Python 2.6 and later.
With ABCs you can implement the "core" functionality of your class and it will provide the methods which logically depend on what you've defined.
For example, implementing __getitem__ in a collections.Sequence-derived class will be enough to provide your class with __contains__, __iter__, and other methods.
You may still want to use a contained list object to do the heavy lifting.
There are no benefits to subclassing list. None of the methods will use any methods you override, so you can have unexpected bugs. Further, it's very often confusing doing things like self.append instead of self.foos.append or especially self[4] rather than self.foos[4] to access your data. You can make something that works exactly like a list or (better) howevermuch like a list you really want while just subclassing object.
I think the first question I'd ask myself is, "Is my new object really a list?". Does it walk like a list, talk like a list? Or is is something else?
If it is a list, then all the standard list methods should all make sense.
If the standard list methods don't make sense, then your object should contain a list, not be a list.
In old python (2.2?) sub-classing list was a bad idea for various technical reasons, but in a modern python it is fine.
Nick is correct.
Also, while I can't speak to Python, in other OO languages (Java, Smalltalk) subclassing a list is a bad idea. Inheritance in general should be avoided and delegation-composition used instead.
Rather, you make a container class and delegate calls to the list. The container class has a reference to the list and you can even expose the calls and returns of the list in your own methods.
This adds flexibility and allows you to change the implementation (a different list type or data structure) later w/o breaking any code. If you want your list to do different listy-type things then your container can do this and use the plain list as a simple data structure.
Imagine if you had 47 different uses of lists. Do you really want to maintain 47 different subclasses?
Instead you could do this via the container and interfaces. One class to maintain and allow people to call your new and improved methods via the interface(s) with the implementation remaining hidden.