Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
There are many posts where I got to use the case of __init__.py and use to define global variable is one of them.
But I want to know is there any demerit in defining a local variable in __init__.py ?
There is no technical reason which prevents using __init__.py for declaring global variables.
Nevertheless, the mean of __init__.py modules is to define the import structure of your application. From Python documentation:
The __init__.py files are required to make Python treat the directories as containing packages; this is done to prevent directories with a common name, such as string, from unintentionally hiding valid modules that occur later on the module search path. In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable, described later.
As a developer, I expect to find in a __init__.py file the import layout of the application itself. It's the last place I look for when I'm exploring the source code.
Therefore, hiding your implementation in an __init__.py is misleading. This is especially true with global variable which lifecycle is by definition hard to follow.
It's considered a bad practice because breaks the code readability which is one of Python's guidelines.
import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
...
Readability counts.
...
Using global variables has its problems in general:
Passing variables as parameters to functions e.g. more flexible and readable than having functions access globals.
In case of a network of connected objects, it's usually more flexible for objects to have members referring to other objects then for them to access other objects using globals.
Having a module export classes rather than instances allows you to have multiple instances of a class rather than one.
Having said that, it's my experience that in programming there are no dogma's. A well known lemma in algorithm design is that from each resource (thing in your program) there maybe zero, one or infinitely many. If you typically can have only one instance of an object and it doesn't change identity, then exporting an instance from a module (so defining it in the modules __init__.py) is fine.
It's just that at the start of design you will sometimes assume that a certain resource is unique, but later on it turns out that you'll have multiple.
A typical application of exporting an variable rather than a type is when it's in fact a constant. A good example of this is math.pi, that doesn't tend to change very often...
Note that since in Python everything is an object, the words 'variable' and 'instance' are used interchangeably here. In addition to that, functions and classes are (usually singleton) objects in their own right in Python.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Introduction
So I've been doing a bit of research regarding the underscore character (_). I know most of its use cases and its semantics in them so I'll just drop them below as a recap and I'll conclude with the question, which is more of a conceptual one regarding 2 of the use cases.
Use cases
To store the last evaluation in an interpreter (outside the interpreter it has no special semantics and it's undefined as a single character)
In internationalization context, where you'd import functions such as gettext aliased as _
In decimal grouping to aid visibility (specifically doing groups of 3 such as 1_000_000) - Note that this is only available from Python 3.6 onward.
Example:
1_000_000 == 10**6 # equals True
x = 1_000
print(x) # yields 1000
To "ignore" certain values, although I'd not call it "ignore" since those values are still evaluated and bound to _ just as though it were a regular identifier. Normally I'd find a better design than this since I find this a code smell. I have rarely used such an approach in many years so I guess whenever you think you need to use it, you can surely improve your design to not use it.
Example:
for _ in range(10):
# do stuff without _
ping_a_server()
# outside loop that still exists and it still has the last evaluated value
print(_) # would yield 9
Trailing an identifier (used by convention to avoid name clashes with a built-in identifier or reserved word):
Example
class Automobile:
def __init__(self, type_='luxury', class_='executive'):
self.car_type = type_
self.car_class = class_
noob_car = Automobile(type_='regular', class_='supermini')
luxury = Automobile()
As access modifiers but only as a convetion since Python doesn't have access modifiers in the true sense:
Single leading underscore
Acts as a weak "internal use" indicator. All identifiers starting with _ will be ignored by star imports (from M import *)
Example:
a.py
_bogus = "I'm a bogus variable"
__bogus = "I'm a bogus with 2 underscores"
___triple = 3.1415
class_ = 'Plants'
type_ = 'Red'
regular_identifier = (x for x in range(10))
b.py
from a import *
print(locals()) # will yield all but the ones starting with _
Important conceptual observation
I hate it when people call this private (excluding the fact that
nothing's actually private in Python).
If we were to put this in an analogy this would be equivalent to Java's protected, since in Java protected means "derived classes and/or within same package". So since at the module level any identifier with a leading underscore _ has a different semantics than a regular identifier (and I'm talking about semantics from Python's perspective not our perspective where CONSTANTS and global_variable mean different things but to Python they're the same thing) and is ignored by the import machinery when talking about start imports, this really is an indicator that you should use those identifiers within that module or within the classes they're defined in or their derived subclasses.
Double leading underscores
Without going into much details, this invokes a name mangling mechanism
when used on identifiers inside a class, which makes it harder, but
again, not impossible for folks to access an attribute from classes that
subclass that base class.
Question
So I've been reading this book dedicated to beginners and at the variables section the author said something like:
Variables can begin with an underscore _ although we generally avoid doing this unless we write library code for others to use.
Which got me thinking... does it make sense to mark stuff as non-public in a private project or even in an open source project that's not used as a dependency in other projects?
For instance, I have an open source web app, that I push changes to on a regular basis. Its mostly there for educational purposes and because I want to write clean, standardized code and put in practice any new skills I acquire along the way. Now I'm thinking of the aformentioned: does it make sense to use identifiers that mark things non-public?
For the sake of argument let us say that in the future 500 people are actively contributing to that web app and it becomes very active code-wise. We can assume a large number of folks will use those "protected" and "private" identifiers directly (even when that's advised against, not all 500 of them would know the best practices) but since it's a non-library project, meaning it's not a dependency in other projects or that other people use, they can be "somewhat" reassured in a sense that those methods won't disappear in a code refactor, as the developer doing the refactor is likely going to notice all callers across the project and will refactor accordingly (or he won't notice but tests will tell him).
Evidently it makes sense in a library code, since all the people depending on your library and all the possible future people depending on your library or people that indirectly are dependant on your library (other folks embed your library into theirs and expose their library and so on) should be aware that identifiers that have a single trailing underscore or double trailing underscores are an implementation detail and can change at anytime. Therefore they should use your public API at all times.
What if no people will work on this project and I'll make it private and I'll be the only one working on it? Or a small subset of people. Does it make sense to use access modifier indicators in these kind of projects?
Your point of view seems to be based on the assumption that what is private or public ( or the equivalent suggestions in python ) is based on the developer who read and write code. That is wrong.
Even if you write all alone a single application that you only will use, if it is correctly designed it will be divided into modules, and those modules will expose an interface and have a private implementation.
The fact that you write both the module and the code that uses it doesn't mean that there are not parts which should be private to keep encapsulation. So yes, having the leading underscore to mark parts of a module as private make sense regardless of the number of developer working on the project or depending by it.
EDIT:
What we are really discussing about is encapsulation, which is the concept, general to software engineering for python and any other language.
The idea is that you divide your whole application into pieces (the modules I was talking about, which might be python packages but might also be something else in your design) and decide which of the several functionality needed to perform your goal is implemented there (this is called the Single Responsibility Principle).
It is a good approach to design by contract, which means to decide the abstraction your module is going to expose to other parts of the software and to hide everything which is not part of it as an implementation detail. Other modules should not rely on your implementation by only on your exposed functionalities so that you are free to change whenever you want to increase performance, favor new functionalities, improve maintainability or any other reason.
Now, all this theoretical rant is language agnostic and application agnostic, which means that every time you design a piece of software you have to rely on the facilities offered by the language to design your modules and to build encapsulation.
Python, one of the few if not the only one, as far as I know, has made the deliberate choice (bad, in my opinion), to not enforce encapsulation but to allow developers to have access to everything, as you have already discovered.
This, however, does not mean that the aforementioned concepts do not apply, but just that they cannot be enforced at the language level and must be implemented in a more loose way, as a simple suggestion.
Does it mean it is a bad idea to encapsulate implementation and freely use each bit of information available? Obviously not, it is still a good SOLID principle upon which build an architecture.
Nothing of all this is really necessary, they are just good principles that with time and experience have been proven to create good quality software.
Should you use in your small application that nobody else uses it? If you want things done as they should be, yes, you should.
Are they necessary? Well, you can make it work without, but you might discover that it will cost you more effort later down the way.
What if I'm not writing a library but a finished application? Well, does that mean that you shouldn't write it in a good, clean, tidy way?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Using Python 2.7, I have a large class object, in a single file, I want to break into multiple files by groups or related functions. For example, I have "class API(object)", and I would like to have separate Python files for all user methods (add user, update user, delete user) and a separate Python file for all order methods (create order, update order, delete order). But this separation should not be known to the end-user, such as:
z = test.api() __init__.py
z.adduser(jeff) user.py
z.createOrder(65, "jeff") orders.py
z.showOpenOrders("jeff") orders.py
z.completeOrder() orders.py
z.emailUser("Jeff") email.py
I have been searching for "extending python class", but I don't believe I am searching using the right term. Please help.
I would instead create specialized classes (Users, Orders) where instances are created in API.__init__ (if necessary they could hold a reference to the API instance). The specialized instances can then be retrieved through member attributes or properties of the API instance.
Calls to them would then look like:
z = test.api()
z.users.add(jeff)
z.orders.create(65, "jeff")
z.orders.showOpen("jeff")
and so on.
First, I'd recommend against this approach. It's typical in python to keep the definition of class in one file. Another answer discusses options for splitting your API across multiple classes, and if that works for your application that may be a great option. However, what you propose is possible in Python and so I'll describe how to do it.
In a.py:
class A:
# some methods go here
in b.py:
import a
def extra_a_method(self, arg1):
# method body
a.A.extra_a_method = extra_a_method
del extra_a_method
So, we create a function and add it into the A class. we had to create the function in some scope, and so to keep that scope clean, we delete the function from that scope.
You cannot do def A.new_a_method for syntactic reasons.
I know this will mostly work in Python3, but haven't analyzed it for Python2.
There are some catches:
b must be imported before the method appears in A; this is a big deal
If A has a nontrivial metaclass, the handling of the extra methods will be different from the handling of the methods in the original class. As an example, I think SQLAlchemy would get this sort of addition right, but other frameworks might not.
There's an alternative that can work sometimes if you know all the things that A should contain from the beginning
In a.py:
import a_extra_1, a_extra_2
class A(a_extra_1.AExtra1, a_extra_2.AExtra2):
# methods can also go here
Here, we're treating the additional methods as mixins to A. Generally, you'd want to have the mixins added after any classes for inheritance.
This approach also has some drawbacks:
If you are not careful it can lead to circular imports
You have to know all the methods you need for A ahead of time
All the code gets imported at once (this is a feature too)
There can also be issues with nontrivial metaclasses in this approach, although if the framework supports inheritance, this is likely to work.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
So, I'm developing a simple web-scraper in Python right now, but I had a question on how to structure my code. In other programming languages (especially compiled languages like C++ and C#), I've been in the habit of wrapping all of my functions in classes. I.e. in my web-scraping example I would have a class called something like "WebScraper" maybe and then hold all of the functions within that class. I might even go so far as to create a second helper class like "WebScraperManager" if I needed to instantiate multiple instances of the original "WebScraper" class.
This leads me to my current question, though. Would similiar logic hold in the current example? Or would I simply define a WebScraper.py file, without a wrapper class inside that file, and then just import the functions as I needed them into some main.py file?
The difference between a class and a function should be that a class has state. Some classes don't have state, but this is rarely a good idea (I'm sure there's exceptions, abstract base classes (ABCs) for instance but I'm not sure if they count), and some functions do have state, but this is rarely a good idea (caching or instrumentation might be exceptions).
If you want an URL as input, and say a dict as output, and then you are done with that website, there's no reason to have a class. Just have a function that takes an URL and returns a dict. Stateless functions are simpler abstractions than classes, so all other things being equal, prefer them.
However, very often there may be intermediate state involved. For instance, maybe you are scraping a family of pages rooted in a base URL, and it's too expensive to do all this eagerly. Maybe then what you want is a class that takes the root URL as its constructor. It then has some methods for querying which child URLs it can follow down, and methods for ordering subsequent scraping of children, which might be stored in nested data structures.
And of course, if your task is reasonably complicated, you may well have layers with functions using classes, or classes calling function. But persisting state is a good indicator of whether the immediate task should be written as a class or set of functions.
Edit: just to close the loop and come round to the original question: No, I would say it's not pythonesque to wrap all functions in classes. Free functions are just fine in python, it all depends what's appropriate. Also, the term pythonesque is not very pythonic ;-)
You mean "pythonic".
That depends in how much Object Oriented, scalable... do you want your implementation. I would use class over simple functions. Lets says tomorrow you want an CraiglistScraper and a FacebookScraper... I would create an abstract class "Scraper " and then the two above inherit from this one and reimplement what you need (Polymorphism). I mean the Object Oriented Principles and Patterns are language independent. Now I wouldn't "hold all the functions" in a class (Single responsibility principle), every time you code remember this word "SOLID".
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm used to the Java model where you can have one public class per file. Python doesn't have this restriction, and I'm wondering what's the best practice for organizing classes.
A Python file is called a "module" and it's one way to organize your software so that it makes "sense". Another is a directory, called a "package".
A module is a distinct thing that may have one or two dozen closely-related classes. The trick is that a module is something you'll import, and you need that import to be perfectly sensible to people who will read, maintain and extend your software.
The rule is this: a module is the unit of reuse.
You can't easily reuse a single class. You should be able to reuse a module without any difficulties. Everything in your library (and everything you download and add) is either a module or a package of modules.
For example, you're working on something that reads spreadsheets, does some calculations and loads the results into a database. What do you want your main program to look like?
from ssReader import Reader
from theCalcs import ACalc, AnotherCalc
from theDB import Loader
def main( sourceFileName ):
rdr= Reader( sourceFileName )
c1= ACalc( options )
c2= AnotherCalc( options )
ldr= Loader( parameters )
for myObj in rdr.readAll():
c1.thisOp( myObj )
c2.thatOp( myObj )
ldr.laod( myObj )
Think of the import as the way to organize your code in concepts or chunks. Exactly how many classes are in each import doesn't matter. What matters is the overall organization that you're portraying with your import statements.
Since there is no artificial limit, it really depends on what's comprehensible. If you have a bunch of fairly short, simple classes that are logically grouped together, toss in a bunch of 'em. If you have big, complex classes or classes that don't make sense as a group, go one file per class. Or pick something in between. Refactor as things change.
I happen to like the Java model for the following reason. Placing each class in an individual file promotes reuse by making classes easier to see when browsing the source code. If you have a bunch of classes grouped into a single file, it may not be obvious to other developers that there are classes there that can be reused simply by browsing the project's directory structure. Thus, if you think that your class can possibly be reused, I would put it in its own file.
It entirely depends on how big the project is, how long the classes are, if they will be used from other files and so on.
For example I quite often use a series of classes for data-abstraction - so I may have 4 or 5 classes that may only be 1 line long (class SomeData: pass).
It would be stupid to split each of these into separate files - but since they may be used from different files, putting all these in a separate data_model.py file would make sense, so I can do from mypackage.data_model import SomeData, SomeSubData
If you have a class with lots of code in it, maybe with some functions only it uses, it would be a good idea to split this class and the helper-functions into a separate file.
You should structure them so you do from mypackage.database.schema import MyModel, not from mypackage.email.errors import MyDatabaseModel - if where you are importing things from make sense, and the files aren't tens of thousands of lines long, you have organised it correctly.
The Python Modules documentation has some useful information on organising packages.
I find myself splitting things up when I get annoyed with the bigness of files and when the desirable structure of relatedness starts to emerge naturally. Often these two stages seem to coincide.
It can be very annoying if you split things up too early, because you start to realise that a totally different ordering of structure is required.
On the other hand, when any .java or .py file is getting to more than about 700 lines I start to get annoyed constantly trying to remember where "that particular bit" is.
With Python/Jython circular dependency of import statements also seems to play a role: if you try to split too many cooperating basic building blocks into separate files this "restriction"/"imperfection" of the language seems to force you to group things, perhaps in rather a sensible way.
As to splitting into packages, I don't really know, but I'd say probably the same rule of annoyance and emergence of happy structure works at all levels of modularity.
I would say to put as many classes as can be logically grouped in that file without making it too big and complex.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've started programming in python 2 weeks ago. I'm making a separate file (module) for each class as I've done before in languages like Java or C#.
But now, seeing tutorials and code from other people, I've realized that many people use the same files to define more than 1 class and the main function but I don't know if they do it like that because are just examples or because it's a python convention or something like that (to define and group many classes in the same files).
So, in Python, one file for each class or many classes in the same files if they can be grouped by any particular feature? (like motor vehicles by one side and just vehicles by the other side).
It's obvious that each one has his own style, but when I ask, I hope general answers or just the conventions, anyway, if someone wants to tell me his opinion about his own style and why, feel free to do it! ;)
one file for each class
Do not do this. In Java, you usually will not have more than one class in a file (you can, of course nest).
In Python, if you group related classes in a single file, you are on the safe side. Take a look at the Python standard library: many modules contain multiple classes in a single file.
As for the why? In short: Readability. I, personally, enjoy not having to switch between files to read related or similar code. It also makes imports more concise.
Imagine socketserver.py would spread UDPServer, TCPServer, ForkingUDPServer, ForkingTCPServer, ThreadingUDPServer, ThreadingTCPServer, BaseRequestHandler, StreamRequestHandler, DatagramRequestHandler into nine files. How would you import these? Like this?
from socketserver.tcp.server import TCPServer
from socketserver.tcp.server.forking import ForkingTCPServer
...
That's plain overhead. It's overhead, when you write it. It's overhead, when you read it. Isn't this easier?
from socketserver import TCPServer, ForkingTCPServer
That said, no one will stop you, if you put each class into a single file. It just might not be pythonic.
Python has the concept of packages, modules and classes. If you put one class per module, the advantage of having modules is gone. If you have a huge class, it might be ok to put this class in a separate file, but then again, is it good to have big classes? NO, it's hard to test and maintain. Better have more small classes with specific tasks and put them logically grouped in as few files as possible.
It's not wrong to have one class per file at all. Python isn't directly aimed at object oriented design so that's why you can get away with multiple classes per file.
I recommend having a read over some style guides if you're confused about what the 'proper' way to do it is.
I suggest either Google's style guide or the official style guide by the Python Foundation
You can also find more material relating to Python's idioms and meta analysis in the PEP index