I've split a program into three scripts. One of them, 'classes.py', is a module defining all the classes I need. Another one is a sort of setup module, call it 'setup.py', which instantiates a lot of objects from 'classes.py' (it's just a bunch of variable assignments with a few for loops, no functions or classes). It has a lot of strings and stuff I don't want to see when I'm working on the third script which is the program itself, i.e. the script that actually does something with all of the above.
The only way I got this to work was to add, in the 'setup.py' script:
from classes import *
This allows me to write quickly in the setup file without having the namespace added everywhere. And, in the main script:
import setup
This has the advantages of PyCharm giving me full code completion for classes and methods, which is nice.
What I'd like to achieve is having the main script import the classes, and then run the setup script to create the objects I need, with two simple commands. But I can't import the classes script into the main script because then the setup script can't do anything, having no class definitions. Should I import the classes into both scripts, or do something else entirely?
Import in each file. Consider this SO post. From the answer by Mr Fooz there,
Each module has its own namespace. So for boo.py to see something from an external module, boo.py must import it itself.
It is possible to write a language where namespaces are stacked the way you expect them to: this is called dynamic scoping. Some languages like the original lisp, early versions of perl, postscript, etc. do use (or support) dynamic scoping.
Most languages use lexical scoping instead. It turns out this is a much nicer way for languages to work: this way a module can reason about how it will work based on its own code without having to worry about how it was called.
See this article for additional details: http://en.wikipedia.org/wiki/Scope_%28programming%29
Intuitively this feels nicer too, as you can immediately (in the file itself) see which dependencies the code has - this will allow you to understand your code much better, a month, or even a year from now.
Related
I am a beginner in Python, and I am trying to learn by making a simple game. I started by having everything in one big file (let's call it main.py), but it is getting to the point where it has so many classes and functions that I would like to split this code into more manageable components.
I have some experience with LaTeX (although certainly not an expert either) and, in LaTeX there is a function called \input which allows one to write part of the code in a different file. For example, if I have files main.tex and sub.tex which look like:
main.tex:
Some code here.
\input{sub}
Lastly, some other stuff.
and
sub.tex:
Some more code here
then, when I execute main.tex, it will execute:
Some code here.
Some more code here
Lastly, some other stuff.
I wonder, is there a similar thing in Python?
Note 1: From what I have seen, the most commonly suggested way to go about splitting your code is to use modules. I have found this a bit uncomfortable for a few reasons, which I will list below (of course, I understand that I find them uncomfortable because I am a inexperienced, and not because this is the wrong way to do things).
Reasons why I find modules uncomfortable:
My main.py file imports some other modules, like Pygame, which need to be imported into all the new modules I create. If for some reason I wanted to import a new module into main.py later in the process I would then need to import it on every other module I create.
My main.py file has some global variables that are used in the different classes; for example, I have a global variable CITY_SIZE that controls the size of all City instances on the screen. Naturally, CITY_SIZE is used in the definition of the class City. If I were to move the class City to a module classes.py, then I need to define CITY_SIZE on classes.py as well, and if I ever wanted to change the value of CITY_SIZE I would need to change its value on classes.py as well.
Again, suppose that I add a classes.py module where I store all my classes, like City. Then in main.py I need to write classes.City in my code instead of City. I understand this can be overcome by using from classes import City but then I need to add a line of code every time I add a new class to classes.py.
Note 2: I would very much appreciate any comments about how to use modules comfortably in Python, but please note that because this is not my question I would not be able to accept those as valid answers (but, again, they would be appreciated!).
If you have all of your modules in the same directory, you can simply use:
import <name of submodule without .py>
For example, if a submodule file was named sub.py, you would import it like this:
import sub
I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.
Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:
if isintance(a, Parallel):
m = ParallelObject()
elif ifinstance(a, Serial):
m = SerialObject()
I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories.
In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?
Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:
from myPackage.parallel import *
or
from myPackage.serial import *
and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?
You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.
wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.
EDIT:
So if I understand it correctly, this file you points out imports modules depending on (...)
What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.
So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?
Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).
The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.
So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.
My fear is that py.test will have modules from parallel and serial while testing different files and create a mess
Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.
import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).
All this means that given something like
# module A
def foo():
return bar(42)
def bar(x):
return x * 2
and
# module B
def foo():
return bar(33)
def bar(x):
return x / 2
It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).
Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.
To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html
EDIT 2:
(about the 'automagic' module)
I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?
I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.
This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use
SOMEFLAG=serial python main.py
vs
SOMEFLAG=parallel mpirun -np4 python main.py
or
python main.py serial
vs
mpirun -np4 python main.py parallel
(whichever works best for you needs - is the most easily portable).
This of course requires a bit more documentation and some more effort from the end-user but well...
I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.
if user_says_parallel:
from myPackage.parallel import ParallelObject
ObjectClass = ParallelObject
else:
from myPackage.serial import SerialObject
ObjectClass = SerialObject
my_abstract_object = ObjectClass()
If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.
Question
If I have import statements nested in an if/else block, am I increasing efficiency? I know some languages do "one passes" over code for import and syntax issues. I'm just not sure how in depth Python goes into this.
My Hypothesis
Because Python is interpreted and not compiled, by nesting the import statements within the else block, those libraries will not be imported until that line is reached, thus saving system resources unless otherwise needed.
Scenario
I have written a script that will be used by both the more computer literate and those are lesser so. My department is very comfortable with running scripts from the command line with arguments so I have set it up to take arguments for what it needs and, if it does not find the arguments it was expecting, it will launch a GUI with headings, buttons, and more verbose instructions. However, this means that I am importing libraries that are only being used in the event that the arguments were not provided.
Additional Information
The GUI is very, very basic (A half dozen text fields and possibly fewer buttons) so I am not concerned with just creating and spawning a custom GUI class in which the necessary libraries would be imported. If this gets more complicated, I'll consider it in the future or even push to change to web interface.
My script fully functions as I would expect it to. The question is simply about resource consumption.
import statements are executed as they're encountered in normal execution, so if the conditional prevents that line from being executed, the import doesn't occur, and you'll have avoided unnecessary work.
That said, if the module is going to be imported in some other way (say, unconditionally imported module B depends on A, and you're conditionally importing A), the savings are trivial; after the first import of a module, subsequent imports just get a new reference to the same cached module; the import machinery has to do some complicated stuff to handle import hooks and the like first, but in the common case, it's still fairly cheap (sub-microsecond when importing an already cached module).
The only way this will save you anything is if the module in question would not be imported in any way otherwise, in which case you avoid the work of loading it and the memory used by the loaded module.
So I have a set of .py documents as follows:
/Spider
Script.py
/Classes
__init__.py
ParseXML.py
CrawlWeb.py
TextAnalytics.py
Each .py document in the /Classes subfolder contains a class for a specific purpose, the script schedules the different components. There are a couple of questions I had:
1) A lot of the classes share frameworks such as urllib2, threading etc. What is considered the 'best' form for setting up the import statements? I.e. is there a way for me to use something like the __init__.py file to pass the shared dependencies to all of the classes, then use the specific .py files to import the singular dependencies?
2) Some of the classes call on the other classes, (e.g. the CrawlWeb.py document uses the ParseXML class to update the XML files after crawling). I separated out the classes like this because they were each quite large and so were easier to update like this... Would it be considered best form to combine classes in this case or are there other ways to get round this?
The classes will only ever be used as part of the script. So far the only real solution I've been able to come up with is perhaps using the Script.py file for all of the import statements, but it seems a little bit messy. Any advice would be very appreciated.
The best way to handle the common imports is to import them in each module they're used. While this probably feels annoying to you because you have to type more, it makes it dramatically clearer to the reader of the code what modules are in scope. You're not missing something by doing common imports; you're doing it right.
While you certainly can put your classes all into separate files, it's more common in Python to group related classes together in a single module. Given how short it sounds like your script is, that may mean it makes sense for you to pull everything into a single file. This is a judgment call, and I cannot offer a hard-and-fast rule.
Ruby uses require, Python uses import. They're substantially different models, and while I'm more used to the require model, I can see a few places where I think I like import more. I'm curious what things people find particularly easy — or more interestingly, harder than they should be — with each of these models.
In particular, if you were writing a new programming language, how would you design a code-loading mechanism? Which "pros" and "cons" would weigh most heavily on your design choice?
The Python import has a major feature in that it ties two things together -- how to find the import and under what namespace to include it.
This creates very explicit code:
import xml.sax
This specifies where to find the code we want to use, by the rules of the Python search path.
At the same time, all objects that we want to access live under this exact namespace, for example xml.sax.ContentHandler.
I regard this as an advantage to Ruby's require. require 'xml' might in fact make objects inside the namespace XML or any other namespace available in the module, without this being directly evident from the require line.
If xml.sax.ContentHandler is too long, you may specify a different name when importing:
import xml.sax as X
And it is now avalable under X.ContentHandler.
This way Python requires you to explicitly build the namespace of each module. Python namespaces are thus very "physical", and I'll explain what I mean:
By default, only names directly defined in the module are available in its namespace: functions, classes and so.
To add to a module's namespace, you explicitly import the names you wish to add, placing them (by reference) "physically" in the current module.
For example, if we have the small Python package "process" with internal submodules machine and interface, and we wish to present this as one convenient namespace directly under the package name, this is and example of what we could write in the "package definition" file process/__init__.py:
from process.interface import *
from process.machine import Machine, HelperMachine
Thus we lift up what would normally be accessible as process.machine.Machine up to process.Machine. And we add all names from process.interface to process namespace, in a very explicit fashion.
The advantages of Python's import that I wrote about were simply two:
Clear what you include when using import
Explicit how you modify your own module's namespace (for the program or for others to import)
A nice property of require is that it is actually a method defined in Kernel. Thus you can override it and implement your own packaging system for Ruby, which is what e.g. Rubygems does!
PS: I am not selling monkey patching here, but the fact that Ruby's package system can be rewritten by the user (even to work like python's system). When you write a new programming language, you cannot get everything right. Thus if your import mechanism is fully extensible (into totally all directions) from within the language, you do your future users the best service. A language that is not fully extensible from within itself is an evolutionary dead-end. I'd say this is one of the things Matz got right with Ruby.
Python's import provides a very explicit kind of namespace: the namespace is the path, you don't have to look into files to know what namespace they do their definitions in, and your file is not cluttered with namespace definitions. This makes the namespace scheme of an application simple and fast to understand (just look at the source tree), and avoids simple mistakes like mistyping a namespace declaration.
A nice side effect is every file has its own private namespace, so you don't have to worry about conflicts when naming things.
Sometimes namespaces can get annoying too, having things like some.module.far.far.away.TheClass() everywhere can quickly make your code very long and boring to type. In these cases you can import ... from ... and inject bits of another namespace in the current one. If the injection causes a conflict with the module you are importing in, you can simply rename the thing you imported: from some.other.module import Bar as BarFromOtherModule.
Python is still vulnerable to problems like circular imports, but it's the application design more than the language that has to be blamed in these cases.
So python took C++ namespace and #include and largely extended on it. On the other hand I don't see in which way ruby's module and require add anything new to these, and you have the exact same horrible problems like global namespace cluttering.
Disclaimer, I am by no means a Python expert.
The biggest advantage I see to require over import is simply that you don't have to worry about understanding the mapping between namespaces and file paths. It's obvious: it's just a standard file path.
I really like the emphasis on namespacing that import has, but can't help but wonder if this particular approach isn't too inflexible. As far as I can tell, the only means of controlling a module's naming in Python is by altering the filename of the module being imported or using an as rename. Additionally, with explicit namespacing, you have a means by which you can refer to something by its fully-qualified identifier, but with implicit namespacing, you have no means to do this inside the module itself, and that can lead to potential ambiguities that are difficult to resolve without renaming.
i.e., in foo.py:
class Bar:
def myself(self):
return foo.Bar
This fails with:
Traceback (most recent call last):
File "", line 1, in ?
File "foo.py", line 3, in myself
return foo.Bar
NameError: global name 'foo' is not defined
Both implementations use a list of locations to search from, which strikes me as a critically important component, regardless of the model you choose.
What if a code-loading mechanism like require was used, but the language simply didn't have a global namespace? i.e., everything, everywhere must be namespaced, but the developer has full control over which namespace the class is defined in, and that namespace declaration occurs explicitly in the code rather than via the filename. Alternatively, defining something in the global namespace generates a warning. Is that a best-of-both-worlds approach, or is there an obvious downside to it that I'm missing?