I have a Python package that has an optional [extras] dependency, yet I want to adhere to typing on all methods.
The situation is that in my file, I have this
class MyClass:
def __init__(self, datastore: Datastore): # <- Datastore is azureml.core.Datastore
...
def my_func(self):
from azureml.core import Datastore
...
I import from within the function because there are other classes in the same file that should be imported when not using the extras (extras being azureml).
So this obviously fails, because I refer to Datastore before importing it. Removing the Datastore typing from the __init__ method obviously solves the problem.
So in general my question is whether it is possible, and if so how, to use typing when typing an optional (extras) package.
Notice, that importing in the class definition (below the class MyClass statement) is not a valid solution, as this code is called when the module is imported
You can use TYPE_CHECKING:
A special constant that is assumed to be True by 3rd party static type
checkers. It is False at runtime.
It is False at runtime: So it doesn't affect your module's behavior.
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from azureml.core import Datastore
class MyClass:
def __init__(self, datastore: Datastore):
...
def my_func(self):
from azureml.core import Datastore
...
Since I want to show this in action, I will use operator.itemgetter as an instance because it's recognizable for type checkers, but azureml.core is not:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from operator import itemgetter
class MyClass:
def __init__(self, datastore: itemgetter):
...
def my_func(self):
from operator import itemgetter
...
obj1 = MyClass(itemgetter(1)) # line 16
obj2 = MyClass(10) # line 17
Here is the Mypy error:
main.py:17: error: Argument 1 to "MyClass" has incompatible type "int"; expected "itemgetter[Any]"
Found 1 error in 1 file (checked 1 source file)
Which shows it works as excepted.
Just to add my two cents:
While it is certainly a solution, I consider the use of the TYPE_CHECKING constant a red flag regarding the project structure. It typically (though not always) either shows the presence of circular dependencies or poor separation of concerns.
In your case it seems to be the latter, as you state this:
I import from within the function because there are other classes in the same file that should be imported when not using the extras
If MyClass provides optional functionality to your package, it should absolutely reside in its own module and not alongside other classes that provide core functionality.
When you put MyClass into its own module (say my_class), you can place its dependencies at the top with all the other imports. Then you put the import from my_class inside a function that handles the logic of loading internal optional dependencies.
Aside from visibility and arguably better style, one advantage of such a setup over the one you presented is that the my_class module will be consistent in itself and fail on import, if the extra azureml dependency is missing (or broken/renamed/deprecated), rather than at runtime only when MyClass.my_func is called.
You'd be surprised how easy it is to accidentally forget to install all extra dependencies (even in a production environment). Then you'll thank the stars, when the code fails immediately and transparently, rather than causing errors at some point later at runtime.
I have the following (toy) package structure
root/
- package1/
- __init__.py
- class_a.py
- class_b.py
- run.py
In both class_a.py and class_b.py I have a class definition that I want to expose to run.py. If I want to import them this way, I will have to use
from package1.class_a import ClassA # works but doesn't look nice
I don't like that this shows the class_a.py module, and would rather use the import style
from package1 import ClassA # what I want
This is also closer to what I see from larger libraries. I found a way to do this by importing the classes in the __init__.py file like so
from class_a import ClassA
from class_b import ClassB
This works fine if it wasn't for one downside: as soon as I import ClassA as I would like (see above), I also immediately 'import' ClassB as, as far as I know, the __init__.py will be run, importing ClassB. In my real scenario, this means I implicitly import a huge class that I use very situationally (which itself imports tensorflow), so I really want to avoid this somehow. Is there a way to create the nice looking imports without automatically importing everything in the package?
It is possible but require a rather low level customization: you will have to customize the class of your package (possible since Python 3.5). That way, you can declare a __getattr__ member that will be called when you ask for a missing attribute. At that moment, you know that you have to import the relevant module and extract the correct attribute.
The init.py file should contain (names can of course be changed):
import importlib
import sys
import types
class SpecialModule(types.ModuleType):
""" Customization of a module that is able to dynamically loads submodules.
It is expected to be a plain package (and to be declared in the __init__.py)
The special attribute is a dictionary attribute name -> relative module name.
The first time a name is requested, the corresponding module is loaded, and
the attribute is binded into the package
"""
special = {'ClassA': '.class_a', 'ClassB': '.class_b'}
def __getattr__(self, name):
if name in self.special:
m = importlib.import_module(self.special[name], __name__) # import submodule
o = getattr(m, name) # find the required member
setattr(sys.modules[__name__], name, o) # bind it into the package
return o
else:
raise AttributeError(f'module {__name__} has no attribute {name}')
sys.modules[__name__].__class__ = SpecialModule # customize the class of the package
You can now use it that way:
import package1
...
obj = package1.ClassA(...) # dynamically loads class_a on first call
The downside is that clever IDE that look at the declared member could choke on that and pretend that you are accessing an inexistant member because ClassA is not statically declared in package1/__init__.py. But all will be fine at run time.
As it is a low level customization, it is up to you do know whether it is worth it...
Since 3.7 you could also declare a __gettatr__(name) function directly at the module level.
I have an abstract base class with a number of derived classes. I'm trying to achieve the same behaviour that I would get by placing all the derived classes in the same file as the base class, i.e. if my classes are Base, DerivedA, DerivedB, DerivedC in the file myclass.py I can write in another file
import myclass
a = myclass.DerivedA()
b = myclass.DerivedB()
c = myclass.DerivedC()
but with each derived class in its own file. This has to be dynamic, i.e. such that I could e.g. delete derived_c.py and everything still works except that now I can no longer call myclass.DerivedC, or that if I add a derived_d.py, I could use it without touching the __init__.py so simply using from derived_c import DerivedC is not an option.
I've tried placing them all in a subdirectory and in that directory's __init__.py use pkgutil.walk_packages() to import all the files dynamically, but I can't get them to then be directly in the module's namespace, i.e. rather than myclass.DerivedC() I have to call myclass.derived_c.DerivedC() because I can't figure out how (or if it's possible) to use importlib to achieve the equivalent of a from xyz import * statement.
Any suggestions for how I could achieve this? Thanks!
Edit: The solutions for Dynamic module import in Python don't provide a method for automatically importing the classes in all modules into the namespace of the package.
I had to make something quite similar a while back, but in my case I had to dynamically create a list with all subclasses from a base class in a specific package, so in case you find it useful:
Create a my_classes package containing all files for your Base class and all subclasses. You should include only one class in each file.
Set __all__ appropriately in __init__.py to import all .py files except for __init__.py (from this answer):
from os import listdir
from os.path import dirname, basename
__all__ = [basename(f)[:-3] for f in listdir(dirname(__file__)) if f[-3:] == ".py" and not f.endswith("__init__.py")]
Import your classes using from my_classes import *, since our custom __all__ adds all classes inside the my_classes package to the namespace.
However, this does not allow us direct access to the subclasses yet. You have to access them like this in your main script:
from my_classes import *
from my_classes.base import Base
subclasses = Base.__subclasses__()
Now subclasses is a list containing all classes that derive from Base.
Since Python 3.6 there exists a method for initializing subclasses. This is done on definition, so before all of your code gets executed. In here you can simply import the sub-class that is initialized.
base.py
class Base:
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
__import__(cls.__module__)
sub1.py
class Sub1(Base):
pass
sub2.py
class Sub2(Base):
pass
Please consider the following Python modules excerpts:
foo.py:
class Foo:
(...)
bar.py:
import foo
foo = foo.Foo()
The variable foo, which was a module object, is overwritten with a Foo object.
I know that I can use other names for the object, e.g.:
foobar = foo.Foo()
but semantically it makes more sense in my code to have it called foo, since it will be the only instance.
(I tried to workaround this by dropping classes and using modules only, but I went back to using classes because using modules only had "robustness" problems.)
This is kind of a philosophical question, but what is the "right" way of handling this potential object/module names clash?
In my opinion there is nothing wrong with what you are currently doing, but to make it more clear for everyone reading the code I would suggest changing your code to something like the following:
import foo as foo_mod
foo = foo_mod.Foo()
Or alternatively:
from foo import Foo
foo = Foo()
This prevents the name clash so it will be more obvious that the variable foo in your module is not going to refer to the module of the same name.
I've also been favoring the following style nowadays:
import foo
my_foo = foo.Foo()
I prefer this because it keeps module names untouched, and those are are more global and sacred than local variables.
This pattern doesn't seem to bother peeps who use Flask + Celery,
from celery import Celery
def make_celery(app):
celery = Celery(
app.import_name,
backend=app.config['CELERY_RESULT_BACKEND'],
broker=app.config['CELERY_BROKER_URL']
)
...
Obviously, the correct way to create an instance of this class is stalk = Celery() (hehe)
What are the best practices for extending an existing Python module – in this case, I want to extend the python-twitter package by adding new methods to the base API class.
I've looked at tweepy, and I like that as well; I just find python-twitter easier to understand and extend with the functionality I want.
I have the methods written already – I'm trying to figure out the most Pythonic and least disruptive way to add them into the python-twitter package module, without changing this modules’ core.
A few ways.
The easy way:
Don't extend the module, extend the classes.
exttwitter.py
import twitter
class Api(twitter.Api):
pass
# override/add any functions here.
Downside : Every class in twitter must be in exttwitter.py, even if it's just a stub (as above)
A harder (possibly un-pythonic) way:
Import * from python-twitter into a module that you then extend.
For instance :
basemodule.py
class Ball():
def __init__(self,a):
self.a=a
def __repr__(self):
return "Ball(%s)" % self.a
def makeBall(a):
return Ball(a)
def override():
print "OVERRIDE ONE"
def dontoverride():
print "THIS WILL BE PRESERVED"
extmodule.py
from basemodule import *
import basemodule
def makeBalls(a,b):
foo = makeBall(a)
bar = makeBall(b)
print foo,bar
def override():
print "OVERRIDE TWO"
def dontoverride():
basemodule.dontoverride()
print "THIS WAS PRESERVED"
runscript.py
import extmodule
#code is in extended module
print extmodule.makeBalls(1,2)
#returns Ball(1) Ball(2)
#code is in base module
print extmodule.makeBall(1)
#returns Ball(1)
#function from extended module overwrites base module
extmodule.override()
#returns OVERRIDE TWO
#function from extended module calls base module first
extmodule.dontoverride()
#returns THIS WILL BE PRESERVED\nTHIS WAS PRESERVED
I'm not sure if the double import in extmodule.py is pythonic - you could remove it, but then you don't handle the usecase of wanting to extend a function that was in the namespace of basemodule.
As far as extended classes, just create a new API(basemodule.API) class to extend the Twitter API module.
Don't add them to the module. Subclass the classes you want to extend and use your subclasses in your own module, not changing the original stuff at all.
Here’s how you can directly manipulate the module list at runtime – spoiler alert: you get the module type from types module:
from __future__ import print_function
import sys
import types
import typing as tx
def modulize(namespace: tx.Dict[str, tx.Any],
modulename: str,
moduledocs: tx.Optional[str] = None) -> types.ModuleType:
""" Convert a dictionary mapping into a legit Python module """
# Create a new module with a trivially namespaced name:
namespacedname: str = f'__dynamic_modules__.{modulename}'
module = types.ModuleType(namespacedname, moduledocs)
module.__dict__.update(namespace)
# Inspect the new module:
name: str = module.__name__
doc: tx.Optional[str] = module.__doc__
contents: str = ", ".join(sorted(module.__dict__.keys()))
print(f"Module name: {name}")
print(f"Module contents: {contents}")
if doc:
print(f"Module docstring: {doc}")
# Add to sys.modules, as per import machinery:
sys.modules.update({ modulename : module })
# Return the new module instance:
return module
… you could then use such a function like so:
ns = {
'func' : lambda: print("Yo Dogg"), # these can also be normal non-lambda funcs
'otherfunc' : lambda string=None: print(string or 'no dogg.'),
'__all__' : ('func', 'otherfunc'),
'__dir__' : lambda: ['func', 'otherfunc'] # usually this’d reference __all__
}
modulize(ns, 'wat', "WHAT THE HELL PEOPLE")
import wat
# Call module functions:
wat.func()
wat.otherfunc("Oh, Dogg!")
# Inspect module:
contents = ", ".join(sorted(wat.__dict__.keys()))
print(f"Imported module name: {wat.__name__}")
print(f"Imported module contents: {contents}")
print(f"Imported module docstring: {wat.__doc__}")
… You could also create your own module subclass, by specifying types.ModuleType as the ancestor of your newly declared class, of course; I have never personally found this necessary to do.
(Also, you don’t have to get the module type from the types module – you can always just do something like ModuleType = type(os) after importing os – I specifically pointed out this one source of the type because it is non-obvious; unlike many of its other builtin types, Python doesn’t offer up access to the module type in the global namespace.)
The real action is in the sys.modules dict, where (if you are appropriately intrepid) you can replace existing modules as well as adding your new ones.
Say you have an older module called mod that you use like this:
import mod
obj = mod.Object()
obj.method()
mod.function()
# and so on...
And you want to extend it, without replacing it for your users. Easily done. You can give your new module a different name, newmod.py or place it by same name at a deeper path and keep the same name, e.g. /path/to/mod.py. Then your users can import it in either of these ways:
import newmod as mod # e.g. import unittest2 as unittest idiom from Python 2.6
or
from path.to import mod # useful in a large code-base
In your module, you'll want to make all the old names available:
from mod import *
or explicitly name every name you import:
from mod import Object, function, name2, name3, name4, name5, name6, name7, name8, name9, name10, name11, name12, name13, name14, name15, name16, name17, name18, name19, name20, name21, name22, name23, name24, name25, name26, name27, name28, name29, name30, name31, name32, name33, name34, name35, name36, name37, name38, name39
I think the import * will be more maintainable for this use-case - if the base module expands functionality, you'll seamlessly keep up (though you might shade new objects with the same name).
If the mod you are extending has a decent __all__, it will restrict the names imported.
You should also declare an __all__ and extend it with the extended module's __all__.
import mod
__all__ = ['NewObject', 'newfunction']
__all__ += mod.__all__
# if it doesn't have an __all__, maybe it's not good enough to extend
# but it could be relying on the convention of import * not importing
# names prefixed with underscores, (_like _this)
Then extend the objects and functionality as you normally would.
class NewObject(object):
def newmethod(self):
"""this method extends Object"""
def newfunction():
"""this function builds on mod's functionality"""
If the new objects provide functionality you intend to replace (or perhaps you are backporting the new functionality into an older code base) you can overwrite the names
May I suggest not to reinvent the Wheel here? I'm building a >6k line Twitter Client for 2 month now, at first I checked python-twitter too, but it's lagging a lot behind the recent API changes,, Development doesn't seem to be that active either, also there was(at least when I last checked) no support for OAuth/xAuth).
So after searching around a bit more I discovered tweepy:
http://github.com/joshthecoder/tweepy
Pros: Active development, OAauth/xAuth and up to date with the API.
Chances are high that what you need is already in there.
So I suggest going with that, it's working for me, the only thing I had to add was xAuth(that got merge back to tweepy :)
Oh an a shameless plug, if you need to parse Tweets and/or format them to HTML use my python version of the twitter-text-* libraries:
http://github.com/BonsaiDen/twitter-text-python
This thing is unittestetd an guaranteed to parse Tweets just like Twitter.com does it.
Define a new class, and instead of inherit it from the class you want to extend from the original module, add an instance of the original class as an attribute to your new class.
And here comes the trick: intercept all non-existing method calls on your new class and try to call it on the instance of the old class.
In your NewClass just define new or overridden methods as you like:
import originalmodule
class NewClass:
def __init__(self, *args, **kwargs):
self.old_class_instance = originalmodule.create_oldclass_instance(*args, **kwargs)
def __getattr__(self, methodname):
"""This is a wrapper for the original OldClass class.
If the called method is not part of this NewClass class,
the call will be intercepted and replaced by the method
in the original OldClass instance.
"""
def wrapper(*args, **kwargs):
return getattr(self.old_class_instance, methodname)(*args, **kwargs)
return wrapper
def new_method(self, arg1):
"""Does stuff with the OldClass instance"""
thing = self.old_class_instance.get_somelist(arg1)
# returns the first element only
return thing[0]
def overridden_method(self):
"""Overrides an existing method, if OldClass has a method with the same name"""
print("This message is coming from the NewClass and not from the OldClass")
In my case I used this solution when simple inheritance from the old class was not possible, because an instance had to be created not by its constructor, but with an init script from an other class/module. (It is the originalmodule.create_oldclass_instance in the example above.)