I am using a light ETL library called bonobo.
The csv writer bonobo.CsvWriter class has a factory method:
def writer_factory(self, file):
return csv.writer(file, **self.get_dialect_kwargs()).writerow
with the docs:
class CsvWriter(FileWriter, CsvHandler):
#Method(
__doc__='''
Builds the CSV writer, a.k.a an object we can pass a field collection to be written as one line in the
target file.
Defaults to builtin csv.writer(...).writerow, but can be overriden to fit your special needs.
'''
)
I'd like to add some extra parameters to customize my csv file, so I try to override it as such:
class quoCsvWriter(bonobo.CsvWriter):
def writer_factory(self, file):
return csv.writer(file, **self.get_dialect_kwargs(),quoting=csv.QUOTE_NONNUMERIC).writerow
when I add the node into the chain, the programs shows:
Traceback (most recent call last):
File "geocoding.py", line 162, in <module>
get_graph(),
File "geocoding.py", line 135, in get_graph
quoCsvWriter('db_addresses.csv')
File "/Users/xxxx/xxxx/lib/python3.6/site-packages/bonobo/config/configurables.py", line 152, in __new__
missing.remove(name)
KeyError: 'writer_factory'
Any hints are appreciated.
Update:
meanwhile when I try to do
bonobo.CsvWriter('filename.csv',quoting=csv.QUOTE_MINIMAL)
it throws error:
TypeError "quoting" must be an integer
As of bonobo 0.6, overriding directly Method instances in subclasses is non trivial. Instead, you should provide an overriden implementation in the constructor arguments.
def writer_factory(self, file):
return csv.writer(file, **{**self.get_dialect_kwargs(), 'quoting': csv.QUOTE_NONNUMERIC}).writerow
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
extract,
bonobo.CsvWriter('...', writer_factory=writer_factory),
)
return graph
If you really want to subclass for this use case, you can do it by overriding get_dialect_kwargs() method instead:
#use_context
class QuoteNonNumericCsvWriter(bonobo.CsvWriter):
def get_dialect_kwargs(self):
return {
**super().get_dialect_kwargs(),
'quoting': csv.QUOTE_NONNUMERIC,
}
This should work as expected.
Of course, overriding quoting is possible directly from writer constructor as of bonobo 0.6.2, there was a bug before around this but fix is now released.
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
extract,
bonobo.CsvWriter('...', quoting=csv.QUOTE_NONNUMERIC),
)
All three methods have the exact same behaviour, you should favour the last one.
Hope that helps.
Related
I want to be able to define what the contents of a subclass of a subclass of typing.Iterable have to be.
Type hints are critical for this project so I have to find a working solution
Here is a snip code of what I've already tried and what I want:
# The code I'm writing:
from typing import TypeVar, Iterable
T = TypeVar('T')
class Data:
pass
class GeneralPaginatedCursor(Iterable[T]):
"""
If this type pf cursor is used by an EDR a specific implementation has to be created
Handle paginated lists, exposes hooks to simplify retrieval and parsing of paginated data
"""
# Implementation
pass
###########
# This part is supposed to be written by different developers on a different place:
class PaginatedCursor(GeneralPaginatedCursor):
pass
def foo() -> GeneralPaginatedCursor[Data]:
"""
Works great
"""
pass
def bar() -> PaginatedCursor[Data]:
"""
Raises
Traceback (most recent call last):
.
.
.
def bar(self) -> PaginatedCursor[Data]:
File "****\Python\Python38-32\lib\typing.py", line 261, in inner
return func(*args, **kwds)
File "****\Python\Python38-32\lib\typing.py", line 894, in __class_getitem__
_check_generic(cls, params)
File "****\Python\Python38-32\lib\typing.py", line 211, in _check_generic
raise TypeError(f"{cls} is not a generic class")
"""
pass
I don't want to leave it to the other developers in the future to inherit from Iterable because if someone will miss it everything will break.
I found the exact same issue here:
https://github.com/python/cpython/issues/82640
But there is no answer.
The only requirements are that GeneralPaginatedCursor define __iter__ to return an Iterable value (namely, something with a __next__ method).
The error you see occurs because, since GeneralPaginatedCursor is generic in T, PaginatedCursor should be as well.
class PaginatedCursor(GeneralPaginatedCursor[T]):
pass
I try to create interface with #staticmethod and #classmethod. Declaring class method is simple. But I can't find the correct way to declare static method.
Consider class interface and its implementation:
#!/usr/bin/python3
from zope.interface import Interface, implementer, verify
class ISerializable(Interface):
def from_dump(slice_id, intex_list, input_stream):
'''Loads from dump.'''
def dump(out_stream):
'''Writes dump.'''
def load_index_list(input_stream):
'''staticmethod'''
#implementer(ISerializable)
class MyObject(object):
def dump(self, out_stream):
pass
#classmethod
def from_dump(cls, slice_id, intex_list, input_stream):
return cls()
#staticmethod
def load_index_list(stream):
pass
verify.verifyClass(ISerializable, MyObject)
verify.verifyObject(ISerializable, MyObject())
verify.verifyObject(ISerializable, MyObject.from_dump(0, [], 'stream'))
Output:
Traceback (most recent call last):
File "./test-interface.py", line 31, in <module>
verify.verifyClass(ISerializable, MyObject)
File "/usr/local/lib/python3.4/dist-packages/zope/interface/verify.py", line 102, in verifyClass
return _verify(iface, candidate, tentative, vtype='c')
File "/usr/local/lib/python3.4/dist-packages/zope/interface/verify.py", line 97, in _verify
raise BrokenMethodImplementation(name, mess)
zope.interface.exceptions.BrokenMethodImplementation: The implementation of load_index_list violates its contract
because implementation doesn't allow enough arguments.
How should I correctly declare static method in this interface?
Obviously the verifyClass does not understand either classmethod or staticmethod properly. The problem is that in Python 3, if you do getattr(MyObject, 'load_index_list') in Python 3, you get a bare function, and verifyClass thinks it is yet another unbound method, and then expects that the implicit self be the first argument.
The easiest fix is to use a classmethod there instead of a staticmethod.
I guess someone could also do a bug report.
I have stumbled into a pretty interesting bug in klen mixer library for Python.
https://github.com/klen/mixer
This bug occurs whenever you try to setup a model with a column using sqlalchemy.dialect.postgresql.INET. Trying to blend a model with this in will bring the following trace...
mixer: ERROR: Traceback (most recent call last):
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 612, in blend
return type_mixer.blend(**values)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 130, in blend
for name, value in defaults.items()
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 130, in <genexpr>
for name, value in defaults.items()
File "/home/cllamach/PythonProjects/mixer/mixer/mix_types.py", line 220, in gen_value
return type_mixer.gen_field(field)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 209, in gen_field
return self.gen_value(field.name, field, unique=unique)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 254, in gen_value
gen = self.get_generator(field, field_name, fake=fake)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 304, in get_generator
field.scheme, field_name, fake, kwargs=field.params)
File "/home/cllamach/PythonProjects/mixer/mixer/backend/sqlalchemy.py", line 178, in make_generator
stype, field_name=field_name, fake=fake, args=args, kwargs=kwargs)
File "/home/cllamach/PythonProjects/mixer/mixer/main.py", line 324, in make_generator
fabric = self.__factory.gen_maker(scheme, field_name, fake)
File "/home/cllamach/PythonProjects/mixer/mixer/factory.py", line 157, in gen_maker
if not func and fcls.__bases__:
AttributeError: Mixer (<class 'tests.test_flask.IpAddressUser'>): 'NoneType' object has no attribute '__bases__'
I debugged this error all the way down to a couple of methods in the code, the first method get_generator tries the following...
if key not in self.__generators:
self.__generators[key] = self.make_generator(
field.scheme, field_name, fake, kwargs=field.params)
And heres comes the weird part. Here in this statement field.scheme has a value, specifically a Column object from sqlalchemy, but when is passed down to the make_generetor method is passed as a None. So far i have seen no other piece of code in between these two methods, have debugged with ipdb and others. Have tried calling the method manually with ipdb and still the scheme is passed None.
I know this can be deemed as too particular an issue but i would like to know if someone has encountered this kind of issues before, as this is a first for me.
Mixer is choking on an unknown column type. It stores all the ones it knows in GenFactory.types as a dict and calls types.get(column_type), which of course will return None for an unrecognized type. I ran into this because I defined a couple custom SQLAlchemy types with sqlalchemy.types.TypeDecorator.
To solve this problem, You'll have to monkey-patch your types into Mixer's type system. Here's how I did it:
def _setup_mixer_with_custom_types():
from mixer._faker import faker
from mixer.backend.sqlalchemy import (
GenFactory,
mixer,
)
from myproject.customcolumntypes import (
IntegerTimestamp,
UTCDateTimeTimestamp,
)
def arrow_generator():
return arrow.get(faker.date_time())
GenFactory.generators[IntegerTimestamp] = arrow_generator
GenFactory.generators[UTCDateTimeTimestamp] = arrow_generator
return mixer
mixer = _setup_mixer_with_custom_types()
Note that you don't actually have to touch GenFactory.types because it's just an intermediary step that Mixer skips if it can find your type directly on GenFactory.generators.
In my case, I also had to define a custom generator (to accommodate Arrow), but you may not need to. Mixer uses the fake-factory library to generate fake data, and you can see what they're using by looking at the GenFactory.generators dict.
You have to get the column type into GenFactory.generators, which by default only contains some standard types. Instead of monkey-patching, you might subclass GenFactory and then specify your own class upon Mixer generation.
In this case, we'll customize the already subclassed GenFactory and Mixer variants from backend.sqlalchemy:
from mixer.backend.sqlalchemy import Mixer, GenFactory
from customtypes import CustomType # The column type
def get_mixer():
class CustomFactory(GenFactory):
# No need to preserve entries, the parent class attribute is
# automatically extended through GenFactory's metaclass
generators = {
CustomType: lambda: 42 # Or any other function
}
return Mixer(factory=CustomFactory)
You can use whatever function you like as generator, it just has to return the desired value. Sometimes, directly using something from faker might be enough.
In the same way, you can also customize the other attributes of GenFactory, i.e. fakers and types.
I know that in order to be picklable, a class has to overwrite __reduce__ method, and it has to return string or tuple.
How does this function work?
What the exact usage of __reduce__? When will it been used?
When you try to pickle an object, there might be some properties that don't serialize well. One example of this is an open file handle. Pickle won't know how to handle the object and will throw an error.
You can tell the pickle module how to handle these types of objects natively within a class directly. Lets see an example of an object which has a single property; an open file handle:
import pickle
class Test(object):
def __init__(self, file_path="test1234567890.txt"):
# An open file in write mode
self.some_file_i_have_opened = open(file_path, 'wb')
my_test = Test()
# Now, watch what happens when we try to pickle this object:
pickle.dumps(my_test)
It should fail and give a traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
--- snip snip a lot of lines ---
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects
However, had we defined a __reduce__ method in our Test class, pickle would have known how to serialize this object:
import pickle
class Test(object):
def __init__(self, file_path="test1234567890.txt"):
# Used later in __reduce__
self._file_name_we_opened = file_path
# An open file in write mode
self.some_file_i_have_opened = open(self._file_name_we_opened, 'wb')
def __reduce__(self):
# we return a tuple of class_name to call,
# and optional parameters to pass when re-creating
return (self.__class__, (self._file_name_we_opened, ))
my_test = Test()
saved_object = pickle.dumps(my_test)
# Just print the representation of the string of the object,
# because it contains newlines.
print(repr(saved_object))
This should give you something like: "c__main__\nTest\np0\n(S'test1234567890.txt'\np1\ntp2\nRp3\n.", which can be used to recreate the object with open file handles:
print(vars(pickle.loads(saved_object)))
In general, the __reduce__ method needs to return a tuple with at least two elements:
A blank object class to call. In this case, self.__class__
A tuple of arguments to pass to the class constructor. In the example it's a single string, which is the path to the file to open.
Consult the docs for a detailed explanation of what else the __reduce__ method can return.
I want to wrap the default open method with a wrapper that should also catch exceptions. Here's a test example that works:
truemethod = open
def fn(*args, **kwargs):
try:
return truemethod(*args, **kwargs)
except (IOError, OSError):
sys.exit('Can\'t open \'{0}\'. Error #{1[0]}: {1[1]}'.format(args[0], sys.exc_info()[1].args))
open = fn
I want to make a generic method of it:
def wrap(method, exceptions = (OSError, IOError)):
truemethod = method
def fn(*args, **kwargs):
try:
return truemethod(*args, **kwargs)
except exceptions:
sys.exit('Can\'t open \'{0}\'. Error #{1[0]}: {1[1]}'.format(args[0], sys.exc_info()[1].args))
method = fn
But it doesn't work:
>>> wrap(open)
>>> open
<built-in function open>
Apparently, method is a copy of the parameter, not a reference as I expected. Any pythonic workaround?
The problem with your code is that inside wrap, your method = fn statement is simply changing the local value of method, it isn't changing the larger value of open. You'll have to assign to those names yourself:
def wrap(method, exceptions = (OSError, IOError)):
def fn(*args, **kwargs):
try:
return method(*args, **kwargs)
except exceptions:
sys.exit('Can\'t open \'{0}\'. Error #{1[0]}: {1[1]}'.format(args[0], sys.exc_info()[1].args))
return fn
open = wrap(open)
foo = wrap(foo)
Try adding global open. In the general case, you might want to look at this section of the manual:
This module provides direct access to all ‘built-in’ identifiers of Python; for example, __builtin__.open is the full name for the built-in function open(). See chapter Built-in Objects.
This module is not normally accessed explicitly by most applications, but can be useful in modules that provide objects with the same name as a built-in value, but in which the built-in of that name is also needed. For example, in a module that wants to implement an open() function that wraps the built-in open(), this module can be used directly:
import __builtin__
def open(path):
f = __builtin__.open(path, 'r')
return UpperCaser(f)
class UpperCaser:
'''Wrapper around a file that converts output to upper-case.'''
def __init__(self, f):
self._f = f
def read(self, count=-1):
return self._f.read(count).upper()
# ...
CPython implementation detail: Most modules have the name __builtins__ (note the 's') made available as part of their globals. The value of __builtins__ is normally either this module or the value of this modules’s __dict__ attribute. Since this is an implementation detail, it may not be used by alternate implementations of Python.
you can just add return fn at the end of your wrap function and then do:
>>> open = wrap(open)
>>> open('bhla')
Traceback (most recent call last):
File "<pyshell#24>", line 1, in <module>
open('bhla')
File "<pyshell#18>", line 7, in fn
sys.exit('Can\'t open \'{0}\'. Error #{1[0]}: {1[1]}'.format(args[0], sys.exc_info()[1].args))
SystemExit: Can't open 'bhla'. Error #2: No such file or directory