Storing a PostgreSQL ARRAY of ENUM values - python

I have a table that can have a status:
statuses = ['unmoderated', 'nominee', 'finalist', 'winner']
status = db.Enum(
*statuses, name='enum_nomination_status', metadata=db.metadata)
class Nomination(db.Model):
status = db.Column(status, default='unmoderated')
I would now like to have a table that has a column that can contain multiple statuses:
class Judge(db.Model):
statuses = db.Column(ARRAY(status, dimensions=1))
However the above approach leads me to this error:
ProgrammingError: (psycopg2.ProgrammingError) column "statuses" is of type enum_nomination_status[] but expression is of type text[]
LINE 1: ...4, 'Name', ARRAY['unm...
^
HINT: You will need to rewrite or cast the expression.
So I tried to create a custom type that did the cast to the enum type:
class STATUS_ARRAY(TypeDecorator):
impl = ARRAY(status, dimensions=1)
def process_bind_param(self, value, dialect):
if value is None:
return value
else:
return cast(array(value), ARRAY(status, dimensions=1))
But this causes a segfault.
I've also tried casting the individual items:
class STATUS_ARRAY(TypeDecorator):
impl = ARRAY(status, dimensions=1)
def process_bind_param(self, value, dialect):
if value is None:
return value
else:
return array(cast(s, status) for s in value)
But I get:
ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'Cast' [SQL: 'INSERT INTO judge (statuses) VALUES (%(statuses)s)'] [parameters: {'statuses': [<sqlalchemy.sql.elements.Cast object at 0x7fc8bb69c710>]}]
I admit that I'm mostly trying different combinations of casting things without really knowing what's going on underneath the hood. I tried looking at the underlying ENUM implementation to see if I could get at some kind of native enum type without casting but I couldn't see anything. I'm grasping at straws.
Thanks for your help :)

As of 1.3.17, no workaround is needed anymore
The answer below ended up in the docs as ARRAY of ENUM. This docs page now says:
Changed in version 1.3.17: The combination of ENUM and ARRAY is now directly handled by SQLAlchemy’s implementation without any workarounds needed.
Old answer for historical purposes:
I looked at Issue 3467 posted by Wichert Akkerman, and this work-around was posted. Credit to Mike Bayer. Declare the following class in your code (with the necessary imports, of course):
from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy import cast
class ArrayOfEnum(ARRAY):
def bind_expression(self, bindvalue):
return cast(bindvalue, self)
def result_processor(self, dialect, coltype):
super_rp = super(ArrayOfEnum, self).result_processor(dialect, coltype)
def handle_raw_string(value):
if value==None:
return []
inner = re.match(r"^{(.*)}$", value).group(1)
return inner.split(",")
def process(value):
return super_rp(handle_raw_string(value))
return process
ArrayOfEnum is now a special column type that gets used in the model definition.
So instead of
class Judge(db.Model):
statuses = db.Column(ARRAY(status))
Now you can do:
class Judge(db.Model):
statuses = db.Column(ArrayOfEnum(status))
Now in your code you can assign values to statuses with a list and it will do the proper casting upon saving:
my_judge_object.status = ['unmoderated', 'nominee']

Related

Translation of message dict into msg enum

I'm dealing with refactoring code which extensively uses dicts in a circumstance where enums could be used. Unfortunately, to reduce typing the dict keys were abbreviated in a cryptic fashion.
In order to have more meaningful code and fewer string literals as well as a more advanced interface I translated the message dictionary based code into an Enum based code using the same messages.
The message dictionaries looked like the following:
MsgDictionary = {'none': None,
'STJ': 'start_job',
'RPS': 'report_status',
'KLJ': 'kill_job'}
ExecStates = {'none': None,
'JCNS': 'job_could_not_start',
'JSS': 'job_successfully_started',
'JSF': 'job_successfully_finished'}
This, unfortunately lead to cluttered code:
...
self.send_message(id = MsgDictionary["stj"], some_data)
...
msg = self.receive_msg()
if msg.id in (MsgDictionary['STJ'], MsgDictionary['KLJ']):
self.toggle_job()
...
I would merely like to get rid of the string accesses, the cryptic names and the low level interface, like in the following. This send_message should send the str typed value of the Enum not the Enum instance itself.
...
self.send_message(id = MessagesEnum.START_JOB, some_data)
...
msg = self.receive_msg()
if msg.id in (MessagesEnum.START_JOB, MessagesEnum.KILL_JOB):
self.toggle_job()
...
But as in the original case, undefined execution states should still be allowed. This does currently not work. The reason is to not break existing code:
e = ExecStates(None)
-> ValueError: None is not a valid ExecutionStates
And I would like to be able to compare enum instances, e.g.:
e = ExecState[START_JOB]
if e == ExecState[START_JOB]:
pass
if e == ExecState[KILL_JOB]:
pass
Using the following definitions, I believe I'm almost there:
import enum
class _BaseEnum(str, enum.Enum):
#classmethod
def values(cls) -> DictValues:
return cls.__members__.values()
def _generate_next_value_(name: str, *args: object) -> str:
return name.lower()
def __str__(self):
return str(self.value) # Use stringification to cover the None value case
class MessageEnum(_BaseEnum):
NONE = None
START_JOB = enum.auto()
REPORT_STATUS = enum.auto()
KILL_JOB = enum.auto()
class ExecutionState(_BaseEnum):
NONE = None
JOB_COULD_NOT_START = enum.auto()
JOB_SUCCESSFULLY_STARTED = enum.auto()
JOB_SUCCESSFULLY_FINISHED = enum.auto()
However, one problem still remains. How can I deal with None value as well as strings in the enumerations? In my case, all enum items gets mapped to the lowercase of the enum item name. Which is the intended functionality. However, None gets unintendedly mapped to 'None'. This in effect leads to problems at other spots in the existing code which initializes an ExecutionState instance with None. I would like to also cover this case to not break existing code.
When I add a __new__ method to the _BaseEnum,
def __new__(cls, value):
obj = str.__new__(cls)
obj._value_ = value
return obj
I loose the possibility to compare the enumeration instances as all instances compare equal to ``.
My question is, in order to solve my problem, if I can corner case the None either in the _generate_next_value_ or the __new__ method or maybe using a proxy pattern ?
Two things that should help:
in your __new__, the creation line should read obj = str.__new__(cls, value) -- that way each instance will compare equal to its lower-cased name
export your enum members to the global namespace, and use is:
START_JOB, REPORT_STATUS, KILL_JOB = MessageEnum
...
if e is START_JOB: ...
...
if msg.id in (START_JOB, KILL_JOB): ...

How to safely unpack dict in python?

I have a wrapper class - it's an abstraction that I return from backend to frontend.
from typing import NamedTuple
class NewsItem(NamedTuple):
id: str
title: str
content: str
service: str
published_at: datetime
#classmethod
def from_payload(cls, payload) -> 'NewsItem':
return cls(**payload)
For example, when I get data from elastic I convert it to NewsItem:
return [NewsItem.from_payload(hit['_source'])
for hit in result['hits']['hits']]
The problem is I don't want to fail because of unknown fields that can come from elastic. How to ignore them (or put into a separate dedicated attribute list NewsItem.extra)?
I think the most elegant way is to use ._fields of NewsItem:
#classmethod
def from_payload(cls, payload) -> 'NewsItem':
return cls(*(payload[field] for field in cls._fields))
If you want to keep extras, you would need to do some work (field extra declared as extra: dict = {}):
#classmethod
def from_payload(cls, payload) -> 'NewsItem':
fields_no_extra = set(cls._fields) - {'extra'}
extra_fields = payload.keys() - fields_no_extra
extras = {field: payload[field] for field in extra_fields}
data = {field: payload[field] for field in fields_no_extra}
data['extra'] = extras
return cls(**data)
You can optimize this further, too much computation with sets;)
Of course my solutions do not handle case where payload doesn't contain all of the fields of the NewsItem
You can use **kwargs to let your __init__ take an arbitrary number of keyword arguments ("kwargs" means "keyword arguments") and discard unnecessary arguments:
class NewsItem(NamedTuple):
id: str
title: str
content: str
service: str
published_at: datetime
#classmethod
def from_payload(cls, id=None, title=None, content=None, service=None, published_at=None, **kwargs) -> 'NewsItem':
return cls(id, title, content, service, published_at)
Alternative solution with introspection NamedTuple class attributes (see #MOROZILnic answer + comment)
Since your problem is with the unknown key's you can use get method of the dictionary to safely ignore unknown keys.
For get method, first argument is the key you are looking for and the second argument is the Default value which will be returned when the key is not found.
so, do the following
return [NewsItem.from_payload(hit['_source'])
for hit in result.get('hits',{}).get('hits',"NOT FOUND"):
The above is just a example. do modify what you want to get when the hit does not have the key you want.

How can I adapt custom types in SQL Expression statements?

I have a custom type in application using the SQLAlchemy ORM mapper. For some complex queries I need to use the SQL expression module, but this makes handling of the custom types non-transparent. How can I tell SQLAlchemy to use my custom types for mapping when not using the ORM?
Below is a quick example demonstrating the problem.
Note that the first query works, but I have to manually cast it first to str in Python and next to INET for PostgreSQL even though I have my custom type defined.
I understand that the SQL expression module is unaware of the custom type as it is defined one layer above it in the ORM. But I wonder if there is no way I could wire that custom type somehow into the SQL layer making usage of types and values much more transparent. And additionally ensuring that any operation (clean-ups and so on) defined in the custom type are consistently applied no matter what layer of SA is being used.
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql.expression import any_
from sqlalchemy.types import TypeDecorator
Base = declarative_base()
class PgIpInterface(TypeDecorator):
"""
A codec for :py:mod:`ipaddress` interfaces.
"""
impl = INET
def process_bind_param(self, value, dialect):
return str(value) if value else None
def process_result_value(self, value, dialect):
return ip_interface(value) if value else None
def process_literal_param(self, value, dialect):
raise NotImplementedError('Not yet implemented')
class Network(Base):
__tablename__ = 'example_table'
cidr = Column(PgIpInterface, primary_key=True)
def execute(query):
import logging
LOG = logging.getLogger()
try:
print(query)
print(query.all())
except:
LOG.exception('!!! failed')
engine = create_engine('postgresql://malbert#/malbert')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
ranges = [
ip_interface('192.168.1.0/24'),
ip_interface('192.168.3.0/24'),
]
# Query with manual casting
print(' Manual Casting via "str" '.center(80, '-'))
arr = array([cast(str(_), INET) for _ in ranges])
query1 = session.query(Network).filter(Network.cidr.op("<<=")(any_(arr)))
execute(query1)
print(' Manual Casting '.center(80, '-'))
arr = array([cast(_, INET) for _ in ranges])
query2 = session.query(Network).filter(Network.cidr.op("<<=")(any_(arr)))
execute(query2)
# Query without casting
print(' No Casting '.center(80, '-'))
query3 = session.query(Network).filter(Network.cidr.op("<<=")(any_(ranges)))
execute(query3)
To make your second query work, simply cast to your custom type:
arr = array([cast(_, PgIpInterface) for _ in ranges])
To make your third query work, you need to go one level deeper, in psycopg2. psycopg2 has builtin support for ipaddress types, but unfortunately it seems incomplete. (ipaddress types are converted to string without an explicit cast.)
register_ipaddress() # register ipaddress handling globally
arr = [ip_interface('192.168.1.0/24'), ip_interface('192.168.3.0/24')]
session.query(Foo).filter(Foo.cidr.op("<<=")(any_(arr))).all()
This renders something like
WHERE foo.cidr <<= ANY (ARRAY['192.168.1.0/24', '192.168.3.0/24'])
which fails with a operator does not exist: inet <<= text error. Fortunately, it's easy to fix it; we'll just rewrite register_ipaddress ourselves:
import ipaddress
from psycopg2.extensions import (
AsIs,
new_array_type,
new_type,
register_adapter,
register_type
)
def register_ipaddress():
def cast_interface(s, cur=None):
if s is None:
return None
return ipaddress.ip_interface(s)
inet = new_type((869,), 'INET', cast_interface)
ainet = new_array_type((1041,), 'INET[]', inet)
def cast_network(s, cur=None):
if s is None:
return None
return ipaddress.ip_network(s)
cidr = new_type((650,), 'CIDR', cast_network)
acidr = new_array_type((651,), 'CIDR[]', cidr)
for caster in [inet, ainet, cidr, acidr]:
register_type(caster)
def adapt_interface(obj):
return AsIs("'{}'::inet".format(obj))
for t in [ipaddress.IPv4Interface, ipaddress.IPv6Interface]:
register_adapter(t, adapt_interface)
def adapt_network(obj):
return AsIs("'{}'::cidr".format(obj))
for t in [ipaddress.IPv4Network, ipaddress.IPv6Network]:
register_adapter(t, adapt_network)
This will render your query like
WHERE foo.cidr <<= ANY (ARRAY['192.168.1.0/24'::inet, '192.168.3.0/24'::inet])
Note the difference between using
arr = array([ip_interface...])
and
arr = [ip_interface...]
In the former case, the array is processed by SQLAlchemy, so you'll get n bound parameters for n items in the list; in the latter case, the array is processed by psycopg2, so you'll get one bound parameter for the entire array.

how to return dynamic json as response from google cloud endpoint python

I want the following json to return from google endpoint
{"arts":[{"id":"4","name":"punjabi"},{"id":"5","name":"hindi"}],"Science":[{"id":"1","name":"MCA"},{"id":"2","name":"physics"},{"id":"3","name":"chemistry"}]}
Here is how I am declaring my endpoint
#endpoints.method(TokenAsInput,GetDepartmentListOutput,
path='getdepartmentlist', http_method='GET',
name='GetDepartmentList')
def getDepartmentList(self,request):
objResult = GetDepartmentListOutput()
objResult.data = dynamicJson
return objResult
But I don't know how to declare GetDepartmentListOutput so that it can map the above JSON.The object 'arts','science' are dynamic, may or may not exist.
I use ProtoRpc messages (that underlie Cloud Endpoints) and use the following to return general Json message data:
from protorpc import messages
class DetailMessage(messages.Message):
"""
General-format Json detail response
"""
data = GeneralField(1)
To set the result:
data = {"arts":[{"id":"4","name":"punjabi"},{"id":"5","name":"hindi"}],"Science":[{"id":"1","name":"MCA"},{"id":"2","name":"physics"},{"id":"3","name":"chemistry"}]}
return DetailMessage(data=data)
GeneralField is defined as:
class GeneralField(messages.Field):
"""
Allow for normal non-Message objects to be serialised to JSON.
This allows for variable result objects or dictionaries to be returned (Note: these objects must be Json serialisable).
"""
VARIANTS = frozenset([messages.Variant.MESSAGE])
DEFAULT_VARIANT = messages.Variant.MESSAGE
def __init__(self,
number,
required=False,
repeated=False,
variant=None):
"""Constructor.
Args:
number: Number of field. Must be unique per message class.
required: Whether or not field is required. Mutually exclusive to
'repeated'.
repeated: Whether or not field is repeated. Mutually exclusive to
'required'.
variant: Wire-format variant hint.
Raises:
FieldDefinitionError when invalid message_type is provided.
"""
super(GeneralField, self).__init__(number,
required=required,
repeated=repeated,
variant=variant)
def __set__(self, message_instance, value):
"""Set value on message.
Args:
message_instance: Message instance to set value on.
value: Value to set on message.
"""
if isinstance(value, list):
if len(value) > 0:
self.type = type(value[0])
else:
self.type = type(self)
else:
self.type = type(value)
self.__initialized = True
super(GeneralField, self).__set__(message_instance, value)
def __setattr__(self, name, value):
"""Setter overidden to allow assignment to fields after creation.
Args:
name: Name of attribute to set.
value: Value to assign.
"""
object.__setattr__(self, name, value)
def value_from_message(self, message):
"""Convert a message to a value instance.
Used by deserializers to convert from underlying messages to
value of expected user type.
Args:
message: A message instance of type self.message_type.
Returns:
Value of self.message_type.
"""
return message
def value_to_message(self, value):
"""Convert a value instance to a message.
Used by serializers to convert Python user types to underlying
messages for transmission.
Args:
value: A value of type self.type.
Returns:
An instance of type self.message_type.
"""
return value
Note: GeneralField is derived from other ProtoRpc Message code and overrides Field's set and setattr methods in order to allow normal (json-serialisable) objects or dictionaries to be used in ProtoRpc messages. You may need to adapt this approach to suit your purposes.
Note2: I am unsure how Cloud Endpoints will like this, but it may be worth a shot.

peewee + MySQL, How to create a custom field type that wraps SQL-built ins?

I'd like to create a custom UUID field in peewee (over MySQL).
In python, I'm using the UUID as a hexified string, e.g.:
uuid = '110e8400-e29b-11d4-a716-446655440000'
But I want to store it in the database to a column of type BINARY(16) to save space.
MySQL has built-in HEX() and UNHEX()methods to convert back and forth between a string and binary.
So my question is how do I tell peewee to generate SQL that uses a built-in function? Here's an idea for the code I want to work:
class UUIDField(Field):
db_field='binary(16)'
def db_value(self, value):
if value is not None:
uuid = value.translate(None, '-') # remove dashes
# HERE: How do I let peewee know I want to generate
# a SQL string of the form "UNHEX(uuid)"?
def python_value(self, value):
if value is not None:
# HERE: How do I let peewee know I want to generate
# a SQL string of the form "HEX(value)"?
Note that I'm specifically asking how to get peewee to wrap or unwrap a value in custom SQL. I realize I could probably do the value conversion entirely in python, but I'm looking for the more general-purpose SQL-based solution.
EDIT: For future reference, here is how I made it work doing the conversions in python. It doesn't answer the question though, so any ideas are appreciated!
import binascii
from peewee import *
db = MySQLDatabase(
'database',
fields={'binary(16)': 'BINARY(16)'} # map the field type
)
# this does the uuid conversion in python
class UUIDField(Field):
db_field='binary(16)'
def db_value(self, value):
if value is None: return None
value = value.translate(None, '-')
value = binascii.unhexlify(value)
return value
def python_value(self, value):
if value is None: return None
value = '{}-{}-{}-{}-{}'.format(
binascii.hexlify(value[0:4]),
binascii.hexlify(value[4:6]),
binascii.hexlify(value[6:8]),
binascii.hexlify(value[8:10]),
binascii.hexlify(value[10:16])
)
return value
Using a SelectQuery you can invoke internal SQL functions like so:
from peewee import SelectQuery
# this does the uuid conversion in python
class UUIDField(Field):
db_field = 'binary(16)'
def db_value(self, value):
if value is None: return None
value = value.translate(None, '-')
query = SelectQuery(self.model_class, fn.UNHEX(value).alias('unhex'))
result = query.first()
value = result.unhex
return value
def python_value(self, value):
if value is None: return None
query = SelectQuery(self.model_class, fn.HEX(value).alias('hex'))
result = query.first()
value = '{}-{}-{}-{}-{}'.format(
result.hex[0:8],
result.hex[8:12],
result.hex[12:16],
result.hex[16:20],
result.hex[20:32]
)
return value

Categories

Resources