Multi-valued data in DynamoDB using boto

Multi-valued data in DynamoDB using boto - python

After scouring the documentation and various tutorials, I cannot figure out how to set or update an attribute on a dynamo Item that is a multi-valued data type (number or string set). I'm using boto (boto.dynamodb2, to be specific -- not boto.dynamodb).
Trying something like this (where 'id' is the hash key):
Item(Table('test'), data={'id': '123', 'content': 'test', 'list': [1,2,3,4]}).save()
Results in this error:
TypeError: Unsupported type "<type 'list'>" for value "[1, 2, 3, 4]"
I feel like this must be possible in boto.dynamodb2, but it's odd that I can't find any examples of people doing this. (Everyone is just setting number or string attributes, not number set or string set attributes.)
Any insight on this topic and how I might get this to work with boto would be very much appreciated! I'm guessing I'm overlooking something simple. Thanks!

Okay, we were able to figure this out on our own. The problem with my example above is that I'm using a list instead of a set. The value of a multi-value attribute MUST be a set.
For example, this works:
Item(Table('test'), data={'id': '123', 'content': 'test', 'list': set([1,2,3,4])}).save()

DnyamoDB now supports Dict/List directly. Boto doesn't have support for it yet, but it's a small patch until it's supported in production.
############################################################
# Patch Dynamizer to support dict/list
############################################################
from boto.dynamodb.types import Dynamizer, get_dynamodb_type
def _get_dynamodb_type(self, attr):
if isinstance(attr, dict):
return 'M'
if isinstance(attr, list):
return 'L'
return get_dynamodb_type(attr)
def _encode_m(self, attr):
result = {}
for k, v in attr.items():
result[k] = self.encode(v)
return result
def _decode_m(self, attr):
result = {}
for k, v in attr.items():
result[k] = self.decode(v)
return result
def _encode_l(self, attr):
return [self.encode(v) for v in attr]
def _decode_l(self, attr):
return [self.decode(v) for v in attr]
Dynamizer._get_dynamodb_type = _get_dynamodb_type
Dynamizer._encode_m = _encode_m
Dynamizer._decode_m = _decode_m
Dynamizer._encode_l = _encode_l
Dynamizer._decode_l = _decode_l
############################################################
# End patch Dynamizer to support dict/list
############################################################

This works normally with boto3:
session = boto3.Session(
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
dynamodb = session.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('table')
list = ['1','2','3']
table.put_item(
Item={
'id': 01,
'message': list,
'timestamp': '2019-05-01 22:14:00'
}
)
Your data will be saved like the following:

Related

Unable to insert nested object in mongodb using pymongo

I am coming today following an issue that doesn't make sense to me using python and mongodb. I am a Go/C# developer so maybe I am missing something but I have the following case:
from datetime import datetime
from bson import ObjectId
class DailyActivity:
user_ids = []
date = None
def __init__(self, user_ids : [ObjectId] = [], date : datetime = None):
self.user_ids = user_ids
self.date = date
class ActivitiesThroughDays:
daily_activies = []
def add_daily_activity(self, daily_activity : DailyActivity = None):
daily_activies.append(daily_activity)
I then have these 2 classes but also another file containing some helper to use mongodb:
from pymongo import MongoClient
def get_client():
return MongoClient('localhost', 27017)
def get_database(database_name: str = None):
if database_name is None:
raise AttributeError("database name is None.")
return get_client().get_database(database_name)
def get_X_database():
return get_database("X")
And here we get to the issue.. I am now building a simple ActivitiesThroughDays object which has only one DailyActivity containing X user ids (as ObjectId array/list).
However, when I try to insert_one, I get the following:
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
this is the piece of code that raise the exception:
def insert_activities_though_days(activities_through_days: ActivitiesThroughDays = None):
if activities_through_days is None:
raise AttributeError("activities_through_days is None.")
col = get_EM_column("activities_through_days")
col.insert_one(activities_through_days)
Based on the above issue, I then tried to convert my ActivitiesThroughDays into dic/json:
col.insert_one(activities_through_days.__dict__)
bson.errors.InvalidDocument: cannot encode object: models. DailyActivity. DailyActivity object at 0x10eea0320, of type: class 'models. DailyActivity. DailyActivity'
col.insert_one(json.dumps(activities_through_days))
TypeError: Object of type ActivitiesThroughDays is not JSON serializable
So based on this, I began to search for different solutions over google and found out solutions such as :
def to_dict(obj):
if not hasattr(obj,"__dict__"):
return obj
result = {}
for key, val in obj.__dict__.items():
if key.startswith("_"):
continue
element = []
if isinstance(val, list):
for item in val:
element.append(to_dict(item))
else:
element = to_dict(val)
result[key] = element
return result
But I got :
bson.errors.InvalidDocument: cannot encode object: property object at 0x10229aa98, of type: class 'property'
For each step I move forward, another issue comes up... To me, all of this doesn't make sense at all because.. there should be a generic serializer/deserializer somewhere that would, from 1 line, convert any nested objects/arrays to be inserted in mongodb..
Also, from one of the solution I tried, I found out that ObjectId were ignored while mapping to json/dict (I don't remember which one)
I am not at all a Python developer so please, feel free to give any tips :)
Thanks

pymongo's interface expects dict and .__dict__ is a very low level attribute.
I'm afraid you'll spend a lot of energy if you try to build an ORM/ODM for mongodb from scratch.
There are existing ORM/ODM libraries that exist for mongodb in python (mongoengine, pymodm which are quite similar) and that could help you to get something working quickly.
Here are a few lines that shows how the models would look with mongoengine and how to save them:
import datetime as dt
from mongoengine import *
connect(host='mongodb://localhost:27017/testdb')
class User(Document):
email = EmailField(required=True)
class DailyActivity(Document):
users = ListField(ReferenceField(User))
date = DateTimeField(default=dt.datetime.utcnow)
user = User(email='test#garbage.com').save()
user2 = User(email='test2#garbage.com').save()
activity = DailyActivity(users=[user, user2]).save()
I hope this helps

Access dict via dict.key

I created a dict source = {'livemode': False}. I thought it's possible to access the livemode value via source.livemode. But it doesn't work. Is there a way to access it that way?
As a not source['livemode'] works, but I need source.livemode as that's already used in my code and I have to handle it as an alternative to the Stripe return value charge.
I want to give a bit more context
Here I create a charge via Stripe:
def _create_charge(self, request, order_reference, order_items_dict, token):
try:
charge = stripe.Charge.create(
amount=order_items_dict['total_gross'],
application_fee=order_items_dict['application_fee'],
currency=order_items_dict['event'].currency,
source=token,
stripe_account=order_items_dict['event'].organizer.stripe_account,
expand=['balance_transaction', 'application_fee'],
)
except stripe.error.StripeError as e:
body = e.json_body
err = body.get('error', {})
messages.error(
request,
err.get('message')
)
else:
if charge.paid and charge.status == 'succeeded':
return charge
I can access this with e.g. charge_or_source.livemode
def _create_order(self, request, charge_or_source, order_status):
order_reference = request.session.get('order_reference')
new_order = self.order_form.save(commit=False)
print(charge_or_source.livemode, "charge_or_source.livemode")
new_order_dict = {
'total_gross': self.order_items_dict['total_gross'],
'livemode': charge_or_source.livemode,
}
Now there is a case (when the order is Free) where I have to 'skip' the _create_charge function but still, I have to send information about charge_or_source.livemode. Therefore I tried to create the above-mentioned dictionary.

You can implement a custom dict wrapper (either a subclass of dict or something that contains a dict) and implement __getattr__ (or __getattribute__) to return data from the dict.
class DictObject(object):
def __init__(self, data):
self.mydict = data
def __getattr__(self, attr):
if attr in self.mydict: return self.mydict[attr]
return super(self, DictObject).__getattr__(attr)

I'm a beginner myself, but let me try and answer:
Say you have a dictionary:
dictionary = {"One": 1, "Two": 2, "Three": 3}
You can create a class with its keys like:
class DictKeys:
One = 'One'
Two = 'Two'
Three = 'Three'
Here, One, Two and Three are class variables or attributes, which means if you create an object for this class:
key = DictKeys()
You can access all of those keys using the '.' (dot) operator.
key.One
>>'One'
Now just plug it where ever you want to access your dictionary!
dictionary[key.One]
>>1
I'm sure this isn't the best way, and class access is a tiny bit slower than dict access, but if you really want to, you can access all your keys with a dot using this method.

The correct way to access a dictionary is how you proposed it:
source['livemode']

What is the most efficient way of retrieving the port value from this json list

I have the below list from which I have to retrieve the port number I want the value 50051 but what I get is port=50051 I know I can retrieve this by iterating the list and using string operations but wanted to see if there is some direct way to access this.
r = requests.get(url_service)
data = {}
data = r.json()
#Below is the json after printing
[{'ServerTag': [ 'abc-service=true',
'port=50051',
'protocol=http']
}]
print(data[0]["ServiceTags"][1]) // prints port=50051

You can do something like this perhaps:
received_dic = {
'ServerTag': [ 'abc-service=true',
'port=50051',
'protocol=http']
}
ServerTag = received_dic.get("ServerTag", None)
if ServerTag:
port = list(filter(lambda x: "port" in x, ServerTag))[0].split("=")[1]
print(port)
# 50051

Considering you have the following JSON:
[
{
"ServerTag": ["abc-service=true", "port=50051", "protocol=http"]
}
]
You can extract your value like this:
from functools import partial
# ...
def extract_value_from_tag(tags, name, default=None):
tags = map(partial(str.split, sep='='), tags)
try:
return next(value for key, value in tags if key == name)
except StopIteration:
# Tag was not found
return default
And then you just:
# Providing data is the deserialized JSON as a Python list
# Also assuming that data is not empty and ServerTag is present on the first object
tags = data[0].get('ServerTag', [])
port_number = extract_value_from_tag(tags, 'port', default='8080')

How can I return a list of results as JSON?

I want to return the result of a query as JSON. I'm using the following route to return one model instance as a JSON object.
#mod.route('/autocomplete/<term>', methods=['GET'])
def autocomplete(term):
country = Country.query.filter(Country.name_pt.ilike('%'+ term + '%')).first()
country_dict = country.__dict__
country_dict.pop('_sa_instance_state', None)
return jsonify(json_list=country_dict)
This code works well if I use the first() method. However, I need to use the all() to get all results. When I do that, I get the following error.
country_dict = country.__dict__
AttributeError: 'list' object has no attribute '__dict__'
What should I be doing to return the entire list of results as JSON?

You need to do that "jsonify preparation step" for each item in the list, since .all() returns a list of model instances, not just one instance like .first(). Work on a copy of each __dict__ so you don't mess with SQLAlchemy's internal representation of the instances.
#mod.route('/autocomplete/<term>', methods=['GET'])
def autocomplete(term):
countries = []
for country in Country.query.filter(Country.name_pt.ilike('%' + term + '%'):
country_dict = country.__dict__.copy()
country_dict.pop('_sa_instance_state', None)
countries.append(country_dict)
return jsonify(json_list=countries)
Probably better just to return the data about each country explicitly, rather than trying to magically jsonify the instance.
#mod.route('/autocomplete/<term>', methods=['GET'])
def autocomplete(term):
countries = []
for country in Country.query.filter(Country.name_pt.ilike('%' + term + '%'):
countries.append({
'id': country.id,
'name': country.name_pt,
})
return jsonify(countries=countries)

method of iterating over sqlalchemy model's defined columns?

I've been trying to figure out how to iterate over the list of columns defined in a SQLAlchemy model. I want it for writing some serialization and copy methods to a couple of models. I can't just iterate over the obj.__dict__ since it contains a lot of SA specific items.
Anyone know of a way to just get the id and desc names from the following?
class JobStatus(Base):
__tablename__ = 'jobstatus'
id = Column(Integer, primary_key=True)
desc = Column(Unicode(20))
In this small case I could easily create a:
def logme(self):
return {'id': self.id, 'desc': self.desc}
but I'd prefer something that auto-generates the dict (for larger objects).

You could use the following function:
def __unicode__(self):
return "[%s(%s)]" % (self.__class__.__name__, ', '.join('%s=%s' % (k, self.__dict__[k]) for k in sorted(self.__dict__) if '_sa_' != k[:4]))
It will exclude SA magic attributes, but will not exclude the relations. So basically it might load the dependencies, parents, children etc, which is definitely not desirable.
But it is actually much easier because if you inherit from Base, you have a __table__ attribute, so that you can do:
for c in JobStatus.__table__.columns:
print c
for c in JobStatus.__table__.foreign_keys:
print c
See How to discover table properties from SQLAlchemy mapped object - similar question.
Edit by Mike: Please see functions such as Mapper.c and Mapper.mapped_table. If using 0.8 and higher also see Mapper.attrs and related functions.
Example for Mapper.attrs:
from sqlalchemy import inspect
mapper = inspect(JobStatus)
for column in mapper.attrs:
print column.key

You can get the list of defined properties from the mapper. For your case you're interested in only ColumnProperty objects.
from sqlalchemy.orm import class_mapper
import sqlalchemy
def attribute_names(cls):
return [prop.key for prop in class_mapper(cls).iterate_properties
if isinstance(prop, sqlalchemy.orm.ColumnProperty)]

I realise that this is an old question, but I've just come across the same requirement and would like to offer an alternative solution to future readers.
As Josh notes, full SQL field names will be returned by JobStatus.__table__.columns, so rather than the original field name id, you will get jobstatus.id. Not as useful as it could be.
The solution to obtaining a list of field names as they were originally defined is to look the _data attribute on the column object, which contains the full data. If we look at JobStatus.__table__.columns._data, it looks like this:
{'desc': Column('desc', Unicode(length=20), table=<jobstatus>),
'id': Column('id', Integer(), table=<jobstatus>, primary_key=True, nullable=False)}
From here you can simply call JobStatus.__table__.columns._data.keys() which gives you a nice, clean list:
['id', 'desc']

Assuming you're using SQLAlchemy's declarative mapping, you can use __mapper__ attribute to get at the class mapper. To get all mapped attributes (including relationships):
obj.__mapper__.attrs.keys()
If you want strictly column names, use obj.__mapper__.column_attrs.keys(). See the documentation for other views.
https://docs.sqlalchemy.org/en/latest/orm/mapping_api.html#sqlalchemy.orm.mapper.Mapper.attrs

self.__table__.columns will "only" give you the columns defined in that particular class, i.e. without inherited ones. if you need all, use self.__mapper__.columns. in your example i'd probably use something like this:
class JobStatus(Base):
...
def __iter__(self):
values = vars(self)
for attr in self.__mapper__.columns.keys():
if attr in values:
yield attr, values[attr]
def logme(self):
return dict(self)

To get an as_dict method on all of my classes I used a Mixin class which uses the technics described by Ants Aasma.
class BaseMixin(object):
def as_dict(self):
result = {}
for prop in class_mapper(self.__class__).iterate_properties:
if isinstance(prop, ColumnProperty):
result[prop.key] = getattr(self, prop.key)
return result
And then use it like this in your classes
class MyClass(BaseMixin, Base):
pass
That way you can invoke the following on an instance of MyClass.
> myclass = MyClass()
> myclass.as_dict()
Hope this helps.
I've played arround with this a bit further, I actually needed to render my instances as dict as the form of a HAL object with it's links to related objects. So I've added this little magic down here, which will crawl over all properties of the class same as the above, with the difference that I will crawl deeper into Relaionship properties and generate links for these automatically.
Please note that this will only work for relationships have a single primary key
from sqlalchemy.orm import class_mapper, ColumnProperty
from functools import reduce
def deepgetattr(obj, attr):
"""Recurses through an attribute chain to get the ultimate value."""
return reduce(getattr, attr.split('.'), obj)
class BaseMixin(object):
def as_dict(self):
IgnoreInstrumented = (
InstrumentedList, InstrumentedDict, InstrumentedSet
)
result = {}
for prop in class_mapper(self.__class__).iterate_properties:
if isinstance(getattr(self, prop.key), IgnoreInstrumented):
# All reverse relations are assigned to each related instances
# we don't need to link these, so we skip
continue
if isinstance(prop, ColumnProperty):
# Add simple property to the dictionary with its value
result[prop.key] = getattr(self, prop.key)
if isinstance(prop, RelationshipProperty):
# Construct links relaions
if 'links' not in result:
result['links'] = {}
# Get value using nested class keys
value = (
deepgetattr(
self, prop.key + "." + prop.mapper.primary_key[0].key
)
)
result['links'][prop.key] = {}
result['links'][prop.key]['href'] = (
"/{}/{}".format(prop.key, value)
)
return result

self.__dict__
returns a dict where keys are attribute names and values the values of the object.
/!\ there is a supplementary attribute: '_sa_instance_state'
but you can handle it :)

While row._asdict() worked for most of the cases, I needed some approach that also works after object creation process (db.session.add etc.). The idea is to create a method to_dict accessing columns on the table object and use standard getattr.
class Inventory(db.Model):
__tablename__ = 'inventory'
id = db.Column('id', db.Integer(), primary_key=True)
date = db.Column('date', db.DateTime, nullable=False, default=datetime.utcnow)
item = db.Column('item', db.String(100))
def to_dict(self):
return {
column.name: getattr(self, column.name, None)
for column in Inventory.__table__.columns
}
record = Inventory(item="gloves")
db.session.add(record)
db.session.commit()
# print(record._asdict()) # << that doesn't work
print(record.to_dict()) # << that works as intended
This solution will produce dict with columns only - no meta attributes or anything that you will have to manually clean after the next major update (if any).
PS. I use flask-sqlalchemy but it does not change the idea

I want to get the data of a particular instance of Model dynamically. I used this code.
def to_json(instance):
# get columns data
data = {}
columns = list(instance.__table__.columns)
for column in columns:
data[column.name] = instance.__dict__[column.name]
return data

To map a model from sqlalchemy to a json, taking into account relationships, I use this code
from sqlalchemy.orm import class_mapper
from sqlalchemy.ext.declarative import DeclarativeMeta
from sqlalchemy.orm import ColumnProperty
from sqlalchemy.orm import RelationshipProperty
class BaseMixin(object):
"""BaseMixin"""
__repr_hide = ["created_at", "updated_at"]
__insert_hide = []
#property
def _repr_hide(self):
return self.__repr_hide
#_repr_hide.setter
def _repr_hide(self, k):
self.__repr_hide.append(k)
#property
def _insert_hide(self):
return self.__insert_hide
#_insert_hide.setter
def _insert_hide(self, k):
self.__insert_hide.append(k)
def serialize(self, obj):
"""serialize from json"""
for k, v in obj.items():
if k in self.__repr_hide:
continue
if k in self.__insert_hide:
continue
if k in self.__table__.c.keys():
setattr(self, k, v)
return self
def deserialize(self, backref=None):
"""deserialize to json"""
res = dict()
for prop in class_mapper(self.__class__).iterate_properties:
if prop.key in self.__repr_hide:
continue
if isinstance(prop, ColumnProperty):
res[prop.key] = getattr(self, prop.key)
for prop in class_mapper(self.__class__).iterate_properties:
if prop.key in self.__repr_hide:
continue
if isinstance(prop, RelationshipProperty):
if prop.key == str(backref):
continue
key, value = prop.key, getattr(self, prop.key)
if value is None:
res[key] = None
elif isinstance(value.__class__, DeclarativeMeta):
res[key] = value.deserialize(backref=self.__table__)
else:
res[key] = [i.deserialize(backref=self.__table__) for i in value]
return res
def __iter__(self):
return iter(self.deserialize().items())
def __repr__(self):
vals = ", ".join(
"%s=%r" % (n, getattr(self, n))
for n in self.__table__.c.keys()
if n not in self._repr_hide
)
return "<%s={%s}>" % (self.__class__.__name__, vals)

I know this is an old question, but what about:
class JobStatus(Base):
...
def columns(self):
return [col for col in dir(self) if isinstance(col, db.Column)]
Then, to get column names: jobStatus.columns()
That would return ['id', 'desc']
Then you can loop through, and do stuff with the columns and values:
for col in jobStatus.colums():
doStuff(getattr(jobStatus, col))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multi-valued data in DynamoDB using boto - python

Okay, we were able to figure this out on our own. The problem with my example above is that I'm using a list instead of a set. The value of a multi-value attribute MUST be a set. For example, this works: Item(Table('test'), data={'id': '123', 'content': 'test', 'list': set([1,2,3,4])}).save()

Related

Unable to insert nested object in mongodb using pymongo

Access dict via dict.key

What is the most efficient way of retrieving the port value from this json list

How can I return a list of results as JSON?

method of iterating over sqlalchemy model's defined columns?

Categories

Resources