I need to set up two tables in a database and I'm struggling to decide how to design the tables in SQL Alchemy.
Table 1 contains raw address data, and the source of the address. Raw addresses may appear more than once if they come from different sources.
Table 2 contains geocoded versions of these addresses. Each address appears only once. Addresses should only appear in this table if they appear at least once in Table 1
When new addresses come into the system, they first will be inserted into Table 1. I will then have a script that looks for records in Table 1 that are not in Table 2, geocodes them and inserts them into Table 2.
I have the following code:
class RawAddress(Base):
__tablename__ = 'rawaddresses'
id = Column(Integer,primary_key = True)
source_of_address = Column(String(50))
#Want something like a foreign key here, but address may not yet exist
#in geocoded address table
full_address = Column(String(400))
class GeocodedAddress(Base):
__tablename__ = 'geocodedaddresses'
full_address = Column(String(400), primary_key = True)
lat = Column(Float)
lng = Column(Float)
Is there a way of establishing the relationship between the full_address fields in SQL Alchemy? Or perhaps I've got the design wrong - maybe every time I seen a new raw address I should add it to the GeocodedAddress table, with a flag saying whether it's geocoded or not?
Thanks very much for any help with this.
Taking into account your comments, the code which would allow such data storage as well as insert/update process should do the job. Few comments before it:
Foreign Keys can be NULL, so your FK idea still works.
You can define the relationship on any model, and name the other side with backref
Code:
# Model definitions
class RawAddress(Base):
__tablename__ = 'rawaddresses'
id = Column(Integer, primary_key=True)
source_of_address = Column(String(50))
full_address = Column(
ForeignKey('geocodedaddresses.full_address'),
nullable=True,
)
class GeocodedAddress(Base):
__tablename__ = 'geocodedaddresses'
full_address = Column(String(400), primary_key=True)
lat = Column(Float)
lng = Column(Float)
raw_addresses = relationship(RawAddress, backref="geocoded_address")
now:
# logic
def get_geo(full_address):
" Dummy function which fakes `full_address` and get lat/lng using hash(). "
hs = hash(full_address)
return (hs >> 8) & 0xff, hs & 0xff
def add_test_data(addresses):
with session.begin():
for fa in addresses:
session.add(RawAddress(full_address=fa))
def add_geo_info():
with session.begin():
q = (session
.query(RawAddress)
.filter(~RawAddress.geocoded_address.has())
)
for ra in q.all():
print("Computing geo for: {}".format(ra))
lat, lng = get_geo(ra.full_address)
ra.geocoded_address = GeocodedAddress(
full_address=ra.full_address, lat=lat, lng=lng)
and some tests:
# step-1: add some raw addresses
add_test_data(['Paris', 'somewhere in Nevada'])
print("-"*80)
# step-2: get those raw which do not have geo
add_geo_info()
print("-"*80)
# step-1: again with 1 new, 1 same
add_test_data(['Paris', 'somewhere in Chicago'])
print("-"*80)
# step-2: get those raw which do not have geo
add_geo_info()
print("-"*80)
# check: print all data for Paris geo:
gp = session.query(GeocodedAddress).filter(GeocodedAddress.full_address == 'Paris').one()
assert 2 == len(gp.raw_addresses)
print(gp.raw_addresses)
Related
In a project using Flask-SQLAlchemy, i get some intermittent errors and i think it might be due to not explicitly using transactions.
I have these two model classes, one for locations and another for closures:
class Location(db.Model):
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String)
code = sa.Column(sa.String, unique=True)
class LocationPath(db.Model):
ancestor_id = sa.Column(sa.Integer, sa.ForeignKey('location.id'), nullable=False, primary_key=True)
descendant_id = sa.Column(sa.Integer, sa.ForeignKey('location.id'), nullable=False, primary_key=True)
depth = sa.Column(sa.Integer, default=0, nullable=False)
In a background process, i'm doing a lot of inserts, so i'm bypassing the ORM to use Core:
location_table = Location.__table__
location_path_table = LocationPath.__table__
statement = select([location_table.c.id]).where(code == code)
result = db.session.get_bind().execute(statement)
location_id = result.first()
if location_id is None:
statement = location_table.insert().values(**kwargs)
result = db.session.get_bind().execute(statement)
new_id = result.inserted_primary_key[0]
result.close()
else:
new_id = location_id
# save new_id as an ancestor_id or a descendant_id
path = LocationPath.query.filter_by(
ancestor_id=ancestor_id,
descendant_id=descendant_id
).first()
if path is None:
statement = location_path_table.insert().values(
ancestor_id=ancestor_id,
descendant_id=descendant_id,
depth=depth)
# the line below intermittently generates either of two errors:
# - the inserted primary key (ancestor/descendant) does not exist
# - a duplicate key error where the path already exists
result = db.session.get_bind().execute(statement)
this has resulted in quite a bit of head-scratching on my part, since i get the ancestor_id or descendant_id either from a select or an insert, and i also query the database to see if the path exists before attempting to insert it.
Edit: the code above runs in a loop.
I have two tables, Products and Orders, inside my Flask-SqlAlchemy setup, and they are linked so an order can have several products:
class Products(db.Model):
id = db.Column(db.Integer, primary_key=True)
....
class Orders(db.Model):
guid = db.Column(db.String(36), default=generate_uuid, primary_key=True)
products = db.relationship(
"Products", secondary=order_products_table, backref="orders")
....
linked via:
order_products_table = db.Table("order_products_table",
db.Column('orders_guid', db.String(36), db.ForeignKey('orders.guid')),
db.Column('products_id', db.Integer, db.ForeignKey('products.id'))
# db.Column('license', dbString(36))
)
For my purposes, each product in an order will receive a unique license string, which logically should be added to the order_products_table rows of each product in an order.
How do I declare this third license column on the join table order_products_table so it gets populated it as I insert an Order?
I've since found the documentation for the Association Object from the SQLAlchemy docs, which allows for exactly this expansion to the join table.
Updated setup:
# Instead of a table, provide a model for the JOIN table with additional fields
# and explicit keys and back_populates:
class OrderProducts(db.Model):
__tablename__ = 'order_products_table'
orders_guid = db.Column(db.String(36), db.ForeignKey(
'orders.guid'), primary_key=True)
products_id = db.Column(db.Integer, db.ForeignKey(
'products.id'), primary_key=True)
order = db.relationship("Orders", back_populates="products")
products = db.relationship("Products", back_populates="order")
licenses = db.Column(db.String(36), nullable=False)
class Products(db.Model):
id = db.Column(db.Integer, primary_key=True)
order = db.relationship(OrderProducts, back_populates="order")
....
class Orders(db.Model):
guid = db.Column(db.String(36), default=generate_uuid, primary_key=True)
products = db.relationship(OrderProducts, back_populates="products")
....
What is really tricky (but also shown on the documentation page), is how you insert the data. In my case it goes something like this:
o = Orders(...) # insert other data
for id in products:
# Create OrderProducts join rows with the extra data, e.g. licenses
join = OrderProducts(licenses="Foo")
# To the JOIN add the products
join.products = Products.query.get(id)
# Add the populated JOIN as the Order products
o.products.append(join)
# Finally commit to database
db.session.add(o)
db.session.commit()
I was at first trying to populate the Order.products (or o.products in the example code) directly, which will give you an error about using a Products class when it expects a OrderProducts class.
I also struggled with the whole field naming and referencing of the back_populates. Again, the example above and on the docs show this. Note the pluralization is entirely to do with how you want your fields named.
First of all, i would like to apologize as my SQL knowledge level is still very low. Basically the problem is the following: I have two distinct tables, no direct relationship between them, but they share two columns: storm_id and userid.
Basically, i would like to query all posts from storm_id, that are not from a banned user and some extra filters.
Here are the models:
Post
class Post(db.Model):
id = db.Column(db.Integer, primary_key = True)
...
userid = db.Column(db.String(100))
...
storm_id = db.Column(db.Integer, db.ForeignKey('storm.id'))
Banneduser
class Banneduser(db.Model):
id = db.Column(db.Integer, primary_key=True)
sn = db.Column(db.String(60))
userid = db.Column(db.String(100))
name = db.Column(db.String(60))
storm_id = db.Column(db.Integer, db.ForeignKey('storm.id'))
Both Post and Banneduser are another table (Storm) children. And here is the query i am trying to output. As you can see, i am trying to filter:
verified posts
by descending order
with a limit (i put it apart from the query as the elif has other filters)
# we query banned users id
bannedusers = db.session.query(Banneduser.userid)
# we do the query except the limit, as in the if..elif there are more filtering queries
joined = db.session.query(Post, Banneduser)\
.filter(Post.storm_id==stormid)\
.filter(Post.verified==True)\
# here comes the trouble
.filter(~Post.userid.in_(bannedusers))\
.order_by(Post.timenow.desc())\
try:
if contentsettings.filterby == 'all':
posts = joined.limit(contentsettings.maxposts)
print((posts.all()))
# i am not sure if this is pythonic
posts = [item[0] for item in posts]
return render_template("stream.html", storm=storm, wall=posts)
elif ... other queries
I got two problems, one basic and one underlying problem:
1/ .filter(~Post.userid.in_(bannedusers))\ gives one output EACH TIME post.userid is not in bannedusers, so i get N repeated posts. I try to filter this with distinct, but it does not work
2/ Underlying problem: i am not sure if my approach is the correct one (the ddbb model structure/relationship plus the queries)
Use SQL EXISTS. Your query should be like this:
db.session.query(Post)\
.filter(Post.storm_id==stormid)\
.filter(Post.verified==True)\
.filter(~ exists().where(Banneduser.storm_id==Post.storm_id))\
.order_by(Post.timenow.desc())
I am writing a script to synchronize Adwords accounts and a local database wit Sqlalchemy.
I am following the object hierarchy of the Adwords API, so my first table is 'campaigns' and the second is 'adgroups'
here is how I define the two:
class Campaign(Base):
__tablename__ = 'aw_campaigns'
id = Column(Integer, primary_key=True)
name = Column(String(99))
impressions = Column(Integer)
serving_status = Column(String(99))
start_date = Column(String(99))
status = Column(String(99))
def __init__(self, id, name, impressions, serving_status, start_date, status):
self.id = id
self.name = name
self.impressions = impressions
self.serving_status = serving_status
self.start_date = start_date
self.status = status
class Adgroup(Base):
__tablename__ = 'aw_adgroups'
id = Column(Integer, primary_key=True) # , primary_key=True
name = Column(String(99))
camp_id = Column(Integer, ForeignKey('aw_campaigns.id')) # , ForeignKey('aw_campaigns.id')
camp_name = Column(String(99))
ctr = Column(Float)
cost = Column(Float)
impressions = Column(Integer)
clicks = Column(Integer)
status = Column(String(99))
def __init__(self, id, name, camp_id, camp_name, ctr, cost, impressions, clicks, status):
self.id = id
self.name = name
self.camp_id = camp_id
self.camp_name = camp_name
self.ctr = ctr
self.cost = cost
self.impressions = impressions
self.clicks = clicks
self.status = status
I query the API, and then build the list of objects for the lines in the Adgroup table:
adgr_query = 'SELECT CampaignId, CampaignName, Clicks, Cost, Impressions, Ctr, Id, KeywordMaxCpc, Name, Settings, Status'
adgr_page = ad_group_serv.Query(adgr_query)[0]['entries']
adgr_ins = [Adgroup(i['id'],
i['name'],
i['campaignId'],
i['campaignName'],
i['stats']['ctr'],
i['stats']['cost']['microAmount'],
i['stats']['impressions'],
i['stats']['clicks'],
i['status']) for i in adgr_page if int(i['id']) not in adgr_exist]
but when I commit I get the error:
(IntegrityError) (1062, "Duplicate entry '2147483647' for key 'PRIMARY'")
The problem is that I have no idea where that value is from.
'2147483647' in [i['id'] for i in adgr_page]
>>> False
'2147483647' in str(adgr_page)
>>> False
I am really stuck on this.
Looks like you have integer overflow somewhere.
The symptom: 2147483647 is 2**31-1 - indicates that 32 bits were used to store the number.
AdGroup.Id field has type xsd:long which has 64 bits length.
Python itself has no limitation on the size of integer value but database may has such limit.
Short solution:
Try to use BigInteger sqltype type id = Column(BigInteger, primary_key=True) and the same for camp_id and the rest xsd:long values coming from AdWords API. There is chance that SQLAlchemy will pick database specific big integer column type. Or you can use String(64) as a type for id. But in this case it you'll need extra step to generate primary key.
How many entries your query to AdWords API return? Are there more then 2**32 records? I doubt it - it is unlikely that your database will be able to handle ~4200 millions of records.
Solution 2 - long term
Although I would suggest to no trust primary key integrity to external source and would rely on database to generate primary key using autoincrement and rely on SQLAlchemy to handle foreign keys population based on database generated primary keys:
class Adgroup(Base):
__tablename__ = 'aw_adgroups'
id = Column(Integer, Sequence('adgroup_seq'), primary_key=True) # , primary_key=True
adGroupId = Column(String(64))
campaignId = Column(Integer,ForeignKey('aw_campaigns.id'))
campaign = relationship("Campaign", backref = "adgroup")
...
class Campaign(Base):
__tablename__ = 'aw_campaigns'
id = Column(Integer, Sequence('adgroup_seq'), primary_key=True)
campaignId = Column(String(64))
...
Also looks like you may need to do look up by campaignId and adGroupId - so you can add indexes on them.
Then you create your Campaign and AdGroup objects and just add relations between them. The code will depend on type of relationship your want to use - one-to-many or many-to-many. Check sqlalchemy relationship manual for more details.
ag = AdGroup(**kwargs)
camp = Campaign(**kwargs)
ag.campaign = camp
session.add(ag)
Hi I have a simple question - i have 2 tables (addresses and users - user has one address, lot of users can live at the same address)... I created a sqlalchemy mapping like this:
when I get my session and try to query something like
class Person(object):
'''
classdocs
'''
idPerson = Column("idPerson", Integer, primary_key = True)
name = Column("name", String)
surname = Column("surname", String)
idAddress = Column("idAddress", Integer, ForeignKey("pAddress.idAddress"))
idState = Column("idState", Integer, ForeignKey("pState.idState"))
Address = relationship(Address, primaryjoin=idAddress==Address.idAddress)
class Address(object):
'''
Class to represent table address object
'''
idAddress = Column("idAddress", Integer, primary_key=True)
street = Column("street", String)
number = Column("number", Integer)
postcode = Column("postcode", Integer)
country = Column("country", String)
residents = relationship("Person",order_by="desc(Person.surname, Person.name)", primaryjoin="idAddress=Person.idPerson")
self.tablePerson = sqlalchemy.Table("pPerson", self.metadata, autoload=True)
sqlalchemy.orm.mapper(Person, self.tablePerson)
self.tableAddress = sqlalchemy.Table("pAddress", self.metadata, autoload=True)
sqlalchemy.orm.mapper(Address, self.tableAddress)
myaddress = session.query(Address).get(1);
print myaddress.residents[1].name
=> I get TypeError: 'RelationshipProperty' object does not support indexing
I understand residents is there to define the relationship but how the heck can I get the list of residents that the given address is assigned to?!
Thanks
You define a relationship in a wrong place. I think you are mixing Declarative Extension with non-declarative use:
when using declarative, you define your relations in your model.
otherwise, you define them when mapping model to a table
If option-2 is what you are doing, then you need to remove both relationship definitions from the models, and add it to a mapper (only one is enought):
mapper(Address, tableAddress,
properties={'residents': relationship(Person, order_by=(desc(Person.name), desc(Person.surname)), backref="Address"),}
)
Few more things about the code above:
Relation is defined only on one side. The backref takes care about the other side.
You do not need to specify the primaryjoin (as long as you have a ForeignKey specified, and SA is able to infer the columns)
Your order_by configuration is not correct, see code above for the version which works.
You might try defining Person after Address, with a backref to Address - this will create the array element:
class Address(object):
__tablename__ = 'address_table'
idAddress = Column("idAddress", Integer, primary_key=True)
class Person(object):
idPerson = Column("idPerson", Integer, primary_key = True)
...
address_id = Column(Integer, ForeignKey('address_table.idAddress'))
address = relationship(Address, backref='residents')
Then you can query:
myaddress = session.query(Address).get(1);
for residents in myaddress.residents:
print name
Further, if you have a lot of residents at an address you can further filter using join:
resultset = session.query(Address).join(Address.residents).filter(Person.name=='Joe')
# or
resultset = session.query(Person).filter(Person.name=='Joe').join(Person.address).filter(Address.state='NY')
and resultset.first() or resultset[0] or resultset.get(...) etc...