SQLalchemy duplicate entry error for unknown value - python

I am writing a script to synchronize Adwords accounts and a local database wit Sqlalchemy.
I am following the object hierarchy of the Adwords API, so my first table is 'campaigns' and the second is 'adgroups'
here is how I define the two:
class Campaign(Base):
__tablename__ = 'aw_campaigns'
id = Column(Integer, primary_key=True)
name = Column(String(99))
impressions = Column(Integer)
serving_status = Column(String(99))
start_date = Column(String(99))
status = Column(String(99))
def __init__(self, id, name, impressions, serving_status, start_date, status):
self.id = id
self.name = name
self.impressions = impressions
self.serving_status = serving_status
self.start_date = start_date
self.status = status
class Adgroup(Base):
__tablename__ = 'aw_adgroups'
id = Column(Integer, primary_key=True) # , primary_key=True
name = Column(String(99))
camp_id = Column(Integer, ForeignKey('aw_campaigns.id')) # , ForeignKey('aw_campaigns.id')
camp_name = Column(String(99))
ctr = Column(Float)
cost = Column(Float)
impressions = Column(Integer)
clicks = Column(Integer)
status = Column(String(99))
def __init__(self, id, name, camp_id, camp_name, ctr, cost, impressions, clicks, status):
self.id = id
self.name = name
self.camp_id = camp_id
self.camp_name = camp_name
self.ctr = ctr
self.cost = cost
self.impressions = impressions
self.clicks = clicks
self.status = status
I query the API, and then build the list of objects for the lines in the Adgroup table:
adgr_query = 'SELECT CampaignId, CampaignName, Clicks, Cost, Impressions, Ctr, Id, KeywordMaxCpc, Name, Settings, Status'
adgr_page = ad_group_serv.Query(adgr_query)[0]['entries']
adgr_ins = [Adgroup(i['id'],
i['name'],
i['campaignId'],
i['campaignName'],
i['stats']['ctr'],
i['stats']['cost']['microAmount'],
i['stats']['impressions'],
i['stats']['clicks'],
i['status']) for i in adgr_page if int(i['id']) not in adgr_exist]
but when I commit I get the error:
(IntegrityError) (1062, "Duplicate entry '2147483647' for key 'PRIMARY'")
The problem is that I have no idea where that value is from.
'2147483647' in [i['id'] for i in adgr_page]
>>> False
'2147483647' in str(adgr_page)
>>> False
I am really stuck on this.

Looks like you have integer overflow somewhere.
The symptom: 2147483647 is 2**31-1 - indicates that 32 bits were used to store the number.
AdGroup.Id field has type xsd:long which has 64 bits length.
Python itself has no limitation on the size of integer value but database may has such limit.
Short solution:
Try to use BigInteger sqltype type id = Column(BigInteger, primary_key=True) and the same for camp_id and the rest xsd:long values coming from AdWords API. There is chance that SQLAlchemy will pick database specific big integer column type. Or you can use String(64) as a type for id. But in this case it you'll need extra step to generate primary key.
How many entries your query to AdWords API return? Are there more then 2**32 records? I doubt it - it is unlikely that your database will be able to handle ~4200 millions of records.
Solution 2 - long term
Although I would suggest to no trust primary key integrity to external source and would rely on database to generate primary key using autoincrement and rely on SQLAlchemy to handle foreign keys population based on database generated primary keys:
class Adgroup(Base):
__tablename__ = 'aw_adgroups'
id = Column(Integer, Sequence('adgroup_seq'), primary_key=True) # , primary_key=True
adGroupId = Column(String(64))
campaignId = Column(Integer,ForeignKey('aw_campaigns.id'))
campaign = relationship("Campaign", backref = "adgroup")
...
class Campaign(Base):
__tablename__ = 'aw_campaigns'
id = Column(Integer, Sequence('adgroup_seq'), primary_key=True)
campaignId = Column(String(64))
...
Also looks like you may need to do look up by campaignId and adGroupId - so you can add indexes on them.
Then you create your Campaign and AdGroup objects and just add relations between them. The code will depend on type of relationship your want to use - one-to-many or many-to-many. Check sqlalchemy relationship manual for more details.
ag = AdGroup(**kwargs)
camp = Campaign(**kwargs)
ag.campaign = camp
session.add(ag)

Related

AttributeError: 'Query' object has no attribute 'is_clause_element' when joining table with query

AttributeError: 'Query' object has no attribute 'is_clause_element' when joining table with query
I have a query that counts the amount of keywords a company has and then sorts them by the amount of keywords they have.
query_company_ids = Session.query(enjordplatformCompanyToKeywords.company_id.label("company_id"),func.count(enjordplatformCompanyToKeywords.keyword_id)).group_by(enjordplatformCompanyToKeywords.company_id).order_by(desc(func.count(enjordplatformCompanyToKeywords.keyword_id))).limit(20)
I then want to get information about these companies like image, title, info etc and send it to the frontend (this is done later by looping through companies_query).
Though I have trouble in building the connection between the query_company_ids query and enjordplatformCompanies table.
I have tried two ways of doing this:
companies_query = Session.query(enjordplatformCompanies, query_company_ids).filter(enjordplatformCompanies.id == query_company_ids.company_id).all()
companies_query = Session.query(enjordplatformCompanies, query_company_ids).join( query_company_ids, query_company_ids.c.company_id == enjordplatformCompanies.id).all()
But both of them result in the error: AttributeError: 'Query' object has no attribute 'is_clause_element'
Question
How can I join the query_company_ids query and enjordplatformCompanies table?
Thanks
Here are the table definitions
class enjordplatformCompanies(Base):
__tablename__ = "enjordplatform_companies"
id = Column(Integer, primary_key=True, unique=True)
name = Column(String)
about = Column(String)
image = Column(String)
website = Column(String)
week_added = Column(Integer)
year_added = Column(Integer)
datetime_added = Column(DateTime)
created_by_userid = Column(Integer)
company_type = Column(String)
contact_email=Column(String)
adress=Column(String)
city_code=Column(String)
city=Column(String)
class enjordplatformCompanyToKeywords(Base):
__tablename__ = "enjordplatform_company_to_keywords"
id = Column(Integer, primary_key=True, unique=True)
company_id = Column(Integer,ForeignKey("enjordplatform_companies.id"))
keyword_id = Column(Integer,ForeignKey("enjordplatform_keywords.id"))
I copied your example query above and was getting a lot of weird errors until I realized you use Session instead of session. I guess make sure you are using an instance instead of the class or sessionmaker.
Below I create an explicit subquery() to get the company id paired with its keyword count and then I join the companies class against that, applying the order and limit to the final query.
with Session(engine) as session, session.begin():
subq = session.query(
enjordplatformCompanyToKeywords.company_id,
func.count(enjordplatformCompanyToKeywords.keyword_id).label('keyword_count')
).group_by(
enjordplatformCompanyToKeywords.company_id
).subquery()
q = session.query(
enjordplatformCompanies,
subq.c.keyword_count
).join(
subq,
enjordplatformCompanies.id == subq.c.company_id
).order_by(
desc(subq.c.keyword_count)
)
for company, keyword_count in q.limit(20).all():
print (company.name, keyword_count)
This isn't the exact method but explains the intention of calling .subquery() above:
subquery

(Flask-)SQLAlchemy primary key issues probably due to implicit transactions

In a project using Flask-SQLAlchemy, i get some intermittent errors and i think it might be due to not explicitly using transactions.
I have these two model classes, one for locations and another for closures:
class Location(db.Model):
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.String)
code = sa.Column(sa.String, unique=True)
class LocationPath(db.Model):
ancestor_id = sa.Column(sa.Integer, sa.ForeignKey('location.id'), nullable=False, primary_key=True)
descendant_id = sa.Column(sa.Integer, sa.ForeignKey('location.id'), nullable=False, primary_key=True)
depth = sa.Column(sa.Integer, default=0, nullable=False)
In a background process, i'm doing a lot of inserts, so i'm bypassing the ORM to use Core:
location_table = Location.__table__
location_path_table = LocationPath.__table__
statement = select([location_table.c.id]).where(code == code)
result = db.session.get_bind().execute(statement)
location_id = result.first()
if location_id is None:
statement = location_table.insert().values(**kwargs)
result = db.session.get_bind().execute(statement)
new_id = result.inserted_primary_key[0]
result.close()
else:
new_id = location_id
# save new_id as an ancestor_id or a descendant_id
path = LocationPath.query.filter_by(
ancestor_id=ancestor_id,
descendant_id=descendant_id
).first()
if path is None:
statement = location_path_table.insert().values(
ancestor_id=ancestor_id,
descendant_id=descendant_id,
depth=depth)
# the line below intermittently generates either of two errors:
# - the inserted primary key (ancestor/descendant) does not exist
# - a duplicate key error where the path already exists
result = db.session.get_bind().execute(statement)
this has resulted in quite a bit of head-scratching on my part, since i get the ancestor_id or descendant_id either from a select or an insert, and i also query the database to see if the path exists before attempting to insert it.
Edit: the code above runs in a loop.

SQL alchemy - specifying relationships when foreign key may not exist

I need to set up two tables in a database and I'm struggling to decide how to design the tables in SQL Alchemy.
Table 1 contains raw address data, and the source of the address. Raw addresses may appear more than once if they come from different sources.
Table 2 contains geocoded versions of these addresses. Each address appears only once. Addresses should only appear in this table if they appear at least once in Table 1
When new addresses come into the system, they first will be inserted into Table 1. I will then have a script that looks for records in Table 1 that are not in Table 2, geocodes them and inserts them into Table 2.
I have the following code:
class RawAddress(Base):
__tablename__ = 'rawaddresses'
id = Column(Integer,primary_key = True)
source_of_address = Column(String(50))
#Want something like a foreign key here, but address may not yet exist
#in geocoded address table
full_address = Column(String(400))
class GeocodedAddress(Base):
__tablename__ = 'geocodedaddresses'
full_address = Column(String(400), primary_key = True)
lat = Column(Float)
lng = Column(Float)
Is there a way of establishing the relationship between the full_address fields in SQL Alchemy? Or perhaps I've got the design wrong - maybe every time I seen a new raw address I should add it to the GeocodedAddress table, with a flag saying whether it's geocoded or not?
Thanks very much for any help with this.
Taking into account your comments, the code which would allow such data storage as well as insert/update process should do the job. Few comments before it:
Foreign Keys can be NULL, so your FK idea still works.
You can define the relationship on any model, and name the other side with backref
Code:
# Model definitions
class RawAddress(Base):
__tablename__ = 'rawaddresses'
id = Column(Integer, primary_key=True)
source_of_address = Column(String(50))
full_address = Column(
ForeignKey('geocodedaddresses.full_address'),
nullable=True,
)
class GeocodedAddress(Base):
__tablename__ = 'geocodedaddresses'
full_address = Column(String(400), primary_key=True)
lat = Column(Float)
lng = Column(Float)
raw_addresses = relationship(RawAddress, backref="geocoded_address")
now:
# logic
def get_geo(full_address):
" Dummy function which fakes `full_address` and get lat/lng using hash(). "
hs = hash(full_address)
return (hs >> 8) & 0xff, hs & 0xff
def add_test_data(addresses):
with session.begin():
for fa in addresses:
session.add(RawAddress(full_address=fa))
def add_geo_info():
with session.begin():
q = (session
.query(RawAddress)
.filter(~RawAddress.geocoded_address.has())
)
for ra in q.all():
print("Computing geo for: {}".format(ra))
lat, lng = get_geo(ra.full_address)
ra.geocoded_address = GeocodedAddress(
full_address=ra.full_address, lat=lat, lng=lng)
and some tests:
# step-1: add some raw addresses
add_test_data(['Paris', 'somewhere in Nevada'])
print("-"*80)
# step-2: get those raw which do not have geo
add_geo_info()
print("-"*80)
# step-1: again with 1 new, 1 same
add_test_data(['Paris', 'somewhere in Chicago'])
print("-"*80)
# step-2: get those raw which do not have geo
add_geo_info()
print("-"*80)
# check: print all data for Paris geo:
gp = session.query(GeocodedAddress).filter(GeocodedAddress.full_address == 'Paris').one()
assert 2 == len(gp.raw_addresses)
print(gp.raw_addresses)

Fastest way to insert object if it doesn't exist with SQLAlchemy

So I'm quite new to SQLAlchemy.
I have a model Showing which has about 10,000 rows in the table. Here is the class:
class Showing(Base):
__tablename__ = "showings"
id = Column(Integer, primary_key=True)
time = Column(DateTime)
link = Column(String)
film_id = Column(Integer, ForeignKey('films.id'))
cinema_id = Column(Integer, ForeignKey('cinemas.id'))
def __eq__(self, other):
if self.time == other.time and self.cinema == other.cinema and self.film == other.film:
return True
else:
return False
Could anyone give me some guidance on the fastest way to insert a new showing if it doesn't exist already. I think it is slightly more complicated because a showing is only unique if the time, cinmea, and film are unique on a showing.
I currently have this code:
def AddShowings(self, showing_times, cinema, film):
all_showings = self.session.query(Showing).options(joinedload(Showing.cinema), joinedload(Showing.film)).all()
for showing_time in showing_times:
tmp_showing = Showing(time=showing_time[0], film=film, cinema=cinema, link=showing_time[1])
if tmp_showing not in all_showings:
self.session.add(tmp_showing)
self.session.commit()
all_showings.append(tmp_showing)
which works, but seems to be very slow. Any help is much appreciated.
If any such object is unique based on a combination of columns, you need to mark these as a composite primary key. Add the primary_key=True keyword parameter to each of these columns, dropping your id column altogether:
class Showing(Base):
__tablename__ = "showings"
time = Column(DateTime, primary_key=True)
link = Column(String)
film_id = Column(Integer, ForeignKey('films.id'), primary_key=True)
cinema_id = Column(Integer, ForeignKey('cinemas.id'), primary_key=True)
That way your database can handle these rows more efficiently (no need for an incrementing column), and SQLAlchemy now automatically knows if two instances of Showing are the same thing.
I believe you can then just merge your new Showing back into the session:
def AddShowings(self, showing_times, cinema, film):
for showing_time in showing_times:
self.session.merge(
Showing(time=showing_time[0], link=showing_time[1],
film=film, cinema=cinema)
)

sqlalchemy / table setup

I have items, warehouses, and items are in warehouses.
So I have table that has information about items (sku, description, cost ...) and a table that describes warehouses(location, code, name, ...). Now I need a way to store inventory so that I know I have X items in warehouse Y. An item can be in any warehouse.
How would I go about setting up the relationship between them and storing the qty?
class Item(DeclarativeBase):
__tablename__ = 'items'
item_id = Column(Integer, primary_key=True,autoincrement=True)
item_code = Column(Unicode(35),unique=True)
item_description = Column(Unicode(100))
item_long_description = Column(Unicode())
item_cost = Column(Numeric(precision=13,scale=4))
item_list = Column(Numeric(precision=13,scale=2))
def __init__(self,code,description,cost,list):
self.item_code = code
self.item_description = description
self.item_cost = cost
self.item_list = list
class Warehouse(DeclarativeBase):
__tablename__ = 'warehouses'
warehouse_id = Column(Integer, primary_key=True, autoincrement=True)
warehouse_code = Column(Unicode(15),unique=True)
warehouse_description = Column(Unicode(55))
If I am correct I would setup the many to many using an intermediate table something like ...
item_warehouse = Table(
'item_warehouse', Base.metadata,
Column('item_id', Integer, ForeignKey('items.item_id')),
Column('warehouse_id', Integar, ForeignKey('warehouses.warehouse_id'))
)
But i would need to start the qty available on this table but since its not its own class I am not sure how that would work.
What would be the "best" practice for modeling this and having it usable in my app?
Model:
As mentioned by #Lafada, you need an Association Object. As such I would create a SA-persistent object and not only a table:
class ItemWarehouse(Base):
# version-1:
__tablename__ = 'item_warehouse'
__table_args__ = (PrimaryKeyConstraint('item_id', 'warehouse_id', name='ItemWarehouse_PK'),)
# version-2:
#__table_args__ = (UniqueConstraint('item_id', 'warehouse_id', name='ItemWarehouse_PK'),)
#id = Column(Integer, primary_key=True, autoincrement=True)
# other columns
item_id = Column(Integer, ForeignKey('items.id'), nullable=False)
warehouse_id = Column(Integer, ForeignKey('warehouses.id'), nullable=False)
quantity = Column(Integer, default=0)
This covers the model requirement with the following:
added a PrimaryKey
added a UniqueConstraint covering the (item_id, warehouse_id) pairs.
In the code above this is solved in two ways:
version-1: uses composite primary key (which must be unique)
version-2: uses simple primary key, but also adds an explicit unique constraint [I personally prefer this option]
Relationship: Association Object
Now. You can use the Association Object as is, which will look similar to this:
w = Warehouse(...)
i = Item(name="kindle", price=...)
iw = ItemWarehouse(quantity=50)
iw.item = i
w.items.append(i)
Relationship: Association Proxy extension
or, you could go one step further and use the Composite Association Proxies example, and you may configure dictionary-like access to the association object similar to this:
w = Warehouse(...)
i = Item(name="kindle", price=...)
w[i] = 50 # sets the quantity to 50 of item _i_ in warehouse _w_
i[w] = 50 # same as above, if you configure it symmetrically
Beware: the code for the relationships definition might look really not easily readable, but the usage pattern is really nice. So if this option is too much to digest, I would start with Association Object with maybe some helper functions to add/get/update the item stocks, and eventually move to the Association Proxy extesion.
You have to use "Association Object".
I try to give you hint for your problem you have to create table like you mention in your question
item_warehouse = Table( 'item_warehouse',
Base.metadata,
Column('item_id',
Integer,
ForeignKey('items.item_id')
),
Column('warehouse_id',
Integar,
ForeignKey('warehouses.warehouse_id')
),
Column('qty',
Integer,
default=0,
),
)
Now you can add warehouse, item and qty in single object and you have to write method which will take warehouse_id and item_id and get the sum of qty for those itmes.
Hope this will help you to solve your problem.

Categories

Resources