Iterate List of tables for specific column - python

I have a list of tables which I would like to iterate and find a specific row based on a foreign key column, then delete it.
This is what my list of tables look like:
subrep_tables = [ TCableResistance.__table__,
TCapacitorBankTest.__table__,
TCapAndDf.__table__,
TMeasuredData.__table__,
TMultiDeviceData.__table__,
TStepVoltage.__table__,
TTemperatureData.__table__,
TTransformerResistance.__table__,
TTransformerTurnsRatio.__table__,
TTripUnit.__table__,
TVectorDiagram.__table__,
TWithstandTest.__table__,
]
I called the list subrep_tables because all of those tables contains a foreign key called ixSubReport.
What I'm trying to do is iterate the list and find all the rows that have a certain sub report and delete those rows instead of going to each table and running the query to delete them(very tedious)
This is what I've come up with thus far.
for report in DBSession.query(TReport).filter(TReport.ixDevice == device_id).all():
for sub_report in DBSession.query(TSubReport).filter(TSubReport.ixReport == report.ixReport).all():
for table in subrep_tables:
for item in DBSession.query(table).filter(table.ixSubReport == sub_report.ixSubReport).all():
print "item: " + str(item)
#DBSession.delete(item)
I'm having some difficulty accessing the table's ixSubReport column for my WHERE clause. The code I have right now gives me an error saying: 'Table' Object has no attribute 'ixSubReport'.
How can I access my iterated table's ixSubReport column to use in my WHERE clause to find the specific row so I can delete it?

If you really want to query the tables, the columns are under the c attribute, use table.c.ixSubReport.
There's no reason to create a list of the __table__ attributes though, just query the models directly. Also, you can avoid a ton of overhead by not performing the first two queries; you can do all this in a single query per model. (This example assumes there are relationships set up between te models).
from sqlalchemy.orm import contains_eager
has_subrep_models = [TCableResistance, TCapacitorBankTest, ...]
# assuming each has_subrep model has a relationship "subrep"
# assuming TSubReport has a relationship "report"
for has_subrep_model in has_subrep_models:
for item in DBSession.query(has_subrep_model).join(has_subrep_model.subrep, TSubReport.report).filter(TReport.ixDevice == device_id).options(contains_eager(has_subrep_model.subrep), contains_eager(TSubReport.report)):
DBSession.delete(item)
This simply joins the related sub report and report when querying each model that has a sub report, and does the filtering on the report's device there. So you end up doing one query per model, rather than 1 + <num reports> + (<num reports> * <num models with sub reports>) = a lot.

Thanks to Denis for the input, I ended up with this :
for report in DBSession.query(TReport).filter(TReport.ixDevice == device_id).all():
for sub_report in DBSession.query(TSubReport).filter(TSubReport.ixReport == report.ixReport).all():
for table in subrep_tables:
for item in DBSession.query(table).filter(table.c.ixSubreport == sub_report.ixSubReport).all():
DBSession.delete(item)

Related

Variable filter for SQLAlchemy Query

I'm adding a search feature to my application (created using PyQt5) that will allow the user to search an archive table in the database. I've provided applicable fields for the user to choose to match rows with. I'm having some trouble with the query filter use only what was provided by the user, given that the other fields would be empty strings.
Here's what I have so far:
def search_for_order(pierre):
fields = {'archive.pat.firstname': pierre.search_firstname.text(),
'archive.pat.lastname': pierre.search_lastname.text(),
'archive.pat.address': pierre.search_address.text(),
'archive.pat.phone': pierre.search_phone.text(),
'archive.compound.compname': pierre.search_compname.text(),
'archive.compound.compstrength': pierre.search_compstrength.text(),
'archive.compound.compform': pierre.search_compform.currentText(),
'archive.doc.lastname': pierre.search_doctor.text(),
'archive.clinic.clinicname': pierre.search_clinic.text()
}
filters = {}
for field, value in fields.items():
if value is not '':
filters[field] = value
query = session.query(Archive).join(Patient, Prescribers, Clinic, Compound)\
.filter(and_(field == value for field, value in filters.items())).all()
The fields dictionary collects the values of all the fields in the search form. Some of them will be blank, resulting in empty strings. filters is intended to be a dictionary of the object names and the value to match that.
The problem lies in your definition of the expressions within your and_ conjunction. As of now you're comparing each field with the corresponding value which of course returns false for each comparison.
To properly populate the and_ conjunction you have to create a list of what sqlalchemy calls BinaryExpression objects.
In order to do so I'd change your code like this:
1) First use actual references to your table classes in your definition of fields:
fields = {
(Patient, 'firstname'): pierre.search_firstname.text(),
(Patient, 'lastname'): pierre.search_lastname.text(),
(Patient, 'address'): pierre.search_address.text(),
(Patient, 'phone'): pierre.search_phone.text(),
(Compound, 'compname'): pierre.search_compname.text(),
(Compound, 'compstrength'): pierre.search_compstrength.text(),
(Compound, 'compform'): pierre.search_compform.currentText(),
(Prescribers, 'lastname'): pierre.search_doctor.text(),
(Clinic, 'clinicname'): pierre.search_clinic.text()
}
2) Define filters as a list instead of a dictionary:
filters = list()
3) To populate the filters list explode the tuple of table and fieldname used as key in the fields dictionary and add the value to again create tuples but now with three elements. Append each of the newly created tuples to the list of filters:
for table_field, value in fields.items():
table, field = table_field
if value:
filters.append((table, field, value))
4) Now transform the created list of filter definitions to a list of BinaryExpression objects usable by sqlalchemy:
binary_expressions = [getattr(table, attribute) == value for table, attribute, value in filters]
5) Finally apply the binary expressions to your query, make sure it's presented to the and_ conjunction in a consumable form:
query = session.query(Archive).join(Patient, Prescribers, Clinic, Compound)\
.filter(and_(*binary_expressions)).all()
I'm not able to test that solution within your configuration, but a similar test using my environment was successful.
Once you get a query object bound to a table in SqlAlquemy - that is, what is returned by session.query(Archive) in the code above -, calling some methods on that object will return a new, modified query, where that filter is already applied.
So, my preferred way of combining several and filters is to start from the bare query, iterate over the filters to be used, and for each, add a new .filter call and reassign the query:
query = session.query(Archive).join(Patient, Prescribers, Clinic, Compound)
for field, value in filters.items():
query = query.filter(field == value)
results = query.all()
Using and_ or or_ as you intend can also work - in the case of your example, the only thing missing was an *. Without an * preceeding the generator expression, it is passed as the first (and sole) parameter to and_. With a prefixed *, all elements in the iterator are unpacked in place, each one passed as an argument:
...
.filter(and_(*(field == value for field, value in filters.items()))).all()

Querying objects using attribute of member of many-to-many

I have the following models:
class Member(models.Model):
ref = models.CharField(max_length=200)
# some other stuff
def __str__(self):
return self.ref
class Feature(models.Model):
feature_id = models.BigIntegerField(default=0)
members = models.ManyToManyField(Member)
# some other stuff
A Member is basically just a pointer to a Feature. So let's say I have Features:
feature_id = 2, members = 1, 2
feature_id = 4
feature_id = 3
Then the members would be:
id = 1, ref = 4
id = 2, ref = 3
I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:
# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp
This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:
print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())
... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:
Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.
Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches
Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?
Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.
EDIT - I think a much faster way to get ndids is like this:
ndids = ndtmp.values_list('feature_id', flat=True)
This works, producing a non-empty set of ids.
Unfortunately, I am still at a loss as to how to get okmems. I tried:
okmems = Member.objects.filter(ref__in=str(ndids))
But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:
Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]
You should take a look at annotate:
okmems = Member.objects.annotate(
feat_count=models.Count('feature')).filter(feat_count__gte=1)
relsways = Feature.geoobjects.filter(members__in=okmems)
Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:
# ndtmp is a query set of member-less Features which Members can point to
# get the unique ids from ndtmp as strings
strids = ndtmp.extra({'feature_id_str':"CAST( \
feature_id AS VARCHAR)"}).order_by( \
'-feature_id_str').values_list('feature_id_str',flat=True).distinct()
# find all members whose ref values can be found in stride
okmems = Member.objects.filter(ref__in=strids)
# find all features containing one or more members in the accepted members list
relsways = Feature.geoobjects.filter(members__in=okmems)
# combine that with my existing list of allowed member-less features
op = relsways | ndtmp
# prove that this set is not empty
op.count()
# takes about 10 seconds
>>> 8997148 # looks like it worked!
Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.

Filter query by linked object key in SQLAlchemy

Judging by the title this would be the exact same question, but I can't see how any of the answers are applicable to my use case:
I have two classes and a relationship between them:
treatment_association = Table('tr_association', Base.metadata,
Column('chronic_treatments_id', Integer, ForeignKey('chronic_treatments.code')),
Column('animals_id', Integer, ForeignKey('animals.id'))
)
class ChronicTreatment(Base):
__tablename__ = "chronic_treatments"
code = Column(String, primary_key=True)
class Animal(Base):
__tablename__ = "animals"
treatment = relationship("ChronicTreatment", secondary=treatment_association, backref="animals")
I would like to be able to select only the animals which have undergon a treatment which has the code "X". I tried quite a few approaches.
This one fails with an AttributeError:
sql_query = session.query(Animal.treatment).filter(Animal.treatment.code == "chrFlu")
for item in sql_query:
pass
mystring = str(session.query(Animal))
And this one happily returns a list of unfiltered animals:
sql_query = session.query(Animal.treatment).filter(ChronicTreatment.code == "chrFlu")
for item in sql_query:
pass
mystring = str(session.query(Animal))
The closest thing to the example from the aforementioned thread I could put together:
subq = session.query(Animal.id).subquery()
sql_query = session.query(ChronicTreatment).join((subq, subq.c.treatment_id=="chrFlu"))
for item in sql_query:
pass
mystring = str(session.query(Animal))
mydf = pd.read_sql_query(mystring,engine)
Also fails with an AttributeError.
Can you hel me sort this list?
First, there are two issues with table definitions:
1) In the treatment_association you have Integer column pointing to chronic_treatments.code while the code is String column.
I think it's just better to have an integer id in the chronic_treatments, so you don't duplicate the string code in another table and also have a chance to add more fields to chronic_treatments later.
Update: not exactly correct, you still can add more fields, but it will be more complex to change your 'code' if you decide to rename it.
2) In the Animal model you have a relation named treatment. This is confusing because you have many-to-many relation, it should be plural - treatments.
After fixing the above two, it should be clearer why your queries did not work.
This one (I replaced treatment with treatments:
sql_query = session.query(Animal.treatments).filter(
Animal.treatments.code == "chrFlu")
The Animal.treatments represents a many-to-many relation, it is not an SQL Alchemy mode, so you can't pass it to the query nor use in a filter.
Next one can't work for the same reason (you pass Animal.treatments into the query.
The last one is closer, you actually need join to get your results.
I think it is easier to understand the query as SQL (and you anyway need to know SQL to be able to use sqlalchemy):
animals = session.query(Animal).from_statement(text(
"""
select distinct animals.* from animals
left join tr_association assoc on assoc.animals_id = animals.id
left join chronic_treatments on chronic_treatments.id = assoc.chronic_treatments_id
where chronic_treatments.code = :code
""")
).params(code='chrFlu')
It will select animals and join chronic_treatments through the tr_association and filter the result by code.
Having this it is easy to rewrite it using SQL-less syntax:
sql_query = session.query(Animal).join(Animal.treatments).filter(
ChronicTreatment.code == "chrFlu")
That will return what you want - a list of animals who have related chronic treatment with given code.

Checking if ForeignKeys are equal to one another

I have a list of tuples of foreign keys of the form [(3,2),(2,3)]. And I want to insert the items into a ManyToMany table within a model:
class Place(models.Model):
data=models.IntegerField()
connected_to=models.ManyToManyField('self')
class PlaceMeta(models.Model):
place=models.ForeignKey("places.Place")
and I am inserting the list (connections) with:
places=Place.objects.all()
for conn_1, conn_2 in connections:
for place in places:
if place.data == conn_1 and conn_1 != conn_2:
place.connected_to.add(conn_1, conn_2)
elif place.data == conn_2 and conn_1 != conn_2:
fixture.connected_to.add(conn_2, conn_1)
When I print the list it prints [(3L, 2L),(2L, 3L)] (for example) but after I insert the table shows that (2,2),(3,2),(2,3),and (3,3) have been inserted.
I've tried at multiple points in the code to check for if a tuple (a,a) exists and when I print prior to inserting it shows no such tuple. So how do I avoid inserting such tuples seeing as they don't even appear to exist in the list before I insert?
You have one parameter too much in you call to add(). It should look like this:
if place.data == conn_1 and conn_1 != conn_2:
# place is the Place instance described by conn_1.
# Let's connect it to conn_2!
place.connected_to.add(conn_2)
And you don't need to iterate through all the Places, instead use objects.get or objects.filter depending if data is unique or not. For example, if it is unique, use:
for source, target in connections:
Place.objects.get(data=source).connected_to.add(Place.objects.get(data=target))
(and probably add the unique=True attribute to the data field)

SQLAlchemy: Operating on results

I'm trying to do something relatively simple, spit out the column names and respective column values, and possibly filter out some columns so they aren't shown.
This is what I attempted ( after the initial connection of course ):
metadata = MetaData(engine)
users_table = Table('fusion_users', metadata, autoload=True)
s = users_table.select(users_table.c.user_name == username)
results = s.execute()
if results.rowcount != 1:
return 'Sorry, user not found.'
else:
for result in results:
for x, y in result.items()
print x, y
I looked at the API on SQLAlchemy ( v.5 ) but was rather confused. my 'result' in 'results' is a RowProxy, yet I don't think it's returning the right object for the .items() invocation.
Let's say my table structure is so:
user_id user_name user_password user_country
0 john a9fu93f39uf usa
i want to filter and specify the column names to show ( i dont want to show the user_password obviously ) - how can I accomplish this?
A SQLAlchemy RowProxy object has dict-like methods -- .items() to get all name/value pairs, .keys() to get just the names (e.g. to display them as a header line, then use .values() for the corresponding values or use each key to index into the RowProxy object, etc, etc -- so it being a "smart object" rather than a plain dict shouldn't inconvenience you unduly.
You can use results instantly as an iterator.
results = s.execute()
for row in results:
print row
Selecting specific columns is done the following way:
from sqlalchemy.sql import select
s = select([users_table.c.user_name, users_table.c.user_country], users_table.c.user_name == username)
for user_name, user_country in s.execute():
print user_name, user_country
To print the column names additional to the values the way you have done it in your question should be the best because RowProxy is really nothing more than a ordered dictionary.
IMO the API documentation for SqlAlchemy is not really helpfull to learn how to use it. I would suggest you to read the SQL Expression Language Tutorial. It contains the most vital information about basic querying with SqlAlchemy.

Categories

Resources