I have a database where I store mouse IDs and FMRI measurement sessions, the classes (with greatly reduced columns, for convenience) look as follows:
class FMRIMeasurement(Base):
__tablename__ = "fmri_measurements"
id = Column(Integer, primary_key=True)
date = Column(DateTime)
animal_id = Column(Integer, ForeignKey('animals.id'))
class Animal(Base):
__tablename__ = "animals"
id = Column(Integer, primary_key=True)
id_eth = Column(Integer)
fmri_measurements = relationship("FMRIMeasurement", backref="animal")
I would like to create a pandas dataframe cntaining all of the details of all of the FMRIMeasurements assigned to one particular animal. Selecting data from that animal works fine:
mystring = str(session.query(Animal).filter(Animal.id_eth == 1))
print pd.read_sql_query(mystring, engine, params=[4001])
But as soon as I try to select the FMRIMeasurements it blows up. None of the follwing work.
mystring = str(session.query(Animal.fmri_measurements).filter(Animal.id_eth == 1))
mystring = str(session.query(FMRIMeasurement).filter(FMRIMeasurement.animal.id_eth == 1))
mystring = str(session.query(Animal.fmri_measurements.date).filter(Animal.id_eth == 1))
I guess I'm just using SQLAlchemy wrong, but I couldn't find anything to help me with my use case in the docs (perhaps I don't know how wthat I want to do is actually called) :-/
session.query makes a query string, you need to actually execute it using .first() or .all() to get the resultset
For instance
sql_query = session.query(Animal.fmri_measurements).filter(Animal.id_eth == 1)
result_set = sql_query.all() # execute the query and return resultset
for item in result_set:
# do work with item
# if the item changes and you want to commit the changes
session.merge(item)
# commit changes
session.commit()
Alterantively you do not need the .all() and iterating through a query object will execute it as well
sql_query = session.query(Animal.fmri_measurements).filter(Animal.id_eth == 1)
for item in sql_query:
#do something
To then get the pandas dataframe, you can run:
mystring = str(session.query(Weight))
print pd.read_sql_query(mystring,engine)
Related
Model:
class Example(Base):
__tablename__ = "example"
create_time = Column(DateTime, server_default=func.now())
time_stamps = Column(MutableList.as_mutable(ARRAY(DateTime)), server_default="{}")
update_time = Column(DateTime, server_default=func.now())
Now when I insert new example, I need to append the create_time of new example into the time_stamps ARRAY, then I need to sort it to get the newest time and that time set as a new update_time.
I managed to do it separately
def update_record(db: Session, create_time: datetime, db_record: Example):
db_record.time_stamps.append(create_time)
sorted_times = sorted(db_record.time_stamps, reverse=True)
db_record.update_time = sorted_times[0]
db_record.time_stamps = sorted_times
db.commit()
But I need to do it atomically using INSERT ON CONFLICT UPDATE clause.
So far I have:
db_dict = {"create_time": record.create_time,
"time_stamps": [record.create_time],
"update_time": record.create_time}
stm = insert(Example).values(db_dict)
do_update_stm = stm.on_conflict_do_update(constraint='my_unique_constraint',
set_=dict(??)
My question is how to access and append to values of the original conflict row in set_ inside conflict_do_update in SQLAlchemy?
Thanks
In the end I bypassed SQLAlchemy by writing a textual query where I could use this syntax to append to ARRAY
.. DO UPDATE SET time_stamps = example.time_stamps || EXCLUDED.create_time,
I am trying to build an ORM mapped SQLite database. The conception of the DB seems to work as intended but I can't seem to be able to query it properly for more complex cases. I have spent the day trying to find an existing answer to my question but nothing works. I am not sure if the issue is with my mapping, my query or both. Or if maybe querying with attributes from a many to many association table with extra data works differently.
This the DB setup:
engine = create_engine('sqlite:///')
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)
class User(Base):
__tablename__ = 'users'
# Columns
id = Column('id', Integer, primary_key=True)
first = Column('first_name', String(100))
last = Column('last_name', String(100))
age = Column('age', Integer)
quality = Column('quality', String(100))
unit = Column('unit', String(100))
# Relationships
cases = relationship('UserCaseLink', back_populates='user_data')
def __repr__(self):
return f"<User(first='{self.first}', last='{self.last}', quality='{self.quality}', unit='{self.unit}')>"
class Case(Base):
__tablename__ = 'cases'
# Columns
id = Column('id', Integer, primary_key=True)
num = Column('case_number', String(100))
type = Column('case_type', String(100))
# Relationships
users = relationship('UserCaseLink', back_populates='case_data')
def __repr__(self):
return f"<Case(num='{self.num}', type='{self.type}')>"
class UserCaseLink(Base):
__tablename__ = 'users_cases'
# Columns
user_id = Column('user_id', Integer, ForeignKey('users.id'), primary_key=True)
case_id = Column('case_id', Integer, ForeignKey('cases.id'), primary_key=True)
role = Column('role', String(100))
# Relationships
user_data = relationship('User', back_populates='cases')
case_data = relationship('Case', back_populates='users')
if __name__ == '__main__':
Base.metadata.create_all()
session = Session()
and I would like to retrieve all the cases on which a particular person is working under a certain role.
So for example I want a list of all the cases a person named 'Alex' is working on as an 'Administrator'.
In other words I would like the result of this query:
SELECT [cases].*,
[main].[users_cases].role
FROM [main].[cases]
INNER JOIN [main].[users_cases] ON [main].[cases].[id] = [main].[users_cases].[case_id]
INNER JOIN [main].[users] ON [main].[users].[id] = [main].[users_cases].[user_id]
WHERE [main].[users].[first_name] = 'Alex'
AND [main].[users_cases].[role] = 'Administrator';
So far I have tried many things along the lines of:
cases = session.query(Case).filter(User.first == 'Alex', UserCaseLink.role == 'Administrator')
but it is not working as I would like it to.
How can I modify the ORM mapping so that it does the joining for me and allows me to query easily (something like the query I tried)?
According to your calsses, the quivalent query for:
SELECT [cases].*,
[main].[users_cases].role
FROM [main].[cases]
INNER JOIN [main].[users_cases] ON [main].[cases].[id] = [main].[users_cases].[case_id]
INNER JOIN [main].[users] ON [main].[users].[id] = [main].[users_cases].[user_id]
WHERE [main].[users].[first_name] = 'Alex'
AND [main].[users_cases].[role] = 'Administrator';
is
cases = session.query(
Case.id, Case.num,Cas.type,
UserCaseLink.role
).filter(
(Case.id==UserCaseLink.case_id)
&(User.id==UserCaseLink.user_id)
&(User.first=='Alex')
&(UserCaseLink.role=='Administrator'
).all()
also, you can:
cases = Case.query\
.join(UserCaseLink,Case.id==UserCaseLink.case_id)\
.join(User,User.id==UserCaseLink.user_id)\
.filter( (User.first=='Alex') & (User.first=='Alex') )\
.all()
Good Luck
After comment
based in your comment, I think you want something like:
cases = Case.query\
.filter( (Case.case_data.cases.first=='Alex') & (Case.case_data.cases.first=='Alex') )\
.all()
where case_data connect between Case an UserCaseLink and cases connect between UserCaseLink and User as in your relations.
But,that case causes error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with dimpor.org_type has an attribute 'org_type_id'
The missage shows that the attributes combined in filter should belong to the table class
So I ended up having to compromise.
It seems the query cannot be aware of all the relationships present in the ORM mapping at all times. Instead I had to manually give it the path between the different classes for it to find all the data I wanted:
cases = session.query(Case)\
.join(Case.users)\
.join(UserCaseLink.user_data)\
.filter(User.first == 'Alex', UserCaseLink.role == 'Administrator')\
.all()
However, as it does not meet all the criteria for my original question (ie I still have to specify the joins), I will not mark this answer as the accepted one.
I have this SQL expression that I'm trying to write in SQL Alchemy
select * from candidates1 c
inner join uploaded_emails1 e
on c.id=e.candidate_id
group by e.thread_id
How would I go about doing that?
The execute method can be used to run raw SQL, like so:
from sqlalchemy import text
sql = text('select * from candidates1 c inner join uploaded_emails1 e on c.id=e.candidate_id group by e.thread_id')
result = db.engine.execute(sql)
... do stuff ...
If you have some models that you're working with, you could use the relationship field type to create a one-to-many relationship between the Candidate and the UploadedEmail, like so:
class Candidate(Base):
__tablename__ = 'candidates1'
id = Column(Integer, primary_key=True)
uploaded_emails = relationship("UploadedEmail", lazy='dynamic')
class UploadedEmail(Base):
__tablename__ = 'uploaded_emails1'
id = Column(Integer, primary_key=True)
candidate_id = Column(Integer, ForeignKey('candidate.id'))
thread_id = Column(Integer)
And in your code, you might use that like this (including the group_by)
candidate_id = 1
c = Candidate.query.filter_by(id=candidate_id).first()
thread_id_results = c.uploaded_emails.with_entities(UploadedEmail.thread_id).group_by(UploadedEmail.thread_id).all()
thread_ids = [row[0] for row in thread_id_results]
Note that you have to use the .with_entities clause to specify the columns you would like to select, and then the fact that you are specifying the thread_id column. If you don't do this, you'll get errors along the lines of "Expression #X of SELECT list is not in GROUP BY clause and contains nonaggregated column ... which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by".
Sorry I didn't provide enough information to answer the question. This ended up working:
x = db_session.query(Candidate1, Uploaded_Emails1).filter(Candidate1.id == Uploaded_Emails1.candidate_id).group_by(Uploaded_Emails1.thread_id).all()
I have two tables Foo and Bar. I just added a new column x to the Bar table which has to be populated using values in Foo
class Foo(Base):
__table__ = 'foo'
id = Column(Integer, primary_key=True)
x = Column(Integer, nullable=False)
class Bar(Base):
__table__ = 'bar'
id = Column(Integer, primary_key=True)
x = Column(Integer, nullable=False)
foo_id = Column(Integer, ForeignKey('foo.id'), nullable=False)
One straightforward way to do it would be iterating over all the rows in Bar and then updating them one-by-one, but it takes a long time (there are more than 100k rows in Foo and Bar)
for b, foo_x in session.query(Bar, Foo.x).join(Foo, Foo.id==Bar.foo_id):
b.x = foo_x
session.flush()
Now I was wondering if this would be right way to do it -
mappings = []
for b, foo_x in session.query(Bar, Foo.x).join(Foo, Foo.id==Bar.foo_id):
info = {'id':b.id, 'x': foo_x}
mappings.append(info)
session.bulk_update_mappings(Bar, mappings)
There are not much examples on bulk_update_mappings out there. The docs suggest
All those keys which are present and are not part of the primary key
are applied to the SET clause of the UPDATE statement; the primary key
values, which are required, are applied to the WHERE clause.
So, in this case id will be used in the WHERE clause and then that would be updates using the x value in the dictionary right ?
The approach is correct in terms of usage. The only thing I would change is something like below
mappings = []
i = 0
for b, foo_x in session.query(Bar, Foo.x).join(Foo, Foo.id==Bar.foo_id):
info = {'id':b.id, 'x': foo_x}
mappings.append(info)
i = i + 1
if i % 10000 == 0:
session.bulk_update_mappings(Bar, mappings)
session.flush()
session.commit()
mappings[:] = []
session.bulk_update_mappings(Bar, mappings)
This will make sure you don't have too much data hanging in memory and you don't do a too big insert to the DB at a single time
Not directly related to this question, but for those searching for more performance when updating/inserting using both methods: bulk_update_mappings and bulk_insert_mappings, just add the fast_executemany to your engine as follows:
engine = create_engine(connection_string, fast_executemany=True)
You can use that parameter in sqlalchemy versions above 1.3. This parameter comes from pyodbc and it will for sure speed up your bulk requests.
I am trying to do a simple join query like this,
SELECT food._id, food.food_name, food_categories.food_categories FROM food JOIN food_categories ON food.food_category_id = food_categories._id
but keep receiving an error. Here is how my classes are setup.
class Food_Categories(db.Model):
__tablename__ = 'food_categories'
_id = db.Column(db.Integer, primary_key=True)
food_categories = db.Column(db.String(30))
class Food(db.Model):
__tablename__ = 'food'
_id = db.Column(db.Integer, primary_key=True)
food_name = db.Column(db.String(40))
food_category_id = db.Column(db.Integer, ForeignKey(Food_Categories._id))
food_category = relationship("Food_Categories")
My query function looks like this.
#app.route('/foodlist')
def foodlist():
if request.method == 'GET':
results = Food.query.join(Food_Categories.food_categories).all()
json_results = []
for result in results:
d = {'_id': result._id,
'food': result.food_name,
'food_category': result.food_categories}
json_results.append(d)
return jsonify(user=json_results)
I am using Flask. When I call the route I get this error.
AttributeError: 'ColumnProperty' object has no attribute 'mapper'
I essentially want this:
| id | food_name | food_category |
and have the food_category_id column replaced with the actual name of the food category located in other table.
Are my tables/relationships set up correctly? Is my query setup correctly?
Your tables and relationships are setup correctly. Your query needs a change.
The reason for an error is the fact that you try to perform a join on the column (Food_Categories.food_categories) instead of a Table (or mapped model object). Technically, you should replace your query with the one below to fix the error:
results = Food.query.join(Food_Categories).all()
This will fix the error, but will not generate the SQL statement you desire, because it will return instances of Food only as a result even though there is a join.
In order to build a query which will generate exactly the SQL statement you have in mind:
results = (db.session.query(Food._id, Food.food_name,
Food_Categories.food_categories,)
.join(Food_Categories)).all()
for x in results:
# print(x)
print(x._id, x.food_name, x.food_categories)
Please note that in this case the results are not instances of Food, but rather tuples with 3 column values.