I have a table to log user actions. I dont have different columns for different action types yet for my newest task, I have to find the average time difference inbetween some actions.
class UserAction(db.Model):
__tablename__ = "user_action"
id = Column(Integer, nullable=False, primary_key=True)
user_id = Column(Integer, ForeignKey('user.id))
case_id = Column(Integer, ForeignKey('case.id'))
category_id = Column(Integer)
action_time = Column(DateTime(), server_default=func.now())
So I want to do something like,
x = session.query(func.avg(UserAction.action_time)).filter(UserAction.category_id == 1).all()
y = session.query(func.avg(UserAction.action_time)).filter(UserAction.category_id == 2).all()
dif = x - y
or
session.query(func.avg(func.justify_hours(timestamp_column1) - func.justify_hours(timestamp_column1))).all()
I can create a new table to log each case's actions' in different columns to do it like thw first way but that wont be efficient. But if i do it like the second way, the output won't be correct becasue not all actions are done by all cases. So im a little bit stuck. How can i achieve this?
Thanks
Related
I have a scenario to iterate up session_number column for related user_name. If a user created a session before I'll iterate up the last session_number but if a user created session for the first time session_number should start from 1. I tried to illustrate on below. Right now I handle this by using logic but try to find more elegant way to do that in SqlAlchemy.
id - user_name - session_number
1 user_1 1
2 user_1 2
3 user_2 1
4 user_1 3
5 user_2 2
Here is my python code of the table. My database is PostgreSQL and I'm using alembic to upgrade tables. Right now it continues to iterate up the session_number regardless user_name.
class UserSessions(db.Model):
__tablename__ = 'user_sessions'
id = db.Column(db.Integer, primary_key=True, unique=True)
username = db.Column(db.String, nullable=False)
session_number = db.Column(db.Integer, Sequence('session_number_seq', start=0, increment=1))
created_at = db.Column(db.DateTime)
last_edit = db.Column(db.DateTime)
__table_args__ = (
db.UniqueConstraint('username', 'session_number', name='_username_session_number_idx_'),
)
I've searched on the internet for this situation but those were not like my problem. Is it possible to achieve this with SqlAlchemy/PostgreSQL actions?
First, I do not know of any "pure" solution for this situation by using either SqlAlchemy or Postgresql or a combination of the two.
Although it might not be exactly the solution you are looking for, I hope it will give you some ideas.
If you wanted to calculate the session_number for the whole table without it being stored, i would use the following query or a variation of thereof:
def get_user_sessions_with_rank():
expr = (
db.func.rank()
.over(partition_by=UserSessions.username, order_by=[UserSessions.id])
.label("session_number")
)
subq = db.session.query(UserSessions.id, expr).subquery("subq")
q = (
db.session.query(UserSessions, subq.c.session_number)
.join(subq, UserSessions.id == subq.c.id)
.order_by(UserSessions.id)
)
return q.all()
Alternatively, I would actually add a column_property to the model compute it on the fly for each instance of UserSessions. it is not as efficient in calculation, but for queries filtering by specific user it should be good enough:
class UserSessions(db.Model):
__tablename__ = "user_sessions"
id = db.Column(db.Integer, primary_key=True, unique=True)
username = db.Column(db.String, nullable=False)
created_at = db.Column(db.DateTime)
last_edit = db.Column(db.DateTime)
# must define this outside of the model definition because of need for aliased
US2 = db.aliased(UserSessions)
UserSessions.session_number = db.column_property(
db.select(db.func.count(US2.id))
.where(US2.username == UserSessions.username)
.where(US2.id <= UserSessions.id)
.scalar_subquery()
)
In this case, when you query for UserSessions, the session_number will be fetched from the database, while being None for newly created instances.
So I am currently trying to build a new app using flask-restful for the backend, and I am still learning since all of this is quite new to me. I already set up everything with several different mySQL tables detailed next and all the relationships between them, and everything seems to be working fine, except than the insert process of new data is quite slow.
Here is a (simplified) explanation of the current setup I have. Basically, I first of all have one Flights table.
class FlightModel(db.Model):
__tablename__ = "Flights"
flightid = db.Column(db.Integer, primary_key = True, nullable=False)
[Other properties]
reviewid = db.Column(db.Integer, db.ForeignKey('Reviews.reviewid'), index = True, nullable = False)
review = db.relationship("ReviewModel", back_populates="flight", lazy='joined')
This table then points to a Reviews table, in which I store the global review left by the user.
class ReviewModel(db.Model):
__tablename__ = "Reviews"
reviewid = db.Column(db.Integer, primary_key = True, nullable = False)
[Other properties]
depAirportReviewid = db.Column(db.Integer, db.ForeignKey('DepAirportReviews.reviewid'), index=True, nullable = False)
arrAirportReviewid = db.Column(db.Integer, db.ForeignKey('ArrAirportReviews.reviewid'), index=True, nullable = False)
airlineReviewid = db.Column(db.Integer, db.ForeignKey('AirlineReviews.reviewid'), index=True, nullable = False)
flight = db.relationship("FlightModel", uselist=False, back_populates="review", lazy='joined')
depAirportReview = db.relationship("DepAirportReviewModel", back_populates="review", lazy='joined')
arrAirportReview = db.relationship("ArrAirportReviewModel", back_populates="review", lazy='joined')
airlineReview = db.relationship("AirlineReviewModel", back_populates="review", lazy='joined')
Then, a more detailed review can be left regarding different aspects of the flights, stored in yet another table (for example, in the following DepAirportReviews table: there are three tables in total at this level).
class DepAirportReviewModel(db.Model):
__tablename__ = "DepAirportReviews"
reviewid = db.Column(db.Integer, primary_key = True, nullable = False)
[Other properties]
review = db.relationship("ReviewModel", uselist=False, back_populates="depAirportReview", lazy='joined')
The insert process is slow (it typically takes 1 second per flight to insert, which is a problem when I try to bulk insert a few hundreds of them).
I understand this is because of all these relationships and all the trips to the database it implies, in order to retrieve the different ids for the different tables. Is it correct? Is there anything I could do to solve this, or will I need to redesign the tables to remove some of these relationships?
Thanks for any help!
EDIT: displaying the SQL executed directly showed what I expected: 7 simple queries in total are executed for each insertion, each taking ~300ms. It's quite long, I guess it is mostly due to the time to reach the server. Nothing to be done except remvoing some foreign keys, right?
We have two tables:
Table 1: EventLog
class EventLog(Base):
""""""
__tablename__ = 'event_logs'
id = Column(Integer, primary_key=True, autoincrement=True)
# Keys
event_id = Column(Integer)
data = Column(String)
signature = Column(String)
# Unique constraint
__table_args__ = (UniqueConstraint('event_id', 'signature'),)
Table 2: Machine_Event_Logs
class Machine_Event_Logs(Base):
""""""
__tablename__ = 'machine_event_logs'
id = Column(Integer, primary_key=True, autoincrement=True)
# Keys
machine_id = Column(String, ForeignKey("machines.id"))
event_log_id = Column(String, ForeignKey("event_logs.id"))
event_record_id = Column(Integer)
time_created = Column(String)
# Unique constraint
__table_args__ = (UniqueConstraint('machine_id', 'event_log_id', 'event_record_id', 'time_created'),)
# Relationships
event_logs = relationship("EventLog")
The relationship between EventLogs and Machine_Event_Logs is 1 to many.
Whereby we register a unique event log into the EventLogs table and then register millions of entries into Machine_Event_Logs for every time we encounter that event.
Goal: We're trying to join both table to display the entire timeline of event logs captured.
We've tried multiple combinations of the merge() function in Panda Dataframe but it only returns a bunch of NaN or empty. For example:
pd.merge(event_logs, machine_event_logs, how='left', left_on='id', right_on='event_log_id')
Any ideas on how to solve this?
Thank in in advance for your assistance.
According to your data schema, you have incompatible types where id in event_logs is an Integer and event_log_id in machine_event_logs is a String column. In Python the equality of a string and its equivalent numeric value yields false:
print('0'==0)
# False
Therefore your pandas left join merge returns all NAN on right hand side since no matches are successfully found. Consider converting to align types for proper merging:
event_logs['id'] = event_logs['id'].astype(str)
OR
machine_event_logs['event_log_id'] = machine_event_logs['event_log_id'].astype(int)
So I'm quite new to SQLAlchemy.
I have a model Showing which has about 10,000 rows in the table. Here is the class:
class Showing(Base):
__tablename__ = "showings"
id = Column(Integer, primary_key=True)
time = Column(DateTime)
link = Column(String)
film_id = Column(Integer, ForeignKey('films.id'))
cinema_id = Column(Integer, ForeignKey('cinemas.id'))
def __eq__(self, other):
if self.time == other.time and self.cinema == other.cinema and self.film == other.film:
return True
else:
return False
Could anyone give me some guidance on the fastest way to insert a new showing if it doesn't exist already. I think it is slightly more complicated because a showing is only unique if the time, cinmea, and film are unique on a showing.
I currently have this code:
def AddShowings(self, showing_times, cinema, film):
all_showings = self.session.query(Showing).options(joinedload(Showing.cinema), joinedload(Showing.film)).all()
for showing_time in showing_times:
tmp_showing = Showing(time=showing_time[0], film=film, cinema=cinema, link=showing_time[1])
if tmp_showing not in all_showings:
self.session.add(tmp_showing)
self.session.commit()
all_showings.append(tmp_showing)
which works, but seems to be very slow. Any help is much appreciated.
If any such object is unique based on a combination of columns, you need to mark these as a composite primary key. Add the primary_key=True keyword parameter to each of these columns, dropping your id column altogether:
class Showing(Base):
__tablename__ = "showings"
time = Column(DateTime, primary_key=True)
link = Column(String)
film_id = Column(Integer, ForeignKey('films.id'), primary_key=True)
cinema_id = Column(Integer, ForeignKey('cinemas.id'), primary_key=True)
That way your database can handle these rows more efficiently (no need for an incrementing column), and SQLAlchemy now automatically knows if two instances of Showing are the same thing.
I believe you can then just merge your new Showing back into the session:
def AddShowings(self, showing_times, cinema, film):
for showing_time in showing_times:
self.session.merge(
Showing(time=showing_time[0], link=showing_time[1],
film=film, cinema=cinema)
)
First, the database overview:
competitors - people who compete
competitions - things that people compete at
competition_registrations - Competitors registered for a particular competition
event - An "event" at a competition.
events_couples - A couple (2 competitors) competing in an event.
First, EventCouple, a Python class corresponding to events_couples, is:
class EventCouple(Base):
__tablename__ = 'events_couples'
competition_id = Column(Integer, ForeignKey('competitions.id'), primary_key=True)
event_id = Column(Integer, ForeignKey('events.id'), primary_key=True)
leader_id = Column(Integer)
follower_id = Column(Integer)
__table_args__ = (
ForeignKeyConstraint(['competition_id', 'leader_id'], ['competition_registrations.competition_id', 'competition_registrations.competitor_id']),
ForeignKeyConstraint(['competition_id', 'follower_id'], ['competition_registrations.competition_id', 'competition_registrations.competitor_id']),
{}
)
I have a Python class, CompetitorRegistration, that corresponds to a record/row in competition_registrations. A competitor, who is registered, can compete in multiple events, but either as a "leader", or a "follower". I'd like to add to CompetitorRegistration an attribute leading, that is a list of EventCouple where the competition_id and leader_id match. This is my CompetitorRegistration class, complete with attempt:
class CompetitorRegistration(Base):
__tablename__ = 'competition_registrations'
competition_id = Column(Integer, ForeignKey('competitions.id'), primary_key=True)
competitor_id = Column(Integer, ForeignKey('competitors.id'), primary_key=True)
email = Column(String(255))
affiliation_id = Column(Integer, ForeignKey('affiliation.id'))
is_student = Column(Boolean)
registered_time = Column(DateTime)
leader_number = Column(Integer)
leading = relationship('EventCouple', primaryjoin=and_('CompetitorRegistration.competition_id == EventCouple.competition_id', 'CompetitorRegistration.competitor_id == EventCouple.leader_id'))
following = relationship('EventCouple', primaryjoin='CompetitorRegistration.competition_id == EventCouple.competition_id and CompetitorRegistration.competitor_id == EventCouple.follower_id')
However, I get:
ArgumentError: Could not determine relationship direction for primaryjoin
condition 'CompetitorRegistration.competition_id == EventCouple.competition_id
AND CompetitorRegistration.competitor_id == EventCouple.leader_id', on
relationship CompetitorRegistration.leading. Ensure that the referencing Column
objects have a ForeignKey present, or are otherwise part of a
ForeignKeyConstraint on their parent Table, or specify the foreign_keys parameter
to this relationship.
Thanks for any help, & let me know if more info is needed on the schema.
Also, another attempt of mine is visible in following — this did not error, but didn't give correct results either. (It only joined on the competition_id, and completely ignored the follower_id)
Your leading's condition mixes expression and string to be eval()ed. And following's condition mixes Python and SQL operators: and in Python is not what you expected here. Below are corrected examples using both variants:
leading = relationship('EventCouple', primaryjoin=(
(competition_id==EventCouple.competition_id) & \
(competitor_id==EventCouple.leader_id)))
leading = relationship('EventCouple', primaryjoin=and_(
competition_id==EventCouple.competition_id,
competitor_id==EventCouple.leader_id))
following = relationship('EventCouple', primaryjoin=\
'(CompetitorRegistration.competition_id==EventCouple.competition_id) '\
'& (CompetitorRegistration.competitor_id==EventCouple.follower_id)')