We have two tables:
Table 1: EventLog
class EventLog(Base):
""""""
__tablename__ = 'event_logs'
id = Column(Integer, primary_key=True, autoincrement=True)
# Keys
event_id = Column(Integer)
data = Column(String)
signature = Column(String)
# Unique constraint
__table_args__ = (UniqueConstraint('event_id', 'signature'),)
Table 2: Machine_Event_Logs
class Machine_Event_Logs(Base):
""""""
__tablename__ = 'machine_event_logs'
id = Column(Integer, primary_key=True, autoincrement=True)
# Keys
machine_id = Column(String, ForeignKey("machines.id"))
event_log_id = Column(String, ForeignKey("event_logs.id"))
event_record_id = Column(Integer)
time_created = Column(String)
# Unique constraint
__table_args__ = (UniqueConstraint('machine_id', 'event_log_id', 'event_record_id', 'time_created'),)
# Relationships
event_logs = relationship("EventLog")
The relationship between EventLogs and Machine_Event_Logs is 1 to many.
Whereby we register a unique event log into the EventLogs table and then register millions of entries into Machine_Event_Logs for every time we encounter that event.
Goal: We're trying to join both table to display the entire timeline of event logs captured.
We've tried multiple combinations of the merge() function in Panda Dataframe but it only returns a bunch of NaN or empty. For example:
pd.merge(event_logs, machine_event_logs, how='left', left_on='id', right_on='event_log_id')
Any ideas on how to solve this?
Thank in in advance for your assistance.
According to your data schema, you have incompatible types where id in event_logs is an Integer and event_log_id in machine_event_logs is a String column. In Python the equality of a string and its equivalent numeric value yields false:
print('0'==0)
# False
Therefore your pandas left join merge returns all NAN on right hand side since no matches are successfully found. Consider converting to align types for proper merging:
event_logs['id'] = event_logs['id'].astype(str)
OR
machine_event_logs['event_log_id'] = machine_event_logs['event_log_id'].astype(int)
Related
I have a table to log user actions. I dont have different columns for different action types yet for my newest task, I have to find the average time difference inbetween some actions.
class UserAction(db.Model):
__tablename__ = "user_action"
id = Column(Integer, nullable=False, primary_key=True)
user_id = Column(Integer, ForeignKey('user.id))
case_id = Column(Integer, ForeignKey('case.id'))
category_id = Column(Integer)
action_time = Column(DateTime(), server_default=func.now())
So I want to do something like,
x = session.query(func.avg(UserAction.action_time)).filter(UserAction.category_id == 1).all()
y = session.query(func.avg(UserAction.action_time)).filter(UserAction.category_id == 2).all()
dif = x - y
or
session.query(func.avg(func.justify_hours(timestamp_column1) - func.justify_hours(timestamp_column1))).all()
I can create a new table to log each case's actions' in different columns to do it like thw first way but that wont be efficient. But if i do it like the second way, the output won't be correct becasue not all actions are done by all cases. So im a little bit stuck. How can i achieve this?
Thanks
I have got a not very common join and filter problem.
Here are my models;
class Order(Base):
id = Column(Integer, primary_key=True)
order_id = Column(String(19), nullable=False)
... (other fields)
class Discard(Base):
id = Column(Integer, primary_key=True)
order_id = Column(String(19), nullable=False)
I want to query all and full instances of Order but just exclude those that have a match in Discard.order_id based on Order.order_id field. As you can see there is no relationship between order_id fields.
I've tried outer left join, notin_ but ended up with no success.
With this answer I've achieved desired results.
Here is my code;
orders = (
session.query(Order)
.outerjoin(Discard, Order.order_id == Discard.order_id)
.filter(Discard.order_id == None) # noqa: E711
.all()
)
I was paying too much attention to flake8 wrong syntax message at Discard.order_id == None and was using Discard.order_id is None. It appeared out they were rendered differently by sqlalchemy.
I have two tables, Products and Orders, inside my Flask-SqlAlchemy setup, and they are linked so an order can have several products:
class Products(db.Model):
id = db.Column(db.Integer, primary_key=True)
....
class Orders(db.Model):
guid = db.Column(db.String(36), default=generate_uuid, primary_key=True)
products = db.relationship(
"Products", secondary=order_products_table, backref="orders")
....
linked via:
order_products_table = db.Table("order_products_table",
db.Column('orders_guid', db.String(36), db.ForeignKey('orders.guid')),
db.Column('products_id', db.Integer, db.ForeignKey('products.id'))
# db.Column('license', dbString(36))
)
For my purposes, each product in an order will receive a unique license string, which logically should be added to the order_products_table rows of each product in an order.
How do I declare this third license column on the join table order_products_table so it gets populated it as I insert an Order?
I've since found the documentation for the Association Object from the SQLAlchemy docs, which allows for exactly this expansion to the join table.
Updated setup:
# Instead of a table, provide a model for the JOIN table with additional fields
# and explicit keys and back_populates:
class OrderProducts(db.Model):
__tablename__ = 'order_products_table'
orders_guid = db.Column(db.String(36), db.ForeignKey(
'orders.guid'), primary_key=True)
products_id = db.Column(db.Integer, db.ForeignKey(
'products.id'), primary_key=True)
order = db.relationship("Orders", back_populates="products")
products = db.relationship("Products", back_populates="order")
licenses = db.Column(db.String(36), nullable=False)
class Products(db.Model):
id = db.Column(db.Integer, primary_key=True)
order = db.relationship(OrderProducts, back_populates="order")
....
class Orders(db.Model):
guid = db.Column(db.String(36), default=generate_uuid, primary_key=True)
products = db.relationship(OrderProducts, back_populates="products")
....
What is really tricky (but also shown on the documentation page), is how you insert the data. In my case it goes something like this:
o = Orders(...) # insert other data
for id in products:
# Create OrderProducts join rows with the extra data, e.g. licenses
join = OrderProducts(licenses="Foo")
# To the JOIN add the products
join.products = Products.query.get(id)
# Add the populated JOIN as the Order products
o.products.append(join)
# Finally commit to database
db.session.add(o)
db.session.commit()
I was at first trying to populate the Order.products (or o.products in the example code) directly, which will give you an error about using a Products class when it expects a OrderProducts class.
I also struggled with the whole field naming and referencing of the back_populates. Again, the example above and on the docs show this. Note the pluralization is entirely to do with how you want your fields named.
So I'm quite new to SQLAlchemy.
I have a model Showing which has about 10,000 rows in the table. Here is the class:
class Showing(Base):
__tablename__ = "showings"
id = Column(Integer, primary_key=True)
time = Column(DateTime)
link = Column(String)
film_id = Column(Integer, ForeignKey('films.id'))
cinema_id = Column(Integer, ForeignKey('cinemas.id'))
def __eq__(self, other):
if self.time == other.time and self.cinema == other.cinema and self.film == other.film:
return True
else:
return False
Could anyone give me some guidance on the fastest way to insert a new showing if it doesn't exist already. I think it is slightly more complicated because a showing is only unique if the time, cinmea, and film are unique on a showing.
I currently have this code:
def AddShowings(self, showing_times, cinema, film):
all_showings = self.session.query(Showing).options(joinedload(Showing.cinema), joinedload(Showing.film)).all()
for showing_time in showing_times:
tmp_showing = Showing(time=showing_time[0], film=film, cinema=cinema, link=showing_time[1])
if tmp_showing not in all_showings:
self.session.add(tmp_showing)
self.session.commit()
all_showings.append(tmp_showing)
which works, but seems to be very slow. Any help is much appreciated.
If any such object is unique based on a combination of columns, you need to mark these as a composite primary key. Add the primary_key=True keyword parameter to each of these columns, dropping your id column altogether:
class Showing(Base):
__tablename__ = "showings"
time = Column(DateTime, primary_key=True)
link = Column(String)
film_id = Column(Integer, ForeignKey('films.id'), primary_key=True)
cinema_id = Column(Integer, ForeignKey('cinemas.id'), primary_key=True)
That way your database can handle these rows more efficiently (no need for an incrementing column), and SQLAlchemy now automatically knows if two instances of Showing are the same thing.
I believe you can then just merge your new Showing back into the session:
def AddShowings(self, showing_times, cinema, film):
for showing_time in showing_times:
self.session.merge(
Showing(time=showing_time[0], link=showing_time[1],
film=film, cinema=cinema)
)
First, the database overview:
competitors - people who compete
competitions - things that people compete at
competition_registrations - Competitors registered for a particular competition
event - An "event" at a competition.
events_couples - A couple (2 competitors) competing in an event.
First, EventCouple, a Python class corresponding to events_couples, is:
class EventCouple(Base):
__tablename__ = 'events_couples'
competition_id = Column(Integer, ForeignKey('competitions.id'), primary_key=True)
event_id = Column(Integer, ForeignKey('events.id'), primary_key=True)
leader_id = Column(Integer)
follower_id = Column(Integer)
__table_args__ = (
ForeignKeyConstraint(['competition_id', 'leader_id'], ['competition_registrations.competition_id', 'competition_registrations.competitor_id']),
ForeignKeyConstraint(['competition_id', 'follower_id'], ['competition_registrations.competition_id', 'competition_registrations.competitor_id']),
{}
)
I have a Python class, CompetitorRegistration, that corresponds to a record/row in competition_registrations. A competitor, who is registered, can compete in multiple events, but either as a "leader", or a "follower". I'd like to add to CompetitorRegistration an attribute leading, that is a list of EventCouple where the competition_id and leader_id match. This is my CompetitorRegistration class, complete with attempt:
class CompetitorRegistration(Base):
__tablename__ = 'competition_registrations'
competition_id = Column(Integer, ForeignKey('competitions.id'), primary_key=True)
competitor_id = Column(Integer, ForeignKey('competitors.id'), primary_key=True)
email = Column(String(255))
affiliation_id = Column(Integer, ForeignKey('affiliation.id'))
is_student = Column(Boolean)
registered_time = Column(DateTime)
leader_number = Column(Integer)
leading = relationship('EventCouple', primaryjoin=and_('CompetitorRegistration.competition_id == EventCouple.competition_id', 'CompetitorRegistration.competitor_id == EventCouple.leader_id'))
following = relationship('EventCouple', primaryjoin='CompetitorRegistration.competition_id == EventCouple.competition_id and CompetitorRegistration.competitor_id == EventCouple.follower_id')
However, I get:
ArgumentError: Could not determine relationship direction for primaryjoin
condition 'CompetitorRegistration.competition_id == EventCouple.competition_id
AND CompetitorRegistration.competitor_id == EventCouple.leader_id', on
relationship CompetitorRegistration.leading. Ensure that the referencing Column
objects have a ForeignKey present, or are otherwise part of a
ForeignKeyConstraint on their parent Table, or specify the foreign_keys parameter
to this relationship.
Thanks for any help, & let me know if more info is needed on the schema.
Also, another attempt of mine is visible in following — this did not error, but didn't give correct results either. (It only joined on the competition_id, and completely ignored the follower_id)
Your leading's condition mixes expression and string to be eval()ed. And following's condition mixes Python and SQL operators: and in Python is not what you expected here. Below are corrected examples using both variants:
leading = relationship('EventCouple', primaryjoin=(
(competition_id==EventCouple.competition_id) & \
(competitor_id==EventCouple.leader_id)))
leading = relationship('EventCouple', primaryjoin=and_(
competition_id==EventCouple.competition_id,
competitor_id==EventCouple.leader_id))
following = relationship('EventCouple', primaryjoin=\
'(CompetitorRegistration.competition_id==EventCouple.competition_id) '\
'& (CompetitorRegistration.competitor_id==EventCouple.follower_id)')