I'm pretty new to SQL and am trying to join some tables in SQL.
I'm using SQLite3 and Pandas and have the following table structure:
User
|
Measurement - Environment - meas_device - Device
| |
Data Unit_of_Measurement
Why do I get the result of the following SQL-query multiple times (4x)?
query = """
SELECT User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data
FROM Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE User.name = 'nicola'
"""
pd.read_sql_query(query, conn)
My guess is that I did something wrong with the joining, but I can not see what.
I hoped to be able to save a JOIN statement somewhere that works for every possible query, that's why more tables are joined than necessary for this query.
Update
I think the problem lies within the Environment table. Whenever I join this table the results get multiplied. As the Environment is a collection of meas_devices, there are multiple entries with the same Environment id.
(I could save the Environment table with the corresponding meas_device_id's as lists, but then I see no possibility to link the Environment table with the meas_device table.)
id | meas_device_id
1 | 1
1 | 2
1 | 5
2 | 3
2 | 4
Up until now i created the tables with pandas DataFrame.to_sql() therefore the id is not marked as primary key or something like that. Could this be the reason for my problem
Update 2
I found the problem. I don't think that actually helps somebody in the future. But for the sake of completeness, here the explanation. It was not really a question of how to link the tables but I neglected a crucial link. Because the Environment has multiple indices with the same value it created "open ends" that resulted in a multiplication of the results. I needed to add a cross-check between Environment.subset_id and Data.subset_id. The following query works fine:
query = f""" SELECT {SELECT}
FROM Data
JOIN Measurement ON Data.measurement_id = Measurement.id
JOIN User ON Measurement.user_id = User.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.Device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE {WHERE} AND Environment.subset_id = Data.subset_id
"""
If you need to filter on tables that produce additional rows in the result the when they are joined, don't join them and instead include them in a sub-query in the WHERE clause.
E.g.
SELECT User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data
FROM
Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
WHERE
Measurement.Environment_id IN (
SELECT Environment.id
FROM
Environment
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE Device.name = 'xy'
)
In this subquery you can join many tables without generating additional records.
If this is not an option because you want to select entries from other tables as well, you can simply add a DISTINCT to you original query.
SELECT DISTINCT
User.name, Measurement.id, Data.set_id, Data.subset_id, Data.data
FROM
Measurement
JOIN Data ON Measurement.id = Data.measurement_id
JOIN User ON Measurement.user_id = user.id
JOIN Environment ON Measurement.Environment_id = Environment.id
JOIN meas_device ON Environment.meas_dev_ids = meas_device.id
JOIN Device ON meas_device.device_id = Device.id
JOIN Unit_of_Measurement ON meas_device.Unit_id = Unit_of_Measurement.id
WHERE
User.name = 'nicola'
Related
I'm trying to perform an inner join of distinct values of three tables from an SQLite DB. I tried multiple times and failed. Please guide me.
Below is a pseudo-code of what I'm trying to achieve
sql = '''
SELECT DISTINCT lesson_id, question_id FROM lesson_practice_questions as lpq
INNER JOIN
SELECT DISTINCT topic_id, lesson_id FROM chapter_lessons as cl
WHERE cl.topic_id==2
ON cl.lesson_id = lpq.lesson_id
INNER JOIN
SELECT DISTINCT question_id, subject_id, question_type_id, knowledge_type_ids complexity_level FROM questions as q
ON q.question_id = lpq.question_id;'''
cur.execute(sql)
Many Thanks to #eshirvana for taking the time to help out!.
Perfect Solution with ambiguous error resolution for future stackoverflow reference:
sql = '''SELECT
lpq.lesson_id, cl.lesson_id,
topic_id,
q.question_id, lpq.question_id,
subject_id,
question_type_id,
knowledge_type_ids,
complexity
FROM lesson_practice_questions as lpq
INNER JOIN chapter_lessons as cl on cl.topic_id = 2 and cl.lesson_id = lpq.lesson_id
INNER JOIN questions as q ON q.question_id = lpq.question_id;'''
here is the right sql syntax , however you need to provide sample data and desired output if this is not the right output :
SELECT
lpq.lesson_id,
question_id,
topic_id,
lesson_id,
question_id,
subject_id,
question_type_id,
knowledge_type_ids,
complexity_level
FROM lesson_practice_questions as lpq
INNER JOIN chapter_lessons as cl on cl.topic_id = 2 and cl.lesson_id = lpq.lesson_id
INNER JOIN questions as q ON q.question_id = lpq.question_id;
I have a relatively complex sql statement that I want to execute with sqlalchemy ORM. But when I try to do so I always get the error {NoSuchColumnError}"Could not locate column in row for column 'transaction_out.value'". My sql statement looks as follows:
sql = """
Select
addresses.address,
transaction_out1.value As sent,
transaction_out1.transaction_id As sent_id,
transactions.block As block_sent,
transactions.time As time_sent,
transactions.txid As txid_sent,
"sent" as type
From
transaction_out INNER Join
transaction_out_address On transaction_out_address.transaction_out_id = transaction_out.id INNER Join
addresses On transaction_out_address.address_id = addresses.id INNER Join
transaction_in On transaction_in.transaction_out_id = transaction_out.id INNER Join
transactions On transaction_in.transaction_id = transactions.id INNER Join
transaction_out transaction_out1 On transaction_out1.transaction_id = transactions.id INNER Join
transactions transactions1 On transaction_out.transaction_id = transactions1.id
WHERE addresses.address=:address_string
UNION
Select
addresses.address,
transaction_out.value As received,
transaction_out.transaction_id As received_id,
transactions.block As received_block,
transactions.time As received_time,
transactions.txid As received_txid,
"received"
From
transaction_out LEFT Join
transaction_out_address On transaction_out_address.transaction_out_id = transaction_out.id LEFT Join
addresses On transaction_out_address.address_id = addresses.id LEFT Join
transaction_in On transaction_in.transaction_out_id = transaction_out.id LEFT Join
transactions On transaction_out.transaction_id = transactions.id
WHERE addresses.address=:address_string
"""
And I tried to execute the statement in the following way:
query = session.query(Address.address, TransactionOut.value, TransactionOut.id, Block.height, Transaction.time, Transaction.txid).from_statement(
stmt.bindparams(
bindparam("address_string",
value=address_string)
))
I can execute the raw sql statement with engine.execute() without any problems but I need to do it with session.query() so I can use sqlalchemy-datatables. My database looks more or less like the one here: https://dba.stackexchange.com/questions/137791/blockchain-bitcoin-as-a-database/137800#137800.
What is the problem with the way I try to execute it?
The column aliases in the raw SQL are hiding the columns from the SQLAlchemy query. Either remove them, or alter the query to accommodate them:
query = session.query(Address.address,
TransactionOut.value.label('sent'),
TransactionOut.id.label('sent_id'),
Transaction.block.label('block_sent'),
Transaction.time.label('time_sent'),
Transaction.txid.label('txid_sent')).\
from_statement(stmt).\
params(address_string=address_string)
I have 2 tables in MySQL.
One has transactions with important columns where each row has Debit account ID and Credit account ID. I have second table which contains Account name and special number associated to Account ID. I want somehow to try sql query which will take data from transactions table and assign account name and account number from second table.
I tried doing everything using two query , one would get transactions and second one would get account details and then I did iterate over dataframe and assigned everything one by one which doesn't seem to be good idea
query = "SELECT tr_id, tr_date, description, dr_acc, cr_acc, amount, currency, currency_rate, document, comment FROM transactions WHERE " \
"company_id = {} {} and deleted = 0 {} LIMIT {}, {}".format(
company_id, filter, sort, sn, en)
df = ncon.getDF(query)
df.insert(4, 'dr_name', '')
df.insert(6, 'cr_name', '')
data = tuple(list(set(df['dr_acc'].values.tolist() + df['cr_acc'].values.tolist())))
query = "SELECT account_number, acc_id, account_name FROM tb_accounts WHERE company_id = {} and deleted = 0 and acc_id in {}".format(
company_id, data)
df_accs = ncon.getDF(query)
for index, row in df_accs.iterrows():
acc = str(row['acc_id'])
ac = row['account_number']
nm = row['account_name']
indx = df.index[df['dr_acc'] == acc].tolist()
df.at[indx, 'dr_acc'] = ac
df.at[indx, 'dr_name'] = nm
indx = df.index[df['cr_acc'] == acc].tolist()
df.at[indx, 'cr_acc'] = ac
df.at[indx, 'cr_name'] = nm
What you're looking for, I think, is a SQL JOIN statement.
Taking a crack at writing a query that might work based on your code:
query = '''
SELECT transactions.tr_id,
transactions.tr_date,
transactions.description,
transactions.dr_acc,
transactions.cr_acc,
transactions.amount,
transactions.currency,
transactions.currency_rate,
transactions.document,
transactions.comment
FROM transactions INNER JOIN tb_accounts ON tb_accounts.acc_id = transactions.acc_id
WHERE
transactions.company_id = {} AND
tb_accounts.company_id = {} AND
transactions.deleted = 0 AND
tb_accounts.deleted = 0
ORDER BY transactions.tr_id
LIMIT 10;'''
The above query will, roughly, present query results with all the fields listed from the two tables for each pair of rows where the acc_id is the same.
NOTE, the query above will probably not have very good performance. SQL JOIN statements must be written with care, but I wrote it above in a way that's easy to understand, so as to illustrate the power of the JOIN.
You should as a matter of habit NEVER try to program something when you could use a join instead. As long as you take care to write a join properly so that it can be efficient, the MySQL engine will beat your python code for performance almost every time.
sort two dataframe and use merge for merging 2data frame
df1 = df1.sort_values(['dr_acc'], ascending=True)
df2 = df2.sort_values(['acc_id'], ascending=True)
merge2df = pd.merge(df1, df2, how='outer',
left_on=['dr_acc'], right_on=['acc_id'])
I assumed df1 is 1st query data set and df2 is 2nd query data set
sql query
'''SELECT tr_id, tr_date,
description,
dr_acc, cr_acc,
amount, currency,
currency_rate,
document,
account_number, acc_id, account_name
comment FROM transactions left join
tb_accounts on transactions.dr_acc=tb_accounts.account_number'''
I am trying to build a compound SQL query that builds a table from a join I have previously performed. (Using SqlAlchemy (Core part) with python3 and Postgresql 9.4)
I include here the relevant part of my python3 code. I first create "in_uuid_set" using a select with a group_by. Then I join "in_uuid_set" with "in_off_messages" to get "jn_in".
Finally, I try to build a new table "incoming" from "jn_in" by selecting and generating the wanted columns:
in_uuid_set = \
sa.select([in_off_messages.c.src_uuid.label('remote_uuid')])\
.select_from(in_off_messages)\
.where(in_off_messages.c.dst_uuid == local_uuid)\
.group_by(in_off_messages.c.src_uuid)\
.alias()
jn_in = in_uuid_set.join(in_off_messages,\
and_(\
in_off_messages.c.src_uuid == in_uuid_set.c.remote_uuid,\
in_off_messages.c.dst_uuid == local_uuid,\
))\
.alias()
incoming = sa.select([\
in_off_messages.c.msg_uuid.label('msg_uuid'),\
in_uuid_set.c.remote_uuid.label('remote_uuid'),\
in_off_messages.c.msg_type.label('msg_type'),\
in_off_messages.c.date_sent.label('date_sent'),\
in_off_messages.c.content.label('content'),\
in_off_messages.c.was_read.label('was_read'),\
true().label('is_incoming')]
)\
.select_from(jn_in)
Surprisingly, I get that "incoming" has more rows than "jn_in". "incoming" has 12 rows, while "jn_in" has only 2 rows. I expect that "incoming" will have the same amount of rows (2) as "jn_in".
I also include here the SQL output the SqlAlchemy generates for "incoming":
SELECT in_off_messages.msg_uuid AS msg_uuid,
anon_1.remote_uuid AS remote_uuid,
in_off_messages.msg_type AS msg_type,
in_off_messages.date_sent AS date_sent,
in_off_messages.content AS content,
in_off_messages.was_read AS was_read,
1 AS is_incoming
FROM in_off_messages,
(SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1,
(SELECT anon_1.remote_uuid AS anon_1_remote_uuid,
in_off_messages.msg_uuid AS in_off_messages_msg_uuid,
in_off_messages.orig_src_uuid AS in_off_messages_orig_src_uuid,
in_off_messages.src_uuid AS in_off_messages_src_uuid,
in_off_messages.dst_uuid AS in_off_messages_dst_uuid,
in_off_messages.msg_type AS in_off_messages_msg_type,
in_off_messages.date_sent AS in_off_messages_date_sent,
in_off_messages.content AS in_off_messages_content,
in_off_messages.was_read AS in_off_messages_was_read
FROM (SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1
JOIN in_off_messages
ON in_off_messages.src_uuid = anon_1.remote_uuid
AND in_off_messages.dst_uuid = :dst_uuid_2) AS anon_2
Something doesn't look right for me with this SQL output, mostly because I see GROUP BY too many times. I would have expected it to show up about once, but it seems like it shows up twice here.
My guesses is that somehow some braces went out of place (In the generated SQL). I also suspect that I did something wrong with the alias() thing, though I'm not sure about it.
What should I do to get the wanted result (Same amount of rows for "jn_in" and "incoming")?
After playing with the code for a while, I found a way to fix it.
The answer was eventually related to the alias().
In order to make this work, the second alias() (Of jn_in) should be omitted, like this:
in_uuid_set = \
sa.select([in_off_messages.c.src_uuid.label('remote_uuid')])\
.select_from(in_off_messages)\
.where(in_off_messages.c.dst_uuid == local_uuid)\
.group_by(in_off_messages.c.src_uuid)\
.alias()
jn_in = in_uuid_set.join(in_off_messages,\
and_(\
in_off_messages.c.src_uuid == in_uuid_set.c.remote_uuid,\
in_off_messages.c.dst_uuid == local_uuid,\
))
# <<< The alias() is gone >>>
incoming = sa.select([\
in_off_messages.c.msg_uuid.label('msg_uuid'),\
in_uuid_set.c.remote_uuid.label('remote_uuid'),\
in_off_messages.c.msg_type.label('msg_type'),\
in_off_messages.c.date_sent.label('date_sent'),\
in_off_messages.c.content.label('content'),\
in_off_messages.c.was_read.label('was_read'),\
true().label('is_incoming')]
)\
.select_from(jn_in)
It seems, however, that the first alias() (of in_uuid_set) can not be ommited. If I try to omit it, I get this error message:
E subquery in FROM must have an alias
E LINE 2: FROM (SELECT in_off_messages.src_uuid AS remote_uuid
E ^
E HINT: For example, FROM (SELECT ...) [AS] foo.
As a generalization of this, probably if you have a select that you want to put as a clause somewhere else, then you want to alias() it, however if you have a join that you want to put as a clause, you should not alias() it.
For the sake of completeness, I include here the resulting SQL of the new code:
SELECT in_off_messages.msg_uuid AS msg_uuid,
anon_1.remote_uuid AS remote_uuid,
in_off_messages.msg_type AS msg_type,
in_off_messages.date_sent AS date_sent,
in_off_messages.content AS content,
in_off_messages.was_read AS was_read,
1 AS is_incoming
FROM (SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1
JOIN in_off_messages
ON in_off_messages.src_uuid = anon_1.remote_uuid
AND in_off_messages.dst_uuid = :dst_uuid_2
Much shorter than the one at the question.
I'm working with a database that has a relationship that looks like:
class Source(Model):
id = Identifier()
class SourceA(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceB(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceC(Source, ServerOptions):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
What I want to do is join all tables Source, SourceA, SourceB, SourceC and then order_by name.
Sound easy to me but I've been banging my head on this for while now and my heads starting to hurt. Also I'm not very familiar with SQL or sqlalchemy so there's been a lot of browsing the docs but to no avail. Maybe I'm just not seeing it. This seems to be close albeit related to a newer version than what I have available (see versions below).
I feel close not that that means anything. Here's my latest attempt which seems good up until the order_by call.
Sources = [SourceA, SourceB, SourceC]
# list of join on Source
joins = [session.query(Source).join(source) for source in Sources]
# union the list of joins
query = joins.pop(0).union_all(*joins)
query seems right at this point as far as I can tell i.e. query.all() works. So now I try to apply order_by which doesn't throw an error until .all is called.
Attempt 1: I just use the attribute I want
query.order_by('name').all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "name" does not exist
Attempt 2: I just use the defined column attribute I want
query.order_by(SourceA.name).all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) missing FROM-clause entry for table "SourceA"
Is it obvious? What am I missing? Thanks!
versions:
sqlalchemy.version = '0.8.1'
(PostgreSQL) 9.1.3
EDIT
I'm dealing with a framework that wants a handle to a query object. I have a bare query that appears to accomplish what I want but I would still need to wrap it in a query object. Not sure if that's possible. Googling ...
select = """
select s.*, a.name from Source d inner join SourceA a on s.id = a.Source_id
union
select s.*, b.name from Source d inner join SourceB b on s.id = b.Source_id
union
select s.*, c.name from Source d inner join SourceC c on s.id = c.Source_id
ORDER BY "name";
"""
selectText = text(select)
result = session.execute(selectText)
# how to put result into a query. maybe Query(selectText)? googling...
result.fetchall():
Assuming that coalesce function is good enough, below examples should point you in the direction. One option automatically creates a list of children, while the other is explicit.
This is not the query you specified in your edit, but you are able to sort (your original request):
def test_explicit():
# specify all children tables to be queried
Sources = [SourceA, SourceB, SourceC]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
def test_implicit():
# get all children tables in the query
from sqlalchemy.orm import class_mapper
_map = class_mapper(Source)
Sources = [_smap.class_
for _smap in _map.self_and_descendants
if _smap != _map # #note: exclude base class, it has no `name`
]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
Your first attempt sounds like it isn't working because there is no name in Source, which is the root table of the query. In addition, there will be multiple name columns after your joins, so you will need to be more specific. Try
query.order_by('SourceA.name').all()
As for your second attempt, what is ServerA?
query.order_by(ServerA.name).all()
Probably a typo, but not sure if it's for SO or your code. Try:
query.order_by(SourceA.name).all()