I am querying a variety of different tables in a mysql database with sqlalchemy, and SQL query code.
My issue right now is renaming some of the columns being joined. The queries are all coming into one dataframe.
SELECT *
FROM original
LEFT JOIN table1
on original.id = table1.t1key
LEFT JOIN table2
on original.id = table2.t2key
LEFT JOIN table3
on original.id = table3.t3key;
All I actually want to get from those tables is a single column added to my query. Each table has a column with the same name. My approach to using an alias has been as below,
table1.columnchange AS 'table1columnchange'
table2.columnchange AS 'table2columnchange'
table3.columnchange AS 'table3columnchange'
But the variety of ways I've tried to implement this ends up with annoying errors.
I am querying around 20 different tables as well, so using SELECT * at the beginning while inefficient is ideal for the sake of ease.
The output I'm looking for is a dataframe that has each of the columns I need in it (which I am trying to then filter and build modeling with python for). I am fine with managing the query through sqlalchemy into pandas, the alias is what is giving me grief right now.
Thanks in advance
You can use nested queries:
SELECT
original.column1 as somename,
table1.column1 as somename1,
table2.column1 as somename2
FROM
(SELECT
column1
FROM
original
) original
LEFT JOIN (
SELECT
column1
FROM
table1
) table1 ON original.id = table1.t1key
LEFT JOIN (
SELECT
column1
FROM
table2
) table2 ON original.id = table2.t2key
Related
I am having the following query:
select t2.col as tag, count(*)
from my_table t1 JOIN TABLE(SPLIT(tags,'/')) as t2
where t2.col != ''
group by tag
(TABLE(SPLIT(tags,'/')) - create a temp table by splitting the tags field.)
Query works just fine running it on the database directly, but having trouble to create the query with this join clause using SQLAlchemy.
How can i perform a join with a table that created on the fly? and uses functions that aren't defined in SQLAlchemy.
Thanks.
I want to pass a local df as a table to inner join to an SQL server like so.
sql = """
select top 10000 *
from Table1 as t
inner join {} as a on t.id= a.id
""".format(pandas_df)
results = pd.read_sql_query(sql,conn)
This is obviously not the way to do it.
Any ideas?
Thanks!
You need to convert your dataframe to a SQL table before reading it.
Use pd.pandas_df.to_sql(name_of_table, con)
I see two main options, depending on the data size of your id's. The simplest way would be to add the id to an IN clause in your SQL statement.
This approach is useful if you don't have write permission on the database, but you limited by the maximum batch size of SQL, which iirc is around 256Mb.
From your id series, you create a tuple of id's you're interested in, then cast the tuple to a string to concatenate with you sql statement.
sql = """
select top 10000 *
from Table1 as t
where t.id in """ + str(tuple(pandas.df['id'].values))
results = pd.read_sql_query(sql,conn)
Can use df.to_sql to load it to the df.
I try to join a second table (PageLikes) on a first Table (PageVisits) after selecting only distinct values on one column of the first table with the python ORM peewee.
In pure SQL I can do this:
SELECT DISTINCT(pagevisits.visitor_id), pagelikes.liked_item FROM pagevisits
INNER JOIN pagelikes on pagevisits.visitor_id = pagelikes.user_id
In peewee with python I have tried:
query = (Page.select(
fn.Distinct(Pagevisits.visitor_id),
PageLikes.liked_item)
.join(PageLIkes)
This gives me an error:
distinct() takes 1 positional argument but 2 were given
The only way I can and have used distinct with peewee is like this:
query = (Page.select(
Pagevisits.visitor_id,
PageLikes.liked_item)
.distinct()
which does not seem to work for my scenario.
So how can I select only distinct values in one table based on one column before I join another table with peewee?
I don't believe you should be encountering an error using fn.DISTINCT() in that way. I'm curious to see the full traceback. In my testing locally, I have no problems running something like:
query = (PageVisits
.select(fn.DISTINCT(PageVisits.visitor_id), PageLikes.liked_item)
.join(PageLikes))
Which produces SQL equivalent to what you're after. I'm using the latest peewee code btw.
As Papooch suggested, calling distinct on the Model seems to work:
distinct_visitors = (Pagevisits
.select(
Pagevisits.visitor_id.distinct().alias("visitor")
)
.where(Pagevisits.page_id == "Some specifc page")
.alias('distinct_visitors')
)
query = (Pagelikes
.select(fn.Count(Pagelikes.liked_item),
)
.join(distinct_visitors, on=(distinct_visitors.c.visitor = Pagelikes.user_id))
.group_by(Pagelikes.liked_item)
)
I have two tables, Table A and Table B. I have added one column to Table A, record_id. Table B has record_id and the primary ID for Table A, table_a_id. I am looking to deprecate Table B.
Relationships exist between Table B's table_a_id and Table A's id, if that helps.
Currently, my solution is:
db.execute("UPDATE table_a t
SET record_id = b.record_id
FROM table_b b
WHERE t.id = b.table_a_id")
This is my first time using this ORM -- I'd like to see if there is a way I can use my Python models and the actual functions SQLAlchemy gives me to be more 'Pythonic' rather than just dumping a Postgres statement that I know works in an execute call.
My solution ended up being as follows:
(db.query(TableA)
.filter(TableA.id == TableB.table_a_id,
TableA.record_id.is_(None))
.update({TableA.record_id: TableB.record_id}, synchronize_session=False))
This leverages the ability of PostgreSQL to do updates based on implicit references of other tables, which I did in my .filter() call (this is analogous to a WHERE in a JOIN query). The solution was deceivingly simple.
Using Python, I try to create a new SQLite table in a database from a query result which is executed on two other tables in that database.
For the record, the query is
CREATE TABLE results AS
SELECT table_1.*, table_2.*
FROM table_1
LEFT JOIN table_2
ON table_1.ID_1 = table_2.ID_2
UNION ALL
SELECT table_1.*, table_2.*
FROM table_2
LEFT JOIN table_1
ON table_1.ID_1 = table_2.ID_2
WHERE table_1.ID_1 IS NULL
which is supposed to be a workaround for a FULL OUTER JOIN which is not directly available in SQLite (this method can be found on different SO threads on this topic).
However, this operation is slow on my tables with ~1 million rows each... so slow that I get the impression it's going to take hours when hitting refresh in the Explorer Window showing the database's file size while updating.
How can I get this faster? I already did a lot of research on this and most of the time, people are talking about using transactions because otherwise, each row will open up a new connection to the database or whatever... however, I could not find a working example on how to use this.
My two approaches so far which are way too slow:
Using Python's sqlite3 module:
# open sqlite database
conn = sqlite3.connect('Database.sqlite')
# get a cursor
cursor = conn.cursor()
# start query
cursor.execute("""
CREATE TABLE results AS
SELECT table_1.*, table_2.*
FROM table_1
LEFT JOIN table_2
ON table_1.ID = table_2.ID
UNION ALL
SELECT table_1.*, table_2.*
FROM table_2
LEFT JOIN table_1
ON table_1.ID = table_2.NETWORK_ID
WHERE table_1.ID IS NULL
;""")
Using sqalchemy:
from sqlalchemy import create_engine, event
from sqlite3 import dbapi2 as sqlite
# create database engine
engine = create_engine('sqlite:///Database.sqlite')
# open sqlite database
connection = engine.connect()
# query
with connection.begin() as trans:
connection.execute("BEGIN TRANSACTION;")
connection.execute("""
CREATE TABLE results AS
SELECT table_1.*, table_2.*
FROM table_1
LEFT JOIN table_2
ON table_1.ID = table_2.ID
UNION ALL
SELECT table_1.*, table_2.*
FROM table_2
LEFT JOIN table_1
ON table_1.ID = table_2.ID
WHERE table_1.ID IS NULL
;""")
trans.commit()
Argh... the problem was not with the code above but with the tables that provide the input for the JOIN. I lacked some accuracy while creating them so that ID column of table_1 was of type INTEGER while ID column of table_2 was of type STRING.
Unfortunately, doing a JOIN on columns of different types does not throw an error - instead I guess that something very expensive is happening internally like casting each STRING to INTEGER, comparing and then casting back to STRING or vice versa.
So, the moral of the story: If your CREATE TABLE from a JOIN is painfully slow then check the data types of your JOIN columns for equality.