Union of two tables with same columns using sqlalchemy - python

I'm new to sqlalchemy and am wondering how to do a union of two tables that have the same columns. I'm doing the following:
table1_and_table2 = sql.union_all(self.tables['table1'].alias("table1_subquery").select(),
self.tables['table2'].alias("table2_subquery").select())
I'm seeing this error:
OperationalError: (OperationalError) (1248, 'Every derived table must have its own alias')
(Note that self.tables['table1'] returns a sqlalchemy Table with name table1.)
Can someone point out the error or suggest a better way to combine the rows from both tables?

First, can you output the SQL generated that creates the problem? You should be able to do this by setting echo=True in your create_engine statement.
Second, and this is just a hunch, try rearranging your subqueries to this:
table1_and_table2 = sql.union_all(self.tables['table1'].select().alias("table1_subquery"),
self.tables['table2'].select().alias("table2_subquery"))
If my hunch is right it's creating aliases, then running a query and the resulting query results are re-aliased and clashing

Related

Python peewee: How to select distinct values on one column before a join?

I try to join a second table (PageLikes) on a first Table (PageVisits) after selecting only distinct values on one column of the first table with the python ORM peewee.
In pure SQL I can do this:
SELECT DISTINCT(pagevisits.visitor_id), pagelikes.liked_item FROM pagevisits
INNER JOIN pagelikes on pagevisits.visitor_id = pagelikes.user_id
In peewee with python I have tried:
query = (Page.select(
fn.Distinct(Pagevisits.visitor_id),
PageLikes.liked_item)
.join(PageLIkes)
This gives me an error:
distinct() takes 1 positional argument but 2 were given
The only way I can and have used distinct with peewee is like this:
query = (Page.select(
Pagevisits.visitor_id,
PageLikes.liked_item)
.distinct()
which does not seem to work for my scenario.
So how can I select only distinct values in one table based on one column before I join another table with peewee?
I don't believe you should be encountering an error using fn.DISTINCT() in that way. I'm curious to see the full traceback. In my testing locally, I have no problems running something like:
query = (PageVisits
.select(fn.DISTINCT(PageVisits.visitor_id), PageLikes.liked_item)
.join(PageLikes))
Which produces SQL equivalent to what you're after. I'm using the latest peewee code btw.
As Papooch suggested, calling distinct on the Model seems to work:
distinct_visitors = (Pagevisits
.select(
Pagevisits.visitor_id.distinct().alias("visitor")
)
.where(Pagevisits.page_id == "Some specifc page")
.alias('distinct_visitors')
)
query = (Pagelikes
.select(fn.Count(Pagelikes.liked_item),
)
.join(distinct_visitors, on=(distinct_visitors.c.visitor = Pagelikes.user_id))
.group_by(Pagelikes.liked_item)
)

convert SQL statement of same table join to SQLAlchemy

I have a working Postgres SQL statement that performs a table -- question -- joining on itself to only extract rows with matching foo column that have the most recent created_data:
SELECT q.*
FROM question q
INNER JOIN
(SELECT foo, MAX(created_date) AS most_current
FROM question
GROUP BY foo) grouped_q
ON q.foo = grouped_q.foo
AND q.created_date = grouped_q.most_current
I'm interested if and how this could be converted to SQLAlchemy, such that I could take advantage of the resulting rows as ORM objects?
And, if known, what kind of performance differences there might be?
FWIW, I'm able to hydrate a SQLAlchemy ORM object by looping through rows and doing the following...
for row in res.fetchall():
question = Question(**dict(zip(res.keys(), row)))
but this feels kludgy, and I'd like to keep the query in SQLAlchemy syntax if possible.
Thanks in advance.

Returning full statement with join in sqlalchemy

I'm using the SQLalchemy table declaration to avoid using strings and manually generating SQL statements. This has worked quite well, except for the following use case where I'm trying to return a statement which creates a merged table with all columns from both tables, joined on an implicit PK/FK which exists between the tables.
It seems like the statement below only selects columns from the first (Result) table and I'm not sure how to generate the full select statement?
sql = Query(Results, Details).join(Details)\
.filter(Details.result_type == 'standard')\
.statement
The query construct isn't the best to use for this use case, as it will return the objects themselves. Instead, this can be done with the select construct as follows:
sql = select([Results, Details])\
.select_from(Results.join(Details))

Spark sql version of the same query does not work whereas the normal sql query does

The normal sql query :
SELECT DISTINCT(county_geoid), state_geoid, sum(PredResponse), sum(prop_count) FROM table_a GROUP BY county_geoid;
Gives me an output. However the spark sql version of this same query used in pyspark is giving me an error. How to resolve this issue?
result_county_performance_alpha = spark.sql("SELECT distinct(county_geoid), sum(PredResponse), sum(prop_count), state_geoid FROM table_a group by county_geoid")
This gives an error :
AnalysisException: u"expression 'tract_alpha.`state_geoid`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
How to resolve this issue?
Your "normal" query should not work anywhere. The correct way to write the query is:
SELECT county_geoid, state_geoid, sum(PredResponse), sum(prop_count)
FROM table_a
GROUP BY county_geoid, state_geoid;
This should work on any database (where the columns and tables are defined and of the right types).
Your version has state_geoid in the SELECT, but it is not being aggregated. That is not correct SQL. It might happen to work in MySQL, but that is due to a (mis)feature in the database (that is finally being fixed).
Also, you almost never want to use SELECT DISTINCT with GROUP BY. And, the parentheses after the DISTINCT make no difference. The construct is SELECT DISTINCT. DISTINCT is not a function.

Sqlalchemy Core, insert statement returning * (all columns)

I am using sqlalchemy core (query builder) to do an insert using a table definition. For example:
table.insert().values(a,b,c)
and I can make it return specific columns:
table.insert().values(a,b,c).returning(table.c.id, table.c.name)
but I am using postgres which has a RETURNING * syntax, which returns all the columns in the row. Is there a way to do that with sqlalchemy core?
query = table.insert().values(a,b,c).returning(literal_column('*'))
And you can access it like
for col in execute(query, stmt):
print(col)
To get all table columns, one can also do the following query:
table.insert().values(a,b,c).returning(table)
Alternatively, you can expand table columns:
table.insert().returning(*table.c).values(a,b,c)

Categories

Resources