Returning full statement with join in sqlalchemy

Returning full statement with join in sqlalchemy - python

I'm using the SQLalchemy table declaration to avoid using strings and manually generating SQL statements. This has worked quite well, except for the following use case where I'm trying to return a statement which creates a merged table with all columns from both tables, joined on an implicit PK/FK which exists between the tables.
It seems like the statement below only selects columns from the first (Result) table and I'm not sure how to generate the full select statement?
sql = Query(Results, Details).join(Details)\
.filter(Details.result_type == 'standard')\
.statement

The query construct isn't the best to use for this use case, as it will return the objects themselves. Instead, this can be done with the select construct as follows:
sql = select([Results, Details])\
.select_from(Results.join(Details))

Related

SQLite nested query

I have been searching for quite some time but did not succeed to figure out how to select the id column from a table where either of the given other columns is not null.
I tried tied a nested query like:
SELECT id, name FROM spam_table WHERE (SELECT c.name FROM pragma_table_info('spam_table') c WHERE c.name LIKE '%ham%' OR c.name LIKE '%eggs%') IS NOT NULL
Is there any way that the inner PRAGMA returns the corresponding column names to be used for the outer query. And how assure the outer query is been put together using OR
Cheers.

Is there any way that the inner PRAGMA returns the corresponding
column names to be used for the outer query.
No. There is no "dynamic" column names (or table names) in sqlite.
One way to do it in python:
execute the pragma_table_info select
fetch the results
iterate the results and create the desired sql string
execute the created sql string

Thanks #DinoCoderSaurus for pointing out that there is no dynamic column names
The code I am using need some more pythonic style but in fact it I am running
for a in spam_table[0]: # Tables Header from the pragma_table_info(spam_table)
for i in eggs: # Search terms given by the UI
if i in a:
spam_eggs.append(self.spam_table[0].index(a))
Now I know which columns to check to extract the id

Python peewee: How to select distinct values on one column before a join?

I try to join a second table (PageLikes) on a first Table (PageVisits) after selecting only distinct values on one column of the first table with the python ORM peewee.
In pure SQL I can do this:
SELECT DISTINCT(pagevisits.visitor_id), pagelikes.liked_item FROM pagevisits
INNER JOIN pagelikes on pagevisits.visitor_id = pagelikes.user_id
In peewee with python I have tried:
query = (Page.select(
fn.Distinct(Pagevisits.visitor_id),
PageLikes.liked_item)
.join(PageLIkes)
This gives me an error:
distinct() takes 1 positional argument but 2 were given
The only way I can and have used distinct with peewee is like this:
query = (Page.select(
Pagevisits.visitor_id,
PageLikes.liked_item)
.distinct()
which does not seem to work for my scenario.
So how can I select only distinct values in one table based on one column before I join another table with peewee?

I don't believe you should be encountering an error using fn.DISTINCT() in that way. I'm curious to see the full traceback. In my testing locally, I have no problems running something like:
query = (PageVisits
.select(fn.DISTINCT(PageVisits.visitor_id), PageLikes.liked_item)
.join(PageLikes))
Which produces SQL equivalent to what you're after. I'm using the latest peewee code btw.

As Papooch suggested, calling distinct on the Model seems to work:
distinct_visitors = (Pagevisits
.select(
Pagevisits.visitor_id.distinct().alias("visitor")
)
.where(Pagevisits.page_id == "Some specifc page")
.alias('distinct_visitors')
)
query = (Pagelikes
.select(fn.Count(Pagelikes.liked_item),
)
.join(distinct_visitors, on=(distinct_visitors.c.visitor = Pagelikes.user_id))
.group_by(Pagelikes.liked_item)
)

How can I use "where not exists" SQL condition in pyspark?

I have a table on Hive and I am trying to insert data in that table. I am taking data from SQL but I don't want to insert id which already exists in the Hive table. I am trying to use the same condition like where not exists. I am using PySpark on Airflow.

The exists operator doesn't exist in Spark but there are 2 join operators that can replace it : left_anti and left_semi.
If you want for example to insert a dataframe df in a hive table target, you can do :
new_df = df.join(
spark.table("target"),
how='left_anti',
on='id'
)
then you write new_df in your table.
left_anti allows you to keep only the lines which do not meet the join condition (equivalent of not exists). The equivalent of exists is left_semi.

You can use not exist directly using spark SQL on the dataframes through temp views:
table_withNull_df.createOrReplaceTempView("table_withNull")
tblA_NoNull_df.createOrReplaceTempView("tblA_NoNull")
result_df = spark.sql("""
select * from table_withNull
where not exists
(select 1 from
tblA_NoNull
where table_withNull.id = tblA_NoNull.id)
""")
This method can be preferred to left anti joins since they can cause unexpected BroadcastNestedLoopJoin resulting in a broadcast timeout (even without explicitly requesting the broadcast in the anti join).
After that you can do write.mode("append") to insert the previously not encountered data.
Example taken from here

IMHO I don't think exists such a property in Spark. I think you can use 2 approaches:
A workaround with the UNIQUE condition (typical of relational DB): in this way when you try to insert (in append mode) an already existing record you'll get an exception that you can properly handle.
Read the table in which you want to write, outer join it with the data that you want to add to the aforementioned table and then write the result in overwrite mode (but I think that the first solution may be better in performance).
For more details feel free to ask

Selecting the first item of an ARRAY with PostgreSQL/SqlAlchemy

Trying to move some queries I run daily into an automated script. I have one in Postgres like the below:
SELECT regexp_split_to_array(col1, "|")[1] AS item, COUNT(*) AS itemcount FROM Tabel1 GROUP BY item ORDER BY itemcount
In SqlAlchemy I have this:
session.query((func.regexp_split_to_array(model.table1.col1, "|")[1]).label("item"), func.count().label("itemcount")).group_by("item").order_by("itemcount")
Python can't "get_item" since it's not actually a collection. I've looked through the docs and can't seem to find something that would let me do this without running raw SQL using execute (which I can do and works, but was looking for a solution for next time).

SQLAlchemy does support indexing with [...]. If you declare a type of a column that you have to be of type postgresql.ARRAY, then it works:
table2 = Table("table2", meta, Column("col1", postgresql.ARRAY(String)))
q = session.query(table2.c.col1[1])
print(q.statement.compile(dialect=postgresql.dialect()))
# SELECT table2.col1[%(col1_1)s] AS anon_1
# FROM table2
The reason why your code doesn't work is that SQLAlchemy does not know that func.regexp_split_to_array(...) returns an array, since func.foo produces a generic function for convenience. To make it work, we need to make sure SQLAlchemy knows the return type of the function, by specifying the type_ parameter:
q = session.query(func.regexp_split_to_array(table1.c.col1, "|", type_=postgresql.ARRAY(String))[1].label("item"))
print(q.statement.compile(dialect=postgresql.dialect()))
# SELECT (regexp_split_to_array(table1.col1, %(regexp_split_to_array_1)s))[%(regexp_split_to_array_2)s] AS item
# FROM table1

Python SQLite Return Default String For Non-Existent Row

I have a DB with ID/Topic/Definition columns. When a select query is made, with possibly hundreds of parameters, I would like the fetchall call to also return the topic of any non-existent rows with a default text (i.e. "Not Found").
I realize this could be done in a loop, but that would query the DB every cycle and have a significant performance hit. With the parameters joined by "OR" in a single select statement the search is nearly instantaneous.
Is there a way to get a return of the query (topic) with default text for non-existent rows in SQLite?
Table Structure (named "dictionary")
ID|Topic|Definition
1|wd1|def1
2|wd3|def3
Sample Query
SELECT Topic,Definition FROM dictionary WHERE Topic = "wd1" or Topic = "wd2" or topic = "wd3"'
Desired Return
[(wd1, def1), (wd2, "Not Found"), (wd3, def3)]

To get data like wd2 out of a query, such data must be in the database in the first place.
You could put it into a temporary table, or use a common table expression.
To include rows without a match, use an outer join:
WITH IDs(ID) AS ( VALUES ('wd1'), ('wd2'), ('wd3') )
SELECT Topic,
IFNULL(Definition, 'Not Found') AS Definition
FROM IDs
LEFT JOIN dictionary USING (ID);

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Returning full statement with join in sqlalchemy - python

The query construct isn't the best to use for this use case, as it will return the objects themselves. Instead, this can be done with the select construct as follows: sql = select([Results, Details])\ .select_from(Results.join(Details))

Related

SQLite nested query

Python peewee: How to select distinct values on one column before a join?

How can I use "where not exists" SQL condition in pyspark?

Selecting the first item of an ARRAY with PostgreSQL/SqlAlchemy

Python SQLite Return Default String For Non-Existent Row

Categories

Resources