Pass Pandas df as table to SQL query

Pass Pandas df as table to SQL query - python

I want to pass a local df as a table to inner join to an SQL server like so.
sql = """
select top 10000 *
from Table1 as t
inner join {} as a on t.id= a.id
""".format(pandas_df)
results = pd.read_sql_query(sql,conn)
This is obviously not the way to do it.
Any ideas?
Thanks!

You need to convert your dataframe to a SQL table before reading it.
Use pd.pandas_df.to_sql(name_of_table, con)

I see two main options, depending on the data size of your id's. The simplest way would be to add the id to an IN clause in your SQL statement.
This approach is useful if you don't have write permission on the database, but you limited by the maximum batch size of SQL, which iirc is around 256Mb.
From your id series, you create a tuple of id's you're interested in, then cast the tuple to a string to concatenate with you sql statement.
sql = """
select top 10000 *
from Table1 as t
where t.id in """ + str(tuple(pandas.df['id'].values))
results = pd.read_sql_query(sql,conn)

Can use df.to_sql to load it to the df.

Related

SQLite nested query

I have been searching for quite some time but did not succeed to figure out how to select the id column from a table where either of the given other columns is not null.
I tried tied a nested query like:
SELECT id, name FROM spam_table WHERE (SELECT c.name FROM pragma_table_info('spam_table') c WHERE c.name LIKE '%ham%' OR c.name LIKE '%eggs%') IS NOT NULL
Is there any way that the inner PRAGMA returns the corresponding column names to be used for the outer query. And how assure the outer query is been put together using OR
Cheers.

Is there any way that the inner PRAGMA returns the corresponding
column names to be used for the outer query.
No. There is no "dynamic" column names (or table names) in sqlite.
One way to do it in python:
execute the pragma_table_info select
fetch the results
iterate the results and create the desired sql string
execute the created sql string

Thanks #DinoCoderSaurus for pointing out that there is no dynamic column names
The code I am using need some more pythonic style but in fact it I am running
for a in spam_table[0]: # Tables Header from the pragma_table_info(spam_table)
for i in eggs: # Search terms given by the UI
if i in a:
spam_eggs.append(self.spam_table[0].index(a))
Now I know which columns to check to extract the id

Sqlalchemy Core, insert statement returning * (all columns)

I am using sqlalchemy core (query builder) to do an insert using a table definition. For example:
table.insert().values(a,b,c)
and I can make it return specific columns:
table.insert().values(a,b,c).returning(table.c.id, table.c.name)
but I am using postgres which has a RETURNING * syntax, which returns all the columns in the row. Is there a way to do that with sqlalchemy core?

query = table.insert().values(a,b,c).returning(literal_column('*'))
And you can access it like
for col in execute(query, stmt):
print(col)

To get all table columns, one can also do the following query:
table.insert().values(a,b,c).returning(table)

Alternatively, you can expand table columns:
table.insert().returning(*table.c).values(a,b,c)

Pandas read_sql query with multiple selects

Can read_sql query handle a sql script with multiple select statements?
I have a MSSQL query that is performing different tasks, but I don't want to have to write an individual query for each case. I would like to write just the one query and pull in the multiple tables.
I want the multiple queries in the same script because the queries are related, and it making updating the script easier.
For example:
SELECT ColumnX_1, ColumnX_2, ColumnX_3
FROM Table_X
INNER JOIN (Etc etc...)
----------------------
SELECT ColumnY_1, ColumnY_2, ColumnY_3
FROM Table_Y
INNER JOIN (Etc etc...)
Which leads to two separate query results.
The subsequent python code is:
scriptFile = open('.../SQL Queries/SQLScript.sql','r')
script = scriptFile.read()
engine = sqlalchemy.create_engine("mssql+pyodbc://UserName:PW!#Table")
connection = engine.connect()
df = pd.read_sql_query(script,connection)
connection.close()
Only the first table from the query is brought in.
Is there anyway I can pull in both query results (maybe with a dictionary) that will prevent me from having to separate the query into multiple scripts.

You could do the following:
queries = """
SELECT ColumnX_1, ColumnX_2, ColumnX_3
FROM Table_X
INNER JOIN (Etc etc...)
---
SELECT ColumnY_1, ColumnY_2, ColumnY_3
FROM Table_Y
INNER JOIN (Etc etc...)
""".split("---")
Now you can query each table and concat the result:
df = pd.concat([pd.read_sql_query(q, connection) for q in queries])
Another option is to use UNION on the two results i.e. do the concat in SQL.

How to join two queries in SQLAlchemy?

In this example, I am using the sample MySQL classicmodels database.
So I have two queries:
products = session.query(Products)
orderdetails = session.query(OrderDetails)
Let's assume I cannot make any more queries to the database after this and I can only join these two queries from this point on.
I want to do an outer join on them to be able to do something like this:
for orderdetail, product in query:
print product.productName, product.productCode, orderdetails.quantityOrdered
However, whenever I do an outerjoin on this, I can only seem to get a left join.
query = orderdetails.outerjoin(Products)
Code like this yields only orderdetails columns:
for q in query:
# Only gives orderdetails columns
print q
And doing something like this:
for orderdetails, product in query:
print orderdetails, product
Gives me an error: TypeError: 'OrderDetails' object is not iterable.
What am I doing wrong? I just want columns from the Products table as well.
EDIT:
I have found my solution thanks to #univerio's answer. My real goal was to do a join on two existing queries and then do a SUM and COUNT operation on them.
SQLAlchemy basically just transforms a query object to a SQL statement. The with_entities function just changes the SELECT expression to whatever you pass to it. This is my updated solution, which includes unpacking and reading the join:
for productCode, numOrders, quantityOrdered in orderdetails.with_entities(
OrderDetails.productCode,
func.count(OrderDetails.productCode),
func.sum(OrderDetails.quantityOrdered)).group_by(OrderDetails.productCode):
print productCode, numOrders, quantityOrdered

You can overwrite the entity list with with_entities():
orderdetails.outerjoin(Products).with_entities(OrderDetails, Products)

Wildcards in column name for MySQL

I am trying to select multiple columns, but not all of the columns, from the database. All of the columns I want to select are going to start with "word".
So in pseudocode I'd like to do this:
SELECT "word%" from searchterms where onstate = 1;
More or less. I am not finding any documentation on how to do this - is it possible in MySQL? Basically, I am trying to store a list of words in a single row, with an identifier, and I want to associate all of the words with that identifier when I pull the records. All of the words are going to be joined as a string and passed to another function in an array/dictionary with their identifier.
I am trying to make as FEW database calls as possible to keep speedy code.
Ok, here's another question for you guys:
There are going to be a variable number of columns with the name "word" in them. Would it be faster to do a separate database call for each row, with a generated Python query per row, or would it be faster to simply SELECT *, and only use the columns I needed? Is it possible to say SELECT * NOT XYZ?

No, SQL doesn't provide you with any syntax to do such a select.
What you can do is ask MySQL for a list of column names first, then generate the SQL query from that information.
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'your_table'
AND column_name LIKE 'word%'
let's you select the column names. Then you can do, in Python:
"SELECT * FROM your_table WHERE " + ' '.join(['%s = 1' % name for name in columns])
Instead of using string concatenation, I would recommend using SQLAlchemy instead to do the SQL generating for you.
However, if all you are doing is limit the number of columns there is no need to do a dynamic query like this at all. The hard work for the database is selecting the rows; it makes little difference to send you 5 columns out of 10, or all 10.
In that case just use a "SELECT * FROM ..." and use Python to pick out the columns from the result set.

No, you cannot dynamically produce the list of columns to be selected. It will have to be hardcoded in your final query.
Your current query would produce a result set with one column and the value of that column would be the string "word%" in all rows that satisfy the condition.

You can generate the list of column names first by using
SHOW COLUMNS IN tblname LIKE "word%"
Then loop through the cursor and generate SQL statement uses all the columns from the query above.
"SELECT {0} FROM searchterms WHERE onstate = 1".format(', '.join(columns))

This could be helpful: MySQL wildcard in select
In conclusion it is not possible in MySQL directly.
What you could do as a dirty workaround is get all the column names from the table with an initial query (http://dev.mysql.com/doc/refman/5.0/en/show-columns.html) and then compare in python if the name matches your pattern. Afterwards you could do the MySQL select statement with the found column names like this:
SELECT word1, word2, word3 from searchterms where onstate = 1;

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pass Pandas df as table to SQL query - python

I want to pass a local df as a table to inner join to an SQL server like so. sql = """ select top 10000 * from Table1 as t inner join {} as a on t.id= a.id """.format(pandas_df) results = pd.read_sql_query(sql,conn) This is obviously not the way to do it. Any ideas? Thanks!

You need to convert your dataframe to a SQL table before reading it. Use pd.pandas_df.to_sql(name_of_table, con)

Can use df.to_sql to load it to the df.

Related

SQLite nested query

Sqlalchemy Core, insert statement returning * (all columns)

Pandas read_sql query with multiple selects

How to join two queries in SQLAlchemy?

Wildcards in column name for MySQL

Categories

Resources