Create Tempview from sql query - python

df_sales = spark.sql(
"SELECT \
s.TRANS_DT, \
s.STORE_KEY, \
s.PROD_KEY, \
s.SALES_QTY
FROM sales s \
JOIN inventory i ON s.cal_dt=i.cal_dt and s.store_key=i.store_key and s.prod_key=i.prod_key;"
)
I created sql query from 2 tempview (inventory and sales). How to convert df_sales sql query to tempview again and I can create a new SQL query.

Read this and you can write your own code as follows:
spark.sql(
"""
SELECT
s.TRANS_DT,
s.STORE_KEY,
s.PROD_KEY,
s.SALES_QTY
FROM sales s
JOIN inventory i ON s.cal_dt=i.cal_dt and s.store_key=i.store_key and s.prod_key=i.prod_key;
"""
).createOrReplaceTempView("tmpViewName")

Related

How to reset `increment` value back to 1 when data changes?

Goal
I am aiming to insert database records into MySQL using Python. But with an extra detail, I'll explain as I go along..
This is my current script (Fully functional & working):
#Get data from SQL
sqlCursor = mjmConnection.cursor()
sqlCursor.execute("SELECT sol.id, p.id, p.code,p.description, p.searchRef1, so.number, c.code, c.name, sol.requiredQty \
FROM salesorderline sol JOIN \
salesorder so \
ON sol.salesorderid = so.id JOIN \
product p \
ON sol.productid = p.id JOIN \
customer c \
ON so.customerid = c.id \
WHERE so.orderdate > DATEADD(dd,-35,CAST(GETDATE() AS date));")
#Send recieved data from SQL query from above to MySQL database
print("Sending MJM records to MySQL Database")
mjmCursorMysql = productionConnection.cursor()
for x in sqlCursor.fetchall():
a,b,c,d,e,f,g,h,i = x
mjmCursorMysql.execute("INSERT ignore INTO mjm_python (id, product_id, product_code, product_description, product_weight, \
salesorder_number, customer_code, customer_name, requiredQty) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s);", (a,b,c,d,e,f,g,h,i))
productionConnection.commit()
mjmCursorMysql.close()
sqlCursor.close()
What it does
The above script does the following:
Gets data from SQL Server
Inserts that data into MySQL
I have specifically used IGNORE in the MySQL query, to prevent duplicate id numbers.
Data will look like this:
Next..
Now - i'd like to add a column name sales_id_increment. This will start from 1 and increment for each same salesorder_number and reset back to 1 when there is a different salesorder_number. So I am wanting it to look something like this:
Question
How do I achieve this? Where do I need to look, in my Python script or the MySQL query?
You can get this column when you select the rows from SQL Server with window functions ROW_NUMBER() or DENSE_RANK() (if there are duplicate ids):
SELECT sol.id, p.id, p.code,p.description, p.searchRef1, so.number, c.code, c.name, sol.requiredQty,
ROW_NUMBER() OVER (PARTITION BY so.number ORDER BY sol.id) sales_id_increment
FROM salesorderline sol
JOIN salesorder so ON sol.salesorderid = so.id
JOIN product p ON sol.productid = p.id
JOIN customer c ON so.customerid = c.id
WHERE so.orderdate > DATEADD(dd,-35,CAST(GETDATE() AS date));

Python MySQL SELECT WHERE with list

I have the following Python MySQL code.
cursor = mydb.cursor()
cursor.execute('SELECT id FROM table1 WHERE col1=%s AND col2=%s', (val1, val2))
ids = cursor.fetchall()
for id in ids:
cursor.execute('SELECT record_key FROM table2 WHERE id=%s limit 1', (id[0], ))
record_keys = cursor.fetchall()
print(record_keys[0][0])
How can I make this more efficient? I am using 5.5.60-MariaDB and Python 2.7.5. I have approximately 350 million entries in table1 and 15 million entries in table2.
Happily, you can do this in a single query using a LEFT JOIN.
cursor = mydb.cursor()
cursor.execute(
"SELECT t1.id, t2.record_key FROM table1 t1 "
"LEFT JOIN table2 t2 ON (t1.id = t2.id) "
"WHERE t1.col1=%s AND t2.col2=%s",
(val1, val2),
)
for id, record_key in cursor.fetchall():
pass # do something...

Efficient way to pass this variable multiple times

I'm using Pyodbc in Python to run some SQL queries. What I'm working with is actually longer than this, but this example captures what I'm trying to do:
connection = pyodbc.connect(...)
cursor = connection.cursor(...)
dte = '2018-10-24'
#note the placeholders '{}'
query = """select invoice_id
into #output
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{}'
insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{}'"""
#this is where I need help as explained below
cursor.execute(query.format(dte, dte))
output = pd.read_sql("""select *
from #output"""
, connection)
In the above, since there are only two '{}', I'm passing dte to query.format() twice. However, in the more complicated version I'm working with, I have 19 '{}', so I'd imagine this means I need to pass 'dte' to 'query.format{}' 19 times. I tried passing this as a list, but it didn't work. Do I really need to write out the variable 19 times when passing it to the function?
Consider using a UNION ALL query to avoid the temp table needs and parameterization where you set qmark placeholders and in a subsequent step bind values to them. And being the same value multiply the parameter list/tuple by needed number:
dte = '2018-10-24'
# NOTE THE QMARK PLACEHOLDERS
query = """select invoice_id
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = ?
union all
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = ?"""
output = pd.read_sql(query, connection, params=(dte,)*2)
I agree with the comments, pandas.read_sql has a params argument which prevent from sql injection.
See this post to understand how to use it depending on the database.
Pyodbc has the same parameter on the execute method.
# standard
cursor.execute("select a from tbl where b=? and c=?", (x, y))
# pyodbc extension
cursor.execute("select a from tbl where b=? and c=?", x, y)
To answer to the initial question, even if it is bad practice for building SQL queries :
Do I really need to write out the variable 19 times when passing it to the function?
Of course you don't :
query = """select invoice_id
into #output
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'
insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'""".format(**{'dte': dte})
or :
query = """select invoice_id
into #output
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{0}'
insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{0}'""".format(dte)
Python 3.6+ :
query = f"""select invoice_id
into #output
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'
insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'"""
Note the usage of f before """ ... """

SQLAlchemy select_from a single table

In trying to replicate a MySQL query in SQL Alchemy, I've hit a snag in specifying which tables to select from.
The query that works is
SELECT c.*
FROM attacks AS a INNER JOIN hosts h ON a.host_id = h.id
INNER JOIN cities c ON h.city_id = c.id
GROUP BY c.id;
I try to accomplish this in SQLAlchemy using the following function
def all_cities():
session = connection.globe.get_session()
destination_city = aliased(City, name='destination_city')
query = session.query(City). \
select_from(Attack).\
join((Host, Attack.host_id == Host.id)).\
join((destination_city, Host.city_id == destination_city.id)).\
group_by(destination_city.id)
print query
results = [result.serialize() for result in query]
session.close()
file(os.path.join(os.path.dirname(__file__), "servers.geojson"), 'a').write(geojson.feature_collection(results))
When printing the query, I end up with ALMOST the right query
SELECT
cities.id AS cities_id,
cities.country_id AS cities_country_id,
cities.province AS cities_province,
cities.latitude AS cities_latitude,
cities.longitude AS cities_longitude,
cities.name AS cities_name
FROM cities, attacks
INNER JOIN hosts ON attacks.host_id = hosts.id
INNER JOIN cities AS destination_city ON hosts.city_id = destination_city.id
GROUP BY destination_city.id
However, you will note that it is selecting from cities, attacks...
How can I get it to select only from the attacks table?
The line here :
query = session.query(City)
is querying the City table also that's why you are getting the query as
FROM cities, attacks

MySQL + python table FETCH module

name=input("input CUSTOMERID to search :")
# Prepare SQL query to view all records of a specific person from
# the SALESPRODUCTS TABLE LINKED WITH SALESPERSON TABLE.
sql = "SELECT * selling_products.customer \
FROM customer \
WHERE customer_products.CUSTOMERID == name"
# Execute the SQL command
cursor.execute(sql)
# Fetch all the rows the sql result of SQL1.
results = cursor.fetchall()
print("\n\n****** TABLE MASTERLIST*********")
print("CUSTOMERID \t PRODUCTID \t DATEOFPURCHASE")
print("**************")
for row in results:
print (row[0],row[1],row[2])
Python would compile the code above, but it will not return any output. Help would be very much appreciated :)
i think you sql should be:
sql = """SELECT * selling_products.customer
FROM customer
WHERE customer_products.CUSTOMERID == {name}""".format(name=name)

Categories

Resources