Use python function in a MySQL query - python

my workflow is
Extract a CSV file from MySQL database--> open the CSV on Python---> filter the necessary information based on a Python function.
However, I am started to deal with datasets that don't fit on memory. It is also inconvenient to have to import and filter over and over again.
My question is: Is there a way to apply a Python function in a MySQL database? I mean in a way that I only download from MySQL the values that attend my filter based on a Python function.
Note: I use Datagrip.

which python function to apply in MySQL database? Try to apply tests of some libraries! Which one do you like? Tests with a query like "SELECT 1"
import time
def query_1k(cur):
t = time.time()
for _ in range(1000):
cur.execute("SELECT 1,2,3,4,5")
try:
res = cur.fetchall()
assert len(res) == 1
assert res[0] == (1,2,3,4,5)
except:
pass
return time.time() - t
# pip search mysql-connector | grep --color mysql-connector-python
# pip install mysql-connector-python-rf
def mysql_connector_python():
import mysql.connector
conn = mysql.connector.connect(user='root', host='localhost')
print("MySQL Connector/Python:", query_1k(conn.cursor()), "[sec]")
# pip install mysqlclient
# pip3 install mysqlclient
# sudo yum install gcc
def mysqlclient():
pass
try:
import pymysql
pymysql.install_as_MySQLdb()
except ImportError:
pass
import MySQLdb
conn = MySQLdb.connect(user='root', host='localhost')
print("MySQLdb mysqlclient:", query_1k(conn.cursor()), "[sec]")
def pymysql():
import pymysql
conn = pymysql.connect(user='root', host='localhost')
print("PyMySQL:", query_1k(conn.cursor()), "[sec]")
def msqlchemy():
from sqlalchemy import create_engine
conn = create_engine('sqlite:///:memory:')
print("sqlalchemy mysqlclient:", query_1k(conn), "[sec]")
def pewee():
# from pewee import *
user = 'root'
password = '1234'
db_name = 'information_schema'
conn = MySQLDatabase(
db_name, user=user,
password=password,
host='localhost'
)
print("pewee:", query_1k(conn.cursor()), "[sec]")
'''
Warning: (3090, u"Changing sql mode 'NO_AUTO_CREATE_USER' is deprecated.
SET SESSION sql_mode="NO_ENGINE_SUBSTITUTION,NO_AUTO_CREATE_USER";
'''
for _ in range(10): # for PyPy warmup
print('-------------')
mysql_connector_python()
mysqlclient()
msqlchemy()
pymysql()
# pewee()

Related

Python substitute variable in SQL query

I am using Python3 and connecting to Impala DB using impala package as below:
#!/usr/bin/python3
import pandas as pd
from impala.dbapi import connect
from impala.util import as_pandas
import sys
def pull_from_dw(dw_conn, qry,qryparams):
cur = dw_conn.cursor()
if (qryparams is None):
cur.execute(qry)
else:
cur.execute(qry,qryparams)
custdata=as_pandas(cur)
return custdata
x = sys.argv[1]
query_str="select * from <table_name> where <column_name> = '{}';"
print(query_str)
dw_conn = connect(host='10.xxx.xx.xx', port=21050, use_ssl=True,
user='<username>',
password='<password>',
auth_mechanism='LDAP')
df = pull_from_dw(dw_conn,query_str,x)
print(df)
I can substitute directly by specifying .format(x) in the sql query. However, I need to variable substitution in the calling function df = pull_from_dw(dw_conn,query_str,x) and getting error as below. Please assist:
$ /usr/bin/python3 script1.py 'abc'
impala.error.ProgrammingError: Query parameters argument should be a
list, tuple, or dict object
You are passing a string to qryparams while it takes a list tuple or dictionary. pass it the following and It should fix your issue:
pull_from_dw(dw_conn, qry,[x])

Executing queries in console script

I'm struggling to figure out why the session I'm getting after pyramid's bootstrap is refusing to execute queries, raising the transaction.interfaces.NoTransaction exception.
I'm trying to create a script using the pyramid configuration, but working on a background task. I'm using the bootstrap function to get the environment in place. One of the approaches I tried was:
from pyramid.plaster import bootstrap
with bootstrap(sys.argv[1]) as env
dbsession = env['request'].dbsession
with dbsession.begin_nested():
res = dbsession.execute('''SELECT ....''')
...
That creates a SessionTransaction as expected, but still raises a NoTransaction.
How can I initialise the connection, so I can access it as I normally do in the views?
As described in https://github.com/Pylons/pyramid/issues/3219 the transaction is not initialised by default. It can be done using:
with bootstrap(sys.argv[1]) as env:
with env['request'].tm:
dbsession = env['request'].dbsession
dbsession.execute(...)
I've never used pyramid.plaster.bootstrap. However, you could use the same template as the script that is auto generated when you create a new project using the alchemy template.
pcreate -t alchemy myproject
The script looks like this:
import os
import sys
import transaction
from pyramid.paster import (
get_appsettings,
setup_logging,
)
from pyramid.scripts.common import parse_vars
from ..models.meta import Base
from ..models import (
get_engine,
get_session_factory,
get_tm_session,
)
from ..models import MyModel
def usage(argv):
cmd = os.path.basename(argv[0])
print('usage: %s <config_uri> [var=value]\n'
'(example: "%s development.ini")' % (cmd, cmd))
sys.exit(1)
def main(argv=sys.argv):
if len(argv) < 2:
usage(argv)
config_uri = argv[1]
options = parse_vars(argv[2:])
setup_logging(config_uri)
settings = get_appsettings(config_uri, options=options)
engine = get_engine(settings)
Base.metadata.create_all(engine)
session_factory = get_session_factory(engine)
with transaction.manager:
dbsession = get_tm_session(session_factory, transaction.manager)
model = MyModel(name='one', value=1)
dbsession.add(model)
And the entrypoints in setup.py looks like this:
entry_points="""\
[paste.app_factory]
main = myproject:main
[console_scripts]
initialize_myproject_db = myproject.scripts.initializedb:main
""",

py.test - How to inherit other tests

So let's say that I have two files (test_file1.py, test_file2.py) for integration testing using py.test.
The test_file1.py is something like this:
import datetime
import pytest
Datetime = datetime.datetime.now()
def test_connect():
#1st Query to a mysql database
#2nd Query to a mysql database
..
#N Query to a mysql database
Now I'm writing the test_file2.py which is an extention of test_file1.py but I don't want to write the same mysql queries that I wrote in the above test.
How can I make py.test to inherit the above test and run both after executing py.test test_file2.py?
Something like this (test_file2.py Contents):
import datetime
import pytest
from testDirectory import test_file1
Datetime = datetime.datetime.now()
def test_connect():
#Here should run all the tests from 'test_file1' somehow...
#1st new additional Query to a mysql database
#2nd new additional Query to a mysql database
..
#N new additional Query to a mysql database
Thanks!!
When you import a module, it will execute all of the code inside it. So just write the code you want executed in your original file. For example add the call to the function in your file like this:
test_file1.py:
import datetime
import pytest
Datetime = datetime.datetime.now()
def test_connect():
#1st Query to a mysql database
#2nd Query to a mysql database
..
#N Query to a mysql database
test_connect() # This will run your function when you import
So then in your py.test when you call import test_file1, it will execute the test_connect() and any other code you would like without doing anything else.
In other words, here is a really simple example with 3 files:
File 1: hello_world.py:
def hello_world():
print('hello world!')
hello_world()
File 2: print_text.py:
def print_text():
print('foo bar baz')
print_text()
File 3: run_everything.py:
import hello_world
import print_text
Result when you run run_everything.py:
>>>hello world!
>>>foo bar baz
If you want the function to be executed when the file is executed directly, but not imported as a module, you can do this:
test_file1.py:
import datetime
import pytest
Datetime = datetime.datetime.now()
def test_connect():
#1st Query to a mysql database
#2nd Query to a mysql database
..
#N Query to a mysql database
def main():
# This will _not_ run your function when you import. You would
# have to use test_file1.test_connect() in your py.test.
test_connect()
if __name__ == '__main__':
main()
So in this example, your py.test would be:
import test_file1
test_file1.test_connect()
First one create a fixture in conftest.py:
import pytest
import MySQLdb
def db_cursor(request):
db = MySQLdb.connect(host="localhost", user="root")
cursor = db.cursor()
cursor.execute("SELECT USER()")
data = cursor.fetchone()
assert 'root#localhost' in data
yield cursor
db.close()
Then use it in your test modules:
# test_file1.py
def test_a(db_cursor)
pass
# test_file2.py
def test_b(db_cursor)
res = db_cursor.execute("SELECT VERSION()")
assert '5.5' in res.fetchone()
P.S.
It possible to use any other modules, just inject they are into your tests with pytest_plugins directive:
# conftest.py
pytest_plugins = '_mysql.cursor'
# _mysql/__init__.py
# _mysql/cursor.py
import pytest
import MySQLdb
def db_cursor(request):
db = MySQLdb.connect(host="localhost", user="root")
cursor = db.cursor()
cursor.execute("SELECT USER()")
data = cursor.fetchone()
assert 'root#localhost' in data
yield cursor
db.close()

Shebang line #!/usr/bin/python3 preventing server run

Here's the code I have. Basically I have the Shebang line in there because the psycopg2 wasn't working without it.
But now when I have this line in there it doesn't allow me to run the database, it just says "no module named 'flask'"
#!/usr/bin/python3.4
#
# Small script to show PostgreSQL and Pyscopg together
#
from flask import Flask, render_template
from flask import request
from flask import *
from datetime import datetime
from functools import wraps
import time
import csv
import psycopg2
app = Flask(__name__)
app.secret_key ='lukey'
def getConn():
connStr=("dbname='test' user='lukey' password='lukey'")
conn=psycopg2.connect(connStr)
return conn
#app.route('/')
def home():
return render_template(index.html)
#app.route('/displayStudent', methods =['GET'])
def displayStudent():
residence = request.args['residence']
try:
conn = None
conn = getConn()
cur = conn.cursor()
cur.execute('SET search_path to public')
cur.execute('SELECT stu_id,student.name,course.name,home_town FROM student,\
course WHERE course = course_id AND student.residence = %s',[residence])
rows = cur.fetchall()
if rows:
return render_template('stu.html', rows = rows, residence = residence)
else:
return render_template('index.html', msg1='no data found')
except Exception as e:
return render_template('index.html', msg1='No data found', error1 = e)
finally:
if conn:
conn.close()
##app.route('/addStudent, methods =['GET','POST']')
#def addStudent():
if __name__ == '__main__':
app.run(debug = True)
This is an environment problem, not a flask, postgres or shebang problem. A specific version of Python is being called, and it is not being given the correct path to its libraries.
Depending on what platform you are on, changing you shebang to #! /usr/bin/env python3 can fix the problem, but if not (very likely not, though using env is considered better/portable practice these days), then you may need to add your Python3 libs location manually in your code.
sys.path.append("/path/to/your/python/libs")
If you know where your Python libs are (or maybe flask is installed somewhere peculiar?) then you can add that to the path and imports following the line where you added to the path will include it in their search for modules.

python, accessing a psycopg2 form a def?

i'm trying to make a group of defs in one file so then i just can import them whenever i want to make a script in python
i have tried this:
def get_dblink( dbstring):
"""
Return a database cnx.
"""
global psycopg2
try
cnx = psycopg2.connect( dbstring)
except Exception, e:
print "Unable to connect to DB. Error [%s]" % ( e,)
exit( )
but i get this error: global name 'psycopg2' is not defined
in my main file script.py
i have:
import psycopg2, psycopg2.extras
from misc_defs import *
hostname = '192.168.10.36'
database = 'test'
username = 'test'
password = 'test'
dbstring = "host='%s' dbname='%s' user='%s' password='%s'" % ( hostname, database, username, password)
cnx = get_dblink( dbstring)
can anyone give me a hand?
You just need to import psycopg2 in your first snippet.
If you need to there's no problem to 'also' import it in the second snippet (Python makes sure the modules are only imported once). Trying to use globals for this is bad practice.
So: at the top of every module, import every module which is used within that particular module.
Also: note that from x import * (with wildcards) is generally frowned upon: it clutters your namespace and makes your code less explicit.

Categories

Resources