Query MySQL (SQL server) in parallel from python - python

I have collection of objects and each object I would like to populate with some data from MySQL (could be SQL Server). I would like to do in parallel fashion. When establishing connection to MySQL I have two items
connection
cursor
Hence, the question, when making calls in parallel, should I pass
connection only
cursor only
connection and cursor only
do not pass anything and let each object's method establish connection on its own.
---------------code formatting does not work without this line -------
import mysql.connector
my_collection = [My_Object(), My_Object()]
config = {'user': 'someuser', 'password': 'somepassword', 'host': 'localhost', 'port': '3306',
'database': 'somedb','raise_on_warnings': True}
connection = mysql.connector.connect(**config)
cursor = connection.cursor(dictionary=True)
#some parallel loop
for el in my_collection:
el.pull_data_from(WHAT TO PASS???)

Related

Is connect_timeout a valid URL parameter acquiring a connection via Peewee's playhouse.db_url.connect?

I'm using Peewee as ORM and connect to a Postgres database (psycopg2) using the Playhouse extension db_url.connect. My URL is a vanilla postgresql://username:pass#host:port/dbname?options=... so not using pooling or anything advanced at the moment.
Some times when I call connect it hangs for a long time and doesn't come back. So I appended to my database URL the parameter &connect_timeout=3 meaning to try for at most 3 seconds and fail-fast with a timeout rather than hanging forever. However, I am not sure whether this argument is supported by Peewee/Playhouse/Psycopg2 ... can anyone confirm?
Furthermore, where can I find all the URL parameters supported by Peewee/Playhouse/Psycopg2?
The psycopg2 doc links in turn to the libpq list of supported parameters:
https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
connect_timeout is supported by both peewee and psycopg2:
>>> from playhouse.db_url import *
>>> db = connect('postgresql://../peewee_test?connect_timeout=3')
>>> conn = db.connection()
>>> conn.get_dsn_parameters()
{'user': 'postgres',
'passfile': '...',
'channel_binding': 'prefer',
'connect_timeout': '3', # Our connect timeout
'dbname': 'peewee_test',
'host': 'localhost',
'port': '5432',
...}
Peewee passes the parameters, including arbitrary ones like connect_timeout, back to the constructor of the DB-API Connection class.

Python alternative to R RJDBC connection to SQL

I have a R code which connect to the Vertica database using RJDBC driver. The code is following:
library(RJDBC)
#for vertica save
user = "myuser"
pwd = "mypwd"
driver<- JDBC(driverClass="com.vertica.jdbc.Driver", classPath=Pathto thedriver")
connection<- dbConnect(driver, databasewithport, user, pwd)
sql_code = paste("SELECT .....")
mydata= dbGetQuery(connection, sql_code )
I am searching for a solution that helps do the same thing but using Python. I found the following link, but do not understand which example to use and what else to do. As I understood here no need to connect to the RJDBC driver. Could you help to find the solution which gives the same output as R version.
The code below works well, however, data is retrieved as one value, to get another I need to change ....cur.fetchone()[ANYNUMBER]). How can I get a data frame of the SQL code?
import vertica_python
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'some_user',
'password': 'some_password',
'database': 'vdb',
'connection_load_balance': True}
# Server enables load balancing
with vertica_python.connect(**conn_info) as conn:
cur = conn.cursor()
cur.execute("SELECT NODE_NAME FROM V_MONITOR.CURRENT_SESSION")
print("Client connects to primary node:", cur.fetchone()[0])
cur.execute("SELECT SET_LOAD_BALANCE_POLICY('ROUNDROBIN')")
First of all you will need to install the vertica-python package:
pip install vertica-python
Next, you need to create a connection and perform the query. When retrieving the query results, you can (1) load them all or (2) process them one by one.
import vertica_python
conn_info = {'host': '127.0.0.1',
'port': 5433,
'user': 'myuser',
'password': 'mypass',
'database': 'vdb',
'connection_load_balance': True}
with vertica_python.connect(**conn_info) as connection:
cur = conn.cursor()
cur.execute("SELECT NODE_NAME FROM V_MONITOR.CURRENT_SESSION")
# (1) If you want to load all the results in-memory
data = cur.fetchall()
print(data)
# (2) If you want to process one by one
for row in cur.iterate():
print(row)

How to connect to Oracle-RAC using SCAN in python?

I use cx_Oracle module to connect to standalone Oracle server as follows
import cx_Oracle
CONN_INFO = {
'host': 'xxx.xx.xxx.x',
'port': 12345,
'user': 'user_name',
'psw': 'your_password',
'service': 'abc.xyz.com',
}
CONN_STR = '{user}/{psw}#{host}:{port}/{service}'.format(**CONN_INFO)
connection = cx_Oracle.connect(CONN_STR)
but as scan IP doesn not have machine and its own username passoword, How do we connect?
Es described in the documentation, you can simple use the name defined in tnsnames.ora.
Say your RAC tnsnames entry is called MAXIMIR than you can connect with
con = cx_Oracle.connect("my_usr", "my_pwd", "MAXIMIR", encoding="UTF-8")
alternatively you may pass the whole connection string in a dns variable
dsn = """(DESCRIPTION=
(FAILOVER=on)
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=tcp)(HOST=scan1)(PORT=1521))
(ADDRESS=(PROTOCOL=tcp)(HOST=scan2)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=MAXIMIR)))"""
connection = cx_Oracle.connect("my_usr", "my_pwd", dsn, encoding="UTF-8")

How many connections a Python Django App opens by default?

I have a couple of services which query objects from the database.
Event.objects.filter
Connection.objects.filter
and other methods to retrieve different objects from MySQL database.
I come from JVM background, and I know that to setup a JDBC connection, you need a connector. A programmer can open a connection, query the database and close the connection. A programmer also can use Hibernate, which handles connection according to the configuration. Also it is possible to use pooled connections, so connections are not closed and removed, but stored in the pool untill they are needed.
However, I checked my teams Python Django code, and I did not find how db connection is configured. The only thing I got is which does not configure connections.
# Database
# https://docs.djangoproject.com/en/1.11/ref/settings/#databases
try:
import database_password
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': "mydb",
'USER': 'user',
'PASSWORD': database_password.password,
'HOST': '10.138.67.149',
'PORT': '3306'
}
}
Each thread maintains its own connection. See the docs for full details.
PostgreSQL + PgBouncer (connection pooler) + Django is a common setup. I'm not sure whether there's a similar connection pooler you could use with MySQL.

Python: decryption failed or bad record mac when calling from Thread

I'm getting a "decryption failed or bad record mac" error in this code-fragment:
conn = psycopg2.connect(...)
cursor = conn.cursor()
cursor.execute("SELECT id, ip FROM schema.table;")
rows = cursor.fetchall()
cursor.close()
conn.commit()
conn.close()
This is called in the run() method of a Thread, several times in a while(True) loop.
I'm just opening a connection to my PostgreSQL database using the psycopg2 driver.
Any idea of how safe is opening db connections into Threads in Python?
I don't know what is raising this error.
Ok, looks like I've fixed the problem. I was creating too many connections and seems I was running out of memory or something.
I gathered all the queries and do cursor.execute(...) once with a huge query, instead performing hundreds of small queries/connections.
conn = psycopg2.connect(...)
cursor = conn.cursor()
cursor.execute("SELECT id, ip FROM schema.table;")
rows = cursor.fetchall()
cursor.close()
conn.commit()
conn.close()
conn = None
The cause of this issue could be, there were too many processes(multi) were trying to access PostGres, it was not able to handle that. I was using Django & PostGres in BeanStalk.
Adding 'OPTIONS': {'sslmode': 'disable'} in the database config helped.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': ...
'USER': ....
'PASSWORD': ...
'HOST': ....
'PORT': '5432',
'OPTIONS': {
'sslmode': 'disable',
}
}
}

Categories

Resources