Multiple Python scripts running with their own MSAccess database - conflicts - python

I have a script that creates an access db and populates it with data and queries and performs a compact and repair.
The script gets called via command line via a .bat file, and i need multiple of these scripts running concurrently.
I'm getting an error where basically the it thinks the current database is the database from the other script that concurrently running.
i think i either need to update the code so that it creates a separate instance for each script (which i think it already is) or update it so that it doesn't need to use the OpenCurrentDatabase() method, but dont know what alternative i have? can't get answers via google
if bool(config.query_create_dict):
logging.info("Creating and Executing Queries")
try:
oApp = win32com.client.Dispatch("Access.Application")
# CREATE QUERIES
oApp.OpenCurrentDatabase(config.access_db_filepath)
currentdb = oApp.CurrentDb()
for order_id, query_dict in config.query_create_dict.items():
name = query_dict["Name"]
sql = query_dict["SQL"]
# replace #valuation_date, with month end
sql = sql.replace("#valuation_date", "{}".format(config.valuation_date.strftime("%Y-%m-%d")))
logging.info("Creating query: {}".format(name))
currentdb.CreateQueryDef(name, sql)
# EXECUTE QUERIES
for order_id, name in config.query_execute_dict.items():
logging.info("Running query: {}".format(name))
currentdb.Execute(name)
currentdb = None
oApp.DoCmd.CloseDatabase
except Exception as e:
logging.error(e)
raise e
finally:
currentdb = None
oApp.Quit
oApp = None

Related

Accessing SQLite DB /w python and getting malformed DBs

I have some python code that copies a SQLite db across sftp. However, it is a highly active db, so many of the times I am running into a malformed db. I'm thinking of these possible options, but I don't know how to implement them because I am newer to python.
Alternate method of getting the sqlite db copied?
Maybe there is a way to query the sqlite file from the device? Not sure if that would work since sqlite is more of a local db not sure how I can query it like I could w mysql etc...
Create a loop? I could call the function again in the exception, but not sure how to retry the rest of the code.
Also, the malformed db issue can possibly occur in other sections im thinking? Maybe I need to run a pragma quick_check?
This is commonly what I am seeing.... The other catch is why am I seeing it as often as I am? Because if I load the sqlite file from my main machine, and it runs the query files?
(venv) dulanic#mediaserver:/opt/python_scripts/rpi$ cd /opt/python_scripts/rpi ; /usr/bin/env /opt/python_scripts/rpi/venv/bin/python /home/dulanic/.vscode-server/extensions/ms-python.python-2021.2.636928669/pythonFiles/lib/python/debugpy/launcher 37599 -- /opt/python_scripts/rpi/rpdb.py
An error occurred: database disk image is malformed
This is my current code:
#!/usr/bin/env python3
import psycopg2, sqlite3, sys, paramiko, sys, os, socket, time
scpuser=os.getenv('scpuser')
scppw = os.getenv('scppw')
sqdb = os.getenv('sqdb')
sqlike = os.getenv('sqlike')
pgdb = os.getenv('pgdb')
pguser = os.getenv('pguser')
pgpswd = os.getenv('pgpswd')
pghost = os.getenv('pghost')
pgport = os.getenv('pgport')
pgschema = os.getenv('pgschema')
database = r"./pihole.db"
pihole = socket.gethostbyname('pi.hole')
tabnames=[]
tabgrab = ''
def pullsqlite():
sftp.get('/etc/pihole/pihole-FTL.db','pihole.db')
sftp.close()
# SFTP pull config
ssh_client=paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh_client.connect(hostname=pihole,username=scpuser,password=scppw)
sftp=ssh_client.open_sftp()
# Pull SQlite
pullsqlite()
# Load sqlite tables to list
consq=sqlite3.connect(sqdb)
cursq=consq.cursor()
cursq.execute(f"SELECT name FROM sqlite_master WHERE type='table' AND name in ({sqlike})" )
tabgrab = cursq.fetchall()
# postgres connection
conpg = psycopg2.connect(database=pgdb, user=pguser, password=pgpswd,
host=pghost, port=pgport)
#Load data to postgres from sqlite
for item in tabgrab:
tabnames.append(item[0])
start = time.perf_counter()
for table in tabnames:
curpg = conpg.cursor()
if table=='queries':
curpg.execute(f"SELECT max(id) FROM {table};")
max_id = curpg.fetchone()[0]
cursq.execute(f"SELECT * FROM {table} where id > {max_id};")
else:
cursq.execute(f"SELECT * FROM {table};")
try:
rows=cursq.fetchall()
except sqlite3.Error as e:
print("An error occurred:", e.args[0])
colcount=len(rows[0])
pholder=('%s,'*colcount)[:-1]
try:
curpg.execute(f"SET search_path TO {pgschema};" )
curpg.executemany(f"INSERT INTO {table} VALUES ({pholder}) ON CONFLICT DO NOTHING;" ,rows)
conpg.commit()
print(f'Inserted {len(rows)} rows into {table}')
except psycopg2.DatabaseError as e:
print (f'Error {e}')
sys.exit(1)
if 'start' in locals():
elapsed = time.perf_counter() - start
print(f'Time {elapsed:0.4}')
consq.close()

Python rq: handle success or failure of jobs

I have a fairly basic (so far) queue set up in my app:
Job 1 (backup): back up the SQL table I'm about to replace
Job 2 (update): do the actual table drop/update
very simplified code:
from rq import Queue
from rq.decorators import job
#job('backup')
def backup(db, table, conn_str):
backup_sql = "SELECT * INTO {}.dbo.{}_backup from {}.dbo.{}".format(db, table, db, collection)
#job('update')
def update(db, table, conn_str, keys, data):
truncate_sql = "TRUNCATE TABLE {}.dbo.{}".format(db, collection)
sql_cursor.execute(truncate_sql)
for sql_row in data:
sql = "INSERT INTO {}.dbo.{} ({}) values ({})".format(db, table, ",".join(keys), ",".join(["?"] * len(sql_row)))
sql_cursor.execute(sql, sql_row)
sql_cursor.commit()
def update_data():
...
update_queue = Queue('update', connection=redis_conn)
backup_job = update_queue.enqueue('backup', db, table, conn_str, result_ttl=current_app.config['RESULT_TTL'],)
update_job = update_queue.enqueue('update', db, table, conn_str, result_ttl=current_app.config['RESULT_TTL'],)
What I'd like to do, is find a way to watch the update. If it fails, I want to run a job to restore the backup created in the backup job. If it's successful, I want to run a different job to simply remove the backup.
What's the right way to go about this? I'm pretty new to rq and am looking around in the docs, but haven't found either a way to poll update for success/failure or an idiomatic way to handle either outcome.
One option is to create another third job called "checker" for example, which will decide what to do based on the status of "update" job. For that, you have to specify a dependency relationship.
depends_on specifies another job (or job id) that must complete before
this job will be queued.
def checker(*args, **kwargs):
pass
checker_job = update_queue.enqueue('checker', *args, depends_on=update_job.id, result_ttl=current_app.config['RESULT_TTL'])
Then check the status of the dependency inside of "checker" and based on that status load backup or delete it.
def checker(*args, **kwargs):
job = rq.get_current_job()
update_job = job.dependency
if update_job.status == 'failed':
# do the stuff here
else: # or elif
# do the stuff here

Google BigQuery - python client - creating/managing jobs

I'm new to the BigQuery world... I'm using the python google.cloud package and I need simply to run a query from Python on a BigQuery table and print the results. This is the part of the query function which creates a query job.
function test():
query = "SELECT * FROM " + dataset_name + '.' + table_name
job = bigquery_client.run_async_query('test-job', query)
job.begin()
retry_count = 100
while retry_count > 0 and job.state != 'DONE':
retry_count -= 1
sleep(10)
job.reload() # API call
print(job.state)
print(job.ended)
If I run the test() function multiple times, I get the error:
google.api.core.exceptions.Conflict: 409 POST https://www.googleapis.com/bigquery/v2/projects/myprocject/jobs:
Already Exists: Job myprocject:test-job
Since I have to run the test() function multiple times, do I have to delete the job named 'test-job' each time or do I have to assign a new job-name (e.g. a random one or datetime-based) each time?
do I have to delete the job named 'test-job' each time
You cannot delete job. Jobs collection stores your project's complete job history, but availability is only guaranteed for jobs created in the past six months. The best you can do is to request automatic deletion of jobs that are more than 50 days old for which you should contact support.
or do I have to assign a new job-name (e.g. a random one or datetime-based) each time?
Yes. This is the way to go
As a side recommendation, we usually do it like:
import uuid
job_name = str(uuid.uuid4())
job = bigquery_client.run_async_query(job_name, query)
Notice this is already automatic if you run a synced query.
Also, you don't have to manage the validation for job completeness (as of version 0.27.0), if you want you can use it like:
job = bigquery_client.run_async_query(job_name, query)
job_result = job.result()
query_result = job_result.query_results()
data = list(query_result.fetch_data())

How to generate a RethinkDB changefeed for all tables in a database

I'm testing an API which inserts or deletes data in multiple tables of a RethinkDB database. In order to monitor what is happening to the database while using the API, I would like to print the changes in all its tables.
Here is some 'pseudo-code' of what I'm trying to achieve:
import rethinkdb as r
# Prior to running this script, run "rethinkdb --port-offset 1" at the command line
conn = r.connect('localhost', 28016)
if 'test' in r.db_list().run(conn):
r.db_drop('test').run(conn)
r.db_create('test').run(conn)
r.table_create('table1').run(conn)
r.table_create('table2').run(conn)
feed = r.table('table1' and 'table2').changes().run(conn)
for document in feed:
print document
Prior to running this script, I would run rethinkdb --port-offset 1 to initialize the RethinkDB database.
Once this script is running, I'd like to insert data into either table1 or table2 (using, for example, the web UI at localhost:8081) and see the changes printed in the terminal running the script. This appears not to work, however,
because r.table('table1' and 'table2') is probably not a valid ReQL query.
How can I monitor changes in both tables?
You can follow multiple changefeeds in a single query using r.union:
r.union(
r.table('table1').changes(),
r.table('table2').changes()
).run(conn)
I ended up running the changefeeds for each table in a separate thread:
import rethinkdb as r
import threading
# Prior to running this script, run "rethinkdb --port-offset 1" at the command line
conn = r.connect('localhost', 28016)
def clear_test_database():
'''Clear the contents of the "test" database by dropping and re-creating it.'''
if 'test' in r.db_list().run(conn):
r.db_drop('test').run(conn)
r.db_create('test').run(conn)
clear_test_database()
def monitor_changes(table_name, conn):
feed = r.table(table_name).changes().run(conn)
for document in feed:
print document
tables = ['table1', 'table2']
for table in tables:
conn = r.connect('localhost', 28016)
r.table_create(table).run(conn)
thread = threading.Thread(target=monitor_changes, args=(table, conn))
thread.start()
Note that I re-define the conn connection object within the for-loop as these objects are not thread-safe.
To test the method, I opened the web UI at localhost:8081 and used the following insert command:
In the Sublime runner I see the changes being added every time I press the "Run" button:
This works both when I choose table1 or table2 in the insert command.

Close SQLAlchemy connection

I have the following function in python:
def add_odm_object(obj, table_name, primary_key, unique_column):
db = create_engine('mysql+pymysql://root:#127.0.0.1/mydb')
metadata = MetaData(db)
t = Table(table_name, metadata, autoload=True)
s = t.select(t.c[unique_column] == obj[unique_column])
rs = s.execute()
r = rs.fetchone()
if not r:
i = t.insert()
i_res = i.execute(obj)
v_id = i_res.inserted_primary_key[0]
return v_id
else:
return r[primary_key]
This function looks if the object obj is in the database, and if it is not found, it saves it to the DB. Now, I have a problem. I call the above function in a loop many times. And after few hundred times, I get an error: user root has exceeded the max_user_connections resource (current value: 30) I tried to search for answers and for example the question: How to close sqlalchemy connection in MySQL recommends creating a conn = db.connect() object where dbis the engine and calling conn.close() after my query is completed.
But, where should I open and close the connection in my code? I am not working with the connection directly, but I'm using the Table() and MetaData functions in my code.
The engine is an expensive-to-create factory for database connections. Your application should call create_engine() exactly once per database server.
Similarly, the MetaData and Table objects describe a fixed schema object within a known database. These are also configurational constructs that in most cases are created once, just like classes, in a module.
In this case, your function seems to want to load up tables dynamically, which is fine; the MetaData object acts as a registry, which has the convenience feature that it will give you back an existing table if it already exists.
Within a Python function and especially within a loop, for best performance you typically want to refer to a single database connection only.
Taking these things into account, your module might look like:
# module level variable. can be initialized later,
# but generally just want to create this once.
db = create_engine('mysql+pymysql://root:#127.0.0.1/mydb')
# module level MetaData collection.
metadata = MetaData()
def add_odm_object(obj, table_name, primary_key, unique_column):
with db.begin() as connection:
# will load table_name exactly once, then store it persistently
# within the above MetaData
t = Table(table_name, metadata, autoload=True, autoload_with=conn)
s = t.select(t.c[unique_column] == obj[unique_column])
rs = connection.execute(s)
r = rs.fetchone()
if not r:
i_res = connection.execute(t.insert(), some_col=obj)
v_id = i_res.inserted_primary_key[0]
return v_id
else:
return r[primary_key]

Categories

Resources