Possible threading issue with pyodbc - python

I have an existing system where an oracle database is populated with metadata by a series of Python files. There are around 500, and the current method of running them one at a time is taking around an hour to complete.
To cut down on this runtime, I've tried threading the individual files, running them in concurrently, but I've been getting a the error
pyodbc.IntegrityError: ('23000', '[23000] [Oracle][ODBC][Ora]ORA-00001: unique constraint (DB.PK_TABLE_NAME) violated\n (1) (SQLExecDirectW)')
with a traceback to the following call:
File "C:\file.py", line 295, in ExecuteSql
cursor.execute(Sql)
Can anyone shed any light on this for me by any chance? This doesn't seem to be happening if a file which has thrown the error is then run individually, which leads me to suspect this is an access issue where two files are trying to write to the DB at once. I hope this is not the case, as that will likely veto this approach entirely.

I eventually realised that the issue was coming from the way that the SQL submitted to the database was being constructed.
The ID for the table was being generated by a "GetNext()" function, which got the current max ID from the table and incremented it by one. This was failing when multiple files were being run - and trying to use the same ID based on this generated - at the same time.

Related

Automate a manual task using Python

I have a question and hope someone can direct me in the right direction; Basically every week I have to run a query (SSMS) to get a table containing some information (date, clientnumber, clientID, orderid etc) and then I copy all the information and that table and past it in a folder as a CSV file. it takes me about 15 min to do all this but I am just thinking can I automate this, if yes how can I do that and also can I schedule it so it can run by itself every week. I believe we live in a technological era and this should be done without human input; so I hope I can find someone here willing to show me how to do it using Python.
Many thanks for considering my request.
This should be pretty simple to automate:
Use some database adapter which can work with your database, for MSSQL the one delivered by pyodbc will be fine,
Within the script, connect to the database, perform the query, parse an output,
Save parsed output to a .csv file (you can use csv Python module),
Run the script as the periodic task using cron/schtask if you work on Linux/Windows respectively.
Please note that your question is too broad, and shows no research effort.
You will find that Python can do the tasks you desire.
There are many different ways to interact with SQL servers, depending on your implementation. I suggest you learn Python+SQL using the built-in sqlite3 library. You will want to save your query as a string, and pass it into an SQL connection manager of your choice; this depends on your server setup, there are many different SQL packages for Python.
You can use pandas for parsing the data, and saving it to a ~.csv file (literally called to_csv).
Python does have many libraries for scheduling tasks, but I suggest you hold off for a while. Develop your code in a way that it can be run manually, which will still be much faster/easier than without Python. Once you know your code works, you can easily implement a scheduler. The downside is that your program will always need to be running, and you will need to keep checking to see if it is running. Personally, I would keep it restricted to manually running the script; you could compile to an ~.exe and bind to a hotkey if you need the accessibility.

Importing huge .sql script file (30GB) with only inserts

I need to import some SQL scripts generated in SSMS (generate scripts). These scripts only contain INSERTS.
So far I managed to import almost everything using DbUp (https://dbup.readthedocs.io/en/latest/). My problem is with larger files, in this case I have 2, one 2GB and one 30GB.
The 2GB I imported using BigSqlRunner (https://github.com/kevinly1989/BigSqlRunner)
The 30GB one I've tried everything (PowerShell, split, etc..) and I'm not succeeding, it always gives a memory error and I can't find anything to help me split the file into multiples small files...
I'm asking for your help if you know of a better way or a solution for this.
The goal is to migrate data from one database (PRODUCTION) to the another empty database (PRODUCTION but not used) and I am doing it through generate scripts (SSMS) and then execute the scripts on the target database (for security due to it being production and I don't want to be reading it line by line while importing the data to the target database).
I am open to other solutions that may exist such as SSIS (SQL Server Integration Services), Python, PowerShell, C#, etc... but I have to be careful not to impact the production database when reading the data from the tables.
To update I managed to solve it through SSIS, I made a source and a destination connection and used the source query with WITH(NOLOCK), it's running and already half way through. It's not the fastest but it's working. Thank you all for your help.

Calling sql server stored procedure using python pymssql

I am using pymssql for executing ms sql stored procedure from python. When I try to execute a stored procedure it seems not getting executed. The code gets completed without any error but upon verifying I can see the procedure was not really executed. What baffles me is that usual queries like select and similar ones are working. What might be missing here? I have tried the below two ways. The stored procedure does not have any parameters or arguments.
cursor.execute("""exec procedurename""")
and
cursor.callproc('procedurename',())
EDIT: The procedure loads a table with some latest data. When I execute the proc from local, it loads the table with latest data but I can see the latest data is not being loaded when done from python using pymssql.
Thanks to AlwaysLearning to provide the crucial clue for fixing the issue, I have added connection.commit() after the procedure call and it started working!

Impala open connection in python

I'm after a way of querying Impala through Python which enables you to keep a connection open and pass queries to it.
I can connect quite happily to Impala using this sort of code:
import subprocess
sql = 'some sort of sql statement;'
cmds = ['impala-shell','-k','-B','-i','impala.company.corp','-q', sql]
out,err = subprocess.Popen(cmds, stderr=subprocess.PIPE, stdout=subprocess.PIPE).communicate()
print(out.decode())
print(err.decode())
I can also switch out the -q and sql for -f and a file with sql statements as per the documentation here.
When I'm running this for multiple sql statements the name node it uses is the same for all the queries and it it will stop if there is a failure in the code (unless I use the option to continue), this is all expected.
What I'm trying to get to is where I can run a query or two, check the results using some python logic and then continue if it meets my criteria.
I have tried splitting up my code into individual queries using sqlparse and running them one by one. This works well in isolation but if one statement is a drop table if exists x; and the next one then goes create table x (blah string); then if x did actually exist then because the second statement will run on a different node the dropping metadata change hasn't reached that one yet and it fails with table x already exists or similar error.
I'd think as well as getting round this metadata issue it would just make more sense to keep a connection open to impala whilst I run all the statements but I'm struggling to work this out.
Does anyone have any code that has this functionality?
You may wanna look at impyla, the Impala/Hive python client, if you haven't done so already.
As far as the second part of your question, using Impala's SYNC_DDL option will guarantee that DDL changes are propagated across impalads before next DDL is executed.

Lock and unlock database access - database is locked

I have three programs running, one of which iterates over a table in my database non-stop (over and over again in a loop), just reading from it, using a SELECT statement.
The other programs have a line where they insert a row into the table and a line where they delete it. The problem is, that I often get an error sqlite3.OperationalError: database is locked.
I'm trying to find a solution but I don't understand the exact source of the problem (is reading and writing in the same time what make this error occur? or the writing and deleting? maybe both aren't supposed to work).
Either way, I'm looking for a solution. If it were a single program, I could match the database I/O with mutexes and other multithreading tools, but it's not. How can I wait until the database is unlocked for reading/writing/deleting without using too much CPU?
you need to switch databases..
I would use the following:
postgresql as my database
psycopg2 as the driver
the syntax is fairly similar to SQLite and the migration shouldn't be too hard for you

Categories

Resources