I am creating a program for analyzing and generating queries. I was curious if there currently exists a method within SQLite such that I could query the time taken for a query to process? I am unable to modify my install in any way, so this method needs to work out of the box. I am writing my tool in python, and although I guess I could use the timer class to time execution -this method will not work when I am connecting to remote machines (and return a consistent timing.)
From within the sqlite3 command-line program you can do:
.timer ON
select * from my_table;
This will print the CPU time taken for the query.
Related
Currently, the approach I take is,
clearing the rows in the table using python,
fetching the output of the view , in python and storing the result in a df
appending the data to the table using df.to_Sql in python.
Scheduling this script to be run every day at a specified time( prefect ).
I find this method, unappealing is because of the following reasons:
This method external , hence it involves latency.
This method is subject to various dependencies, like the sql connector that I am using for python, the scheduler like prefect, where debugging can get tricky If I have more than 10 tables..
Is there a better way/ package / tool to automate the process with least dependencies and latency ?
Have you tried Prefect 2 already? Regarding the load process, you may consider loading data to a temp table and merging from there -- by doing that in SQL, it might be faster and easier to troubleshoot. dbt is also a tool you can consider, and you can orchestrate dbt with prefect using the prefect-dbt package: https://github.com/PrefectHQ/prefect-dbt
I have a question and hope someone can direct me in the right direction; Basically every week I have to run a query (SSMS) to get a table containing some information (date, clientnumber, clientID, orderid etc) and then I copy all the information and that table and past it in a folder as a CSV file. it takes me about 15 min to do all this but I am just thinking can I automate this, if yes how can I do that and also can I schedule it so it can run by itself every week. I believe we live in a technological era and this should be done without human input; so I hope I can find someone here willing to show me how to do it using Python.
Many thanks for considering my request.
This should be pretty simple to automate:
Use some database adapter which can work with your database, for MSSQL the one delivered by pyodbc will be fine,
Within the script, connect to the database, perform the query, parse an output,
Save parsed output to a .csv file (you can use csv Python module),
Run the script as the periodic task using cron/schtask if you work on Linux/Windows respectively.
Please note that your question is too broad, and shows no research effort.
You will find that Python can do the tasks you desire.
There are many different ways to interact with SQL servers, depending on your implementation. I suggest you learn Python+SQL using the built-in sqlite3 library. You will want to save your query as a string, and pass it into an SQL connection manager of your choice; this depends on your server setup, there are many different SQL packages for Python.
You can use pandas for parsing the data, and saving it to a ~.csv file (literally called to_csv).
Python does have many libraries for scheduling tasks, but I suggest you hold off for a while. Develop your code in a way that it can be run manually, which will still be much faster/easier than without Python. Once you know your code works, you can easily implement a scheduler. The downside is that your program will always need to be running, and you will need to keep checking to see if it is running. Personally, I would keep it restricted to manually running the script; you could compile to an ~.exe and bind to a hotkey if you need the accessibility.
I'm after a way of querying Impala through Python which enables you to keep a connection open and pass queries to it.
I can connect quite happily to Impala using this sort of code:
import subprocess
sql = 'some sort of sql statement;'
cmds = ['impala-shell','-k','-B','-i','impala.company.corp','-q', sql]
out,err = subprocess.Popen(cmds, stderr=subprocess.PIPE, stdout=subprocess.PIPE).communicate()
print(out.decode())
print(err.decode())
I can also switch out the -q and sql for -f and a file with sql statements as per the documentation here.
When I'm running this for multiple sql statements the name node it uses is the same for all the queries and it it will stop if there is a failure in the code (unless I use the option to continue), this is all expected.
What I'm trying to get to is where I can run a query or two, check the results using some python logic and then continue if it meets my criteria.
I have tried splitting up my code into individual queries using sqlparse and running them one by one. This works well in isolation but if one statement is a drop table if exists x; and the next one then goes create table x (blah string); then if x did actually exist then because the second statement will run on a different node the dropping metadata change hasn't reached that one yet and it fails with table x already exists or similar error.
I'd think as well as getting round this metadata issue it would just make more sense to keep a connection open to impala whilst I run all the statements but I'm struggling to work this out.
Does anyone have any code that has this functionality?
You may wanna look at impyla, the Impala/Hive python client, if you haven't done so already.
As far as the second part of your question, using Impala's SYNC_DDL option will guarantee that DDL changes are propagated across impalads before next DDL is executed.
I'm connecting to MySQL with the MySQLdb module. I don't want to use Python's time functions: I want to know how long the query ran within MySQL, i.e. the number I see after I've run a query within MySQL directly.
I do see a thread where this is addressed as something one could eventually dig down to, but I was hoping that since MySQL reports that number, the Python connection would have picked it up somewhere.
May this help?
SET profiling = 1;
Run your query;
SHOW PROFILES;
See here:http://dev.mysql.com/doc/refman/5.7/en/show-profile.html
Because of the above commands will be removed in the future version, Performance Schema can be used http://dev.mysql.com/doc/refman/5.7/en/performance-schema.html and http://dev.mysql.com/doc/refman/5.7/en/performance-schema-query-profiling.html.
On the above links, there are more details on Query Profiling Using Performance Schema.
I am trying to figure out an elegant way to share a variable between a windows form app and a python script running in the background. The variable would be used solely to update a progress bar in the windows form based on the the long running process in the python script. More specifically, a windows timer will fire every n seconds, check the variable, then update the progress bar value. Sound stupid enough yet? I'll try to explain the need for this below.
I have a windows app that lets a user define a number of parameters to fire off a long running process (python script). Without getting into unnecessary detail, this long running process will insert many (100k+ records) into a sqlite database over a significant period of time. In order to make the python script as performant as possible, I don't call commit on the sqlite database until the very end of the python script. Trying to query the sqlite database from the windows app (via System.Data.Sqlite) before the commit occurs always yields 0 records, regardless of far along the process is.
The windows app will know how many total records will be inserted by the python process, so determining progress will be straight-forward enough, assuming I can get access to a record count in the python script.
I know I could do this with a text file, but is there any better way?
Easiest solution is probably to just have the python script print to stdout: say each time an item is processed, print a line with a number representing how many items have been processed (or a percentage). Then have the forms application read the output line by line, updating the progressbar based on that information.
If you use IronPython instead, you can create the Form-with-progress-bar in Python and manipulate it directly.
Alternately, your Winforms app can host the script and share variables.