I'm connecting to MySQL with the MySQLdb module. I don't want to use Python's time functions: I want to know how long the query ran within MySQL, i.e. the number I see after I've run a query within MySQL directly.
I do see a thread where this is addressed as something one could eventually dig down to, but I was hoping that since MySQL reports that number, the Python connection would have picked it up somewhere.
May this help?
SET profiling = 1;
Run your query;
SHOW PROFILES;
See here:http://dev.mysql.com/doc/refman/5.7/en/show-profile.html
Because of the above commands will be removed in the future version, Performance Schema can be used http://dev.mysql.com/doc/refman/5.7/en/performance-schema.html and http://dev.mysql.com/doc/refman/5.7/en/performance-schema-query-profiling.html.
On the above links, there are more details on Query Profiling Using Performance Schema.
Related
I have a question and hope someone can direct me in the right direction; Basically every week I have to run a query (SSMS) to get a table containing some information (date, clientnumber, clientID, orderid etc) and then I copy all the information and that table and past it in a folder as a CSV file. it takes me about 15 min to do all this but I am just thinking can I automate this, if yes how can I do that and also can I schedule it so it can run by itself every week. I believe we live in a technological era and this should be done without human input; so I hope I can find someone here willing to show me how to do it using Python.
Many thanks for considering my request.
This should be pretty simple to automate:
Use some database adapter which can work with your database, for MSSQL the one delivered by pyodbc will be fine,
Within the script, connect to the database, perform the query, parse an output,
Save parsed output to a .csv file (you can use csv Python module),
Run the script as the periodic task using cron/schtask if you work on Linux/Windows respectively.
Please note that your question is too broad, and shows no research effort.
You will find that Python can do the tasks you desire.
There are many different ways to interact with SQL servers, depending on your implementation. I suggest you learn Python+SQL using the built-in sqlite3 library. You will want to save your query as a string, and pass it into an SQL connection manager of your choice; this depends on your server setup, there are many different SQL packages for Python.
You can use pandas for parsing the data, and saving it to a ~.csv file (literally called to_csv).
Python does have many libraries for scheduling tasks, but I suggest you hold off for a while. Develop your code in a way that it can be run manually, which will still be much faster/easier than without Python. Once you know your code works, you can easily implement a scheduler. The downside is that your program will always need to be running, and you will need to keep checking to see if it is running. Personally, I would keep it restricted to manually running the script; you could compile to an ~.exe and bind to a hotkey if you need the accessibility.
I'm after a way of querying Impala through Python which enables you to keep a connection open and pass queries to it.
I can connect quite happily to Impala using this sort of code:
import subprocess
sql = 'some sort of sql statement;'
cmds = ['impala-shell','-k','-B','-i','impala.company.corp','-q', sql]
out,err = subprocess.Popen(cmds, stderr=subprocess.PIPE, stdout=subprocess.PIPE).communicate()
print(out.decode())
print(err.decode())
I can also switch out the -q and sql for -f and a file with sql statements as per the documentation here.
When I'm running this for multiple sql statements the name node it uses is the same for all the queries and it it will stop if there is a failure in the code (unless I use the option to continue), this is all expected.
What I'm trying to get to is where I can run a query or two, check the results using some python logic and then continue if it meets my criteria.
I have tried splitting up my code into individual queries using sqlparse and running them one by one. This works well in isolation but if one statement is a drop table if exists x; and the next one then goes create table x (blah string); then if x did actually exist then because the second statement will run on a different node the dropping metadata change hasn't reached that one yet and it fails with table x already exists or similar error.
I'd think as well as getting round this metadata issue it would just make more sense to keep a connection open to impala whilst I run all the statements but I'm struggling to work this out.
Does anyone have any code that has this functionality?
You may wanna look at impyla, the Impala/Hive python client, if you haven't done so already.
As far as the second part of your question, using Impala's SYNC_DDL option will guarantee that DDL changes are propagated across impalads before next DDL is executed.
I used to be able to run and execute python using simply execute statement. This will insert value 1,2 into a,b accordingly. But started last week, I got no error , but nothing happened in my database. No flag - nothing... 1,2 didn't get insert or replace into my table.
connect.execute("REPLACE INTO TABLE(A,B) VALUES(1,2)")
I finally found the article that I need commit() if I have lost the connection to the server. So I have add
connect.execute("REPLACE INTO TABLE(A,B) VALUES(1,2)")
connect.commit()
now it works , but I just want to understand it a little bit , why do I need this , if I know I my connection did not get lost ?
New to python - Thanks.
This isn't a Python or ODBC issue, it's a relational database issue.
Relational databases generally work in terms of transactions: any time you change something, a transaction is started and is not ended until you either commit or rollback. This allows you to make several changes serially that appear in the database simultaneously (when the commit is issued). It also allows you to abort the entire transaction as a unit if something goes awry (via rollback), rather than having to explicitly undo each of the changes you've made.
You can make this functionality transparent by turning auto-commit on, in which case a commit will be issued after each statement, but this is generally considered a poor practice.
Not commiting puts all your queries into one transaction which is safer (and possibly better performance wise) when queries are related to each other. What if the power goes between two queries that doesn't make sense independently - for instance transfering money from one account to another using two update queries.
You can set autocommit to true if you don't want it, but there's not many reasons to do that.
I am creating a program for analyzing and generating queries. I was curious if there currently exists a method within SQLite such that I could query the time taken for a query to process? I am unable to modify my install in any way, so this method needs to work out of the box. I am writing my tool in python, and although I guess I could use the timer class to time execution -this method will not work when I am connecting to remote machines (and return a consistent timing.)
From within the sqlite3 command-line program you can do:
.timer ON
select * from my_table;
This will print the CPU time taken for the query.
How can a function be run "in a transaction" (see http://code.google.com/appengine/docs/python/datastore/functions.html#run_in_transaction with MySQL and SQLite? (and other RDBMSs if anyone knows how)?
EDIT: I want to do so in Python, but a way to do it in other programming languages would also be okay.
That depends on the RDBMS you use, what programming language you're writing in, and what library you use to connect.
With MySQL, you issue a START TRANSACTION query before the queries you want to be run in a transaction, and a COMMIT query to run that transaction.
http://dev.mysql.com/doc/refman/5.0/en/commit.html