how to read sql query output in python - python

I am using sqlplus in my Python code to connect to data base and executing the query and reading the results. Can any one help me how to read the data from sysout.
My Code is Like this:
stdout = os.popen(cmd)
for line in stdout:
print line
stdout.close()
But I could see the result as for every three rows title is repeating like:
Name ID
---- ---
AB 1
AC 2
AD 3
Name ID
---- ---
BC 1
BD 2
like this.
Is it possible to control this, with out repeating the header, header should come only once and it should come only in the beginning.

What you are doing:
Launching a standalone program which queries the database and prints the results to stdout
Reading the stdout of that program and thinking about parsing it.
What you should be doing:
Using a database API in Python.
This page contains a list of Oracle DB APIs you could use: https://wiki.python.org/moin/Oracle
Many benefits will come from using a real API to query the database, such as better opportunities to handle errors, probably better performance, and future maintainers of your code not being upset with you.

Related

Why is my code producing a different SQLite file between my local installation and GitHub Action?

I have a Python module to handle my SQLite database, and this module provides two functions:
init_database() creates the database file with all the CREATE TABLE statements, it is executed if the database does not exist yet.
upgrade_database() updates the schema of the database (ALTER TABLE and other changes), it is executed when migrating from an older version of my program.
To ensure this critical part of the application works as expected, I wrote some tests for PyTest, to check that once these functions are executed, we get exactly some content.
My test to check the init part looks like this:
def test_init_database():
# PATH_DIRNAME is a constant with the path where the files are stored
db_path = f"{PATH_DIRNAME}/files/database.db"
if path.exists(db_path):
# Delete the file if it exists (happens if the test has already been run)
remove(db_path)
get_db(db_path).init_database()
with open(f"{PATH_DIRNAME}/files/database/database.sha256", "r") as file:
# The file may contain a blank line at the end, we just take the first one
# This is the hash sum we should get if the file is correctly generated
expected_sum = file.read().split("\n")[0]
with open(db_path, "rb") as file:
# Compute the SHA-256 hash sum of the generated database file, and compare it
assert hashlib.sha256(file.read()).hexdigest() == expected_sum
If I run this test locally, it passes with no problem. But if I run it on GitHub Action, it fails on the assertion, because the hash is different.
Then I've configured my GH Action workflow to upload the generated files in an artifact, so I could check them by myself, and it looks like there is a subtle difference between the file generated on my local environment and the one generated in my workflow:
$ xxd gha_workflow/database.db > gha_workflow.hex
$ xxd local/database.db > local.hex
$ diff local.hex gha_workflow.hex
7c7
< 00000060: 002e 5f1d 0d0f f800 0a0b ac00 0f6a 0fc7 .._..........j..
---
> 00000060: 002e 5f1c 0d0f f800 0a0b ac00 0f6a 0fc7 .._..........j..
Note as the fourth byte is different (1d vs 1c).
What could cause this difference? Am I doing my test wrong?
Depending on the sqlite version or build options your resulting database may vary in its format. For example the version and the page size can change and that may alter the physical format of your database. Depending on what you're trying to achieve, you may want to try to compare with a logical representation of your schema and content.
You can check these documentation pages for more information on the file format and the build options:
https://www.sqlite.org/compile.html
https://www.sqlite.org/formatchng.html
https://www.sqlite.org/fileformat2.html
Based on the advice given in comments, I think I have found a better way to test my database schema without depending on my file's metadata.
Instead of computing the SHA-256 sum, I get the table schema and compare it to the fields I expect to see if the database is well formed.
To get the schema, SQLite provides the following syntax:
PRAGMA table_info('my_table');
I can run it with Python like any regular SQL query, and it will return a list of tuples that contain, for each field:
the position
the field name
the field type
0 if it is nullable, 1 otherwise
the default value (or None if there isn't)
1 if it a primary key
Thanks for helping me to understand my mistake!

Can't query any data from sqlite - python

The problem
I just created my first sqlite database and don't know anything about SQLite. It took over 2 days to make and during the process I was able to do run
connectioncursor.execute("""SELECT * FROM tracks""")
and see the data appear in idle.
Now that I finished, I wanted to look at the data but I am not returning any results. I don't know anything about SQLite to troubleshoot this. I don't know what to check, or how to check it. When I look online, I only see more examples for how to use SELECT but that wont help if I can't see any of the data I spent days trying to insert.
Here is the code:
import sqlite3
connecttosql = sqlite3.connect('musicdatabase.db')
connectioncursor = connecttosql.cursor()
connectioncursor.execute("""SELECT * FROM tracks""")
rows = connectioncursor.fetchone()
connecttosql.commit()
connecttosql.close()
Result
============ RESTART: \\VGMSTATION\testing scripts\sqlite test.py ============
>>>
#in other words, nothing appears
The musicdatabse.db is in the same folder and it's 123 mb in size. I feel a little desperate as I just want to see the data I have worked days on trying to insert.
Is there any program, or any code, or any way that I can see my data?
Thank you for your time (also if I am missing something please let me know)
To answer-ify my comment:
You're not printing rows (in an interactive >>> shell, it would get printed implicitly), so nothing appears.
As a shortcut, you can just call execute on the connection:
import sqlite3
conn = sqlite3.connect('musicdatabase.db')
cur = conn.execute("SELECT * FROM tracks LIMIT 1")
row = cur.fetchone()
print(row)
Since you're not modifying data, no need to .commit() (and .close() will be called automatically for you on program end anyway).

Should I use python classes for a MySQL database insert program?

I have created a database to store NGS sequencing results. It consists of 17 tables to store all of the information. The results are stored in spreadsheets which I parse data from and store in variables using python (2.7), and then use the python package mysqldb to insert data into the database. I mainly use functions to obtain the information i need in variables, then write a loop in which I call this function followed by a 'try:' statement to insert. Here is a simple example:
def sample_processer(file):
my_file = open(file, 'r+')
samples = []
for line in my_file:
...get info...
samples.append(line[0])
return(samples)
samples = sample_processor('path/to/file')
for sample in samples:
try:
sql = "samsql = "INSERT IGNORE INTO sample(sample_id, diagnosis, screening) VALUES ("
samsql = samsql + "'"+sample+"'," +sam_screen_dict.get(sample)+"')"
except e:
db.rollback()
print("Something went wrong inserting data into the sample table: %s" %(e))
*sam_screen_dict is a dictionary i made from another function.
This is a simple table that I upload into but many of them call of different dictionaries to make sure the correct results are uploaded. However I was wondering whether there would be a more robust way in which to do this using a class.
For example, my sample_id has an associated screening attribute in the sample table, so this is easy to do with one dictionary. I have more complex junction tables, such as the table in which the sample_id, experiment_id and found mutation are stored, alongside other data, would it be a good idea to create a class for this table, calling on a simple 'sample' class to inherit from? That way I would always know that the results being inserted will be for the correct sample/experiment etc.
Also, using classes could I write rules for each attribute so that if the source spreadsheet is for some reason incorrect, it will not insert into the database?
I.e: sample_id is in the format A123/16. Therefore using a class it will check that the first character is 'A' and that sample_id[-3] should always == '/'. I know I could write these into functions, but I feel like it would take up so much space and time writing so many 'if' statements, that if it is stored once in a class then this would be alot better.
Has anybody done anything similar using classes to pass through their variables to test that they are correct before it gets to the insert stage and an error is created?
I am new to python classes and understand the basics, still trying to get to grips with them so a point in the right direction would be great - as would any help on how to go about actually writing the code for a python class that would be used to make a more robust database insertion program.
17tables it means you may use about 17 classes.
Use a simple script. webpy.db
https://github.com/webpy/webpy/blob/master/web/db.py just modify few code.
Then you can use webpy api: http://webpy.org/docs/0.3/api#web.db to finish your job.
Hope it's useful for you

Running multiple stored procedures using pypyodbc giving incomplete results

I'm running a relatively simple python script that is meant to read a text file that has a series of stored procedures - one per line. The script should run the stored procedure on the first line, move to the second line, run the stored procedure on the second line, etc etc. Running these stored procedures should populate a particular table.
So my problem is that these procedures aren't populating the table with all of the results that they should be. For example, if my file looks like
exec myproc 'data1';
exec myproc 'data2';
Where myproc 'data1' should populate this other table with about ~100 records, and myproc 'data2' should populate this other table with an additional ~50 records. Instead, I end up with about 9 results total - 5 from the first proc, 4 from the second.
I know the procedures work, because if I run the same sql file (with the procs) through OSQL and I get the correct ~150 records in the other table, so obviously it's something to do with my script.
Here's the code I'm running to do this:
import pypyodbc
conn = pypyodbc.connect(CONN_STR.format("{SQL Server}",server,database,user,password))
conn.autoCommit = True
procsFile = openFile('otherfile.txt','r+')
#loop through each proc (line) in the file
for proc in procsFile:
#run the procedure
curs = conn.cursor()
curs.execute(proc)
#commit results
conn.commit()
curs.close();
conn.close();
procsFile.close();
I'm thinking this has something to do with the procedures not committing...or something?? Frankly I don't really understand why only 5/100 records would be committed.
I dont know. any help or advice would be much appreciated.
There a couple things to check. One is that your data1 is actually a string 'data1', or if you want the value of data1? If you want the string 'data1', then you will have to add quotes around it. So your string to execute would look like this:
exec_string = 'exec my_proc \'data1\';'
In your case you turn ON auto-commit and also you manually commit for the entire connection.
I would comment out the auto-commit line:
#conn.autoCommit = True
And then change conn.commit() to the cursor instead
curs.commit()
As a one-liner:
conn.cursor().execute('exec myproc \'data1\';').commit()
Also, your Python semi-colons (;) at the end of the python line are unnecessary, and may be doing something weird in your for-loop. (Keep the SQL ones though.)

bq.py Not Paging Results

We're working on writing a wrapper for bq.py and are having some problems with result sets larger than 100k rows. It seems like in the past this has worked fine (we had related problems with Google BigQuery Incomplete Query Replies on Odd Attempts). Perhaps I'm not understanding the limits explained on the doc page?
For instance:
#!/bin/bash
for i in `seq 99999 100002`;
do
bq query -q --nouse_cache --max_rows 99999999 "SELECT id, FROM [publicdata:samples.wikipedia] LIMIT $i" > $i.txt
j=$(cat $i.txt | wc -l)
echo "Limit $i Returned $j Rows"
done
Yields (note there are 4 lines of formatting):
Limit 99999 Returned 100003 Rows
Limit 100000 Returned 100004 Rows
Limit 100001 Returned 100004 Rows
Limit 100002 Returned 100004 Rows
In our wrapper, we directly access the API:
while row_count < total_rows:
data = client.apiclient.tabledata().list(maxResults=total_rows - row_count,
pageToken=page_token,
**table_dict).execute()
# If there are more results than will fit on a page,
# you will recieve a token for the next page
page_token = data.get('pageToken', None)
# How many rows are there across all pages?
total_rows = min(total_rows, int(data['totalRows'])) # Changed to use get(data[rows],0)
raw_page = data.get('rows', [])
We would expect to get a token in this case, but none is returned.
sorry it took me a little while to get back to you.
I was able to identify a bug that exists server-side, you would end up seeing this with the Java client as well as the python client. We're planning on pushing a fix out this coming week. Your client should start to behave correctly as soon as that happens.
BTW, I'm not sure if you knew this already or not but there's a whole standalone python client that you can use to access the API from python as well. I thought that might be a bit more convenient for you than the client that's distributed as part of bq.py. You'll find a link to it on this page:
https://developers.google.com/bigquery/client-libraries
I can reproduce the behavior you're seeing with the bq command-line. That seems like a bug, I'll see what I can do to fix it.
One thing I did notice about the data you're querying was that selecting the id field only, and capping the number of rows around 100,000. This produces about ~1M worth of data so the server would likely not paginate the results. Selecting a larger amount of data will force the server to paginate since it will not be able to return all the results in a single response. If you did a select * for 100,000 rows of samples.wikipedia you'd be getting ~50M back which should be enough to start to see some pagination happen.
Are you seeing too few results come back from the python client as well or were you surprised that no page_token was returned for your samples.wikipedia query?

Categories

Resources