The problem
I just created my first sqlite database and don't know anything about SQLite. It took over 2 days to make and during the process I was able to do run
connectioncursor.execute("""SELECT * FROM tracks""")
and see the data appear in idle.
Now that I finished, I wanted to look at the data but I am not returning any results. I don't know anything about SQLite to troubleshoot this. I don't know what to check, or how to check it. When I look online, I only see more examples for how to use SELECT but that wont help if I can't see any of the data I spent days trying to insert.
Here is the code:
import sqlite3
connecttosql = sqlite3.connect('musicdatabase.db')
connectioncursor = connecttosql.cursor()
connectioncursor.execute("""SELECT * FROM tracks""")
rows = connectioncursor.fetchone()
connecttosql.commit()
connecttosql.close()
Result
============ RESTART: \\VGMSTATION\testing scripts\sqlite test.py ============
>>>
#in other words, nothing appears
The musicdatabse.db is in the same folder and it's 123 mb in size. I feel a little desperate as I just want to see the data I have worked days on trying to insert.
Is there any program, or any code, or any way that I can see my data?
Thank you for your time (also if I am missing something please let me know)
To answer-ify my comment:
You're not printing rows (in an interactive >>> shell, it would get printed implicitly), so nothing appears.
As a shortcut, you can just call execute on the connection:
import sqlite3
conn = sqlite3.connect('musicdatabase.db')
cur = conn.execute("SELECT * FROM tracks LIMIT 1")
row = cur.fetchone()
print(row)
Since you're not modifying data, no need to .commit() (and .close() will be called automatically for you on program end anyway).
Related
I am trying to use the sqlite3 module in python to do a database lookup in a table that only has one column. The column contains phone numbers in the format of:
9545551212
???5551212
Here's what I am running in python:
import sqlite3
cti = '/home/user/test/cti.db'
conn = sqlite3.connect(cti)
c = conn.cursor()
c.execute('select * from ani_table_1 where number = 9545551212')
<sqlite3.Cursor object at 0x7f6b435316c0>
When I run that exact same select statement in sqlite3 I get the expected result:
sqlite> select * from ani_table_1 where number = 9545551212;
9545551212
I'm using python 3.6.5 and sqlite 3.7.17
What have I got wrong in my code? Any help is much appreciated.
You didn't iterate over the result. The sqlite3 command line tool is not the same thing as Python code; the latter always prints the results, because it is a command-line tool and will make sure you don't get flooded with large result sets.
When accessing a database in code, however, the library can't assume you want to print out all the rows to the end user. You maybe wanted to do something different with the data instead.
So you need to loop over the cursor and print each row:
c.execute('select * from ani_table_1 where number = 9545551212')
for row in c:
print(*row, sep='\t')
You may want to familiarise yourself with how the Python database API standard works; search around for a good tutorial. At a glance, this specific tutorial looks like it covers the most important basics.
I'm new to both MongoDB and pyMongo,
and am having some performance issues
regarding cursors.
TL,DNR: Anything operation I try to perform
using a cursor takes about a second.
Long version
I have a small database, which I bulkloaded. Each entry has 3 fields:
dom: domain name (unique)
date: date, YYYYMMDD
flag: string
I've loaded about 1.9 million entries, without incident, and quite quickly.
I created a hash index on the dom field.
Now, I want to grab certain records by the domain field, and update them, using a Python program.
That's where the problem lies.
I'm using the latest MongoDB, and the latest pyMongo.
stripped down program...
import pymongo
from pymongo import MongoClient
db = client.myindexname
posts = db.posts
print list(db.profiles.index_information()) # shows hash index is present
for k in newdomainlist.keys(): #iterate list of domains to check
ret = posts.find({"dom": k}) #this runs fine, and quickly
#'ret' is a cursor
print ret #this runs quickly
#Here's the problem
print ret.count() #this takes about a second. why?
If I just 'print ret', the speed is fine. However, if I try to
reference anything in the cursor, the speed drops to the floor - I
can do about 1 operation per second.
In this case, I'm just trying to see if ret.count() returns '0' (we don't
have this domain), or '1' (we have it already).
I've tried adding a batch_size(10000) to the find, without it helping.
I DO have the Python C extensions loaded.
What the heck am I doing wrong?
thanks
It turned out that I'd created my hashed index on the wrong field, 'collection', rather than 'posts'. Chalk it up to mongodb inexperience. We can close this one now, or delete it entirely.
My set-up:
MySQL server.
host running a python script.
(1) and (2) are different machines on the network.
The python script generates data which must be stored in a MySQL-database.
I use this (example-)code to achieve that:
def function sqldata(date,result):
con = mdb.connect('sql.lan', 'demouser', 'demo', 'demo')
with con:
cur = con.cursor()
cur.execute('INSERT INTO tabel(titel, nummer) VALUES( %s, %s)',(date, result))
The scipt generates one data-point approx. every minute. So this means that a new connection is opened and closed every minute. I'm wondering if it would be a better idea to open the connection at the start of the script and only close it when the script terminates. Effectively leaving the connection open indefinately.
This then obviously begs the question how to handle/recover when the SQL-server "leaves" the network (e.g. due to a reboot) for a while.
While typing my question this question appeared in the "Similar Questions" section. It is, however, from 2008 and possibly outdated and the 4 answers it received seem to contradict with each other.
What are the current insights in this matter?
Well the referred answer is right in it's point, but maybe not answering all your questions. I can not provide a full running python script for you here, but let me explain how i would go along with it:
Rule 1: Generally most mysql functions return values, that you should always check so that you can react on unwanted behavior.
Rule 2: Open a connection at the beginning of your script and use this one and only connection throughout your script.
Obviously you could check if there is an existing connection in your sqldata function, and if not then you could open a new one to the global con object.
if not con:
con = mdb.connect('sql.lan', 'demouser', 'demo', 'demo')
And if there is a connection already, you could check it's "up status" by performing a simple query with fixed expected result that you can check to see if the sql server is running.
if con:
cur = con.cursor()
returned = cur.execute('SELECT COUNT(*) FROM tabel')
if returned.with_rows:
....
Basically you could avoid this, because if you don't get a cursor back, and you check that first before using it, then you already know if the server is alive or not.
So CHECK, CHECK and CHECK. You should check everything you get back from a function to have a good error handling. Just using a connection or using a cursor without checking it first, can lead you talking to a NIL object and crashing your script.
And the last BIG HINT i can give you is to use multiple row inserts. You can actually insert hundreds of rows, if you just add the values comma seperated to your insert string:
# consider result would be filled like this
result = '("First Song",1),("Second Song",2),("Third Song",3)'
# then this will insert 3 rows with one call
returned = cur.execute('INSERT INTO tabel (titel, nummer) VALUES %s',(date, result), multi=True)
# since literally it will execute
returned = cur.execute('INSERT INTO tabel (titel, nummer) VALUES ("First Song",1),("Second Song",2),("Third Song",3)', multi=True)
# and now you can check returned for any error
if returned:
....
I am using pythons built-in sqlite3 module to access a database. My query executes a join between a table of 150000 entries and a table of 40000 entries, the result contains about 150000 entries again. If I execute the query in the SQLite Manager it takes a few seconds, but if I execute the same query from Python, it has not finished after a minute. Here is the code I use:
cursor = self._connection.cursor()
annotationList = cursor.execute("SELECT PrimaryId, GOId " +
"FROM Proteins, Annotations " +
"WHERE Proteins.Id = Annotations.ProteinId")
annotations = defaultdict(list)
for protein, goterm in annotationList:
annotations[protein].append(goterm)
I did the fetchall just to measure the execution time. Does anyone have an explanation for the huge difference in performance? I am using Python 2.6.1 on Mac OS X 10.6.4.
I implemented the join manually, and this works much faster. The code looks like this:
cursor = self._connection.cursor()
proteinList = cursor.execute("SELECT Id, PrimaryId FROM Proteins ").fetchall()
annotationList = cursor.execute("SELECT ProteinId, GOId FROM Annotations").fetchall()
proteins = dict(proteinList)
annotations = defaultdict(list)
for protein, goterm in annotationList:
annotations[proteins[protein]].append(goterm)
So when I fetch the tables myself and then do the join in Python, it takes about 2 seconds. The code above takes forever. Am I missing something here?
I tried the same with apsw, and it works just fine (the code does not need to be changed at all), the performance it great. I'm still wondering why this is so slow with the sqlite3-module.
There is a discussion about it here: http://www.mail-archive.com/python-list#python.org/msg253067.html
It seems that there is a performance bottleneck in the sqlite3 module. There is an advice how to make your queries faster:
make sure that you do have indices on the join columns
use pysqlite
You haven't posted the schema of the tables in question, but I think there might be a problem with indexes, specifically not having an index on Proteins.Id or Annotations.ProteinId (or both).
Create the SQLite indexes like this
CREATE INDEX IF NOT EXISTS index_Proteins_Id ON Proteins (Id)
CREATE INDEX IF NOT EXISTS index_Annotations_ProteinId ON Annotations (ProteinId)
I wanted to update this because I am noticing the same issue and we are now 2022...
In my own application I am using python3 and sqlite3 to do some data wrangling on large databases (>100000 rows * >200 columns). In particular, I have noticed that my 3 table inner join clocks in around ~12 minutes of run time in python, whereas running the same join query in sqlite3 from the CLI runs in ~100 seconds. All the join predicates are properly indexed and the EXPLAIN QUERY PLAN indicates that the added time is most likely because I am using SELECT *, which is a necessary evil in my particular context.
The performance discrepancy caused me to pull my hair out all night until I realized there is a quick fix from here: Running a Sqlite3 Script from Command Line. This is definitely a workaround at best, but I have research due so this is my fix.
Write out the query to an .sql file (I am using f-strings to pass variables in so I used an example with {foo} here)
fi = open("filename.sql", "w")
fi.write(f"CREATE TABLE {Foo} AS SELECT * FROM Table1 INNER JOIN Table2 ON Table2.KeyColumn = Table1.KeyColumn INNER JOIN Table3 ON Table3.KeyColumn = Table1.KeyColumn;")
fi.close()
Run os.system from inside python and send the .sql file to sqlite3
os.system(f"sqlite3 {database} < filename.sql")
Make sure you close any open connection before running this so you don't end up locked out and you'll have to re-instantiate any connection objects afterward if you're going back to working in sqlite within python.
Hope this helps and if anyone has figured the source of this out, please link to it!
I've got a very large SQLite table with over 500,000 rows with about 15 columns (mostly floats). I'm wanting to transfer data from the SQLite DB to a Django app (which could be backed by many RDBMs, but Postgres in my case). Everything works OK, but as the iteration continues, memory usage jumps by 2-3 meg a second for the Python process. I've tried using 'del' to delete the EVEMapDenormalize and row objects at the end of each iteration, but the bloat continues. Here's an excerpt, any ideas?
class Importer_mapDenormalize(SQLImporter):
def run_importer(self, conn):
c = conn.cursor()
for row in c.execute('select * from mapDenormalize'):
mapdenorm, created = EVEMapDenormalize.objects.get_or_create(id=row['itemID'])
mapdenorm.x = row['x']
mapdenorm.y = row['y']
mapdenorm.z = row['z']
if row['typeID']:
mapdenorm.type = EVEInventoryType.objects.get(id=row['typeID'])
if row['groupID']:
mapdenorm.group = EVEInventoryGroup.objects.get(id=row['groupID'])
if row['solarSystemID']:
mapdenorm.solar_system = EVESolarSystem.objects.get(id=row['solarSystemID'])
if row['constellationID']:
mapdenorm.constellation = EVEConstellation.objects.get(id=row['constellationID'])
if row['regionID']:
mapdenorm.region = EVERegion.objects.get(id=row['regionID'])
mapdenorm.save()
c.close()
I'm not at all interested in wrapping this SQLite DB with the Django ORM. I'd just really like to figure out how to get the data transferred without sucking all of my RAM.
Silly me, this was addressed in the Django FAQ.
Needed to clear the DB query cache while in DEBUG mode.
from django import db
db.reset_queries()
I think a select * from mapDenormalize and loading the result into memory will always be a bad idea. My advise is - spread script into chunks. Use LIMIT to get data in portions.
Get first portion, work with it, close to cursor, and then get the next portion.