200GB Sqlite database file search and retrieve

200GB Sqlite database file search and retrieve - python

I have a 250GB sqlite database file on an SSD drive and need to search through this file and search for a specific value in a table.
I wrote a script to perform the lookup in python and here is a similar sql statement to the one that I wrote:
SELECT table FROM database WHERE table like X'003485FAd480'.
I am looking to compare between hex values stored in a table to a given hex value.I am using Anaconda command prompt and not sure if this is the best route.
My question is about possible recommendations or tools to help speed up the lookup?
Thanks!

LIKE converts both operands into strings, so it might not work correctly if a value contains zero bytes or bytes that are not valid in the UTF-8 encoding.
To compare for equality, use =:
SELECT ... FROM MyTable WHERE MyColumn = x'003485FAD480';
This search can be sped up with an index on the lookup column; if you do not already have a primary key or unique constraint on this column, you can create an index manually:
CREATE INDEX MyLittleIndex ON MyTable(MyColumn);

I don't know if this is what your looking for, you mentioned using Python. If you're searching different values that are in Python, have you thought about writing two functions, one to search the database and one to compare those results and do something with them?
def queryFuntion():
cnxn = pyodbc.connect('DRIVER={SQLite3 ODBC Driver};SERVER=localhost;DATABASE=test.db;Trusted_connection=yes') #for production use only
cursor = cnxn.cursor()
query = cursor.execute("SELECT table FROM database")
for row in cursor.fetchall():
yield str(row.table)
def compareFunction(row):
search = '003485FAd480'
if row == search:
print('Yes')
else:
print('No')

Related

Store a set in SQLite

I am using Python and I would like to have a list of IDs stored in disk preserving some of the functionalities of a set (that is, efficiently checking if an ID is contained). To this end, I think using SQLite library is a wise decision (at least that is my impression after googling and stacking a bit). However, I am a beginner in SQL world and could not find any post explaining what I am looking for.
How can I store IDs (strings) in SQLite and later check if a specific ID appears or not in the database?
import sqlite3
id1 = 'abc'
id2 = 'def'
# Initialization of the database
define_database()
# Update the database by inserting a new ID
insert_in_database(id1)
insert_in_database(id2)
# Check if the specified ID is contained in the database (returns a Boolean)
check_if_exists_in_database(id1)
PS: I am aware of the sqlite3 library.
Thanks!

Just use a table with a single column. This column must be indexed (explicitly, or by making it the primary key) for lookups over large data to be efficient:
db = sqlite3.connect('...filename...')
def define_database():
db.execute('CREATE TABLE IF NOT EXISTS MyStuff(id PRIMARY KEY)')
(Use a WITHOUT ROWID table if your Python version is recent enough to have a modern version of the SQLite library.)
Inserting is done with standard SQL:
def insert_in_database(value):
db.execute('INSERT INTO MyStuff(id) VALUES(?)', [value])
To check whether a value exists, just try to read its row:
def check_if_exists_in_database(value):
for row in db.execute('SELECT 1 FROM MyStuff WHERE id = ?', [value])
return True
else:
return False

Python, SQL: How to update multiple rows and columns in a single trip around the database?

Hello StackEx community.
I am implementing a relational database using SQLite interfaced with Python. My table consists of 5 attributes with around a million tuples.
To avoid large number of database queries, I wish to execute a single query that updates 2 attributes of multiple tuples. These updated values depend on the tuples' Primary Key value and so, are different for each tuple.
I am trying something like the following in Python 2.7:
stmt= 'UPDATE Users SET Userid (?,?), Neighbours (?,?) WHERE Username IN (?,?)'
cursor.execute(stmt, [(_id1, _Ngbr1, _name1), (_id2, _Ngbr2, _name2)])
In other words, I am trying to update the rows that have Primary Keys _name1 and _name2 by substituting the Neighbours and Userid columns with corresponding values. The execution of the two statements returns the following error:
OperationalError: near "(": syntax error
I am reluctant to use executemany() because I want to reduce the number of trips across the database.
I am struggling with this issue for a couple of hours now but couldn't figure out either the error or an alternate on the web. Please help.
Thanks in advance.

If the column that is used to look up the row to update is properly indexed, then executing multiple UPDATE statements would be likely to be more efficient than a single statement, because in the latter case the database would probably need to scan all rows.
Anyway, if you really want to do this, you can use CASE expressions (and explicitly numbered parameters, to avoid duplicates):
UPDATE Users
SET Userid = CASE Username
WHEN ?5 THEN ?1
WHEN ?6 THEN ?2
END,
Neighbours = CASE Username
WHEN ?5 THEN ?3
WHEN ?6 THEN ?4
END,
WHERE Username IN (?5, ?6);

Python pyodbc returning \x0e

I'm trying to return a hard coded value in my SQL query, but when running the query using pyodbc, random records return '\x0e' instead of the hard coded value (in this case '16'). If I run the query on the server (MS SQL Server 2008), the query returns all the correct results and values.
The beginning of the query looks like this:
My SQL Code:
Select '"16","' + S.ShipNum + '","'
My python code:
cursor.execute("""Select '\"16\",\"' + SS.ShipNum + '\",\"'
Is there another way to guarantee a value is returned from a query?

\016 is the oct representation of \x0e
So I would think that it has more to do with the way in which you are escaping your double quotes. In your python you are actually geting \16 and not "16" as you desire.
You should try a prepared statment maybe.
ps = db.prepare("SELECT 16")
ps()
returns:
[(16,)]
Addtional examples can be seen here:
[http://python.projects.pgfoundry.org/docs/0.8/driver.html#parameterized-statements]
You can see all of the ascii and other character sets here
[http://donsnotes.com/tech/charsets/ascii.html]

It looks like you're trying to create a comma-delimited, quoted, string representation of the row. Don't try to do this in the database query, string formatting isn't one of T-SQL's strengths.
Pass the static value using a parameter, then join the row values. Using sys.databases for the example:
params = ("Some value",)
sql = "SELECT ?, name, user_access_desc FROM sys.databases"
for row in cursor.execute(sql):
print(','.join('"{0}"'.format(column) for column in row))

msqldb query is not acting as case sensitive

Hi I'm trying to check if I already have an entry in a mysql table through python. The key I use to check if their is already an entry is called PID and it uses the ascii_bin collation. my problem is when when I try something like...
q = """select * from table_name where PID = '%s'"""%("Hello")
db = MySQLdb.connect("xxxx", "xxxx", "xxxx","temp",cursorclass=MySQLdb.cursors.DictCursor)
cursor = db.cursor()
res = cursor.execute(q)
rowOne = cursor.fetchone() #fetches the row where pid = "hello"
rowOne ends up being the row where pid = hello. However when I use sqlyog and execute the query it properly prints out the row where pid = Hello(Properly functions as a case sensitive query). I'm looking for a way to get the mysqldb module to work properly as a lot of my code already is using this module

By default string comparison operations in MySQL are not case sensitive (or accent sensitive): http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
You have two easy options though, use COLLATE:
For example, if you are comparing a column and a string that both have the latin1 character set, you can use the COLLATE operator to cause either operand to have the latin1_general_cs or latin1_bin collation:
or set the column to be case sensitive:
If you want a column always to be treated in case-sensitive fashion, declare it with a case sensitive or binary collation. See Section 13.1.10, “CREATE TABLE Syntax”.

How to determine if field exists in same table in another field in any other row

I am having troubles finding out if I can even do this. Basically, I have a csv file that looks like the following:
1111,804442232,1
1112,312908721,1
1113,A*2434,1
1114,A*512343128760987,1
1115,3512748,1
1116,1111,1
1117,1234,1
This is imported into a sqlite database in memory for manipulation. I will be importing multiple files into this database after some manipulation. Sqlite is allowing me to keep constraints on the tables and receive errors where needed without creating additional functions just to check each constraint while using arrays in python. I want to do a few things but the first of which is to prepend field2 where all field2 strings match an entry in field1.
For example, in the above data field2 in entry 6 matches entry 1. In this case I would like to prepend field2 in entry 6 with '555'
If this is not possible I do believe I could make do using a regex and just do this on every row with 4 digits in field2... though... I have yet to successfully get REGEX working using python/sqlite as it always throws me an error.
I am working within Python using Sqlite3 to connect/manipulate my sqlite database.
EDIT: I am looking for a method to manipulate the resultant tables which reside in a sqlite database rather than manipulating just the csv data. The data above is just a simple representation of what is contained in the files I am working with. Would it be better to work with arrays containing the data from the csv files? These files have 10,000+ entries and about 20-30 columns.

If you must do it in SQLite, how about this:
First, get the column names of the table by running the following and parsing the result
def get_columns(table_name, cursor):
cursor.execute('pragma table_info(%s)' % table_name)
return [row[1] for row in cursor]
conn = sqlite3.connect('test.db')
columns = get_columns('test_table',conn.cursor())
For each of those columns, run the following update, that does your prepending
def prepend(column, reference, prefix, cursor):
query = '''
UPDATE %s
SET %s = 'prefix' || %s
WHERE %s IN (SELECT %s FROM %s)
''' % (table, column, column, column, reference, table)
cursor.execute(query)
reference = 'field1'
[prepend('test_table', column, reference, '555', conn.cursor())
for column in columns
if column != reference]
Note that this is expensive: O(n^2) for each column you want to do it for.
As per your edit and Nathan's answer, it might be better to simply work with python's builtin datastructures. You can always insert it into SQLite after.
10,000 entries is not really much so it might not matter in the end. It all depends on your reason for requiring it to be done in SQLite (which we don't have much visibility of).

There is no need to use regex expressions to do this, just throw the contents from the first column into a set and then iterate through the rows and update the second field.
first_col_values = set(row[0] for row in rows)
for row in rows:
if row[1] in first_col_values:
row[1] = '555' + row[1]

So... I found the answer to my own question after a ridiculous amount of my own searching and trial and error. My unfamiliarity with SQL had me stumped as I was trying all kinds of crazy things. In the end... this was the simple type of solution I was looking for:
prefix="555"
cur.execute("UPDATE table SET field2 = %s || field2 WHERE field2 IN (SELECT field1 FROM table)"% (prefix))
I kept the small amount of python in there but what I was looking for was the SQL statement. Not sure why nobody else came up with something that simple =/. Unsatisfied with the answers so far, I had been searching far and wide for this simple line >_<.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.