Storing Python Lists in SQL Database - python

I have a SQL database that I store python lists in. Currently I convert the list to a string and then insert it into the database (using sqlite3) i.e.
foo = [1,2,3]
foo = str(foo)
#Establish connection with database code here and get cursor 'cur'
cur.execute("INSERT INTO Table VALUES(?, ?)", (uniqueKey, foo,))
It seems strange to convert my list to a string first, is there a better way to do this?

Replace your (key, listdata) table with (key, index, listitem). The unique key for the table becomes (key, index) instead of just key, and you'll want to ensure as a consistency condition that the set of indexes in the table for any given key is contiguous starting from 0.
You may or may not also need to distinguish between a key whose list is empty and a key that doesn't exist at all. One way is to have two tables (one of lists, and one of their elements), so that an empty but existing list is naturally represented as a row in the lists table with no corresponding rows in the elements table. Another way is just to fudge it and say that a row with index=null implies that the list for that key is empty.
Note that this is worthwhile if (and probably only if) you want to act on the elements of the list using SQL (for example writing a query to pull the last element of every list in the table). If you don't need to do that, then it's not completely unreasonable to treat your lists as opaque data in the DB. You're just losing the ability for the DB to "understand" it.
The remaining question then is how best to serialize/deserialize the list. str/eval does the job, but is a little worrying. You might consider json.dumps / json.loads, which for a list of integers is the same string format but with more safety restrictions in the parser. Or you could use a more compact binary representation if space is an issue.

2 ways.
Normalize tables, to you need setup new table for list value. so you get something like "TABLE list(id)" and "TABLE list_values(list_id, value)".
You can serialize the list and put in a column. Ex. Json, XML and so on (its not a very good practice in SQL).

Related

make function memory efficent or store data somewhere else to avoid memory error

I currently have a for loop which is finding and storing combinations in a list. The possible combinations are very large and I need to be able to access the combos.
can I use an empty relational db like SQLite to store my list on a disk instead of using list = []?
Essentially what I am asking is whether there is a db equivalent to list = [] that I can use to store the combinations generated via my script?
Edit:
SQLlite is not a must. Any will work if it can accomplish my task.
Here is the exact function that is causing me so much trouble. Maybe there is a better solution in general.
Idea - Could I insert the list into the database on each loop and then empty the list? Basically, create a list on each loop, send that list to PostgreSQL and then empty the list in the python to keep the RAM usage down?
def permute(set1, set2):
set1_combos = list(combinations(set1, 2))
set2_combos = list(combinations(set2, 8))
full_sets = []
for i in set1_combos:
for j in set2_combos:
full_sets.append(i + j)
return full_sets
Ok, a few ideas
My first thought was, why do you explode the combinations objects in lists? But of course, since we have two nested for loops, the iterator in the inner loop is consumed at the first iteration of the outer loop if it is not converted to a list.
However, you don't need to explode both objects: you can explode just the smaller one. For instance, if both our sets are made of 50 elements, the combinations of 2 elements are 1225 with a memsize (if the items are integers) of about 120 bytes each, i.e. 147KB, while the combinations of 8 elements are 5.36e+08 with a memsize of about 336 bytes, i.e. 180GB. So the first thing is, keep the larger combo set as a combinations object and iterate over it in the outer loop. By the way, this will also be really faster.
Now the database part. I assume a relational DBMS, be it SQLite or anything.
You want to create a table with a single column defined. Each row of your table will contain one final combination. Instead of appending each combination to a list, you will insert it in the table.
Now the question is, how do you need to access the data you created? Do you just need to iterate over the final combos sequentially, or do you need to query them, for instance finding all the combos which contain one specific value?
In the latter case, you'll want to define your column as the Primay Key, so your queries will be efficient; otherwise, you will save space on disk using an auto incrementing integer as the PK (SQLite will create it for you if you don't explicitly define a PK, and so will do a few other DMBS as well).
One final note: the insert phase may be painfully slow if you don't take some specific measures: check this very interesting SO post for details. In short, with a few optimizations they were able to pass from 85 to over 96K insert per second.
EDIT: iterating over the saved data
Once we have the data in the DB, iterating over them could be as simple as:
mycursor.execute('SELECT * FROM <table> WHERE <conditions>')
for combo in mycursor.fetchall():
print(combo) #or do what you need
But if your conditions don't filter away most of the rows you will meet the same memory issue we started with. A first step could be using fetchmany() or even fetchone() instead of fetchall() but still you may have a problem with the size of the query result set.
So you will probably need to read from the DB a chunk of data at a time, exploiting the LIMIT and OFFSET parameters in your SELECT. The final result may be something like:
chunck_size = 1000 #or whatever number fits your case
chunk_count = 0
chunk = mycursor.execute(f'SELECT * from <table> WHERE <conditions> LIMIT {chunk_size} ORDER BY <primarykey>'}
while chunk:
for combo in mycursor.fetchall():
print(combo) #or do what you need
chunk_count += 1
chunk = mycursor.execute(f'SELECT * from <table> WHERE <conditions> ORDER BY <primarykey>' OFFSET {chunk_size * chunk_count} LIMIT {chunk_size}}
Note that you will usually need the ORDER BY clause to ensure rows are returned as you expect them, and not in a random manner.
I don't believe SQLite has a built in array data type. Other DBMSs, such as PostgreSQL, do.
For SQLite, a good recommendation by another user on this site to obtain an array in SQLite can be found here: How to store array in one column in Sqlite3?
Another solution can be found: https://sqlite.org/forum/info/99a33767e8a07e59
In either case, yes it is possible to have a DBMS like SQLite store an array (list) type. However, it may require a little setup depending on the DBMS.
Edit: If you're having memory issues, have you thought about storing your data as a string and accessing the portions of the string you need when you need it?

How can I check if a list in a list comprehension inside a dictionary comprehension is empty?

I'm currently using list comprehension inside dictionary comprehension to detect changes between 2 dictionaries with lists as values.
The code looks something like this:
detectedChanges = {table: [field for field in tableDict[table] if field not in fieldBlackList] for table in modifiedTableDict if table not in tableBlackList}
This will create a dictionary where each entry is the table name and associated with it is a list changes.
The problem I'm getting is that although this code works, the resulting structure detectedChanges is filled with entries that only contain a table name and an empty list (meaning that no changes were detected).
I'm currently doing a posterior sweep through the dictionary in order to remove these entries but I would like avoid putting them in the dictionary in the first place.
Basically if I could somehow do a length check or something over [field for field in tableDict[table] I could validade it before creating the key:value entry.
Is there way to do this with the current method I'm using?
Although dict comprehensions are cool, they should not be misused. The following code is not much longer and it can be kept on a narrow screen as well:
detectedChanges = {}
for table, fields in modifiedTableDict.iteritems():
if table not in tableBlackList:
good_fields = [field for field in fields
if field not in fieldBlackList]
if good_fields:
detectedChanges[table] = good_fields
Just an addition to eumiro's answer. Please use their answer first as it is more readable. However, if I'm not mistaken comprehensions are in general faster, so there is one use case, but ONLY IF THIS IS A BOTTLENECK IN YOUR CODE. I cannot emphasize that enough.
detectedChanges = {table: [field for field in tableDict[table]
if field not in fieldBlackList]
for table in modifiedTableDict
if table not in tableBlackList
if set(tableDict[table])-set(fieldBlackList)}
Notice how ugly this is. I enjoy doing things like this to get a better understanding of Python, and due to the fact that I have had things like this be bottlenecks before. However, you should always use profiling before trying to solve issues that may not exist.
The addition to your code [...] if set(tableDict[table])-set(fieldBlackList) [...] creates a set of the entries in the current table, and a set of the blacklisted fields and gets the entries that are in the current table but not the blacklist. Empty sets evaluate to False causing the comprehension to ignore that table, the same as if it were in the tableBlackList variable. To make it more explicit, one could compare the result to an empty set or check whether it has a value in it at all.
Also, prefer the following for speed:
detectedChanges = {table: [field for field in fields
if field not in fieldBlackList]
for table, fields in modifiedTableDict.iteritems()
if table not in tableBlackList
if set(fields)-set(fieldBlackList)}

python adding unique items to a huge table

I have a very large list of items (10M+) that must be put in a table with three columns (Item_ID,Item_name,Item_count)
The items in the table must be unique.
We are adding the items one by one.
When each new item is added, we need to check:
if it is on the table, update its count +1, and retrieve its ID
if not on the table, insert it in the table, assign it an ID and set its count to 1
I have tried with different database implementations (MySQL and sqlite, python shelve, and my own flat file implementation), but the problem is always the same: the more rows there are on the table, the more lookup operations will be needed (for a table 10,000 rows, will need to do around 10,000*10,000 at least lookups for the following 10,000 items.
Indexing the database may sound a good idea to optimize, but my understanding is that the indexing is done after the bulk of the data is inserted, not updated with each insertion.
So, how can we add such large number of items into a table the way described?
you can use set() to check if that item is already on the list
im assuming that you have a list of list(w=[[id,name,count],[id,name,count],..])
r=[e[1] for e in list] <--- this will create a new list that only contains the names
if(len(set(r+item[1]))== len(set(r))){ <-if this is true then the item is on list
list[list.index(item)][countIndex]+= 1 <-- count +1
list[list.index(item)][idindex] <-- to retrieve id
}else{
list=list+[id,item-name,count] <-- this will add the item
}
if you have the list on your database its the same, just use queries the get and set the info.
to set the id you can search the last item id and set +1 like this
list=list+[list[len(list)][id]+1,item-name,count]

Issues with arrays in django views

I am fetching data from database which is stored in arrays, I have to match the output of this array with a string. But array outputs the result into Unicode format (u'aviesta',) thus it does not match the string.
My code.
// fblike is an arry in which the output of query stores.
for i in fblike:
if i=="Aviesta":
like=1
return render_to_response('showroom.html')
i have also try to encode this as variable.encode('utf8') but it only encode the specific element of an array such as i[0].encode('utf8') but i do not know which element of array have aviesta as a value.Thus i need to encode whole array but5 i don't know how to do that.
Updated::
In views.py is use
cursor = connection.cursor()
cursor.execute("SELECT name FROM django_facebook_facebooklike WHERE user_id = %s", request.user.id)
rowfb = cursor.fetchall() return render_to_response('showroom.html',{'rowfb':rowfbthis}
and print the {{rowfb}} variable in my template..and result array is
((u'Mukesh Chapagain',), (u'Ghrix Technologies Private Limited',), (u'FirstLALimo',), (u'Aviesta',), (u'Awkward Group',), (u'FB.Canvas.setDoneLoading',), (u'99recharge',), (u'AllThingsCustomized.com',), (u'celebrity aviesta',), (u'FTC',))
So please suggest me some way so that i can match the elements of array with the given string.
Thanks
Firstly, you should have posted the code as an update to your question, rather than a comment.
Secondly, I have no idea why you are accessing the data via a manual SQL query, rather than using Django's ORM. If you had done it the normal way, you would not be having this problem.
Finally, your problem has nothing to do with encodings. Your data is as follows (reposted for clarity):
((u'Mukesh Chapagain',), (u'Ghrix Technologies Private Limited',), (u'FirstLALimo',), (u'Aviesta',), (u'Awkward Group',), (u'FB.Canvas.setDoneLoading',), (u'99recharge',), (u'AllThingsCustomized.com',), (u'celebrity aviesta',), (u'FTC',))
This is a tuple of tuples. Each row of data is represented by a tuple, and in turn each column within that row is a tuple. In your case, since you're only selecting one column, you have a tuple of single-element tuples. That means, in each iteration of your loop, you have a tuple, not a string.
This would work:
for i in fblike:
if i[0] == "Aviesta":
like = 1
but to be honest, you would be better off going and doing a simple Python tutorial, and then going back to the Django tutorial and learning how to do queries via the ORM.
I don't know if your question has anything to do with arrays.
If you only need to find if the given string is in your array, you could simply do
if "Aviesta" in fblike:
like+=1

Python result from mysql db does not match actual value: comparison of values

I have a program in which I will compare hash values generated from my code as the ones in my mysql database.
So, after generating the check values i have:
hash to be compared: 78ff0103440dcea01f36438a71bdf28f
hash value from db: (('78ff0103440dcea01f36438a71bdf28f',),)
The hash value from the DB was output through using something like:
db_hash.fetchone()
that's why it includes the (('',),) symbols.
But I've tried appending the same symbols with the hash to be compared and it still wont equate properly.
Im baffled because it's only supposed to be a simple compare through a
if hash == result:
do some code
else:
do some code
If you have an idea on what this is please answer :)
Python's MySQL adapter returns rows as tuples of values - and in this case, you're receiving a full result set of one row, with one column. To get the value of that, just do:
dbResult # this is (('78ff0103440dcea01f36438a71bdf28f',),)
dbResult[0][0] # this is '78ff0103440dcea01f36438a71bdf28f'
Of course, if your query was different (or returned no rows) this would throw an error. You should ideally be checking the number of rows returned (len(dbResult)) first. The number of columns in each row will be consistent.

Categories

Resources