I have a need to store a python set in a database for accessing later. What's the best way to go about doing this? My initial plan was to use a textfield on my model and just store the set as a comma or pipe delimited string, then when I need to pull it back out for use in my app I could initialize a set by calling split on the string. Obviously if there is a simple way to serialize the set to store it in the db so I can pull it back out as a set when I need to use it later that would be best.
If your database is better at storing blobs of binary data, you can pickle your set. Actually, pickle stores data as text by default, so it might be better than the delimited string approach anyway. Just pickle.dumps(your_set) and unpickled = pickle.loads(database_string) later.
There are a number of options here, depending on what kind of data you wish to store in the set.
If it's regular integers, CommaSeparatedIntegerField might work fine, although it often feels like a clumsy storage method to me.
If it's other kinds of Python objects, you can try pickling it before saving it to the database, and unpickling it when you load it again. That seems like a good approach.
If you want something human-readable in your database though, you could even JSON-encode it into a TextField, as long as the data you're storing doesn't include Python objects.
Redis natively stores sets (as well as other data structures (lists, dicts, queue)) and provides set operations - and its rocket fast too. I find it's the swiss army knife for python development.
I know its not a relational database per se, but it does solve this problem very concisely.
What about CommaSeparatedIntegerField?
If you need other type (string for example) you can create your own field which would work like CommaSeparatedIntegerField but will use strings (without commas).
Or, if you need other type, probably a better way of doing it: have a dictionary which maps integers to your values.
Related
I have a City model and fixture data with list of cities, and currently doing cleanups for URL on view and template before loading them. So I do below in a template to have a URL like this: http://newcity.domain.com.
<a href="http://{{ city.name|lower|cut:" " }}.{{ SITE_URL}}">
The actual city.name is "New City"
Would it be better if I stored already cleaned data (newcity) in a new column (short_name) on MySQL db and just use city.short_name on templates and views?
This seems very opinion-oriented. Is it faster? The only way to know for sure is to measure. Is it faster to a degree that you care about? Probably not. In any event, it's better not to make schema design decisions based on performance unless you've observed measurably bad performance.
All other things being equal, it is generally best to store the data in different columns. It's easier to join it in controller or template code than it is to separate it out into its pieces.
storing the short name in a MySQL database requires I/O. I/O is always slow, for such an easy transormation of data, it should be faster to keep it, like it is and avoid I/O to a database.
If you really want to know the difference, use timeit (https://docs.python.org/2/library/timeit.html), probably accessing a database is much slower.
It really depends, if you have a fixed amount of cities in a list, just make them hardcoded (unless you have lots of cities that will actually put some stress on your server's resources - but I don't think that it's the case here), otherwise - you must use some type of persistent store for the cities and a database will come handy.
following an earlier question I asked here (Most appropriate way to combine features of a class to another?) I got an answer that I finally grown to understand. In short what I intend to now is have a bunch of dictionaries, each dictionary will look somewhat like this:
{ "url": "http://....", "parser": SomeParserClass }
though more properties might be added later but will include either strings or some other classes.
Now my question is: what's the best way to save these objects?
I thought up of 3 solutions, not sure which one is the best and if there are any other more acceptable solutions.
Use pickle, while it seems efficient to use it would make editing any of these dictionaries a pain, since it's saved in binary format.
Save each dictionary in a separate module and import these modules dynamically from a single directory, each module would either have a function inside it to return the dictionary or a specially crafted variable name to hold it so I could call it from my loading code. This seems the easier the edit but doesn't sound very efficient or pythonic
Use some sort of database like MongoDB or Riak to save these objects, my problem with this one is either editing which is doable but doesn't sound like fun and the fact that the former 2 are equipped with means to correctly save my parser class inside the dictionary, I have no idea how these databases serialize or 'pickle' such objects.
As you see my main concerns are how easy would it be to edit them, the efficiency of saving and retrieving the data (though not a huge concern since I only have a couple of hundreds of these) and the correctness of the solution.
So, any thoughts?
Thank you in advance for any help you might be able to provide.
Use JSON. It supports python dictionaries and can be easily edited.
You can try shelve. It's built on top of pickle and let's you serialize objects and associate them to string keys.
Because it is based on dbm, it will only access key/values as you need them. So if you only need to access a few items from a large dictionary, shelve may be a better choice than json, which has to load the entire JSON file into a dictionary first.
Many times while creating database structure, I get stuck at the question, what would be more effective, storing data in pickled format in a column in the same table or create additional table and then use JOIN.
Which path should be followed, any advice ?
For example:
There is a table of Customers, containing fields like Name, Address
Now for managing Orders (each customer can have many), you can either create an Order table or store the orders in a serialized format in a separate column in the Customers table only.
It's usually better to create seperate tables. If you go with pickling and later find you want to query the data in a different way, it could be difficult.
See Database normalization.
Usually it's best to keep your data normalized (i.e. create more tables). Storing data 'pickled' as you say, is acceptable, when you don't need to perform relational operations on them.
Mixing SQL databases and pickling seems to ask for trouble. I'd go with either sticking all data in the SQL databases or using only pickling, in the form of the ZODB, which is a Python only OO database that is pretty damn awesome.
Mixing makes case sometimes, but is usually just more trouble than it's worth.
I agree with Mchi, there is no problem storing "pickled" data if you don't need to search or do relational type operations.
Denormalisation is also an important tool that can scale up database performance when applied correctly.
It's probably a better idea to use JSON instead of pickles. It only uses a little more space, and makes it possible to use the database from languages other than Python
I agree with #Lennart Regebro. You should probably see whether you need a Relational DB or an OODB. If RDBMS is your choice, I would suggest you stick with more tables. IMHO, pickling may have issues with scalability. If thats what you want, you should look at ZODB. It is pretty good and supports caching etc for better performance
I have some things that do not need to be indexed or searched (game configurations) so I was thinking of storing JSON on a BLOB. Is this a good idea at all? Or are there alternatives?
If you need to query based on the values within the JSON, it would be better to store the values separately.
If you are just loading a set of configurations like you say you are doing, storing the JSON directly in the database works great and is a very easy solution.
No different than people storing XML snippets in a database (that doesn't have XML support). Don't see any harm in it, if it really doesn't need to be searched at the DB level. And the great thing about JSON is how parseable it is.
I don't see why not. As a related real-world example, WordPress stores serialized PHP arrays as a single value in many instances.
I think,It's beter serialize your XML.If you are using python language ,cPickle is good choice.
I am working on a project on Info Retrieval.
I have made a Full Inverted Index using Hadoop/Python.
Hadoop outputs the index as (word,documentlist) pairs which are written on the file.
For a quick access, I have created a dictionary(hashtable) using the above file.
My question is, how do I store such an index on disk that also has quick access time.
At present I am storing the dictionary using python pickle module and loading from it
but it brings the whole of index into memory at once (or does it?).
Please suggest an efficient way of storing and searching through the index.
My dictionary structure is as follows (using nested dictionaries)
{word : {doc1:[locations], doc2:[locations], ....}}
so that I can get the documents containing a word by
dictionary[word].keys() ... and so on.
shelve
At present I am storing the dictionary using python pickle module and loading from it but it brings the whole of index into memory at once (or does it?).
Yes it does bring it all in.
Is that a problem? If it's not an actual problem, then stick with it.
If it's a problem, what kind of problem do you have? Too slow? Too fast? Too colorful? Too much memory used? What problem do you have?
I would use Lucene. Why reinvent the wheel?
Just store it in a string like this:
<entry1>,<entry2>,<entry3>,...,<entryN>
If <entry*> contains ',' character, use some other delimiter like '\t'.
This is smaller in size than an equivalent pickled string.
If you want to load it, just do:
L = s.split(delimiter)
You could store the repr() of the dictionary and use that to re-create it.
If it's taking a long time to load or using too much memory, you might need a database. There are many you might use; I would probably start with SQLite. Then your problem is "reduced" ;-) to simply formulating the right query to get what you need out of the database. This way you will only load what you need.
I am using anydmb for that purpose. Anydbm provides the same dictionary-like interface, except it allow only strings as keys and values. But this is not a constraint since you can use cPickle's loads/dumps to store more complex structures in the index.