I have some things that do not need to be indexed or searched (game configurations) so I was thinking of storing JSON on a BLOB. Is this a good idea at all? Or are there alternatives?
If you need to query based on the values within the JSON, it would be better to store the values separately.
If you are just loading a set of configurations like you say you are doing, storing the JSON directly in the database works great and is a very easy solution.
No different than people storing XML snippets in a database (that doesn't have XML support). Don't see any harm in it, if it really doesn't need to be searched at the DB level. And the great thing about JSON is how parseable it is.
I don't see why not. As a related real-world example, WordPress stores serialized PHP arrays as a single value in many instances.
I think,It's beter serialize your XML.If you are using python language ,cPickle is good choice.
Related
Here is my situation. I used Python, Django and MySQL for a web development.
I have several tables for form posting, whose fields may change dynamically. Here is an example.
Like a table called Article, it has three fields now, called id INT, title VARCHAR(50), author VARCHAR(20), and it should be able store some other values dynamically in the future, like source VARCHAR(100) or something else.
How can I implement this gracefully? Is MySQL be able to handle it? Anyway, I don't want to give up MySQL totally, for that I'm not really familiar with NoSQL databases, and it may be risky to change technique plan in the process of development.
Any ideas welcome. Thanks in advance!
You might be interested in this post about FriendFeed's schemaless SQL approach.
Loosely:
Store documents in JSON, extracting the ID as a column but no other columns
Create new tables for any indexes you require
Populate the indexes via code
There are several drawbacks to this approach, such as indexes not necessarily reflecting the actual data. You'll also need to hack up django's ORM pretty heavily. Depending on your requirements you might be able to keep some of your fields as pure DB columns and store others as JSON?
I've never actually used it, but django-not-eav looks like the tool for the job.
"This app attempts the impossible: implement a bad idea the right way." I already love it :)
That said, this question sounds like a "rethink your approach" situation, for sure. But yes, sometimes that is simply not an option...
following an earlier question I asked here (Most appropriate way to combine features of a class to another?) I got an answer that I finally grown to understand. In short what I intend to now is have a bunch of dictionaries, each dictionary will look somewhat like this:
{ "url": "http://....", "parser": SomeParserClass }
though more properties might be added later but will include either strings or some other classes.
Now my question is: what's the best way to save these objects?
I thought up of 3 solutions, not sure which one is the best and if there are any other more acceptable solutions.
Use pickle, while it seems efficient to use it would make editing any of these dictionaries a pain, since it's saved in binary format.
Save each dictionary in a separate module and import these modules dynamically from a single directory, each module would either have a function inside it to return the dictionary or a specially crafted variable name to hold it so I could call it from my loading code. This seems the easier the edit but doesn't sound very efficient or pythonic
Use some sort of database like MongoDB or Riak to save these objects, my problem with this one is either editing which is doable but doesn't sound like fun and the fact that the former 2 are equipped with means to correctly save my parser class inside the dictionary, I have no idea how these databases serialize or 'pickle' such objects.
As you see my main concerns are how easy would it be to edit them, the efficiency of saving and retrieving the data (though not a huge concern since I only have a couple of hundreds of these) and the correctness of the solution.
So, any thoughts?
Thank you in advance for any help you might be able to provide.
Use JSON. It supports python dictionaries and can be easily edited.
You can try shelve. It's built on top of pickle and let's you serialize objects and associate them to string keys.
Because it is based on dbm, it will only access key/values as you need them. So if you only need to access a few items from a large dictionary, shelve may be a better choice than json, which has to load the entire JSON file into a dictionary first.
I have a need to store a python set in a database for accessing later. What's the best way to go about doing this? My initial plan was to use a textfield on my model and just store the set as a comma or pipe delimited string, then when I need to pull it back out for use in my app I could initialize a set by calling split on the string. Obviously if there is a simple way to serialize the set to store it in the db so I can pull it back out as a set when I need to use it later that would be best.
If your database is better at storing blobs of binary data, you can pickle your set. Actually, pickle stores data as text by default, so it might be better than the delimited string approach anyway. Just pickle.dumps(your_set) and unpickled = pickle.loads(database_string) later.
There are a number of options here, depending on what kind of data you wish to store in the set.
If it's regular integers, CommaSeparatedIntegerField might work fine, although it often feels like a clumsy storage method to me.
If it's other kinds of Python objects, you can try pickling it before saving it to the database, and unpickling it when you load it again. That seems like a good approach.
If you want something human-readable in your database though, you could even JSON-encode it into a TextField, as long as the data you're storing doesn't include Python objects.
Redis natively stores sets (as well as other data structures (lists, dicts, queue)) and provides set operations - and its rocket fast too. I find it's the swiss army knife for python development.
I know its not a relational database per se, but it does solve this problem very concisely.
What about CommaSeparatedIntegerField?
If you need other type (string for example) you can create your own field which would work like CommaSeparatedIntegerField but will use strings (without commas).
Or, if you need other type, probably a better way of doing it: have a dictionary which maps integers to your values.
Many times while creating database structure, I get stuck at the question, what would be more effective, storing data in pickled format in a column in the same table or create additional table and then use JOIN.
Which path should be followed, any advice ?
For example:
There is a table of Customers, containing fields like Name, Address
Now for managing Orders (each customer can have many), you can either create an Order table or store the orders in a serialized format in a separate column in the Customers table only.
It's usually better to create seperate tables. If you go with pickling and later find you want to query the data in a different way, it could be difficult.
See Database normalization.
Usually it's best to keep your data normalized (i.e. create more tables). Storing data 'pickled' as you say, is acceptable, when you don't need to perform relational operations on them.
Mixing SQL databases and pickling seems to ask for trouble. I'd go with either sticking all data in the SQL databases or using only pickling, in the form of the ZODB, which is a Python only OO database that is pretty damn awesome.
Mixing makes case sometimes, but is usually just more trouble than it's worth.
I agree with Mchi, there is no problem storing "pickled" data if you don't need to search or do relational type operations.
Denormalisation is also an important tool that can scale up database performance when applied correctly.
It's probably a better idea to use JSON instead of pickles. It only uses a little more space, and makes it possible to use the database from languages other than Python
I agree with #Lennart Regebro. You should probably see whether you need a Relational DB or an OODB. If RDBMS is your choice, I would suggest you stick with more tables. IMHO, pickling may have issues with scalability. If thats what you want, you should look at ZODB. It is pretty good and supports caching etc for better performance
I am working on a project on Info Retrieval.
I have made a Full Inverted Index using Hadoop/Python.
Hadoop outputs the index as (word,documentlist) pairs which are written on the file.
For a quick access, I have created a dictionary(hashtable) using the above file.
My question is, how do I store such an index on disk that also has quick access time.
At present I am storing the dictionary using python pickle module and loading from it
but it brings the whole of index into memory at once (or does it?).
Please suggest an efficient way of storing and searching through the index.
My dictionary structure is as follows (using nested dictionaries)
{word : {doc1:[locations], doc2:[locations], ....}}
so that I can get the documents containing a word by
dictionary[word].keys() ... and so on.
shelve
At present I am storing the dictionary using python pickle module and loading from it but it brings the whole of index into memory at once (or does it?).
Yes it does bring it all in.
Is that a problem? If it's not an actual problem, then stick with it.
If it's a problem, what kind of problem do you have? Too slow? Too fast? Too colorful? Too much memory used? What problem do you have?
I would use Lucene. Why reinvent the wheel?
Just store it in a string like this:
<entry1>,<entry2>,<entry3>,...,<entryN>
If <entry*> contains ',' character, use some other delimiter like '\t'.
This is smaller in size than an equivalent pickled string.
If you want to load it, just do:
L = s.split(delimiter)
You could store the repr() of the dictionary and use that to re-create it.
If it's taking a long time to load or using too much memory, you might need a database. There are many you might use; I would probably start with SQLite. Then your problem is "reduced" ;-) to simply formulating the right query to get what you need out of the database. This way you will only load what you need.
I am using anydmb for that purpose. Anydbm provides the same dictionary-like interface, except it allow only strings as keys and values. But this is not a constraint since you can use cPickle's loads/dumps to store more complex structures in the index.