What is the most efficient graph data structure in Python? [closed]

What is the most efficient graph data structure in Python? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need to be able to manipulate a large (10^7 nodes) graph in python. The data corresponding to each node/edge is minimal, say, a small number of strings. What is the most efficient, in terms of memory and speed, way of doing this?
A dict of dicts is more flexible and simpler to implement, but I intuitively expect a list of lists to be faster. The list option would also require that I keep the data separate from the structure, while dicts would allow for something of the sort:
graph[I][J]["Property"]="value"
What would you suggest?
Yes, I should have been a bit clearer on what I mean by efficiency. In this particular case I mean it in terms of random access retrieval.
Loading the data in to memory isn't a huge problem. That's done once and for all. The time consuming part is visiting the nodes so I can extract the information and measure the metrics I'm interested in.
I hadn't considered making each node a class (properties are the same for all nodes) but it seems like that would add an extra layer of overhead? I was hoping someone would have some direct experience with a similar case that they could share. After all, graphs are one of the most common abstractions in CS.

I would strongly advocate you look at NetworkX. It's a battle-tested war horse and the first tool most 'research' types reach for when they need to do analysis of network based data. I have manipulated graphs with 100s of thousands of edges without problem on a notebook. Its feature rich and very easy to use. You will find yourself focusing more on the problem at hand rather than the details in the underlying implementation.
Example of Erdős-Rényi random graph generation and analysis
"""
Create an G{n,m} random graph with n nodes and m edges
and report some properties.
This graph is sometimes called the Erd##[m~Qs-Rényi graph
but is different from G{n,p} or binomial_graph which is also
sometimes called the Erd##[m~Qs-Rényi graph.
"""
__author__ = """Aric Hagberg (hagberg#lanl.gov)"""
__credits__ = """"""
# Copyright (C) 2004-2006 by
# Aric Hagberg
# Dan Schult
# Pieter Swart
# Distributed under the terms of the GNU Lesser General Public License
# http://www.gnu.org/copyleft/lesser.html
from networkx import *
import sys
n=10 # 10 nodes
m=20 # 20 edges
G=gnm_random_graph(n,m)
# some properties
print "node degree clustering"
for v in nodes(G):
print v,degree(G,v),clustering(G,v)
# print the adjacency list to terminal
write_adjlist(G,sys.stdout)
Visualizations are also straightforward:
More visualization: http://jonschull.blogspot.com/2008/08/graph-visualization.html

Even though this question is now quite old, I think it is worthwhile to mention my own python module for graph manipulation called graph-tool. It is very efficient, since the data structures and algorithms are implemented in C++, with template metaprograming, using the Boost Graph Library. Therefore its performance (both in memory usage and runtime) is comparable to a pure C++ library, and can be orders of magnitude better than typical python code, without sacrificing ease of use. I use it myself constantly to work with very large graphs.

As already mentioned, NetworkX is very good, with another option being igraph. Both modules will have most (if not all) the analysis tools you're likely to need, and both libraries are routinely used with large networks.

A dictionary may also contain overhead, depending on the actual implementation. A hashtable usually contain some prime number of available nodes to begin with, even though you might only use a couple of the nodes.
Judging by your example, "Property", would you be better of with a class approach for the final level and real properties? Or is the names of the properties changing a lot from node to node?
I'd say that what "efficient" means depends on a lot of things, like:
speed of updates (insert, update, delete)
speed of random access retrieval
speed of sequential retrieval
memory used
I think that you'll find that a data structure that is speedy will generally consume more memory than one that is slow. This isn't always the case, but most data structures seems to follow this.
A dictionary might be easy to use, and give you relatively uniformly fast access, it will most likely use more memory than, as you suggest, lists. Lists, however, generally tend to contain more overhead when you insert data into it, unless they preallocate X nodes, in which they will again use more memory.
My suggestion, in general, would be to just use the method that seems the most natural to you, and then do a "stress test" of the system, adding a substantial amount of data to it and see if it becomes a problem.
You might also consider adding a layer of abstraction to your system, so that you don't have to change the programming interface if you later on need to change the internal data structure.

As I understand it, random access is in constant time for both Python's dicts and lists, the difference is that you can only do random access of integer indexes with lists. I'm assuming that you need to lookup a node by its label, so you want a dict of dicts.
However, on the performance front, loading it into memory may not be a problem, but if you use too much you'll end up swapping to disk, which will kill the performance of even Python's highly efficient dicts. Try to keep memory usage down as much as possible. Also, RAM is amazingly cheap right now; if you do this kind of thing a lot, there's no reason not to have at least 4GB.
If you'd like advice on keeping memory usage down, give some more information about the kind of information you're tracking for each node.

Making a class-based structure would probably have more overhead than the dict-based structure, since in python classes actually use dicts when they are implemented.

No doubt NetworkX is the best data structure till now for graph. It comes with utilities like Helper Functions, Data Structures and Algorithms, Random Sequence Generators, Decorators, Cuthill-Mckee Ordering, Context Managers
NetworkX is great because it wowrs for graphs, digraphs, and multigraphs. It can write graph with multiple ways: Adjacency List, Multiline Adjacency List,
Edge List, GEXF, GML. It works with Pickle, GraphML, JSON, SparseGraph6 etc.
It has implimentation of various radimade algorithms including:
Approximation, Bipartite, Boundary, Centrality, Clique, Clustering, Coloring, Components, Connectivity, Cycles, Directed Acyclic Graphs,
Distance Measures, Dominating Sets, Eulerian, Isomorphism, Link Analysis, Link Prediction, Matching, Minimum Spanning Tree, Rich Club, Shortest Paths, Traversal, Tree.

Related

Is the Python object mapping that I'm doing for neo4j too naive?

I'm looking for some general advice on how to either re-write application code to be non-naive, or whether to abandon neo4j for another data storage model. This is not only "subjective", as it relates significantly to specific, correct usage of the neo4j driver in Python and why it performs the way it does with my code.
Background:
My team and I have been using neo4j to store graph-friendly data that is initially stored in Python objects. Originally, we were advised by a local/in-house expert to use neo4j, as it seemed to fit our data storage and manipulation/querying requirements. The data are always specific instances of a set of carefully-constructed ontologies. For example (pseudo-data):
Superclass1 -contains-> SubclassA
Superclass1 -implements->SubclassB
Superclass1 -isAssociatedWith-> Superclass2
SubclassB -hasColor-> Color1
Color1 -hasLabel-> string::"Red"
...and so on, to create some rather involved and verbose hierarchies.
For prototyping, we were storing these data as sequences of grammatical triples (subject->verb/predicate->object) using RDFLib, and using RDFLib's graph-generator to construct a graph.
Now, since this information is just a complicated hierarchy, we just store it in some custom Python objects. We also do this in order to provide an easy API to others devs that need to interface with our core service. We hand them a Python library that is our Object model, and let them populate it with data, or, we populate it and hand it to them for easy reading, and they do what they want with it.
To store these objects permanently, and to hopefully accelerate the writing and reading (querying/filtering) of these data, we've built custom object-mapping code that utilizes the official neo4j python driver to write and read these Python objects, recursively, to/from a neo4j database.
The Problem:
For large and complicated data sets (e.g. 15k+ nodes and 15k+ relations), the object relational mapping (ORM) portion of our code is too slow, and scales poorly. But neither I, nor my colleague are experts in databases or neo4j. I think we're being naive about how to accomplish this ORM. We began to wonder if it even made sense to use neo4j, when more traditional ORMs (e.g. SQL Alchemy) might just be a better choice.
For example, the ORM commit algorithm we have now is a recursive function that commits an object like this (pseudo code):
def commit(object):
for childstr in object: # For each child object
child = getattr(object, childstr) # Get the actual object
if attribute is <our object base type): # Open transaction, make nodes and relationship
with session.begin_transaction() as tx:
<construct Cypher query with:
MERGE object (make object node)
MERGE child (make its child node)
MERGE object-[]->child (create relation)
>
tx.run(<All 3 merges>)
commit(child) # Recursively write the child and its children to neo4j
Is it naive to do it like this? Would an OGM library like Py2neo's OGM be better, despite ours being customized? I've seen this and similar questions that recommend this or that OGM method, but in this article, it says not to use OGMs at all.
Must we really just implement every method and benchmark for performance? It seems like there must be some best-practices (other than using the batch IMPORT, which doesn't fit our use cases). And we've read through articles like those linked, and seen the various tips on writing better queries, but it seems better to step back and examine the case more generally before attempting to optimize code line-by line. Although it's clear that we can improve the ORM algorithm to some degree.
Does it make sense to write and read large, deep hierarchical objects to/from neo4j using a recursive strategy like this? Is there something in Cypher, or the neo4j drivers that we're missing? Or is it better to use something like Py2neo's OGM? Is it best to just abandon neo4j altogether? The benefits of neo4j and Cypher are difficult to ignore, and our data does seem to fit well in a graph. Thanks.

It's hard to know without looking at all the code and knowing the class hierarchy, but at the moment I'd hazard a guess that your code is slow in the OGM bit because every relationship is created in its own transaction. So you're doing a huge number of transactions for a larger graph which is going to slow things down.
I'd suggest for an initial import where you're creating every class/object, rather than just adding a new one or editing the relationships for one class, that you use your class inspectors to simply create a graph representation of the data, and then use Cypher to construct it in a lot fewer transactions in Neo4J. Using some basic topological graph theory you could then optimise it by reducing the number of lookups you need to do, too.
You can create a NetworkX MultiDiGraph in your python code to model the structure of your classes. From there on in there are a few different strategies to put the data into Neo4J - I also just found this but have no idea about whether it works or how efficient it is.
The most efficient way to query to import your graph will depend on the topology of the graph, and whether it is cyclical or not. Some options are below.
1. Create the Graph in Two Sets of Queries
Run one query for every node label to create every node, and then another to create every edge between every combination of node labels (the efficiency of this will depend on how many different node labels you're using).
2. Starting from the topologically highest or lowest point in the graph, create the graph as a series of paths
If you have lots of different edge labels and node labels, this might involve writing a lot of cypher logic combining UNWIND and FOREACH (CASE r.label = 'SomeLabel' THEN [1] ELSE [] | CREATE (n:SomeLabel {node_unique_id: x})->, but if the graph is very hierarchical you could also use python to keep track of which nodes have all their lower nodes and relationships created already and then use that knowledge to limit the size of paths that get sent to Neo4J in a query.
3. Use APOC to import the whole graph
Another option, which may or may not fit your use case and may or may not be more performant would be to export the graph to GraphML using NetworkX and then use the APOC GraphML import tool.
Again, it's hard to offer a precise solution without seeing all your data, but I hope this is somewhat useful as a steer in the right direction! Happy to help / answer any other questions based on more data.

There is a lot going on here so I'll try to address this in smaller questions
Would an OGM library like Py2neo's OGM be better
With any ORM/OGM library, the reality is that you can always get better performance by bypassing them and delving into the belly of the beast. That is not really the ORMs entire job though. An ORM is meant to save you time and effort by making relatively efficient DB use easy.
So it depends, if you want best performance, skip the ORM, and invest your time working on as low a level as you can (*Requires advanced low level knowledge of the beast you are working with, and a lot of your time). Otherwise, an ORM library is usually your best bet.
Our code is too slow, and scales poorly
Databases are complex. If at all possible, I would recommend bringing someone(s) on board to be a company wide database admin/expert. (This is harder when you don't already have one to vet new hires actually know what they are talking about)
Assuming that is not an option, here are some things to consider.
IO is expensive. Especially over the network. Minimize data that has to be sent in either direction. (This is why you page return results. Only return the data you need, as you actually need it)
Caveat to that, creating request connections is very expensive. Minimize calls to the DB. (Have fun balancing the two ^_^) (Note: ORMs usually have built in mechanics to only commit what has changed)
Get to the data you want fast. Create indexes in the database to vastly improve fetch speed. The more unique and consistent the id is, the better.
Caveat, indexes have to be updated on writes that alter a value in them. So indexes reduce write speed and eat more memory to gain read speed. Minimize indexes.
Transactions are a memory operation. Committing a transaction is a disk IO operation. This is why batch jobs are far more efficient.
Caveat, Memory isn't infinite. Keep your jobs a reasonable size.
As you can probably tell, scaling DB operations to production levels is not fun. It's too easy to burn yourself over-optimizing on any axis, and this is just surface level over simplifications.
For prototyping, we were storing these data as sequences of grammatical triples
Less a question, and more a statement, but different types of databases have different strengths and weaknesses. Scheme-less DBs are more specialized for cache stores; Graph DBs are specialized for querying based on relationships (edges); Relational DBs are specialized for fetching/updating records (tables); And Triplestores are more Specialized for, well, triples (RDF); (ect. there are more types)
I mention this because it sounds like your data might be mostly "write once, read many". In this case, you probably actually should be using a Triplestore. You can use any DB type for anything, but picking the best DB requires you to know how you use your data, and how that use can possible evolve.
Must we really just implement every method and benchmark for performance?
Well, this is part of why stored procedures are so important. ORMs help abstract this part, and having an in house domain expert would really help. It could just be that you are pushing the limits of what 1 machine can do. Maybe you just need to upgrade to a cluster; or maybe you have horrible code inefficiencies that have you touching a node 10k times in 1 save operation when no (or 1) value changed. To be honest though, bench-marking doesn't do much unless you know what you are looking for. For example, usually the difference between 5 hours and 0.5 seconds could be as simple as creating 1 index.
(To be fair, while buying bigger and better database servers/clusters may be the inefficient solution, it is sometimes the most cost effective compared to the salary of 1 Database Admin. So, again, depends your your priorities. And I'm sure your boss would probably prioritize differently from what you'd like)
TL;DR
You should hire a domain expert to help you.
If that is not an option, go to the bookstore (or google) pick up Databases 4 dummies (hands on learn databases online tutorial classes), and become the domain expert yourself. (Which you can than use to boost your worth to the company)
If you don't have time for that, probably your only saving grace would be to just upgrade your hardware to solve the problem with brute force. (*As long as growth isn't exponential)

Python - go beyond RAM limits?

I'm trying to analyze text, but my Mac's RAM is only 8 gigs, and the RidgeRegressor just stops after a while with Killed: 9. I recon this is because it'd need more memory.
Is there a way to disable the stack size limiter so that the algorithm could use some kind of swap memory?

You will need to do it manually.
There are probably two different core-problems here:
A: holding your training-data
B: training the regressor
For A, you can try numpy's memmap which abstracts swapping away.
As an alternative, consider preparing your data to HDF5 or some DB. For HDF5, you can use h5py or pytables, both allowing numpy-like usage.
For B: it's a good idea to use some out-of-core ready algorithm. In scikit-learn those are the ones supporting partial_fit.
Keep in mind, that this training-process decomposes into at least two new elements:
Efficient being in regards to memory
Swapping is slow; you don't want to use something which holds N^2 aux-memory during learning
Efficient convergence
Those algorithms in the link above should be okay for both.
SGDRegressor can be parameterized to resemble RidgeRegression.
Also: it might be needed to use partial_fit manually, obeying the rules of the algorithm (often some kind of random-ordering needed for convergence-proofs). The problem with abstracting-away swapping is: if your regressor is doing a permutation in each epoch, without knowing how costly that is, you might be in trouble!
Because the problem itself is quite hard, there are some special libraries built for this, while sklearn needs some more manual work as explained. One of the most extreme ones (a lot of crazy tricks) might be vowpal_wabbit (where IO is often the bottleneck!). Of course there are other popular libs like pyspark, serving a slightly different purpose (distributed computing).

Can I separate Python set update into several threads?

I'm doing some brute-force computing and putting the results into a set all_data. Computing chunks of data gives a list of numbers new_data, which I want to add to the big set: all_data.update(new_data). Now while the computational part is easily made parallel by means of multiprocessing.Pool.map, the update part is slow.
Obviously, there is a problem if one has two identical elements in new_data, that are absent in all_data, and trying to add them at the same moment. But if we assume new_data to be a set as well, is there still a problem? The only problem I can see is the way sets are organized in memory, so the question is:
Is there a way to organize a set structure that allows simultaneous addition of elements? If yes, is it realized in Python?

In pure Python, no. Due to the GIL, all Python code (including manipulations with built-in structures) is single-threaded. The very existence of GIL is justified by eliminating the need to lock access to them.
Even the result fetching in multiprocessing.Pool.map is already sequential.
ParallelProgramming article in SciPy wiki outlines related options on parallel code but I didn't find anything directly regarding off-the-shelf concurrent data structures.
It does, however, mention a few C extensions under "Sophisticated parallelization" which do support parallel I/O.
Note that a set is actually a hash table which by its very nature cannot be updated in parallel (even with fine-grained locking, sub-operations of each type (look-up, insertion incl. collision resolution) have to be sequenced). So you need to either
replace it with some other data organization that can be parallelized better, and/or
speed up the update operations, e.g. by using shared memory

What scalability issues are associated with NetworkX?

I'm interested in network analysis on large networks with millions of nodes and tens of millions of edges. I want to be able to do things like parse networks from many formats, find connected components, detect communities, and run centrality measures like PageRank.
I am attracted to NetworkX because it has a nice api, good documentation, and has been under active development for years. Plus because it is in python, it should be quick to develop with.
In a recent presentation (the slides are available on github here), it was claimed that:
Unlike many other tools, NX is designed to handle data on a scale
relevant to modern problems...Most of the core algorithms in NX rely on extremely fast legacy code.
The presentation also states that the base algorithms of NetworkX are implemented in C/Fortran.
However, looking at the source code, it looks like NetworkX is mostly written in python. I am not too familiar with the source code, but I am aware of a couple of examples where NetworkX uses numpy to do heavy lifting (which in turn uses C/Fortran to do linear algebra). For example, the file networkx/networkx/algorithms/centrality/eigenvector.py uses numpy to calculate eigenvectors.
Does anyone know if this strategy of calling an optimized library like numpy is really prevalent throughout NetworkX, or if just a few algorithms do it? Also can anyone describe other scalability issues associated with NetworkX?
Reply from NetworkX Lead Programmer
I posed this question on the NetworkX mailing list, and Aric Hagberg replied:
The data structures used in NetworkX are appropriate for scaling to
large problems (e.g. the data structure is an adjacency list). The
algorithms have various scaling properties but some of the ones you
mention are usable (e.g. PageRank, connected components, are linear
complexity in the number of edges).
At this point NetworkX is pure Python code. The adjacency structure
is encoded with Python dictionaries which provides great flexibility
at the expense of memory and computational speed. Large graphs will
take a lot of memory and you will eventually run out.
NetworkX does use NumPy and SciPy for algorithms that are primarily
based on linear algebra. In that case the graph is represented
(copied) as an adjacency matrix using either NumPy matrices or SciPy
sparse matrices. Those algorithms can benefit from the legacy C and
FORTRAN code that is used under the hood in NumPy and SciPY.

This is an old question, but I think it is worth mentioning that graph-tool has a very similar functionality to NetworkX, but it is implemented in C++ with templates (using the Boost Graph Library), and hence is much faster (up to two orders of magnitude) and uses much less memory.
Disclaimer: I'm the author of graph-tool.

Your big issue will be memory. Python simply cannot handle tens of millions of objects without jumping through hoops in your class implementation. The memory overhead of many objects is too high, you hit 2GB, and 32-bit code won't work. There are ways around it - using slots, arrays, or NumPy. It should be OK because networkx was written for performance, but if there are a few things that don't work, I will check your memory usage.
As for scaling, algorithms are basically the only thing that matters with graphs. Graph algorithms tend to have really ugly scaling if they are done wrong, and they are just as likely to be done right in Python as any other language.

The fact that networkX is mostly written in python does not mean that it is not scalable, nor claims perfection. There is always a trade-off. If you throw more money on your "machines", you'll have as much scalability as you want plus the benefits of using a pythonic graph library.
If not, there are other solutions, ( here and here ), which may consume less memory ( benchmark and see, I think igraph is fully C backed so it will ), but you may miss the pythonic feel of NX.

Memory efficiency: One large dictionary or a dictionary of smaller dictionaries?

I'm writing an application in Python (2.6) that requires me to use a dictionary as a data store.
I am curious as to whether or not it is more memory efficient to have one large dictionary, or to break that down into many (much) smaller dictionaries, then have an "index" dictionary that contains a reference to all the smaller dictionaries.
I know there is a lot of overhead in general with lists and dictionaries. I read somewhere that python internally allocates enough space that the dictionary/list # of items to the power of 2.
I'm new enough to python that I'm not sure if there are other unexpected internal complexities/suprises like that, that is not apparent to the average user that I should take into consideration.
One of the difficulties is knowing how the power of 2 system counts "items"? Is each key:pair counted as 1 item? That's seems important to know because if you have a 100 item monolithic dictionary then space 100^2 items would be allocated. If you have 100 single item dictionaries (1 key:pair) then each dictionary would only be allocation 1^2 (aka no extra allocation)?
Any clearly laid out information would be very helpful!

Three suggestions:
Use one dictionary.
It's easier, it's more straightforward, and someone else has already optimized this problem for you. Until you've actually measured your code and traced a performance problem to this part of it, you have no reason not to do the simple, straightforward thing.
Optimize later.
If you are really worried about performance, then abstract the problem make a class to wrap whatever lookup mechanism you end up using and write your code to use this class. You can change the implementation later if you find you need some other data structure for greater performance.
Read up on hash tables.
Dictionaries are hash tables, and if you are worried about their time or space overhead, you should read up on how they're implemented. This is basic computer science. The short of it is that hash tables are:
average case O(1) lookup time
O(n) space (Expect about 2n, depending on various parameters)
I do not know where you read that they were O(n^2) space, but if they were, then they would not be in widespread, practical use as they are in most languages today. There are two advantages to these nice properties of hash tables:
O(1) lookup time implies that you will not pay a cost in lookup time for having a larger dictionary, as lookup time doesn't depend on size.
O(n) space implies that you don't gain much of anything from breaking your dictionary up into smaller pieces. Space scales linearly with number of elements, so lots of small dictionaries will not take up significantly less space than one large one or vice versa. This would not be true if they were O(n^2) space, but lucky for you, they're not.
Here are some more resources that might help:
The Wikipedia article on Hash Tables gives a great listing of the various lookup and allocation schemes used in hashtables.
The GNU Scheme documentation has a nice discussion of how much space you can expect hashtables to take up, including a formal discussion of why "the amount of space used by the hash table is proportional to the number of associations in the table". This might interest you.
Here are some things you might consider if you find you actually need to optimize your dictionary implementation:
Here is the C source code for Python's dictionaries, in case you want ALL the details. There's copious documentation in here:
dictobject.h
dictobject.c
Here is a python implementation of that, in case you don't like reading C.
(Thanks to Ben Peterson)
The Java Hashtable class docs talk a bit about how load factors work, and how they affect the space your hash takes up. Note there's a tradeoff between your load factor and how frequently you need to rehash. Rehashes can be costly.

If you're using Python, you really shouldn't be worrying about this sort of thing in the first place. Just build your data structure the way it best suits your needs, not the computer's.
This smacks of premature optimization, not performance improvement. Profile your code if something is actually bottlenecking, but until then, just let Python do what it does and focus on the actual programming task, and not the underlying mechanics.

"Simple" is generally better than "clever", especially if you have no tested reason to go beyond "simple". And anyway "Memory efficient" is an ambiguous term, and there are tradeoffs, when you consider persisting, serializing, cacheing, swapping, and a whole bunch of other stuff that someone else has already thought through so that in most cases you don't need to.
Think "Simplest way to handle it properly" optimize much later.

Premature optimization bla bla, don't do it bla bla.
I think you're mistaken about the power of two extra allocation does. I think its just a multiplier of two. x*2, not x^2.
I've seen this question a few times on various python mailing lists.
With regards to memory, here's a paraphrased version of one such discussion (the post in question wanted to store hundreds of millions integers):
A set() is more space efficient than a dict(), if you just want to test for membership
gmpy has a bitvector type class for storing dense sets of integers
Dicts are kept between 50% and 30% empty, and an entry is about ~12 bytes (though the true amount will vary by platform a bit).
So, the fewer objects you have, the less memory you're going to be using, and the fewer lookups you're going to do (since you'll have to lookup in the index, then a second lookup in the actual value).
Like others, said, profile to see your bottlenecks. Keeping an membership set() and value dict() might be faster, but you'll be using more memory.
I'd also suggest reposting this to a python specific list, such as comp.lang.python, which is full of much more knowledgeable people than myself who would give you all sorts of useful information.

If your dictionary is so big that it does not fit into memory, you might want to have a look at ZODB, a very mature object database for Python.
The 'root' of the db has the same interface as a dictionary, and you don't need to load the whole data structure into memory at once e.g. you can iterate over only a portion of the structure by providing start and end keys.
It also provides transactions and versioning.

Honestly, you won't be able to tell the difference either way, in terms of either performance or memory usage. Unless you're dealing with tens of millions of items or more, the performance or memory impact is just noise.
From the way you worded your second sentence, it sounds like the one big dictionary is your first inclination, and matches more closely with the problem you're trying to solve. If that's true, go with that. What you'll find about Python is that the solutions that everyone considers 'right' nearly always turn out to be those that are as clear and simple as possible.

Often times, dictionaries of dictionaries are useful for other than performance reasons. ie, they allow you to store context information about the data without having extra fields on the objects themselves, and make querying subsets of the data faster.
In terms of memory usage, it would stand to reason that one large dictionary will use less ram than multiple smaller ones. Remember, if you're nesting dictionaries, each additional layer of nesting will roughly double the number of dictionaries you need to allocate.
In terms of query speed, multiple dicts will take longer due to the increased number of lookups required.
So I think the only way to answer this question is for you to profile your own code. However, my suggestion is to use the method that makes your code the cleanest and easiest to maintain. Of all the features of Python, dictionaries are probably the most heavily tweaked for optimal performance.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.