I'm wondering what are the pros/cons of nested dicts versus hashing a tuple in Python?
The context is a small script to assign "workshops" to attendees at a conference.
Each workshop has several attributes (e.g. week, day, subject etc.). We use four of these attributes in order to assign workshops to each attendee - i.e. each delegate will also have a week/day/subject etc.
Currently, I'm using a nested dictionary to store my workshops:
workshops = defaultdict(lambda: defaultdict(lambda:defaultdict(lambda:defaultdict(dict))))
with open(input_filename) as input_file:
workshop_reader = csv.DictReader(input_file, dialect='excel')
for row in workshop_reader:
workshops[row['Week ']][row['Stream']][row['Strand']][row['Day']] = row
return workshops
I can then use the above data-structure to assign each attendee their workshop.
The issue is later, I need to iterate through every workshop, and assign an ID (this ID is stored in a different system), which necessitates unwrapping the structure layer by layer.
First question - is there some other way of creating a secondary index to the same values, using a string (workshop name) as a key? I.e. I'd still have the four-level nested dicts, but I could also search for individual entries based on just a name.
Secondly - how can I achieve a similar effect using tuples as the key? Are there any advantages you can think of using this approach? Would it be much cleaner, or easier to use? (This whole unwrapping this is a bit of a pain, and I don't think it's very approachable).
Or are there any other data structures that you could recommend that might be superior/easier to access/manipulate?
Thanks,
Victor
class Workshop(object):
def __init__(self, week, stream, strand, day, name):
self.week = week
self.stream = stream
self.day = day
self.strand = strand
self.name = name
...
for row in workshop_reader:
workshops['name'] = Workshop(...)
That is only if the name is the unique attribute of the workshops (that is, no workshops with repeating names).
Also, you'll be able to easily assign an ID to each Workshop object in the workshops dictionary.
Generally, if you need to nest dictionaries, you should use classes instead. Nested dictionaries gets complicated to keep track of when you are debugging. With classes you can tell different types of dictionaries apart, so it gets easier. :)
Related
I have the following code that works perfectly. It searches a txt file for an ID number, and if it exists, returns the first and last name.
full listing: https://repl.it/Jau3/0
import csv
#==========Search by ID number. Return Just the Name Fields for the Student
with open("studentinfo.txt","r") as f:
studentfileReader=csv.reader(f)
id=input("Enter Id:")
for row in studentfileReader:
for field in row:
if field==id:
currentindex=row.index(id)
print(row[currentindex+1]+" "+row[currentindex+2])
File contents
001,Joe,Bloggs,Test1:99,Test2:100,Test3:33
002,Ash,Smith,Test1:22,Test2:63,Test3:99
For teaching and learning purposes, I would like to know if there are any other methods to achieve the same thing (elegant, simple, pythonic) or if perhaps this is indeed the best solution?
My question arises from the fact that it seems possible that there may be an inbuilt method or some function that more efficiently retrieves the current index and searches for fields.....perhaps not though.
Thanks in advance for the discussion and any explanations that I will accept as answers.
If the list keeps this format you could access the field of the row by index to condense it a bit.
for row in studentfileReader:
if row[0]==id:
print(row[1]+" "+row[2])
it also avoids a match if the ID is not in the beginning but somewhere in between e.g. "Test1:002"
I don't really know if there is such thing as a "pythonic" way of finding a record on a matching key, but here is an example that adds a couple interesting things over your own example and the other answers, like the use of generators, and comprehension. Besides, what is more pythonic than a one-liner.
any is a python built-in, it might interest you to know that it exists since it does exactly what you do.
with open("studentinfo.txt","r") as f:
sid=input("Enter Id:")
print any((line.split(",")[0] == sid for line in f.readlines()))
You should probably consider using csv.DictReader for this usage, since you have tabular data with consistent columns.
If you only want to retrieve data once then you can simply iterate through the file until the first occurrence of the desired id, as follows;
import csv
def search_by_student_id(id):
with open('studentinfo.txt','r') as f:
reader = csv.DictReader(f, ['id', 'surname', 'first_name'],
restkey='results')
for line in reader:
if line['id'] == id:
return line['surname'], line['first_name']
print(search_by_student_id('001'))
# ('Joe', 'Bloggs')
If however, you plan on looking up entries from this data multiple times it would pay to create a dictionary, which is more expensive to create, but significantly reduces lookup times. Then you could look up data like this;
def build_student_id_dict():
with open('studentinfo.txt','r') as f:
reader = csv.DictReader(f, ['id', 'surname', 'first_name'],
restkey='results')
student_id_dict = {}
for line in reader:
student_id_dict[line['id']] = line['surname'], line['first_name']
return student_id_dict
student_by_id_dict = build_student_id_dict()
print(student_by_id_dict['002'])
# ('Ash', 'Smith')
You could read it into a list or even better a dictionary in terms of look up time, and then simply use the following:
if in l or if in d (l or d being the list / dictionary respectively)
An interesting discussion however, on whether this would be the simplest method, or if your existing solution is.
Dictionaries:
1 # retrieve the value for a particular key
2 value = d[key]
A note on time complexity and efficiency in the use of dictionaries:
Python mappings must be able to, given a particular key object, determine which (if any) value object is associated with a given key. One simple approach would be to store a LIST of (key, value) pairs, and then search the list sequentially every time a value was requested. Immediately you can see that this would be very slow with a large number of items - in complexity terms, this algorithm would be O(n), where n is referring to the number of items in the mapping.
Python's dictionary is the answer here, although it isn't always te best solution - the implementation reduces the average complexity of dictionary lookups to O(1) by requiring that key objects provide a "hash" function. In your case and because structurally the data you are dealing with is not terribly complex, it may be easiest to stick to your existing solution, although a dictionary should certainly be considered if it is time efficiency you are after.
The task I have at hand is to parse a large text (several 100K rows) file and accumulate some statistics based which will be then visualized in plots. Each row contains results of some prior analysis.
I wrote a custom class to define the objects that are to be accumulated. The class contains 2 string fields, 3 sets and 2 integer counters. As such there is an __init__(self, name) which initializes a new object with name and empty fields, and a method called addRow() which adds information into the object. The sets accumulate data to be associated with this object and the counters keep track of a couple of conditions.
My original idea was to iterate over the rows of the file and call a method like parseRow() in main
reader = csv.reader(f)
acc = {} # or set()
for row in reader:
parseRow(row,acc)
which would look something like:
parseRow(row, acc):
if row[id] is not in acc: # row[id] is the column where the object names/ids are
a = MyObj(row[id])
else:
a = acc.get(row[id]) # or equivalent
a.addRow(...)
The issue here is that the accumulating collection acc cannot be a set since sets are apparently not indexable in Python. Edit: for clarification, by indexable I didn't mean getting the nth element but rather being able to retrieve a specific element.
One workaround would be to have a dict that has {obj_name : obj} mapping but it feels like an ugly solution. Considering the elegance of the language otherwise, I guess there is a better solution to this. It's surely not a particularly rare situation...
Any suggestions?
You could also try an ordered-set. Which is a set AND ordered.
First of all thank you for taking the time to look at my problem. Rather than simply describing the solution I have in mind for the problem I have to solve, I though it best to outline the problem also in order to enable alternative solution ideas to be suggested. It is more than likely that there is a better way to achieve this solution.
The problem I have:
I generate lists of names with associated scores ranks and other associated values, these lists are generated daily but have to change as the day progresses as a result of needing to remove some names. Currently these lists of names are produced on excel based sheets which contains the following data types in the following format;
(Unique List Title)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value)
(Unique List Title)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value)
For example;
Mrs Dodgsons class
Rosie,1,123.8,5
James,2,122.6,7
Chris,3,120.4,12
Dr Clements class
Hannah,1,126.9,2.56
Gill,2,124.54,6.89
Jack,3,122.04,15.62
Jamie,4,121.09,20.91
Now what I have is a separate list of users who need removing from the above excel generated lists (don't worry the final product of this little project is not to re-save a modified excel doc), this list is generated via a web scraper which is updated every two minutes. The method I currently perceive as a potentially viable solution to this problem is to use a piece of code which saves each list in the CSV as a SET (if this is possible) then upon finding a Unique Name it would then delete them from the set/s in which they occur.
My questions to the python forum are;
is the methodology proposed viable with regards to producing multiple uniquely named SETs (up to 60 per day )
Is there a better method of achieving the same result ?
Any help or comments would be greatly appreciated
Best regards AEA
It will probably be easier for you to use dictionaries rather than sets, as dictionaries, unlike sets, provide a natural way of associating items of data with each member of a collection.
Here's one approach, in which the data for each class is stored in a dictionary, each key of which is a student's name, and the values of which are lists with the scores, etc., of each student:
data = {
"Mrs Dodgson": {
"Rosie": [1,123.8,5],
"James": [2,122.6,7],
"Chris": [3,120.4,12]
},
"Dr Clement": {
"Hannah": [1,126.9,2.56],
"Gill": [2,124.54,6.89],
"Jack": [3,122.04,15.62],
"Jamie": [4,121.09,20.91]
}
}
to_remove = ["Jamie", "Rosie"]
# Mrs. Dodgson's class data, initially.
print data["Mrs Dodgson"]
# Now remove the student data.
for cls_data in data.values():
for student in to_remove:
try:
del cls_data[student]
except KeyError:
pass
print data["Mrs Dodgson"]
A little bit confusion about some data structure in Python.
Could any expert give some rules of thumb in order to help me get out of this mess?
They are all covered in:
Python - Data Structures
List - when you have data that has some order
Tuple - when ordered data is to be immutable
Dictionary - when data is related by key - value pairs
My take on the most important concepts regarding list/tuple/dict:
List - When you have a collection of items and may want to add/remove items, rearrange their order, and so on.
Tuple - When you have a collection of items and do NOT want to add/remove items, or rearrange their order. Realizing the usefulness of this comes with experience.
Dictionary - When you want to map certain keys to certain values, like a dictionary of words. The typical use case is when you have some kind of identifier (the key) such as a person's name:
>>> addresses = {}
>>> addresses['john'] = 'somewhere' # Set john's address
>>> print "John's address is", addresses['john'] # Retrieve it
I am learning Python for a class now, and we just covered tuples as one of the data types. I read the Wikipedia page on it, but, I could not figure out where such a data type would be useful in practice. Can I have some examples, perhaps in Python, where an immutable set of numbers would be needed? How is this different from a list?
Tuples are used whenever you want to return multiple results from a function.
Since they're immutable, they can be used as keys for a dictionary (lists can't).
Tuples make good dictionary keys when you need to combine more than one piece of data into your key and don't feel like making a class for it.
a = {}
a[(1,2,"bob")] = "hello!"
a[("Hello","en-US")] = "Hi There!"
I've used this feature primarily to create a dictionary with keys that are coordinates of the vertices of a mesh. However, in my particular case, the exact comparison of the floats involved worked fine which might not always be true for your purposes [in which case I'd probably convert your incoming floats to some kind of fixed-point integer]
The best way to think about it is:
A tuple is a record whose fields don't have names.
You use a tuple instead of a record when you can't be bothered to specify the field names.
So instead of writing things like:
person = {"name": "Sam", "age": 42}
name, age = person["name"], person["age"]
Or the even more verbose:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
person = Person("Sam", 42)
name, age = person.name, person.age
You can just write:
person = ("Sam", 42)
name, age = person
This is useful when you want to pass around a record that has only a couple of fields, or a record that is only used in a few places. In that case specifying a whole new record type with field names (in Python, you'd use an object or a dictionary, as above) could be too verbose.
Tuples originate from the world of functional programming (Haskell, OCaml, Elm, F#, etc.), where they are commonly used for this purpose. Unlike Python, most functional programming languages are statically typed (a variable can only hold one type of value, and that type is determined at compile time). Static typing makes the role of tuples more obvious. For example, in the Elm language:
type alias Person = (String, Int)
person : Person
person = ("Sam", 42)
This highlights the fact that a particular type of tuple is always supposed to have a fixed number of fields in a fixed order, and each of those fields is always supposed to be of the same type. In this example, a person is always a tuple of two fields, one is a string and the other is an integer.
The above is in stark contrast to lists, which are supposed to be variable length (the number of items is normally different in each list, and you write functions to add and remove items) and each item in the list is normally of the same type. For example, you'd have one list of people and another list of addresses - you would not mix people and addresses in the same list. Whereas mixing different types of data inside the same tuple is the whole point of tuples. Fields in a tuple are usually of different types (but not always - e.g. you could have a (Float, Float, Float) tuple to represent x,y,z coordinates).
Tuples and lists are often nested. It's common to have a list of tuples. You could have a list of Person tuples just as well as a list of Person objects. You can also have a tuple field whose value is a list. For example, if you have an address book where one person can have multiple addresses, you could have a tuple of type (Person, [String]). The [String] type is commonly used in functional programming languages to denote a list of strings. In Python, you wouldn't write down the type, but you could use tuples like that in exactly the same manner, putting a Person object in the first field of a tuple and a list of strings in its second field.
In Python, confusion arises because the language does not enforce any of these practices that are enforced by the compiler in statically typed functional languages. In those languages, you cannot mix different kinds of tuples. For example, you cannot return a (String, String) tuple from a function whose type says that it returns a (String, Integer) tuple. You also cannot return a list when the type says you plan to return a tuple, and vice versa. Lists are used strictly for growing collections of items, and tuples strictly for fixed-size records. Python doesn't stop you from breaking any of these rules if you want to.
In Python, a list is sometimes converted into a tuple for use as a dictionary key, because Python dictionary keys need to be immutable (i.e. constant) values, whereas Python lists are mutable (you can add and remove items at any time). This is a workaround for a particular limitation in Python, not a property of tuples as a computer science concept.
So in Python, lists are mutable and tuples are immutable. But this is just a design choice, not an intrinsic property of lists and tuples in computer science. You could just as well have immutable lists and mutable tuples.
In Python (using the default CPython implementation), tuples are also faster than objects or dictionaries for most purposes, so they are occasionally used for that reason, even when naming the fields using an object or dictionary would be clearer.
Finally, to make it even more obvious that tuples are intended to be another kind of record (not another kind of list), Python also has named tuples:
from collections import namedtuple
Person = namedtuple("Person", "name age")
person = Person("Sam", 42)
name, age = person.name, person.age
This is often the best choice - shorter than defining a new class, but the meaning of the fields is more obvious than when using normal tuples whose fields don't have names.
Immutable lists are highly useful for many purposes, but the topic is far too complex to answer here. The main point is that things that cannot change are easier to reason about than things that can change. Most software bugs come from things changing in unexpected ways, so restricting the ways in which they can change is a good way to eliminate bugs. If you are interested, I recommend reading a tutorial for a functional programming language such as Elm, Haskell or Clojure (Elm is the friendliest). The designers of those languages considered immutability so useful that all lists are immutable there. (Instead of changing a list to add and or remove an item, you make a new list with the item added or removed. Immutability guarantees that the old copy of the list can never change, so the compiler and runtime can make the code perform well by re-using parts of the old list in the new one and garbage-collecting the left-over parts when they are longer needed.)
I like this explanation.
Basically, you should use tuples when there's a constant structure (the 1st position always holds one type of value and the second another, and so forth), and lists should be used for lists of homogeneous values.
Of course there's always exceptions, but this is a good general guideline.
Tuples and lists have the same uses in general. Immutable data types in general have many benefits, mostly about concurrency issues.
So, when you have lists that are not volatile in nature and you need to guarantee that no consumer is altering it, you may use a tuple.
Typical examples are fixed data in an application like company divisions, categories, etc. If this data change, typically a single producer rebuilts the tuple.
I find them useful when you always deal with two or more objects as a set.
A tuple is a sequence of values. The values can be any type, and they are indexed by integer, so tuples are not like lists. The most important difference is that tuples are immutable.
A tuple is a comma-separated list of values:
t = 'p', 'q', 'r', 's', 't'
it is good practice to enclose tuples in parentheses:
t = ('p', 'q', 'r', 's', 't')
A list can always replace a tuple, with respect to functionality (except, apparently, as keys in a dict). However, a tuple can make things go faster. The same is true for, for example, immutable strings in Java -- when will you ever need to be unable to alter your strings? Never!
I just read a decent discussion on limiting what you can do in order to make better programs; Why Why Functional Programming Matters Matters
A tuple is useful for storing multiple values.. As you note a tuple is just like a list that is immutable - e.g. once created you cannot add/remove/swap elements.
One benefit of being immutable is that because the tuple is fixed size it allows the run-time to perform certain optimizations. This is particularly beneficial when a tupple is used in the context of a return value or a parameter to a function.
Use Tuple
If your data should or does not need to be changed.
Tuples are faster than lists. We should use a Tuple instead of a List if we are defining a constant set of values and all we are ever going to do
with it is iterate through it.
If we need an array of elements to be
used as dictionary keys, we can use Tuples. As Lists are mutable,
they can never be used as dictionary keys.
Furthermore, Tuples are immutable, whereas Lists are mutable. By the same token, Tuples are fixed size in nature, whereas Lists are dynamic.
a_tuple = tuple(range(1000))
a_list = list(range(1000))
a_tuple.__sizeof__() # 8024 bytes
a_list.__sizeof__() # 9088 bytes
more information :
https://jerrynsh.com/tuples-vs-lists-vs-sets-in-python/
In addition to the places where they're syntactically required like the string % operation and for multiple return values, I use tuples as a form of lightweight classes. For example, suppose you have an object that passes out an opaque cookie to a caller from one method which is then passed into another method. A tuple is a good way to pack multiple values into that cookie without having to define a separate class to contain them.
I try to be judicious about this particular use, though. If the cookies are used liberally throughout the code, it's better to create a class because it helps document their use. If they are only used in one place (e.g. one pair of methods) then I might use a tuple. In any case, because it's Python you can start with a tuple and then change it to an instance of a custom class without having to change any code in the caller.
Tuples are used in :
places where you want your sequence of elements to be immutable
in tuple assignments
a,b=1,2
in variable length arguments
def add(*arg) #arg is a tuple
return sum(arg)