I am trying to create a small server type application and have a question regarding organizing data with dicts. Right now I am grouping the data using the connection socket (mainly to verify where it's coming from and for sending data back out). Something like this: connected[socket] = account_data. Basically, each connected person will have account data. Since certain fields will be used a lot for comparing and checking information, such as an account ID, I want to speed things up with another dict.
For example: to find an accountID with the above method, I would have to use a for loop to go through all available connections in connected, look at the accountID in account_data for each, and then compare it. This seems to be a slow way to do it. If I could create a dict and use the accountID as the key, I think it could speed things up a little. The problem is, I plan on using 3 different dicts all ordered differently. Some data may change frequently and it seems more of a hassle to update every single dict once information changes; is there anyway to link them together?
Maybe an easier way of trying to explain what I am asking is:
You have Dict A, Dict B, Dict C, and Data. Dict A, B, and C all contain the same Data. I want it so if something changes in Data, the Data in Dict A, B, and C all change. I can of course always do dict A = data, dict B = data, etc but would get repetitive in the code after awhile. I know the data is set once the dict is created so I'm not really sure if there is a solution to this. I am just looking for advice on the best way to organize data in this situation.
First off, the data, needn't be be replicated. You can well have 3 dictionaries each using a different key, but having the same reference as its value.
Doing so you only need to change the value object once and this will be reflected in all dictionaries (or more precisely since the the dictionaries only store a reference, they'll be up to date).
Next you need to ensure "referencial integrity" i.e. if a particular record is deleted, corresponding dictionary entry needs to be be deleted in all 3 dictionaries, and, if the record gets modified, the dictionaries with a key that is now changed also need to be have the record removed and re-added under the new key. This can be done with a class that holds all 3 dictionaries and has Add(), Remove() and (if applicable) Update() methods.
Just do something like:
connected[socket] = accountids[account_data.accountid] = account_data
assuming account_data is a mutable object with attributes, this will reference that same object as a value in both dicts, with different keys of course. It doesn't have to be on one statement, i.e.:
connected[socket] = account_data
accountids[account_data.accountid] = account_data
the multiple assignments in the same statement are just a convenience; what makes it work the way you want is that Python universally operates by "object reference" (in assignments, argument passing, return statements, and so on).
If you have references to dictionaries, an update to the dictionary will be reflected to everything with a reference.
A customer connects and retains a socket, sock. You load his account and stick it in connections[sock]. Then you keep a dictionary of account IDs (the other way) with references to the accounts, accounts[account_id]. Let's try that...
connected = {}
accounts = {}
def load_account(acct):
return db_magic(acct) # Grab a dictionary from the DB
def somebody_connected(sck, acct):
global connected, accounts
account = load_account(acct)
connected[sck] = account # Now we have it by socket
accounts[acct["accountid"]] = account # Now we have it by account ID
Since we assigned account to two different places, any change to that dictionary (in either structure) will be reflected in the other. So...
def update_username(acct_id, new_username):
accounts[acct_id]["username"] = new_username
def what_is_my_username(sck):
sck.send(connected[sck]["username"]) # In response to GIMME_USERNAME
The change we execute in update_username will automatically be picked up when we do the sck.send, because the reference is exactly the same.
Maybe one of the publish/subscribe modules for Python can help you here?
See this question.
Related
class Websites
default = 'https://google.com'
spotify = 'https://spotify.com'
facebook = 'https://facebook.com'
twitter = 'https://twitter.com'
[...]
from websites import Websites
random_website = random.choice(list(vars(Websites).values()))
browser.get(random_website) # This line fails like 30-50% of the time
Note that I am purposefully not using a dictionary here, because I would like to use the random value to get the key.
Debugging, I've found that it will randomly get set to something like this:
random_website = {getset_descriptor} <attribute '__dict__' of 'Websites' objects>
I'm really not sure why it wouldn't be working, because I've tested all of the URLs multiple times.
Also note that this application uses threads -- there are multiple instances of this application (usually 4) and at any given time roughly 1-2 fail, in case that might matter. I'm still very new to Python and Selenium (and still not that experienced in coding, honestly). Please let me know if I can provide more information that might be helpful.
vars(Websites)
is a dictionary
has all kinds of objects in it, mostly inherited from object
If you print it, you will see all kinds of things like unbound methods and other normal object attributes. This happens because an object's __dict__ automatically gets some default elements assigned by the type metaclass. All these attributes should be dunders.
You therefore have two options:
Persist in your current course and filter for dunders:
items = [v for k, v in vars(Website).items() if not k.startswith('__')]
This is not the recommended approach.
Just use a normal dictionary that only contains what you want. You're calling values on it either way. If you don't really need the labels, just use a list.
You laid out your thought process pretty clearly in the comments to Mad Physicists's answer so you just need a mini-tutorial in Python.
A. How to store the data. As suggested above, a dictionary is probably the best way, with strings as keys and values. You probably want to deal with strings, not with variables.
websites = {'default':'https://www.google.com', 'stack':'https://stackoverflow.com'}
B. You can get a list of all the keys in the dictionary, or a list of (key, value) pairs called tuples. This means replacing vars(Websites).values() with websites.keys() or websites.items(). Then your code will give you a random key or item.
C. If you chose ```keys()``, then you can just print the random thing you got, and use that key to get the corresponding value from the dictionary.
random_key = random.choice(websites.keys())
print(random_key)
random_website = websites[random_key] # this is how you get values using keys
D. If you chose items(), then the tuple you got is basically an immutable list (you can't change or assign to either value). You can pull them out by specifying an index in the list. Python indices start at 0.
random_item = random.choice(websites.items())
print(random_item) # something like ('default', 'https://www.google.com')
random_key = random_item[0] # this is how you choose an item from a list or tuple
print(random_key)
random_website = random_item[1]
Say, I need to use some value from Python dictionary several times in one piece of code. What are the best practices?
Access dictionary once, store value in some temporary variable and use this variable:
value = d['my_key']
do_some_work(value)
do_some_other_work(value)
and_again(value)
or access dictionary everytime a need this value:
do_some_work(d['my_key'])
do_some_other_work(d['my_key'])
and_again(d['my_key'])
The first approach leads to more readable functions when called, in particular when the key of the dictionary is long or not self explanatory. However, the reader will always have to check the origin of the variable if he's not willing to blindly trust the name of the variable. So why not calling the dictionary directly then?
Personally, I use both approaches according to the use case. If the key or dictionary names are long or not sufficiently self-explanatory, I create a temporary variable. Otherwise, I access the dictionary directly when calling the functions.
For a dict, the average time complexity of accessing an item is constant O(1), see
Python Time Complexity.
So, I wouldn't expect much difference in performance.
Question: What are the pros and cons of writing an __init__ that takes a collection directly as an argument, rather than unpacking its contents?
Context: I'm writing a class to process data from several fields in a database table. I iterate through some large (~100 million rows) query result, passing one row at a time to a class that performs the processing. Each row is retrieved from the database as a tuple (or optionally, as a dictionary).
Discussion: Assume I'm interested in exactly three fields, but what gets passed into my class depends on the query, and the query is written by the user. The most basic approach might be one of the following:
class Direct:
def __init__(self, names):
self.names = names
class Simple:
def __init__(self, names):
self.name1 = names[0]
self.name2 = names[1]
self.name3 = names[2]
class Unpack:
def __init__(self, names):
self.name1, self.name2, self.name3 = names
Here are some examples of rows that might be passed to a new instance:
good = ('Simon', 'Marie', 'Kent') # Exactly what we want
bad1 = ('Simon', 'Marie', 'Kent', '10 Main St') # Extra field(s) behind
bad2 = ('15', 'Simon', 'Marie', 'Kent') # Extra field(s) in front
bad3 = ('Simon', 'Marie') # Forgot a field
When faced with the above, Direct always runs (at least to this point) but is very likely to be buggy (GIGO). It takes one argument and assigns it exactly as given, so this could be a tuple or list of any size, a Null value, a function reference, etc. This is the most quick-and-dirty way I can think of to initialize the object, but I feel like the class should complain immediately when I give it data it's clearly not designed to handle.
Simple handles bad1 correctly, is buggy when given bad2, and throws an error when given bad3. It's convenient to be able to effectively truncate the inputs from bad1 but not worth the bugs that would come from bad2. This one feels naive and inconsistent.
Unpack seems like the safest approach, because it throws an error in all three "bad" cases. The last thing we want to do is silently fill our database with bad information, right? It takes the tuple directly, but allows me to identify its contents as distinct attributes instead of forcing me to keep referring to indices, and complains if the tuple is the wrong size.
On the other hand, why pass a collection at all? Since I know I always want three fields, I can define __init__ to explicitly accept three arguments, and unpack the collection using the *-operator as I pass it to the new object:
class Explicit:
def __init__(self, name1, name2, name3):
self.name1 = name1
self.name2 = name2
self.name3 = name3
names = ('Guy', 'Rose', 'Deb')
e = Explicit(*names)
The only differences I see are that the __init__ definition is a bit more verbose and we raise TypeError instead of ValueError when the tuple is the wrong size. Philosophically, it seems to make sense that if we are taking some group of data (a row of a query) and examining its parts (three fields), we should pass a group of data (the tuple) but store its parts (the three attributes). So Unpack would be better.
If I wanted to accept an indeterminate number of fields, rather than always three, I still have the choice to pass the tuple directly or use arbitrary argument lists (*args, **kwargs) and *-operator unpacking. So I'm left wondering, is this a completely neutral style decision?
This question is probably best answered by trying out the different approaches and seeing what makes the most sense to you and is the most easily understood by others reading your code.
Now that I have the benefit of more experience, I'd ask myself, how do I plan to access these values?
When I access any one of the values in this collection, am I likely to be using most or all of the values in that same subroutine or section of code? If so, the "Direct" approach is a good choice; it's the most compact and it lets me think about the collection as a collection until the point that I absolutely need to pay attention to what's inside.
On the other hand, if I'm using some values here, some values there, I don't want have to constantly remember which index to access or add verbosity in the form of dictionary keys when I could just be referring directly to the values using separately named attributes. I would probably avoid the "Direct" approach in this case so that I only have to even think about the fact that there's a collection when the class is first initialized.
Each of the remaining approaches involves splitting the collection up into different attributes, and I think the clear winner here is the "Explicit" approach. The "Simple" and "Unpack" approaches share a hidden dependency on the order of the collection, without offering any real advantage.
I was thinking about parts of my class api's and one thing that came up was the following:
Should I use a tuple/list of equal attributes or should I use several attributes, e.g. let's say I've got a Controller class which reads several thermometers.
class Controller(object):
def __init__(self):
self.temperature1 = Thermometer()
self.temperature3 = Thermometer()
self.temperature2 = Thermometer()
self.temperature4 = Thermometer()
vs.
class Controller(object):
def __init__(self):
self.temperature = tuple(Thermometer() for _ in range(4))
Is there a best practice when I should use which style?
(Let's assume the number of Thermometers will not be changed, otherwise choosing the second style with a list would be obvious.)
A tuple or list, 100%. variable1, variable2, etc... is a really common anti-pattern.
Think about how you code later - it's likely you'll want to do similar things to these items. In a data structure, you can loop over them to perform operations, with the numbered variable names, you'll have to do it manually. Not only that but it makes it easier to add in more values, it makes you code more generic and therefore more reusable, and means you can add new values mid-execution easily.
Why make the assumption the number will not be changed? More often than not, assumptions like that end up being wrong. Regardless, you can already see that the second example exemplifies the do not repeat yourself idiom that is central to clear, efficient code.
Even if you had more relevant names eg: cpu_temperature, hdd_temperature, I would say that if you ever see yourself performing the same operations on them, you want a data structure, not lots of variables. In this case, a dictionary:
temperatures = {
"cpu": ...,
"hdd": ...,
...
}
The main thing is that by storing the data in a data structure, you are giving the software the information about the grouping you are providing. If you just give them the variable names, you are only telling the programmer(s) - and if they are numbered, then you are not even really telling the programmer(s) what they are.
Another option is to store them as a dictionary:
{1: temp1, 2: temp2}
The most important thing in deciding how to store data is relaying the data's meaning, if these items are essentially the same information in a slightly different context then they should be grouped (in terms of data-type) to relay that - i.e. they should be stored as either a tuple or a dictionary.
Note: if you use a tuple and then later insert more data, e.g. a temp0 at the beginning, then there could be backwards-compatability issues where you've grabbed individual variables. (With a dictionary temp[1] will always return temp1.)
How can I store values in a list without specifying index numbers?
For example
outcomeHornFive=5
someList = []
someList.append(outComeHornFive)
instead of doing this,
someList[0] # to reference horn five outcome
how can i do something like this? The reason is there are many items that I need to reference within the list and I just think it's really inconvenient to keep track of which index is what.
someList.hornFive
You can use another data structure if you'd like to reference things by attribute access (or otherwise via a name).
You can put them in a dict, or create a class, or do something else. It depends what kind of other interaction you want to have with that object.
(P.S., we call those lists, not arrays).
Instead of using a list you can use a dictionary.
See data types in the python documentation.
A dictionary allows you to lookup a value using a key:
my_dict["HornFive"] = 20
You cannot and you shouldn't. If you could do that, how would you refer to the list itself? And you will need to refer to the list itself.
The reason is there are many items that i need to reference within the list and I just think it's really inconvenient to keep track of which index is what.
You'll need to do something of that ilk anyway, no matter how you organize your data. If you had separate variables, you'd need to know which variable stores what. If you had your way with this, you'd still need to know that a bare someList refers to "horn five" and not to, say, "horn six".
One advantage of lists and dicts is that you can factor out this knowledge and write generic code. A dictionary, or even a custom class (if there is a finite number of semantically distinct attributes, and you'd never have to use it as a collection), may help with the readability by giving it an actual name instead of a numeric index.
referenced from http://parand.com/say/index.php/2008/10/13/access-python-dictionary-keys-as-properties/
Say you want to access the values if your dictionary via the dot notation instead of the dictionary syntax. That is, you have:
d = {'name':'Joe', 'mood':'grumpy'}
And you want to get at “name” and “mood” via
d.name
d.mood
instead of the usual
d['name']
d['mood']
Why would you want to do this? Maybe you’re fond of the Javascript Way. Or you find it more aesthetic. In my case I need to have the same piece of code deal with items that are either instances of Django models or plain dictionaries, so I need to provide a uniform way of getting at the attributes.
Turns out it’s pretty simple:
class DictObj(object):
def __init__(self, d):
self.d = d
def __getattr__(self, m):
return self.d.get(m, None)
d = DictObj(d)
d.name
# prints Joe
d.mood
# prints grumpy