I need a data structure of this form:
test_id1: resultValue, timeChecked
test_id2: resultValue, timeChecked
test_id3: resultValue, timeChecked
test_id4: resultValue, timeChecked
...
up until now I have dealt with it by just using a dictionary with key value for the id and the result. But I would like to add the time I checked.
What would be the best way to do this? Can I make the value a tuple? Or is a new class better suited in this case?
What would my class need to look like to accommodate the above?
One lightweight alternative to making a class that both 1) retains the simplicity of a tuple and 2) allows named (as opposed to positional) field access is namedtuple.
from collections import namedtuple
Record = namedtuple("Record", ["resultValue", "timeChecked"])
s = {"test_id1": Record("res1", "time1"), "test_id2": Record("res2", "time2")}
You can now use the values in this dict as if you had a class with the resultValue and timeChecked fields defined...
>>> s["test_id1"].resultValue
'res1'
...or as simple tuples:
>>> a, b = s["test_id1"]
>>> print a, b
res1 time1
Can I make the value a tuple?
Yes, and I'm not really sure why you don't want to. If it's to make look-ups easier to read, you can even make the value another dictionary.
{"test_id1" : (resultValue, timeChecked)}
or
{"test_id1" : {"resultValue": resultValue, "timeChecked" : timeChecked }}
I think a whole new class is overkill for this situation.
You could also use a pandas DataFrame for your whole structure. It's probably waaaay overkill unless you're planning to do some serious data processing, but it's a nice thing to know about and directly matches your problem.
Related
I have a dictionary which has IP address ranges as Keys (used to de-duplicate in a previous step) and certain objects as values. Here's an example
Part of the dictionary sresult:
10.102.152.64-10.102.152.95 object1:object3
10.102.158.0-10.102.158.255 object2:object5:object4
10.102.158.0-10.102.158.31 object3:object4
10.102.159.0-10.102.255.255 object6
There are tens of thousands of lines, I want to sort (correctly) by IP address in keys
I tried splitting the key based on the range separator - to get a single IP address that can be sorted as follows:
ips={}
for key in sresult:
if '-' in key:
l = key.split('-')[0]
ips[l] = key
else:
ips[1] = key
And then using code found on another post, sorting by IP address and then looking up the values in the original dictionary:
sips = sorted(ipaddress.ip_address(line.strip()) for line in ips)
for x in sips:
print("SRC: "+ips[str(x)], "OBJECT: "+" :".join(list(set(sresult[ips[str(x)]]))), sep=",")
The problem I have encountered is that when I split the original range and add the sorted first IPs as new keys in another dictionary, I de-duplicate again losing lines of data - lines 2 & 3 in the example
line 1 10.102.152.64 -10.102.152.95
line 2 10.102.158.0 -10.102.158.255
line 3 10.102.158.0 -10.102.158.31
line 4 10.102.159.0 -10.102.255.25
becomes
line 1 10.102.152.64 -10.102.152.95
line 3 10.102.158.0 -10.102.158.31
line 4 10.102.159.0 -10.102.255.25
So upon rebuilding the original dictionary using the IP address sorted keys, I have lost data
Can anyone help please?
EDIT This post now consists of three parts:
1) A bit of information about dictionaries that you will need in order to understand the rest.
2) An analysis of your code, and how you could fix it without using any other Python features.
3) What I would consider the best solution to the problem, in detail.
1) Dictionaries
Python dictionaries are not ordered. If I have a dictionary like this:
dictionary = {"one": 1, "two": 2}
And I loop through dictionary.items(), I could get "one": 1 first, or I could get "two": 2 first. I don't know.
Every Python dictionary implicitly has two lists associated with it: a list of it's keys and a list of its values. You can get them list this:
print(list(dictionary.keys()))
print(list(dictionary.values()))
These lists do have an ordering. So they can be sorted. Of course, doing so won't change the original dictionary, however.
Your Code
What you realised is that in your case you only want to sort according to the first IP address in your dictionaries keys. Therefore, the strategy that you adopted is roughly as follows:
1) Build a new dictionary, where the keys are only this first part.
2) Get that list of keys from the dictionary.
3) Sort that list of keys.
4) Query the original dictionary for the values.
This approach will, as you noticed, fail at step 1. Because as soon as you made the new dictionary with truncated keys, you will have lost the ability to differentiate between some keys that were only different at the end. Every dictionary key must be unique.
A better strategy would be:
1) Build a function which can represent you "full" ip addresses with as an ip_address object.
2) Sort the list of dictionary keys (original dictionary, don't make a new one).
3) Query the dictionary in order.
Let's look at how we could change your code to implement step 1.
def represent(full_ip):
if '-' in full_ip:
# Stylistic note, never use o or l as variable names.
# They look just like 0 and 1.
first_part = full_ip.split('-')[0]
return ipaddress.ip_address(first_part.strip())
Now that we have a way to represent the full IP addresses, we can sort them according to this shortened version, without having to actually change the keys at all. All we have to do is tell Python's sorted method how we want the key to be represented, using the key parameter (NB, this key parameter has nothing to do with key in a dictionary. They just both happened to be called key.):
# Another stylistic note, always use .keys() when looping over dictionary keys. Explicit is better than implicit.
sips = sorted(sresults.keys(), key=represent)
And if this ipaddress library works, there should be no problems up to here. The remainder of your code you can use as is.
Part 3 The best solution
Whenever you are dealing with sorting something, it's always easiest to think about a much simpler problem: given two items, how would I compare them? Python gives us a way to do this. What we have to do is implement two data model methods called
__le__
and
__eq__
Let's try doing that:
class IPAddress:
def __init__(self, ip_address):
self.ip_address = ip_address # This will be the full IP address
def __le__(self, other):
""" Is this object less than or equal to the other one?"""
# First, let's find the first parts of the ip addresses
this_first_ip = self.ip_address.split("-")[0]
other_first_ip = other.ip_address.split("-")[0]
# Now let's put them into the external library
this_object = ipaddress.ip_address(this_first_ip)
other_object = ipaddress.ip_adress(other_first_ip)
return this_object <= other_object
def __eq__(self, other):
"""Are the two objects equal?"""
return self.ip_address == other.ip_adress
Cool, we have a class. Now, the data model methods will automatically be invoked any time I use "<" or "<=" or "==". Let's check that it is working:
test_ip_1 = IPAddress("10.102.152.64-10.102.152.95")
test_ip_2 = IPAddress("10.102.158.0-10.102.158.255")
print(test_ip_1 <= test_ip_2)
Now, the beauty of these data model methods is that Pythons "sort" and "sorted" will use them as well:
dictionary_keys = sresult.keys()
dictionary_key_objects = [IPAddress(key) for key in dictionary_keys]
sorted_dictionary_key_objects = sorted(dictionary_key_objects)
# According to you latest comment, the line below is what you are missing
sorted_dictionary_keys = [object.ip_address for object in sorted_dictionary_key_objects]
And now you can do:
for key in sorted_dictionary_keys:
print(key)
print(sresults[key])
The Python data model is almost the defining feature of Python. I'd recommend reading about it.
I have hundereds of dataframe, let say the name is df1,..., df250, I need to build list by a column of those dataframe. Usually I did manually, but today data is to much, and to prone to mistakes
Here's what I did
list1 = df1['customer_id'].tolist()
list2 = df2['customer_id'].tolist()
..
list250 = df250['customer_id'].tolist()
This is so manual, can we make this in easier way?
The easier way is to take a step back and make sure you put your dataframes in a collection such as list or dict. You can then perform operations easily in a scalable way.
For example:
dfs = {1: df1, 2: df2, 3: df3, ... , 250: df250}
lists = {k: v['customer_id'].tolist() for k, v in dfs.items()}
You can then access the results as lists[1], lists[2], etc.
There are other benefits. For example, you are no longer polluting the namespace, you save the effort of explicitly defining variable names, you can easily store and transport related collections of objects.
Using exec function enables you to execute python code stored in a string:
for i in range(1,251):
s = "list"+str(i)+" = df"+str(i)+"['customer_id'].tolist()"
exec(s)
I'd use next code. In this case there's no need to manually create list of DataFrames.
cust_lists = {'list{}'.format(i): globals()['df{}'.format(i)]['customer_id'].tolist()
for i in range(1, 251)}
Now you can access you lists from cust_lists dict by the name, like this:
`cust_lists['list1']`
or
`list1`
When I wish to both retrieve and replace a value in a dict, I naively write:
old_value = my_dict['key']
my_dict['key'] = new_value
But.. that's two lookups for 'key' in my_dict hashtable. And I'm sure that only one is necessary.
How do I get the same behaviour with only one lookup?
Does python automately JIT-optimizes this away?
[EDIT]: I am aware that python dict lookup is cheap and that performance gain would be quite anecdotic unless my_dict is huge or the operation is done billions times a milisecond.
I am just curious about this apparently-basic feature being implemented or not in python, like an old_value = my_dict.retrieve_and_replace('key', new_value).
Storing a reference rather than a value in the dict will do what you want. This isn't intended to be an elegant demonstration, just a simple one:
>>> class MyMutableObject(object):
pass
>>> m = MyMutableObject()
>>> m.value = "old_value"
>>> my_dict["k"] = m
Now when you want to change my_dict["k"] to a new value but remember the old one, with a single lookup on "k":
>>> m2 = my_dict["k"]
>>> m2.value
'old_value'
>>> m2.value = 'new_value'
It's up to you to decide if the price this pays in complexity is worth the time saving of one dictionary lookup. Dereferencing m2.value and assigning it afresh will cost 2 more dictionary lookups under the hood.
I have a list of strings, say something like:
listofstuff = ['string1', 'string2', 'string3', ...]
I have a created a custom class object for what I want to do. All I want now is to create a bunch of said objects that are named the strings in my list. How can I do this?
So I have something like:
for object in listofstuff:
object = classthing(inputs)
But it doesn't work. How do I do this?
EDIT: Maybe I should clarify. I have an input file that can change, and in said input file is a list of names. I want to create a bunch of class objects that are all called the names in the list.
So someone gives me a list like
stuff = ['car1', 'car2', 'car3']
and I now want to create a bunch of new Car objects, each one called car1, car2, etc. So that later I can do things like car1.calculate_price() or whatever.
EDIT 2: Sorry for all the edits, but I also wanted to share something. In what I am trying to do, objects are grouped together in specific ways, but ways that aren't obvious to the user. So it would be like 'car1_car23_car4'. So I wanted, if I asked the user, which car do you want to pick? And they chose car4, it would create an object instead named car1_car23_car4, instead of car4.
Creating names dynamically is not the right approach. It is very easy to loose track of them, to make more or less than you need, or to accidentally overwrite an existing name.
A better approach would be to make a dictionary where the keys are your strings from listofstrings and the values are instances of your class. You can use a dict comprehension and write something like:
dct = {name: classthing(inputs) for name in listofstuff}
Below is a demonstration1 of what this does:
>>> class classthing: # This represents your class
... def __init__(self, name):
... self.name = name
...
>>> listofstuff = ['Joe', 'Bob', 'Mary']
>>>
>>> dct = {name: classthing(name) for name in listofstuff}
>>> dct # dct now holds all the data you need
{'Mary': <__main__.classthing object at 0x0205A6D0>, 'Joe': <__main__.classthing object at 0x0205A690>, 'Bob': <__main__.classthing object at 0x0205A6B0>}
>>>
>>> # Each value in dct is an individual instance of your class
>>> dct['Joe'].name
'Joe'
>>> dct['Bob'].name
'Bob'
>>> dct['Mary'].name
'Mary'
>>>
1For the sake of the demonstration, I replaced inputs with name. You do not have to do this in your real code though.
Assuming your strings are in a list called listofstrings, this creates a corresponsing list constructing objects from the strings (also assuming the __init__ method for the class expects one string argument):
listofobjects = [classthing(s) for s in listofstrings]
If that's what you're looking for, read further about list comprehensions.
While this answers your question, the other answer is probably better way of doing it.
I have a python dictionary whose keys are strings and the values are objects.
For instance, an object with one string and one int
class DictItem:
def __init__(self, field1, field2):
self.field1 = str(field1)
self.field2 = int(field2)
and the dictionary:
myDict = dict()
myDict["sampleKey1"] = DictItem("test1", 1)
myDict["sampleKey2"] = DictItem("test2", 2)
myDict["sampleKey3"] = DictItem("test3", 3)
Which is the best/most efficient way to get the dictionary entries that have the "field2" field >= 2?
The idea is creating a "sub-dictionary" (a list would do too) only with the entries in which field2 >= 2 (in the example would be like):
{
"sampleKey2": {
"field1" : "test2",
"field2": 2
},
"sampleKey3": {
"field1" : "test3",
"field2": 3
}
}
Is there a better way than walking through all the dictionary elements and check for the condition? Maybe using itemgetters, and lambda functions?
Thank you!
P.S.: I am using Python2.4, just in case it's relevant
To make a dict from your dict,
subdict = dict((k, v) for k, v in myDict.iteritems() if v.field2 >= 2)
mySubList = [dict((k,v) for k,v in myDict.iteritems() if v.field2 >= 2)]
Documentation:
list-comprehensions, iteritems()
You should keep your various records - that is "DicItem" instances - inside a list.
An generator/list expression can then filter your desired results with ease.
data = [
DictItem("test1", 1),
DictItem("test2", 2),
DictItem("test3", 3),
DictItem("test4", 4),
]
and then:
results = [item for item in data if item.field2 >= 2]
This, of course, creates a linear filter. If you need more than linear speed for some of your queries, the container object for the registers - in this case a "list" should be a specialized class able to create indexes of the data there in, much like a DBMS does with its table indexes. This can be done easily deriving a class from "list" and overriding the "append", "insert", "__getitem__", "__delitem__" and "pop" methods.
If you need this for a high profile application, I'd suggest you to take a look at some of the Object Oriented DB systems for Python out there, like ZODB and others.
The idea is creating a "sub-dictionary" (a list would do too)
If you want a list you could use filter (or itertools.ifilter):
result_list = filter(lambda x: x.field2 > 2, mydict.values())
'Most efficient' is going to depend on how often the dictionary contents change compared to how often you are doing the lookup.
If the dictionary changes often and you do the lookup less often then the most efficient method will be walking through iteritems and selecting the objects that match the criteria, using the code the Adam Bernier posted.
If the dictionary does not change much and you do lots of lookups then it may be faster to make one or more inverse dictionaries, e.g. one mapping the "field2" values to a list of objects that have that value.
Alternatively if you are going to be doing complex queries you could put all the data into an in-memory sqllite database and let SQL sort it out, perhaps via an ORM such as SqlAlchemy