looking for patterns in a dictionary and make a new dictionary - python

I have a list of all combinations of sequences can be made with 'K'
and 'M' and the lengths are from 6 to 18. so, I have combinations
including "KKKKKK" to "MMMMMMMMMMMMMMMMMM".
I have also a dictionary in which the keys are ids and the values are
long sequences made not only with K and M but also with some more
characters which are not important for me.
small example:
com = ["KKKKKK", "KKKKKM", ......, "MMMMMMMMMMMMMMMMMM"]
li = {id1: "KKKKKKHKJASGKKKMOOGBMMMMMMMMMMMMMMMMMM",
id2:"MMKFJDFKFGKJJJJFKKKKKMJKJHFKKKKKK"}
I want to find different combinations in the li dictionary(values) and
make a new dictionary in which the keys are ids from li dictionary
(the keys) and values are a list containing the combinations found in
the values of li dictionary. for the small example the output would be
like this:
results = {id1: ["KKKKKK", "MMMMMMMMMMMMMMMMMM"], id2: ["KKKKKM", "KKKKKK"] }
I wrote the following code but did not give me what I want.
results = {}
for i in com:
if i in li.values():
results[li.keys()] = [i]

You can use re.findall() within a dictionary comprehension:
In [11]: {k: re.findall(r'(?:K|M){6,18}', v) for k, v in li.items()}
Out[11]: {'id1': ['KKKKKK', 'MMMMMMMMMMMMMMM'], 'id2': ['KKKKKM', 'KKKKKK']}
r'(?:K|M){6,18}' is a regular expression that will match any substring of K or M with length 6 to 18.

The problem is here: if i in li.values():. This line will check if any of the dictionary's values equals the current combination. Instead, you want this:
if v in li.values():
if i in v:
Which will check if any of the dict's values contains the current combination.
Also, this line results[li.keys()] = [i] will map all of the dict's keys to a new list. There are two problems with that: first, you want to map only the relevant key. Second, you want to add to the current list, not replace it with a new one.

Related

How to convert the list into a list of dictionaries?

I have a list like this.
ls = ['Size:10,color:red,', 'Size:10,color: blue,']
I want to convert the list into this format.
[{'Size':'10','color':'red'}, {'Size':'10','color': 'blue'}]
What I have tried is:
[dict([pair.split(":", 1)]) for pair in ls]
# It gave me output like this.
[{'Size': '10,color:red,'}, {'Size': '10,color: blue,'}]
But this method works if the list is like this ['color:blue,'] but didn't worked properly with the above list.
We can see that for pair in ls in your list comprehension is already doubtful, because elements of ls are not pairs. Each element actually contains a sequence of pairs.
There will be two loops needed here, one to iterate the outer list, and then another one to iterate within each value, since those values are actually strings consisting of multiple fields.
While this is possible with a nested list comprehension, it will be easier (and more readable) if you break the problem down into simpler parts rather than trying to fit it all into one-line.
result = []
for text in ls:
d = {}
pairs = text.strip(",").split(",")
for pair in pairs:
key, val = pair.split(":")
d[key] = val.strip()
result.append(d)

How to compare a python dictionary key with a part of another dictionary's key? something like a .contains() function

Most of my small-scale project worked fine using dictionaries, so changing it now would basically mean starting over.
Let's say I have two different dictionaries(dict1 and dict2).
One being:
{'the dog': 3, 'dog jumped': 4, 'jumped up': 1, 'up onto': 8, 'onto me': 13}
Second one being:
{'up': 12, 'dog': 22, 'jumped': 33}
I want to find wherever the first word of the first dictionary is equal to the word of the second one. These 2 dictionaries don't have the same length, like in the example. Then after I find them, divide their values.
So what I want to do, sort of using a bit of Java is:
for(int i = 0;i<dict1.length(),i++){
for(int j = 0;j<dict2.length(),j++){
if(dict1[i].contains(dict2[j]+" ") // not sure if this works, but this
// would theoretically remove the
// possibility of the word being the
// second part of the 2 word element
dict1[i] / dict2[j]
What I've tried so far is trying to make 4 different lists. A list for dict1 keys, a list for dict1 values and the same for dict2. Then I've realized I don't even know how to check if dict2 has any similar elements to dict1.
I've tried making an extra value in the dictionary (a sort of index), so it would kind of get me somewhere, but as it turns out dict2.keys() isn't iterable either. Which would in turn have me believe using 4 different lists and trying to compare it somehow using that is very wrong.
Dictionaries don't have any facilities at all to handle parts of keys. Keys are opaque objects. They are either there or not there.
So yes, you would loop over all the keys in the first dictionary, extract the first word, and then test if the other dictionary has that first word as a key:
for key, dict1_value in dict1.items():
first_word = key.split()[0] # split on whitespace, take the first result
if first_word in dict2:
dict2_value = dict2[first_word]
print(dict1_value / dict2_value)
So this takes every key in dict1, splits off the first word, and tests if that word is a key in dict2. If it is, get the values and print the result.
If you need to test those first words more often, you could make this a bit more efficient by first building another structure to to create an index from first words to whole keys. Simply store the first words every key of the first dictionary, in a new dictionary:
first_to_keys = {}
for key in dict1:
first_word = key.split()[0]
# add key to a set for first_word (and create the set if there is none yet)
first_to_keys.setdefault(first_word, set()).add(key)
Now first_to_key is a dictionary of first words, pointing to sets of keys (so if the same first word appears more than once, you get all full keys, not just one of them). Build this index once (and update the values each time you add or remove keys from dict1, so keep it up to date as you go).
Now you can compare that mapping to the other dictionary:
for matching in first_to_key.keys() & dict2.keys():
dict2_value = dict2[matching]
for dict1_key in first_to_key[matching]:
dict1_value = dict1[dict1_key]
print(dict1_value / dict2_value)
This uses the keys from two dictionaries as sets; the dict.keys() object is a dictionary view that lets you apply set operations. & gives you the intersection of the two dictionary key sets, so all keys that are present in both.
You only need to use this second option if you need to get at those first words more often. It gives you a quick path in the other direction, so you could loop over dict2, and quickly go back to the first dictionary again.
Here's a solution using the str.startswith method of strings
for phrase, val1 in dict1.items():
for word, val2 in dict2.items():
if phrase.startswith(word):
print(val1/val2)

Using suffixes for dictionary searching

P/S: Duplicates questions raised so far are concerning on prefixes (thanks for that anyway)
This question is on suffixes.
With dictionary
dic={"abcd":2, "bbcd":2, "abgg":2}
Is it possible to search the dictionary using suffix of the string, i.e., if given "bcd", it will return me two entries
{"abcd":2, "bbcd":2}
One possible way:
dic1={}
for k, v in dic.items():
if(k.endswith("bcd")):
dic1[k]=v
Is it possible to do it more efficiently?
for a small problems set you can do it with a simple list comprehension:
suffixed = [v for k, v in dic.items() if k.endswith("bcd")]
however that means doing a substring check on every item in the dictionary every time you query. If that's slow on big data sets you can make a second dictionary of the original keys as an acceleration. You'd have to do a one time pre-pass:
suffixes = dict ( [ (k[-3:], []) for k in dic1] )
for k in dic1:
suffixes[k[-3:]].append(dic1[k])
That would give you all the results for each suffix. You could store the keys instead of the values the same way and then chain to a lookup.
In any event, the hashed lookups for dictionary keys are very cheap, so it's best to cache your data in a dictionary with the keys you want (ie, suffixes) rather than looping over every key doing strings.

What's the fastest way to identify the 'name' of a dictionary that contains a specific key-value pair?

I'd like to identify the dictionary within the following list that contains the key-value pair 'Keya':'123a', which is ID1 in this case.
lst = {ID1:{'Keya':'123a','Keyb':456,'Keyc':789},ID2:{'Keya':'132a','Keyb':654,'Keyc':987},ID3:{'Keya':'5433a','Keyb':222,'Keyc':333},ID4:{'Keya':'444a','Keyb':777,'Keyc':666}}
It's safe to assume all dictionaries have the same key's, but have different values.
I currently have the following to identify which dictionary has the value '123a' for the key 'Keya', but is there a shorter and faster way?
DictionaryNames = map(lambda Dict: str(Dict),lst)
Dictionaries = [i[1] for i in lst.items()]
Dictionaries = map(lambda Dict: str(Dict),Dictionaries)
Dict = filter(lambda item:'123a' in item,Dictionaries)
val = DictionaryNames[Dictionaries.index(Dict[0])]
return val
If you actually had a list of dictionaries, this would be:
next(d for d in list_o_dicts if d[key]==value)
Since you actually have a dictionary of dictionaries, and you want the key associated with the dictionary, it's:
next(k for k, d in dict_o_dicts.items() if d[key]==value)
This returns the first matching value. If you're absolutely sure there is exactly one, or if you don't care which you get if there are more than one, and if you're happy with a StopIteration exception if you were wrong and there isn't one, that's exactly what you want.
If you need all matching values, just do the same with a list comprehension:
[k for k, d in dict_o_dicts.items() if d[key]==value]
That list can of course have 0, 1, or 17 values.
You can just do [name for name, d in lst.iteritems() if d['Keya']=='123a'] to get a list of all the dictionaries in lst that have that value for that key. If you know there is only one, you can get it with [name for name, d in lst.iteritems() if d['Keya']=='123a'][0]. (As Andy mentions in a comment, your name lst is misleading, since lst is actually a dictionary of dictionaries, not a list.)
Since you want the fastest, you should short-cut your search as soon as you find the data you are after. Iterating through the whole list is not necessary, nor is producing any temporary dictionary:
for key,data in lst.iteritems():
if data['Keya']=='132a':
return key #or break is not in a function
Å different way to do this is to use the appropriate data structure: Keep a "reverse map" of key-value pairs to names. If your dictionary of dictionaries is static after being built, you can build the reverse dictionary like this:
revdict = {(key, value): name
for name, subdict in dictodicts.items()
for key, value in subdict.items()}
If not, you just need to add revdict[key, value] = name for each d[name][key] = value statement and build them up in parallel.
Either way, to find the name of the dict that maps key to value, it's just:
revdict[key, value]
For (a whole lot) more information (than you actually want), and some sample code for wrapping things up in different ways… I dug up an unfinished blog post, considered editing it, and decided to not bother and just clicked Publish instead, so: Reverse dictionary lookup and more, on beyond z.

searching a value in a list and outputting its key

i have a dictionary, in which each key has a list as its value and those lists are of different sizes. I populated keys and values using add and set(to avoid duplicates). If i output my dictionary, the output is:
blizzard set(['00:13:e8:17:9f:25', '00:21:6a:33:81:50', '58:bc:27:13:37:c9', '00:19:d2:33:ad:9d'])
alpha_jian set(['00:13:e8:17:9f:25'])
Here, blizzard and alpha_jian are two keys in my dictionary.
Now, i have another text file which has two columns like
00:21:6a:33:81:50 45
00:13:e8:17:9f:25 59
As you can see, the first column items are one of the entries in each list of my dictionary. For example, 00:21:6a:33:81:50 belongs to the key 'blizzard' and 00:13:e8:17:9f:25 belongs to the key 'alpha_jian'.
The problem i want is, go through first column items in my text file, and if that column entry is found in dictionary, find its corresponding key, find the length of that corresponding list in the dictionary, and add them in new dictionary, say newDict.
For example 00:21:6a:33:81:50 belongs to blizzard. Hence, newDict entry will be:
newDict[blizzard] = 4 // since the blizzard key corresponds to a list of length 4.
This is the code i expected to do this task:
newDict = dict()
# myDict is present with entries like specified above
with open("input.txt") as f:
for line in f:
fields = line.split("\t")
for key, value in myDict.items():
if fields[0] == #Some Expression:
newdict[key] = len(value)
print newDict
Here, my question is what should be #Some Expression in my code above. If values are not lists, this is very easy. But how to search in lists? Thanks in advance.
You are looking for in
if fields[0] in value:
But this isn't a very efficient method, as it involves scanning the dict values over and over
You can make a temporary datastructure to help
helper_dict = {k: v for v, x in myDict.items() for k in x}
So your code becomes
helper_dict = {k: v for v, x in myDict.items() for k in x}
with open("input.txt") as f:
for line in f:
fields = line.split("\t")
key = fields[0]
if key in helper_dict:
newdict[helper_dict[key]] = len(myDict[helper_dict[key]])
Doesn't
if fields[0] in value:
solve your problem ? Or I don't understand your question ?
Looks like
if fields[0] in value:
should do the trick. I.e. check if the field is a member of the set (this also works for lists, but a bit slower at least if the lists are large).
(note that lists and sets are two different things; one is an ordered container that can contain multiple copies of the same value, the other an unordered container that can contain only one copy of each value.)
You may also want to add a break after the newdict assignment, so you don't keep checking all the other dictionary entries.
if fields[0] in value: should do the trick given that from what you say above every value in the dictionary is a set, whether of length 1 or greater.
It would probably be more efficient to build a new dictionary with keys like '00:13:e8:17:9f:25' (assuming these are unique), and associated values being the number of entries in their set before you start though - that way you will avoid recalculating this stuff repeatedly. Obviously, if the list isn't that long then it doesn't make much difference.

Categories

Resources