Lookup for a key in dictionary with... regular expressions? - python

I have a dictionary that have the following structure: The key is a link between a source and a destination, the value is the instance of an object wire.
wire_dict = { source1_destination1_1 : object,
source1_destination1_2 : object
source2_destination1_3 : object
source2_destination1_4 : object
source2_destination2_1 : object
source2_destination2_2 : object }
Let's suppose that I only have a destination value, and with that I want to find, perhaps with regular expressions, the key that have the destination1_1. As you can see, same sources can have several destinations, but different sources cannot have the same destinations. So I want to find the key that ends with the destination.
Since the wire_dict could contain a lot of key-value entries, please tell me how this approach can affect the performance of the application. Perhaps I should create another dictionary only for the relationship between source and destination?
UPDATE: I change the dictionary with tuples as keys:
wire_dict = { ('source1','destination1_1') : object1,
('source1','destination1_2') : object2
('source2','destination1_3') : object3
('source2','destination1_4') : object4
('source2','destination2_1') : object5
('source2','destination2_2') : object6 }
The logic of the application is the same. A destination cannot have more than one source. So, only a coincidence should be found when a destination is provided.

Having string searches through dict keys is going to be linear time with standard python dictionaries. But it can be done with dict.keys() and re module as #avim helpfully told.
For the second concern, instead of string keys, how about having tuples as keys:
{(begin, end): connection_object}
It won't speed up at all (search is likely stay linear) but it enables better code behind the logic you want to express.

import re
wire_dict = {'source1_destination1_1' : 'object1',
'source1_destination1_2' : 'object2',
'source2_destination1_3' : 'object3',
'source2_destination1_4' : 'object4',
'source2_destination2_1' : 'object5',
'source2_destination2_2' : 'object6' }
pattern = 'source1_destination1_1'
print [value for key, value in wire_dict.items() if re.search(pattern, key)]
Output:
['object1']

It's easy to run over all dict keys and find the ones that match your pattern, but it's slow for big dicts.
I think you need another dict with keys matching your destinations (as you thought).

You just need str.endswith and to iterate over the dict checking each key.
print([k for k in wire_dict if k.endswith("destination1_1")])
If there is only over one use next and a generator expression:
k = next((k for k in wire_dict if k.endswith("destination1_1")),"")
If you want the value use wire_dict.get(k) in case there is no match and you get an empty string returned from the next call.
In [18]: k = next((k for k in wire_dict if k.endswith("destination1_1")),"")
In [19]: wire_dict[k]
Out[19]: object
In [20]: k
Out[20]: 'source1_destination1_1'
You should also never use dict.keys in python2 unless you actually want a list. You can simply iterate over the dict object to access each key efficiently.

Object oriented programming my friend
class Uberdict():
def init(source, destination, obj):
self.source, self.destination, self.obj = source, destination, obj
def has_destination(destination)
# True or False
return self.desination == destination
def has_source(source)
return self.source == source
wire_object_list = [
# list of the objects
]
# how to create them
example_obj = Uberdict(some_source, some_destination, some_instance)
wire_object_list.append(example_obj)
# filter
example_destination = 'some destination'
filtered_list = [item for item in wire_object_list if item.has_destination(example_destination)
only psuedo code could have errors

Related

Python: apply wildcard match to keys being read from dictionary

This is for a script I'm running in Blender, but the question pertains to the Python part of it. It's not specific to Blender.
The script is originally from this answer, and it replaces a given material (the key) with its newer equivalent (the value).
Here's the code:
import bpy
objects = bpy.context.selected_objects
mat_dict = {
"SOLID-WHITE": "Sld_WHITE",
"SOLID-BLACK": "Sld_BLACK",
"SOLID-BLUE": "Sld_BLUE"
}
for obj in objects:
for slot in obj.material_slots:
slot.material = bpy.data.materials[mat_dict[slot.material.name]]
The snag is, how to handle duplicates when the scene may have not only objects with the material "SOLID-WHITE", but also "SOLID-WHITE.001", "SOLID-WHITE.002", and so on.
I was looking at this answer to a question about wildcards in Python and it seems fnmatch might well well-suited for this task.
I've tried working fnmatch into the last line of the code. I've also tried wrapping the dictionary keys with it (very WET, I know). Neither of these approaches has worked.
How can I run a wildcard match on each dictionary key?
So for example, whether an object has "SOLID-WHITE" or "SOLID-WHITE"-dot-some-number, it will still be replaced with "Sld_WHITE"?
I have no clue about Blender so I'm not sure if I'm getting the problem right, but how about the following?
mat_dict = {
"SOLID-WHITE": "Sld_WHITE",
"SOLID-BLACK": "Sld_BLACK",
"SOLID-BLUE": "Sld_BLUE"
}
def get_new_material(old_material):
for k, v in mat_dict.items():
# .split(".")[0] extracts the part to the left of the dot (if there is one)
if old_material.split(".")[0] == k:
return v
return old_material
for obj in objects:
for slot in obj.material_slots:
new_material = get_new_material(slot.material.name)
slot.material = bpy.data.materials[new_material]
Instead of the .split(".")[0] you could use or re.match by storing regexes as keys in your dictionary. As you noticed in the comment, startswith could match too much, and the same would be the case for fnmatch.
Examples of the above function in action:
In [3]: get_new_material("SOLID-WHITE.001")
Out[3]: 'Sld_WHITE'
In [4]: get_new_material("SOLID-WHITE")
Out[4]: 'Sld_WHITE'
In [5]: get_new_material("SOLID-BLACK")
Out[5]: 'Sld_BLACK'
In [6]: get_new_material("test")
Out[6]: 'test'
There are two ways you can approach this.
You can make a smart dictionary that matches vague names. Or you can change the key that is used to look up the a color.
Here is an example of the first approach using fnmatch.
this approach changes the lookup time complexity from O(1) to O(n) when a color contains a number. this approach extends UserDict with a __missing__ method. the __missing__ method gets called if the key is not found in the dictionary. it compares every key with the given key using fnmatch.
from collections import UserDict
import fnmatch
import bpy
objects = bpy.context.selected_objects
class Colors(UserDict):
def __missing__(self, key):
for color in self.keys():
if fnmatch.fnmatch(key, color + "*"):
return self[color]
raise KeyError(f"could not match {key}")
mat_dict = Colors({
"SOLID-WHITE": "Sld_WHITE",
"SOLID-BLACK": "Sld_BLACK",
"SOLID-BLUE": "Sld_BLUE"
})
for obj in objects:
for slot in obj.material_slots:
slot.material = bpy.data.materials[mat_dict[slot.material.name]]
Here is an example of the second approach using regex.
import re
import bpy
objects = bpy.context.selected_objects
mat_dict = {
"SOLID-WHITE": "Sld_WHITE",
"SOLID-BLACK": "Sld_BLACK",
"SOLID-BLUE": "Sld_BLUE"
}
pattern = re.compile(r"([A-Z\-]+)(?:\.\d+)?")
# matches any number of capital letters and dashes
# can be followed by a dot followed by any number of digits
# this pattern can match the following strings
# ["AAAAA", "----", "AA-AA.00005"]
for obj in objects:
for slot in obj.material_slots:
match = pattern.fullmatch(slot.material.name)
if match:
slot.material = bpy.data.materials[mat_dict[match.group(1)]]
else:
slot.material = bpy.data.materials[mat_dict[slot.material.name]]

Simplifying the code to a dictionary comprehension

In a directory images, images are named like - 1_foo.png, 2_foo.png, 14_foo.png, etc.
The images are OCR'd and the text extract is stored in a dict by the code below -
data_dict = {}
for i in os.listdir(images):
if str(i[1]) != '_':
k = str(i[:2]) # Get first two characters of image name and use as 'key'
else:
k = str(i[:1]) # Get first character of image name and use 'key'
# Intiates a list for each key and allows storing multiple entries
data_dict.setdefault(k, [])
data_dict[k].append(pytesseract.image_to_string(i))
The code performs as expected.
The images can have varying numbers in their name ranging from 1 to 99.
Can this be reduced to a dictionary comprehension?
No. Each iteration in a dict comprehension assigns a value to a key; it cannot update an existing value list. Dict comprehensions aren't always better--the code you wrote seems good enough. Although maybe you could write
data_dict = {}
for i in os.listdir(images):
k = i.partition("_")[0]
image_string = pytesseract.image_to_string(i)
data_dict.setdefault(k, []).append(image_string)
Yes. Here's one way, but I wouldn't recommend it:
{k: d.setdefault(k, []).append(pytesseract.image_to_string(i)) or d[k]
for d in [{}]
for k, i in ((i.split('_')[0], i) for i in names)}
That might be as clean as I can make it, and it's still bad. Better use a normal loop, especially a clean one like Dennis's.
Slight variation (if I do the abuse once, I might as well do it twice):
{k: d.setdefault(k, []).append(pytesseract_image_to_string(i)) or d[k]
for d in [{}]
for i in names
for k in i.split('_')[:1]}
Edit: kaya3 now posted a good one using a dict comprehension. I'd recommend that over mine as well. Mine are really just the dirty results of me being like "Someone said it can't be done? Challenge accepted!".
In this case itertools.groupby can be useful; you can group the filenames by the numeric part. But making it work is not easy, because the groups have to be contiguous in the sequence.
That means before we can use groupby, we need to sort using a key function which extracts the numeric part. That's the same key function we want to group by, so it makes sense to write the key function separately.
from itertools import groupby
def image_key(image):
return str(image).partition('_')[0]
images = ['1_foo.png', '2_foo.png', '3_bar.png', '1_baz.png']
result = {
k: list(v)
for k, v in groupby(sorted(images, key=image_key), key=image_key)
}
# {'1': ['1_foo.png', '1_baz.png'],
# '2': ['2_foo.png'],
# '3': ['3_bar.png']}
Replace list(v) with list(map(pytesseract.image_to_string, v)) for your use-case.

Unable to print value from inside array

I have an array in which has multiple objects inside it, I want to parse it to give me the value of bar that is in each object.
This is a mock of the array/object I am trying to parse to get the values from.
[
{
foo: [
{
bar: 50,
crow: true
}
]
}
...
{}
...
]
So far this is as far as I have come on the code and admittedly it isn't very far as I've been back tracking as things go wrong and worse.
for foo in response:
print foo
foo_list = list(foo)
print(foo_list[0])
This outputs the whole value of foo then it outputs only the key foo
PS I know this is a repeat question, please only mark as so if the repeat one has the exact format I am looking for.
You want something along the lines of
something[0]['foo'][0]['bar']
The '[' means you need to access an element of a list. The [0] gets the first element of a list.
The '{' means you need to access an entry of a dictionary. The ['foo'] means get the value of key 'foo'.
Your sample above has a dict in a list in a dict in a list.
You should also not scrimp when it comes to error checking... The above could easily be:
sentinel = object() # unique object for checking default from get()
bar = None
if isinstance(something, list) and len(something) == 1:
elem = something[0]
if isinstance(elem, dict):
val = elem.get('foo', sentinel)
if val is not sentinel:
# and so on and so on...
The other thing to mention is that sometimes you don't care about the name of the key. Instead of saying dick.get(key), which requires knowing the key, you can say dick.values()[0]
So, the top could also be:
something[0].values()[0][0].values()[0]
Again, don't skimp on the error checking...
I'm going to assume you want the inner values. Try this code:
response = [{'foo': [ { 'bar': 50, 'crow': True } ] }]
for i in range(len(response)):
for key in response[i].keys():
for d in response[i][key]:
print(list(d.values()))
Basically, what's happening here is we're first iterating through the outer list ([]), then iterating the keys in that array, then iterating each item in that value list for key foo, then printing a list of the dictionary's values. You could also just grab the whole dict as d.
Also, if this is JSON, you're going to want to parse it first with the json lib. See https://docs.python.org/3/library/json.html for more information on that.

python3 list comprehension using a dictionary doesn't always return the same list

EDIT:
As requested, here is the problem I am trying to solve:
I have files in a directory that do not have extensions. Based on the output of the "file" command I want to assign the corresponding extension. So my dictionary is assigning strings that can be in this output, to the extensions (eg "ASCII": "txt") but can't know if it will be the exact output. For example :
$ file my_file
myfile: ASCII text
# I want to change the extension of myfile to myfile.txt
That is what the code I wrote was designed to do, but maybe there are better solutions
I have a dictionnary of strings with keys that have common strings :
MY_D = {
"first_k": "first_v",
"sec_k": "sec_v",
"some_key_name_that_includes_others": "value1",
"some_other_key": "value2",
...
"last_k": "last_v"
}
Sometimes, I have to fetch a value, if the search word is within the key. I use this code:
key_I_am_looking_for = "some"
value = list(v for k, v in MY_D.items() if value_I_am_looking_for in k)[-1]
But when the key I am looking for appears in multiple possible keys, the value I get is not always the same, in this example it can be either "value1" or "value2".
I noticed the list returned, is not always ordered the same way.
Is there a way I can make this return always the value that corresponds to the longest key matched (here would be "value1")?
Dicts, up to Python 3.7, are unordered by specification, meaning the code should not rely on an internal order for dictionaries.
The work around this is simply to sort the output of your dict interation. In your snippet, that can be done simply by changing the call to list by a call to sorted:
value = sorted(v for k, v in MY_D.items() if value_I_am_looking_for in k)[-1]
The longest key in all matches. A sorted with lambda may help.
This gets each match in a list of lists and uses sorted and key=lambda... on the length of the 1st item in each list (which is the saved key name). It will sort shortest key name to longest. Get the last list and the last item of that list to get the value of value1.
MY_D = {
"first_k": "first_v",
"sec_k": "sec_v",
"some_key_name_that_includes_others": "value1",
"some_other_key": "value2",
"last_k": "last_v"
}
key_I_am_looking_for = "some"
# List of lists of matches.
value = list([k, v] for k, v in MY_D.items() if key_I_am_looking_for in k)
# Sort by index 0 of each inner list.
key_len = sorted(value, key=lambda l: len(l[0]))
# Last list is longest index 0 so get last item of that list.
print('longest:', key_len[-1][-1])

cannot coerce type 'closure' to vector of type 'character' while using hash package in R

I am trying to convert a piece of python code to R. In python, a dictionary within a dictionary is used. So I am trying to utilise the hash package in R,
Python code:
titles = {
'NAME' :{
'exact':['NAME']
,'partial':[]
}
, 'Dt' :{
'exact':['Dt']
,'partial':[]
}
, 'CC' :{
'exact':[]
,'partial':[]
}
}
And the R code is,
library(hash)
titles = hash(("NAME" = list("exact"=list('NAME'),"partial"=list())),
("Dt" = list("exact"=list('Dt'),"partial"=list())),
("CC" = list("exact"=list(),"partial"=list())))
But when i try use this code with hash environment, I am getting this below error.
Error in as.vector(x, "character") :
cannot coerce type 'closure' to vector of type 'character'
When I try to replace hash with list, its working fine. But, I am using key/value pair(hash package) mainly because I have to play around with inner dictionary, I mean change the inner dictionary values based on the outer dictionary keys. Any idea why I am getting this error or any alternative approach.
Updating below to make the question still more clear.
To explain it further, I am creating it as key/value pairs(hash package) mainly because I am going to use the below logic on the dictionaries which heavily use key/value pairs. I am not sure if this can be easily done in R list without key/value pairs.
another_dict = {}
multiples_dict = {}
adj_title = 'Dt'
for outer_key,outer_value in titles.iteritems():
for exact in outer_value['exact']:
if exact == adj_title:
another_dict[actual_title] = outer_key
multiples_dict[outer_key] = multiples
for partial in inner_dict['partial']:
if partial in adj_title:
another_dict[actual_title] = outer_key
multiples_dict[outer_key] = multiples
Thanks in advance.
you need to get rid of the parens surrounding each of the key/value pairs as in:
library(hash)
titles = hash("NAME" = list("exact"=list('NAME'),"partial"=list()),
"Dt" = list("exact"=list('Dt'),"partial"=list()),
"CC" = list("exact"=list(),"partial"=list()))
When you include the parens hash( (a=b) ), the object (a=b) is being passed as an expression and not a key/value pair

Categories

Resources