Performing string transformations on buildbot build properties? - python

Is there a good way to perform string transformations on a property or source stamp attribute before using it in an Interpolate? We use slashes in our branch names, and I need to transform the slashes into dashes so I can use them in filenames.
That is, say I have the branch "feature/fix-all-the-things", accessible as Interpolate("%(prop:branch)s") or Interpolate("%(src::branch)s"). I would like to be able to transform it to "feature-fix-all-the-things" for some interpolations. Obviously, it needs to remain in its original form for selecting the appropriate branch from source control.

It turned out, I just needed to subclass Interpolate:
import re
from buildbot.process.properties import Interpolate
class InterpolateReplace(Interpolate):
"""Interpolate with regex replacements.
This takes an additional argument, `patterns`, which is a list of
dictionaries containing the keys "search" and "replace", corresponding to
`pattern` and `repl` arguments to `re.sub()`.
"""
def __init__(self, fmtstring, patterns, *args, **kwargs):
Interpolate.__init__(self, fmtstring, *args, **kwargs)
self._patterns = patterns
def _sub(self, s):
for pattern in self._patterns:
search = pattern['search']
replace = pattern['replace']
s = re.sub(search, replace, s)
return s
def getRenderingFor(self, props):
props = props.getProperties()
if self.args:
d = props.render(self.args)
d.addCallback(lambda args:
self._sub(self.fmtstring % tuple(args)))
return d
else:
d = props.render(self.interpolations)
d.addCallback(lambda res:
self._sub(self.fmtstring % res))
return d

It looks like there's a newer, easier way to do this since buildbot v0.9.0 with Transform:
filename = util.Transform(
lambda p: p.replace('/', '-'),
util.Property('branch')
)

Related

How do you associate metadata or annotations to a python function or method?

I am looking to build fairly detailed annotations for methods in a Python class. These to be used in troubleshooting, documentation, tooltips for a user interphase, etc. However it's not clear how I can keep these annotations associated to the functions.
For context, this is a feature engineering class, so two example methods might be:
def create_feature_momentum(self):
return self.data['mass'] * self.data['velocity'] *
def create_feature_kinetic_energy(self):
return 0.5* self.data['mass'] * self.data['velocity'].pow(2)
For example:
It'd be good to tell easily what core features were used in each engineered feature.
It'd be good to track arbitrary metadata about each method
It'd be good to embed non-string data as metadata about each function. Eg. some example calculations on sample dataframes.
So far I've been manually creating docstrings like:
def create_feature_kinetic_energy(self)->pd.Series:
'''Calculate the non-relativistic kinetic energy.
Depends on: ['mass', 'velocity']
Supports NaN Values: False
Unit: Energy (J)
Example:
self.data= pd.DataFrame({'mass':[0,1,2], 'velocity':[0,1,2]})
self.create_feature_kinetic_energy()
>>> pd.Series([0, 0.5, 4])
'''
return 0.5* self.data['mass'] * self.data['velocity'].pow(2)
And then I'm using regex to get the data about a function by inspecting the __doc__ attribute. However, is there a better place than __doc__ where I could store information about a function? In the example above, it's fairly easy to parse the Depends on list, but in my use case it'd be good to also embed some example data as dataframes somehow (and I think writing them as markdown in the docstring would be hard).
Any ideas?
I ended up writing an class as follows:
class ScubaDiver(pd.DataFrame):
accessed = None
def __getitem__(self, key):
if self.accessed is None:
self.accessed = set()
self.accessed.add(key)
return pd.Series(dtype=float)
#property
def columns(self):
return list(self.accessed)
The way my code is writen, I can do this:
sd = ScubbaDiver()
foo(sd)
sd.columns
and sd.columns contains all the columns accessed by foo
Though this might not work in your codebase.
I also wrote this decorator:
def add_note(notes: dict):
'''Adds k:v pairs to a .notes attribute.'''
def _(f):
if not hasattr(f, 'notes'):
f.notes = {}
f.notes |= notes # Summation for dicts
return f
return _
You can use it as follows:
#add_note({'Units':'J', 'Relativity':False})
def create_feature_kinetic_energy(self):
return 0.5* self.data['mass'] * self.data['velocity'].pow(2)
and then you can do:
create_feature_kinetic_energy.notes['Units'] # J

How to parse string replacement fields in a string in python?

Python has this concept of string replacement fields such as mystr1 = "{replaceme} other text..." where {replaceme} (the replacement field) can be easily formatted via statements such as mystr1.format(replaceme="yay!").
So I often am working with large strings and sometimes do not know all of the replacement fields and need to either manually resolve them which is not too bad if it is one or two, but sometimes it is dozens and would be nice if python had a function similar to dict.keys().
How does one to parse string replacement fields in a string in python?
In lieu of answers from the community I wrote a helper function below to spit out the replacement fields to a dict which I can then simply update the values to what I want and format the string.
Is there a better way or built in way to do this?
cool_string = """{a}yo{b}ho{c}ho{d}and{e}a{f}bottle{g}of{h}rum{i}{j}{k}{l}{m}{n}{o}{p}{q}{r}{s}{t}{u}{v}{w}{x}{y}{z}"""
def parse_keys_string(s,keys={}):
try:
print(s.format(**keys)[:0])
return keys
except KeyError as e:
print("Adding Key:",e)
e = str(e).replace("'","")
keys[e]=e
parse_keys_string(s,keys)
return keys
cool_string_replacement_fields_dict = parse_keys_string(cool_string)
#set replacement field values
i = 1
for k,v in cool_string_replacement_fields_dict.items():
cool_string_replacement_fields_dict[k] = i
i = i + 1
#format the string with desired values...
cool_string_formatted = cool_string.format(**cool_string_replacement_fields_dict)
print(cool_string_formatted)
I came up with the following:
class NumfillinDict(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.i = -1
def __missing__(self, key): #optionally one could have logic based on key
self.i+=1
return f"({self.i})"
cool_string = ("{a}yo{b}ho{c}ho{d}and{e}a{f}bottle{g}of{h}rum{i}\n"
"{j}{k}{l}{m}{n}{o}{p}{q}{r}{s}{t}{u}{v}{w}{x}{y}{z}")
dt = NumfillinDict(notneeded='something', b=' -=actuallyIknowb<=- ')
filled_string = cool_string.format_map(dt)
print(filled_string)
It works a bit like a defaultdict by filling in missing key-value pairs using the __missing__ method.
Result:
(0)yo -=actuallyIknowb<=- ho(1)ho(2)and(3)a(4)bottle(5)of(6)rum(7)
(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)
Inspired by: Format string unused named arguments

list of functions Python

I have a list of patterns:
patterns_trees = [response.css("#Header").xpath("//a/img/#src"),
response.css("#HEADER").xpath("//a/img/#src"),
response.xpath("//header//a/img/#src"),
response.xpath("//a[#href='"+response.url+'/'+"']/img/#src"),
response.xpath("//a[#href='/']/img/#src")
]
After I traverse it and find the right pattern I have to send the pattern as an argument to a callback function
for pattern_tree in patterns_trees:
...
pattern_response = scrapy.Request(...,..., meta={"pattern_tree": pattern_tree.extract_first()})
By doing this I get the value of the regex not the pattern
THINGS I TRIED:
I tried isolating the patterns in a separate class but still I have the problem that I can not store them as pattern but as values.
I tried to save them as strings and maybe I can make it work but
What is the most efficient way of storing list of functions
UPDATE: Possible solution but too hardcoded and it's too problematic when I want to add more patterns:
def patter_0(response):
response.css("#Header").xpath("//a/img/#src")
def patter_1(response):
response.css("#HEADER").xpath("//a/img/#src")
.....
class patternTrees:
patterns = [patter_0,...,patter_n]
def length_patterns(self):
return len(patterns)
If you're willing to consider reformatting your list of operations, then this is a somewhat neat solution. I've changed the list of operations to a list of tuples. Each tuple contains (a ref to) the appropriate function, and another tuple consisting of arguments.
It's fairly easy to add new operations to the list: just specify what function to use, and the appropriate arguments.
If you want to use the result from one operation as an argument in the next: You will have to return the value from execute() and process it in the for loop.
I've replaced the calls to response with prints() so that you can test it easily.
def response_css_ARG_xpath_ARG(args):
return "response.css(\"%s\").xpath(\"%s\")" % (args[0],args[1])
#return response.css(args[0]).xpath(args[1])
def response_xpath_ARG(arg):
return "return respons.xpath(\"%s\")" % (arg)
#return response.xpath(arg)
def execute(function, args):
response = function(args)
# do whatever with response
return response
response_url = "https://whatever.com"
patterns_trees = [(response_css_ARG_xpath_ARG, ("#Header", "//a/img/#src")),
(response_css_ARG_xpath_ARG, ("#HEADER", "//a/img/#src")),
(response_xpath_ARG, ("//header//a/img/#src")),
(response_xpath_ARG, ("//a[#href='"+response_url+"/"+"']/img/#src")),
(response_xpath_ARG, ("//a[#href='/']/img/#src"))]
for pattern_tree in patterns_trees:
print(execute(pattern_tree[0], pattern_tree[1]))
Note that execute() can be omitted! Depending on if you need to process the result or not. Without the executioner, you may just call the function directly from the loop:
for pattern_tree in patterns_trees:
print(pattern_tree[0](pattern_tree[1]))
Not sure I understand what you're trying to do, but could you make your list a list of lambda functions like so:
patterns_trees = [
lambda response : response.css("#Header").xpath("//a/img/#src"),
...
]
And then, in your loop:
for pattern_tree in patterns_trees:
intermediate_response = scrapy.Request(...) # without meta kwarg
pattern_response = pattern_tree(intermediate_response)
Or does leaving the meta away have an impact on the response object?

How to extract substrings from a masked Python string?

I'm writing an HTTP Request Handler with intuitive routing. My goal is to be able to apply a decorator to a function which states the HTTP method being used as well as the path to be listened on for executing the decorated function. Here's a sample of this implementation:
#route_handler("GET", "/personnel")
def retrievePersonnel():
return personnelDB.retrieveAll()
However, I also want to be able to add variables to the path. For example, /personnel/3 would fetch a personnel with an ID of 3. The way I want to go about doing this is providing a sort of 'variable mask' to the path passed into the route_handler. A new example would be:
#route_handler("GET", "/personnel/{ID}")
def retrievePersonnelByID(ID):
return personnelDB.retrieveByID(ID)
The decorator's purpose would be to compare the path literal (/personnel/3 for example) with the path 'mask' (/personnel/{ID}) and pass the 3 into the decorated function. I'm assuming the solution would be to compare the two strings, keep the differences, and place the difference in the literal into a variable named after the difference in the mask (minus the curly braces). But then I'd also have to check to see if the literal matches the mask minus the {} variable catchers...
tl;dr - is there a way to do
stringMask("/personnel/{ID}", "/personnel/5") -> True, {"ID": 5}
stringMask("/personnel/{ID}", "/flowers/5") -> False, {}
stringMask("/personnel/{ID}", "/personnel") -> False, {}
Since I'm guessing there isn't really an easy solution to this, I'm gonna post the solution I did. I was hoping there would be something I could do in a few lines, but oh well ¯_(ツ)_/¯
def checkPath(self, mask):
mask_parts = mask[1:].split("/")
path_parts = self.path[1:].rstrip("/").split("/")
if len(mask_parts) != len(path_parts):
self.urlVars = {}
return False
vars = {}
for i in range(len(mask_parts)):
if mask_parts[i][0] == "{":
vars[mask_parts[i][1:-1]] = path_parts[i]
else:
if mask_parts[i] != path_parts[i]:
self.urlVars = {}
return False
self.url_vars = vars # save extracted variables
return True
A mask is just a string like one of the ones below:
/resource
/resource/{ID}
/group/{name}/resource/{ID}

How can I apply a prefix to dictionary access?

I'm imitating the behavior of the ConfigParser module to write a highly specialized parser that exploits some well-defined structure in the configuration files for a particular application I work with. Several sections of the config file contain hundreds of variable and routine mappings prefixed with either Variable_ or Routine_, like this:
[Map.PRD]
Variable_FOO=LOC1
Variable_BAR=LOC2
Routine_FOO=LOC3
Routine_BAR=LOC4
...
[Map.SHD]
Variable_FOO=LOC1
Variable_BAR=LOC2
Routine_FOO=LOC3
Routine_BAR=LOC4
...
I'd like to maintain the basic structure of ConfigParser where each section is stored as a single dictionary, so users would still have access to the classic syntax:
config.content['Mappings']['Variable_FOO'] = 'LOC1'
but also be able to use a simplified API that drills down to this section:
config.vmapping('PRD')['FOO'] = 'LOC1'
config.vmapping('PRD')['BAR'] = 'LOC2'
config.rmapping('PRD')['FOO'] = 'LOC3'
config.rmapping('PRD')['BAR'] = 'LOC4'
Currently I'm implementing this by storing the section in a special subclass of dict to which I've added a prefix attribute. The variable and routine properties of the parser set the prefix attribute of the dict-like object to 'Variable_' or 'Routine_' and then modified __getitem__ and __setitem__ attributes of the dict handle gluing the prefix together with the key to access the appropriate item. It's working, but involves a lot of boilerplate to implement all the associated niceties like supporting iteration.
I suppose my ideal solution would be do dispense with the subclassed dict and have have the variable and routine properties somehow present a "view" of the plain dict object underneath without the prefixes.
Update
Here's the solution I implemented, largely based on #abarnet's answer:
class MappingDict(object):
def __init__(self, prefix, d):
self.prefix, self.d = prefix, d
def prefixify(self, name):
return '{}_{}'.format(self.prefix, name)
def __getitem__(self, name):
name = self.prefixify(name)
return self.d.__getitem__(name)
def __setitem__(self, name, value):
name = self.prefixify(name)
return self.d.__setitem__(name, value)
def __delitem__(self, name):
name = self.prefixify(name)
return self.d.__delitem__(name)
def __iter__(self):
return (key.partition('_')[-1] for key in self.d
if key.startswith(self.prefix))
def __repr__(self):
return 'MappingDict({})'.format(dict.__repr__(self))
class MyParser(object):
SECTCRE = re.compile(r'\[(?P<header>[^]]+)\]')
def __init__(self, filename):
self.filename = filename
self.content = {}
lines = [x.strip() for x in open(filename).read().splitlines()
if x.strip()]
for line in lines:
match = re.match(self.SECTCRE, line)
if match:
section = match.group('header')
self.content[section] = {}
else:
key, sep, value = line.partition('=')
self.content[section][key] = value
def write(self, filename):
fp = open(filename, 'w')
for section in sorted(self.content, key=sectionsort):
fp.write("[%s]\n" % section)
for key in sorted(self.content[section], key=cpfsort):
value = str(self.content[section][key])
fp.write("%s\n" % '='.join([key,value]))
fp.write("\n")
fp.close()
def vmapping(self, nsp):
section = 'Map.{}'.format(nsp)
return MappingDict('Variable', self.content[section])
def rmapping(self, nsp):
section = 'Map.{}'.format(nsp)
return MappingDict('Routine', self.content[section])
It's used like this:
config = MyParser('myfile.cfg')
vmap = config.vmapping('PRD')
vmap['FOO'] = 'LOC5'
vmap['BAR'] = 'LOC6'
config.write('newfile.cfg')
The resulting newfile.cfg reflects the LOC5 and LOC6 changes.
I don't think you want inheritance here. You end up with two separate dict objects which you have to create on load and then paste back together on save…
If that's acceptable, you don't even need to bother with the prefixing during normal operations; just do the prefixing while saving, like this:
class Config(object):
def save(self):
merged = {'variable_{}'.format(key): value for key, value
in self.variable_dict.items()}
merged.update({'routine_{}'.format(key): value for key, value
in self.routine_dict.items()}
# now save merged
If you want that merged object to be visible at all times, but don't expect to be called on that very often, make it a #property.
If you want to access the merged dictionary regularly, at the same time you're accessing the two sub-dictionaries, then yes, you want a view:
I suppose my ideal solution would be do dispense with the subclassed dict and have have the global and routine properties somehow present a "view" of the plain dict object underneath without the prefixes.
This is going to be very hard to do with inheritance. Certainly not with inheritance from dict; inheritance from builtins.dict_items might work if you're using Python 3, but it still seems like a stretch.
But with delegation, it's easy. Each sub-dictionary just holds a reference to the parent dict:
class PrefixedDict(object):
def __init__(self, prefix, d):
self.prefix, self.d = prefix, d
def prefixify(self, key):
return '{}_{}'.format(self.prefix, key)
def __getitem__(self, key):
return self.d.__getitem__(self.prefixify(key))
def __setitem__(self, key, value):
return self.d.__setitem__(self.prefixify(key), value)
def __delitem__(self, key):
return self.d.__delitem__(self.prefixify(key))
def __iter__(self):
return (key[len(self.prefix):] for key in self.d
if key.startswith(self.prefix)])
You don't get any of the dict methods for free that way—but that's a good thing, because they were mostly incorrect anyway, right? Explicitly delegate the ones you want. (If you do have some you want to pass through as-is, use __getattr__ for that.)
Besides being conceptually simpler and harder to screw up through accidentally forgetting to override something, this also means that PrefixDict can work with any type of mapping, not just a dict.
So, no matter which way you go, where and how do these objects get created?
The easy answer is that they're attributes that you create when you construct a Config:
def __init__(self):
self.d = {}
self.variable = PrefixedDict('Variable', self.d)
self.routine = PrefixedDict('Routine', self.d)
If this needs to be dynamic (e.g., there can be an arbitrary set of prefixes), create them at load time:
def load(self):
# load up self.d
prefixes = set(key.split('_')[0] for key in self.d)
for prefix in prefixes:
setattr(self, prefix, PrefixedDict(prefix, self.d)
If you want to be able to create them on the fly (so config.newprefix['foo'] = 3 adds 'Newprefix_foo'), you can do this instead:
def __getattr__(self, name):
return PrefixedDict(name.title(), self.d)
But once you're using dynamic attributes, you really have to question whether it isn't cleaner to use dictionary (item) syntax instead, like config['newprefix']['foo']. For one thing, that would actually let you call one of the sub-dictionaries 'global', as in your original question…
Or you can first build the dictionary syntax, use what's usually referred to as an attrdict (search ActiveState recipes and PyPI for 3000 implementations…), which lets you automatically make config.newprefix mean config['newprefix'], so you can use attribute syntax when you have valid identifiers, but fall back to dictionary syntax when you don't.
There are a couple of options for how to proceed.
The simplest might be to use nested dictionaries, so Variable_FOO becomes config["variable"]["FOO"]. You might want to use a defaultdict(dict) for the outer dictionary so you don't need to worry about initializing the inner ones when you add the first value to them.
Another option would be to use tuple keys in a single dictionary. That is, Variable_FOO would become config[("variable", "FOO")]. This is easy to do with code, since you can simply assign to config[tuple(some_string.split("_"))]. Though, I suppose you could also just use the unsplit string as your key in this case.
A final approach allows you to use the syntax you want (where Variable_FOO is accessed as config.Variable["FOO"]), by using __getattr__ and a defaultdict behind the scenes:
from collections import defaultdict
class Config(object):
def __init__(self):
self._attrdicts = defaultdict(dict)
def __getattr__(self, name):
return self._attrdicts[name]
You could extend this with behavior for __setattr__ and __delattr__ but it's probably not necessary. The only serious limitation to this approach (given the original version of the question), is that the attributes names (like Variable) must be legal Python identifiers. You can't use strings with leading numbers, Python keywords (like global) or strings containing whitespace characters.
A downside to this approach is that it's a bit more difficult to use programatically (by, for instance, your config-file parser). To read a value of Variable_FOO and save it to config.Variable["FOO"] you'll probably need to use the global getattr function, like this:
name, value = line.split("=")
prefix, suffix = name.split("_")
getattr(config, prefix)[suffix] = value

Categories

Resources