list of functions Python - python

I have a list of patterns:
patterns_trees = [response.css("#Header").xpath("//a/img/#src"),
response.css("#HEADER").xpath("//a/img/#src"),
response.xpath("//header//a/img/#src"),
response.xpath("//a[#href='"+response.url+'/'+"']/img/#src"),
response.xpath("//a[#href='/']/img/#src")
]
After I traverse it and find the right pattern I have to send the pattern as an argument to a callback function
for pattern_tree in patterns_trees:
...
pattern_response = scrapy.Request(...,..., meta={"pattern_tree": pattern_tree.extract_first()})
By doing this I get the value of the regex not the pattern
THINGS I TRIED:
I tried isolating the patterns in a separate class but still I have the problem that I can not store them as pattern but as values.
I tried to save them as strings and maybe I can make it work but
What is the most efficient way of storing list of functions
UPDATE: Possible solution but too hardcoded and it's too problematic when I want to add more patterns:
def patter_0(response):
response.css("#Header").xpath("//a/img/#src")
def patter_1(response):
response.css("#HEADER").xpath("//a/img/#src")
.....
class patternTrees:
patterns = [patter_0,...,patter_n]
def length_patterns(self):
return len(patterns)

If you're willing to consider reformatting your list of operations, then this is a somewhat neat solution. I've changed the list of operations to a list of tuples. Each tuple contains (a ref to) the appropriate function, and another tuple consisting of arguments.
It's fairly easy to add new operations to the list: just specify what function to use, and the appropriate arguments.
If you want to use the result from one operation as an argument in the next: You will have to return the value from execute() and process it in the for loop.
I've replaced the calls to response with prints() so that you can test it easily.
def response_css_ARG_xpath_ARG(args):
return "response.css(\"%s\").xpath(\"%s\")" % (args[0],args[1])
#return response.css(args[0]).xpath(args[1])
def response_xpath_ARG(arg):
return "return respons.xpath(\"%s\")" % (arg)
#return response.xpath(arg)
def execute(function, args):
response = function(args)
# do whatever with response
return response
response_url = "https://whatever.com"
patterns_trees = [(response_css_ARG_xpath_ARG, ("#Header", "//a/img/#src")),
(response_css_ARG_xpath_ARG, ("#HEADER", "//a/img/#src")),
(response_xpath_ARG, ("//header//a/img/#src")),
(response_xpath_ARG, ("//a[#href='"+response_url+"/"+"']/img/#src")),
(response_xpath_ARG, ("//a[#href='/']/img/#src"))]
for pattern_tree in patterns_trees:
print(execute(pattern_tree[0], pattern_tree[1]))
Note that execute() can be omitted! Depending on if you need to process the result or not. Without the executioner, you may just call the function directly from the loop:
for pattern_tree in patterns_trees:
print(pattern_tree[0](pattern_tree[1]))

Not sure I understand what you're trying to do, but could you make your list a list of lambda functions like so:
patterns_trees = [
lambda response : response.css("#Header").xpath("//a/img/#src"),
...
]
And then, in your loop:
for pattern_tree in patterns_trees:
intermediate_response = scrapy.Request(...) # without meta kwarg
pattern_response = pattern_tree(intermediate_response)
Or does leaving the meta away have an impact on the response object?

Related

How to write multiple conditions in one line

Is there a way that we can add multiple conditions in a single line? this is not working atm, but if it were separated into different sequences, it would work.
Just to mention, I used and/or and still not working
user_input = remove_punct(user_input), remove_spaces(user_input), user_input.lower
return user_input
Just nest all the operations! Your approach doesn't work as a tuple is being created on the right hand side and the value of user_input doesn't get updated.
Try this
user_input = remove_punct(remove_spaces(user_input.lower()))
Edit:
As pointed out by #S3DEV, the above solution assumes that the functions remove_punct, remove_spaces return the updated value of the input after performing the operation
You could create a list of your functions you want to use, and then iterate over them storing the result back into the input string. Alternativly if your functions are just using string methods then you can just keep chaining them. Lastly if you want to chain them like in string but need your own methods you could write your own class.
def remove_space(some_string: str):
return some_string.strip()
def remove_punct(some_string: str):
return some_string.replace("!", "")
def clean(some_string):
functions = remove_space, remove_punct, str.lower
for fun in functions:
some_string = fun(some_string)
return some_string
def clean2(some_string: str):
return some_string.strip().replace("!", "").lower()
print(clean(" hello world! "))
print(clean2(" hello world! "))

How to extract substrings from a masked Python string?

I'm writing an HTTP Request Handler with intuitive routing. My goal is to be able to apply a decorator to a function which states the HTTP method being used as well as the path to be listened on for executing the decorated function. Here's a sample of this implementation:
#route_handler("GET", "/personnel")
def retrievePersonnel():
return personnelDB.retrieveAll()
However, I also want to be able to add variables to the path. For example, /personnel/3 would fetch a personnel with an ID of 3. The way I want to go about doing this is providing a sort of 'variable mask' to the path passed into the route_handler. A new example would be:
#route_handler("GET", "/personnel/{ID}")
def retrievePersonnelByID(ID):
return personnelDB.retrieveByID(ID)
The decorator's purpose would be to compare the path literal (/personnel/3 for example) with the path 'mask' (/personnel/{ID}) and pass the 3 into the decorated function. I'm assuming the solution would be to compare the two strings, keep the differences, and place the difference in the literal into a variable named after the difference in the mask (minus the curly braces). But then I'd also have to check to see if the literal matches the mask minus the {} variable catchers...
tl;dr - is there a way to do
stringMask("/personnel/{ID}", "/personnel/5") -> True, {"ID": 5}
stringMask("/personnel/{ID}", "/flowers/5") -> False, {}
stringMask("/personnel/{ID}", "/personnel") -> False, {}
Since I'm guessing there isn't really an easy solution to this, I'm gonna post the solution I did. I was hoping there would be something I could do in a few lines, but oh well ¯_(ツ)_/¯
def checkPath(self, mask):
mask_parts = mask[1:].split("/")
path_parts = self.path[1:].rstrip("/").split("/")
if len(mask_parts) != len(path_parts):
self.urlVars = {}
return False
vars = {}
for i in range(len(mask_parts)):
if mask_parts[i][0] == "{":
vars[mask_parts[i][1:-1]] = path_parts[i]
else:
if mask_parts[i] != path_parts[i]:
self.urlVars = {}
return False
self.url_vars = vars # save extracted variables
return True
A mask is just a string like one of the ones below:
/resource
/resource/{ID}
/group/{name}/resource/{ID}

Is there a "Pythonic" way of creating a list with conditional items?

I've got this block of code in a real Django function. If certain conditions are met, items are added to the list.
ret = []
if self.taken():
ret.append('taken')
if self.suggested():
ret.append('suggested')
#.... many more conditions and appends...
return ret
It's very functional. You know what it does, and that's great...
But I've learned to appreciate the beauty of list and dict comprehensions.
Is there a more Pythonic way of phrasing this construct, perhaps that initialises and populates the array in one blow?
Create a mapping dictionary:
self.map_dict = {'taken': self.taken,
'suggested': self.suggested,
'foo' : self.bar}
[x for x in ['taken', 'suggested', 'foo'] if self.map_dict.get(x, lambda:False)()]
Related: Most efficient way of making an if-elif-elif-else statement when the else is done the most?
Not a big improvement, but I'll mention it:
def populate():
if self.taken():
yield 'taken'
if self.suggested():
yield 'suggested'
ret = list(populate())
Can we do better? I'm skeptical. Clearly there's a need of using another syntax than a list literal, because we no longer have the "1 expression = 1 element in result" invariant.
Edit:
There's a pattern to our data, and it's a list of (condition, value) pairs. We might try to exploit it using:
[value
for condition, value
in [(self.taken(), 'taken'),
(self.suggested(), 'suggested')]
if condition]
but this still is a restriction for how you describe your logic, still has the nasty side effect of evaluating all values no matter the condition (unless you throw in a ton of lambdas), and I can't really see it as an improvement over what we've started with.
For this very specific example, I could do:
return [x for x in ['taken', 'suggested', ...] if getattr(self, x)()]
But again, this only works where the item and method it calls to check have the same name, ie for my exact code. It could be adapted but it's a bit crusty. I'm very open to other solutions!
I don't know why we are appending strings that match the function names, but if this is a general pattern, we can use that. Functions have a __name__ attribute and I think it always contains what you want in the list.
So how about:
return [fn.__name__ for fn in (self.taken, self.suggested, foo, bar, baz) if fn()]
If I understand the problem correctly, this works just as well for non-member functions as for member functions.
EDIT:
Okay, let's add a mapping dictionary. And split out the function names into a tuple or list.
fns_to_check = (self.taken, self.suggested, foo, bar, baz)
# This holds only the exceptions; if a function isn't in here,
# we will use the .__name__ attribute.
fn_name_map = {foo:'alternate', bar:'other'}
def fn_name(fn):
"""Return name from exceptions map, or .__name__ if not in map"""
return fn_name_map.get(fn, fn.__name__)
return [fn_name(fn) for fn in fns_to_check if fn()]
You could also just use #hcwhsa's mapping dictionary answer. The main difference here is I'm suggesting just mapping the exceptions.
In another instance (where a value will be defined but might be None - a Django model's fields in my case), I've found that just adding them and filtering works:
return filter(None, [self.user, self.partner])
If either of those is None, They'll be removed from the list. It's a little more intensive than just checking but still fairly easy way of cleaning the output without writing a book.
One option is to have a "sentinel"-style object to take the place of list entries that fail the corresponding condition. Then a function can be defined to filter out the missing items:
# "sentinel indicating a list element that should be skipped
Skip = object()
def drop_missing(itr):
"""returns an iterator yielding all but Skip objects from the given itr"""
return filter(lambda v: v is not Skip, itr)
With this simple machinery, we come reasonably close to list-comprehension style syntax:
return drop_skips([
'taken' if self.taken else Skip,
'suggested' if self.suggested else Skip,
100 if self.full else Skip,
// many other values and conditions
])
ret = [
*('taken' for _i in range(1) if self.taken()),
*('suggested' for _i in range(1) if self.suggested()),
]
The idea is to use the list comprehension syntax to construct either a single element list with item 'taken', if self.taken() is True, or an empty list, if self.taken() is False, and then unpack it.

how to make my own mapping type in python

I have created a class MyClassthat contains a lot of simulation data. The class groups simulation results for different simulations that have a similar structure. The results can be retreived with a MyClass.get(foo) method. It returns a dictionary with simulationID/array pairs, array being the value of foo for each simulation.
Now I want to implement a method in my class to apply any function to all the arrays for foo. It should return a dictionary with simulationID/function(foo) pairs.
For a function that does not need additional arguments, I found the following solution very satisfying (comments always welcome :-) ):
def apply(self, function, variable):
result={}
for k,v in self.get(variable).items():
result[k] = function(v)
return result
However, for a function requiring additional arguments I don't see how to do it in an elegant way. A typical operation would be the integration of foo with bar as x-values like np.trapz(foo, x=bar), where both foo and bar can be retreived with MyClass.get(...)
I was thinking in this direction:
def apply(self, function_call):
"""
function_call should be a string with the complete expression to evaluate
eg: MyClass.apply('np.trapz(QHeat, time)')
"""
result={}
for SID in self.simulations:
result[SID] = eval(function_call, locals=...)
return result
The problem is that I don't know how to pass the locals mapping object. Or maybe I'm looking in a wrong direction. Thanks on beforehand for your help.
Roel
You have two ways. The first is to use functools.partial:
foo = self.get('foo')
bar = self.get('bar')
callable = functools.partial(func, foo, x=bar)
self.apply(callable, variable)
while the second approach is to use the same technique used by partial, you can define a function that accept arbitrary argument list:
def apply(self, function, variable, *args, **kwds):
result={}
for k,v in self.get(variable).items():
result[k] = function(v, *args, **kwds)
return result
Note that in both case the function signature remains unchanged. I don't know which one I'll choose, maybe the first case but I don't know the context on you are working on.
I tried to recreate (the relevant part of) the class structure the way I am guessing it is set up on your side (it's always handy if you can provide a simplified code example for people to play/test).
What I think you are trying to do is translate variable names to variables that are obtained from within the class and then use those variables in a function that was passed in as well. In addition to that since each variable is actually a dictionary of values with a key (SID), you want the result to be a dictionary of results with the function applied to each of the arguments.
class test:
def get(self, name):
if name == "valA":
return {"1":"valA1", "2":"valA2", "3":"valA3"}
elif name == "valB":
return {"1":"valB1", "2":"valB2", "3":"valB3"}
def apply(self, function, **kwargs):
arg_dict = {fun_arg: self.get(sim_args) for fun_arg, sim_args in kwargs.items()}
result = {}
for SID in arg_dict[kwargs.keys()[0]]:
fun_kwargs = {fun_arg: sim_dict[SID] for fun_arg, sim_dict in arg_dict.items()}
result[SID] = function(**fun_kwargs)
return result
def joinstrings(string_a, string_b):
return string_a+string_b
my_test = test()
result = my_test.apply(joinstrings, string_a="valA", string_b="valB")
print result
So the apply method gets an argument dictionary, gets the class specific data for each of the arguments and creates a new argument dictionary with those (arg_dict).
The SID keys are obtained from this arg_dict and for each of those, a function result is calculated and added to the result dictionary.
The result is:
{'1': 'valA1valB1', '3': 'valA3valB3', '2': 'valA2valB2'}
The code can be altered in many ways, but I thought this would be the most readable. It is of course possible to join the dictionaries instead of using the SID's from the first element etc.

Python Newbie: Returning Multiple Int/String Results in Python

I have a function that has several outputs, all of which "native", i.e. integers and strings. For example, let's say I have a function that analyzes a string, and finds both the number of words and the average length of a word.
In C/C++ I would use # to pass 2 parameters to the function. In Python I'm not sure what's the right solution, because integers and strings are not passed by reference but by value (at least this is what I understand from trial-and-error), so the following code won't work:
def analyze(string, number_of_words, average_length):
... do some analysis ...
number_of_words = ...
average_length = ...
If i do the above, the values outside the scope of the function don't change. What I currently do is use a dictionary like so:
def analyze(string, result):
... do some analysis ...
result['number_of_words'] = ...
result['average_length'] = ...
And I use the function like this:
s = "hello goodbye"
result = {}
analyze(s, result)
However, that does not feel right. What's the correct Pythonian way to achieve this? Please note I'm referring only to cases where the function returns 2-3 results, not tens of results. Also, I'm a complete newbie to Python, so I know I may be missing something trivial here...
Thanks
python has a return statement, which allows you to do the follwing:
def func(input):
# do calculation on input
return result
s = "hello goodbye"
res = func(s) # res now a result dictionary
but you don't need to have result at all, you can return a few values like so:
def func(input):
# do work
return length, something_else # one might be an integer another string, etc.
s = "hello goodbye"
length, something = func(s)
If you return the variables in your function like this:
def analyze(s, num_words, avg_length):
# do something
return s, num_words, avg_length
Then you can call it like this to update the parameters that were passed:
s, num_words, avg_length = analyze(s, num_words, avg_length)
But, for your example function, this would be better:
def analyze(s):
# do something
return num_words, avg_length
In python you don't modify parameters in the C/C++ way (passing them by reference or through a pointer and doing modifications in situ).There are some reasons such as that the string objects are inmutable in python. The right thing to do is to return the modified parameters in a tuple (as SilentGhost suggested) and rebind the variables to the new values.
If you need to use method arguments in both directions, you can encapsulate the arguments to the class and pass object to the method and let the method use its properties.

Categories

Resources