This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm learning Python and wanted to see if anyone could help break down and understand what this function does step by step?
def label(self, index, *args):
"""
Label each axes one at a time
args are of the form <label 1>,...,<label n>
APIPARAM: chxl
"""
self.data['labels'].append(
str('%s:|%s'%(index, '|'.join(map(str,args)) )).replace('None','')
)
return self.parent
It's a good idea to change the formatting, before trying to understand what it does:
def label(self, index, *args):
"""
Label each axes one at a time
args are of the form <label 1>,...,<label n>
APIPARAM: chxl
"""
self.data['labels'].append(
str( '%s:|%s' % \
( index, '|'.join( map( str,args ) ) )
).replace( 'None', '' )
)
return self.parent
So:
it appends something to self.data[ 'labels' ] list. We know this because append() is a method of list object.
This something is a string such that:
string is of the form xxx:|yyy
xxx is replaced with the value of argument index
yyy is replaced with all the other arguments converted to strings (map(str,args)) and joined with | character (join(...)) so resulting in something like 'a|b|None|c'
every occurence of None in the string above is replaced with an empty string and this is appended to the list
EDIT:
As #abarnert pointed out it might be good to explain what does *args mean and why later on it's used as args, so here it goes.
*args (which is an asterisk + an arbitrary name) means "any number of anonymous arguments available further in args list". One can also use **kwargs - note two asterisk which is used for accepting keyworded arguments, i.e. ones passed to the function in the form of foo = bar where foo is the name of the argument and bar is its value rather than just bar.
As said above args and kwargs are arbitrary, one could just as well use *potatoes or **potatoes but using args and kwargs is a convention in Python (sometimes people also use **kw instead of **kwargs, but the meaning is the same - any number of anonymous and any number of keyworded arguments respectively).
Both are used if the number of arguments which the function/method should accept is not known beforehand - consider for a example a function which processes names of the party guests, one may not know how many there may be, so defining a following function makes sense:
def add_party_quests( *quests ):
for guest in quests:
do_some_processing( guest )
Then both calls below are valid:
add_party_guests( 'John' )
add_party_guests( 'Beth', 'Tim', 'Fred' )
This is also explained in this SO post: https://stackoverflow.com/a/287101/680238
I assume the misleading lines are:
self.data['labels'].append(
str('%s:|%s'%(index, '|'.join(map(str,args)) )).replace('None','')
)
Those can be formatted more clearly to aid reading:
self.data['labels'].append(
str('%s:|%s' % (
index,
'|'.join(map(str, args))
)).replace('None', '')
)
But can be better rewritten as:
self.data['labels'].append( # append to the list at `self.data['labels']`
'%s:|%s' % ( # a string of the format X:|Y
index, # where X is the index
'|'.join( # and Y is a list joined with '|'s
str(arg) if arg is not None else # with each item in the list
'' for arg in args # being it's string representation
)
)
)
*args turns into a list of arguments called args. self.data['labels'] looks to be a list. .append adds an item to the list. The item appended is returned by the string returned by the right most part, replace. To parse what string that is, start inside the parens and work your way out. map(str,args) converts all the args to strings and returns that list. '|'.join( takes the output of map and joins it into a single string, of the general pattern elem1|elem2|elem3..., it then uses the format string '%s:|%s'. The first %s gets replaced by the value of index, the second by the string output by '|'.join. It then calls replace on this string, replacing all occurences of 'None' with ''. Then it returns self.parent.
Related
QUESTIONS
This is a long post, so I will highlight my main two questions now before giving details:
How can one succinctly allow for optional matched parentheses/brackets around an expression?
How does one properly parse the content of nested_expr? This answer suggests that this function is not quite appropriate for this, and infix_notation is better, but that doesn't seem to fit my use case (I don't think).
DETAILS
I am working on a grammar to parse prolog strings. The data I have involves a lot of optional brackets or parentheses.
For example, both predicate([arg1, arg2, arg3]) and predicate(arg1, arg2, arg3) are legal and appear in the data.
My full grammar is a little complicated, and likely could be cleaned up, but I will paste it here for reproducibility. I have a couple versions of the grammar as I found new data that I had to account for. The first one works with the following example string:
pred(Var, arg_name1:arg#arg_type, arg_name2:(sub_arg1, sub_arg2))
For some visual clarity, I am turning the parsed strings into graphs, so this is what this one should look like:
Note that the arg2:(sub_arg1, sub_arg1) is slightly idiosyncratic syntax where the things inside the parens are supposed to be thought of as having an AND operator between them. The only thing indicating this is the fact that this wrapped expression essentially appears "naked" (i.e. has no predicate name of its own, it's just some values lumped together with parens).
VERSION 1: works on the above string
# GRAMMAR VER 1
predication = pp.Forward()
join_predication = pp.Forward()
entity = pp.Forward()
args_list = pp.Forward()
# atoms are used either as predicate names or bottom level argument values
# str_atoms are just quoted strings which may also appear as arguments
atom = pp.Word(pp.alphanums + '_' + '.')
str_atom = pp.QuotedString("'")
# TYPICAL ARGUMENT: arg_name:ARG_VALUE, where the ARG_VALUE may be an entity, join_predication, predication, or just an atom.
# Note that the arg_name is optional and may not always appear
# EXAMPLES:
# with name: pred(arg1:val1, arg2:val2)
# without name: pred(val1, val2)
argument = pp.Group(pp.Opt(atom("arg_name") + pp.Suppress(":")) + (entity | join_predication | predication | atom("arg_value") | str_atom("arg_value")))
# List of arguments
args_list = pp.Opt(pp.Suppress("[")) + pp.delimitedList(argument) + pp.Opt(pp.Suppress("]"))
# As in the example string above, sometimes predications are grouped together in parentheses and are meant to be understood as having an AND operator between them when evaluating the truth of both together
# EXAMPLE: pred(arg1:(sub_pred1, subpred2))
# I am just treating it as an args_list inside obligatory parentheses
join_predication <<= pp.Group(pp.Suppress("(") + args_list("args_list") + pp.Suppress(")"))("join_predication")
# pred_name with optional arguments (though I've never seen one without arguments, just in case)
predication <<= pp.Group(atom("pred_name") + pp.Suppress("(") + pp.Opt(args_list)("args_list") + pp.Suppress(")"))("predication")
# ent_name with optional arguments and a #type
entity <<= (pp.Group(((atom("ent_name")
+ pp.Suppress("(") + pp.Opt(args_list)("args_list") + pp.Suppress(")"))
| str_atom("ent_name") | atom("ent_name"))
+ pp.Suppress("#") + atom("type"))("entity"))
# starter symbol
lf_fragment = entity | join_predication | predication
Although this works, I came across another very similar string which used brackets instead of parentheses for a join_predication:
pred(Var, arg_name1:arg#arg_type, arg_name2:[sub_arg1, sub_arg2])
This broke my parser seemingly because the brackets are used in other places and because they are often optional, it could mistakenly match one with the wrong parser element as I am doing nothing to enforce that they must go together. For this I thought to turn to nested_expr, but this caused further problems because as mentioned in this answer, parsing the elements inside of a nested_expr doesn't work very well, and I have lost a lot of the substructure I need for the graphs I'm building.
VERSION 2: using nested_expr
# only including those expressions that have been changed
# args_list might not have brackets
args_list = pp.nested_expr("[", "]", pp.delimitedList(argument)) | pp.delimitedList(argument)
# join_predication is an args_list with obligatory wrapping parens/brackets
join_predication <<= pp.nested_expr("(", ")", args_list("args_list"))("join_predication") | pp.nested_expr("[", "]", args_list("args_list"))("join_predication")
I likely need to ensure matching for predication and entity, but haven't for now.
Using the above grammar, I can parse both example strings, but I lose the named structure that I had before.
In the original grammar, parse_results['predication']['args_list'] was a list of every argument, exactly as I expected. In the new grammar, it only contains the first argument, Var, in the example strings.
I have a list of patterns:
patterns_trees = [response.css("#Header").xpath("//a/img/#src"),
response.css("#HEADER").xpath("//a/img/#src"),
response.xpath("//header//a/img/#src"),
response.xpath("//a[#href='"+response.url+'/'+"']/img/#src"),
response.xpath("//a[#href='/']/img/#src")
]
After I traverse it and find the right pattern I have to send the pattern as an argument to a callback function
for pattern_tree in patterns_trees:
...
pattern_response = scrapy.Request(...,..., meta={"pattern_tree": pattern_tree.extract_first()})
By doing this I get the value of the regex not the pattern
THINGS I TRIED:
I tried isolating the patterns in a separate class but still I have the problem that I can not store them as pattern but as values.
I tried to save them as strings and maybe I can make it work but
What is the most efficient way of storing list of functions
UPDATE: Possible solution but too hardcoded and it's too problematic when I want to add more patterns:
def patter_0(response):
response.css("#Header").xpath("//a/img/#src")
def patter_1(response):
response.css("#HEADER").xpath("//a/img/#src")
.....
class patternTrees:
patterns = [patter_0,...,patter_n]
def length_patterns(self):
return len(patterns)
If you're willing to consider reformatting your list of operations, then this is a somewhat neat solution. I've changed the list of operations to a list of tuples. Each tuple contains (a ref to) the appropriate function, and another tuple consisting of arguments.
It's fairly easy to add new operations to the list: just specify what function to use, and the appropriate arguments.
If you want to use the result from one operation as an argument in the next: You will have to return the value from execute() and process it in the for loop.
I've replaced the calls to response with prints() so that you can test it easily.
def response_css_ARG_xpath_ARG(args):
return "response.css(\"%s\").xpath(\"%s\")" % (args[0],args[1])
#return response.css(args[0]).xpath(args[1])
def response_xpath_ARG(arg):
return "return respons.xpath(\"%s\")" % (arg)
#return response.xpath(arg)
def execute(function, args):
response = function(args)
# do whatever with response
return response
response_url = "https://whatever.com"
patterns_trees = [(response_css_ARG_xpath_ARG, ("#Header", "//a/img/#src")),
(response_css_ARG_xpath_ARG, ("#HEADER", "//a/img/#src")),
(response_xpath_ARG, ("//header//a/img/#src")),
(response_xpath_ARG, ("//a[#href='"+response_url+"/"+"']/img/#src")),
(response_xpath_ARG, ("//a[#href='/']/img/#src"))]
for pattern_tree in patterns_trees:
print(execute(pattern_tree[0], pattern_tree[1]))
Note that execute() can be omitted! Depending on if you need to process the result or not. Without the executioner, you may just call the function directly from the loop:
for pattern_tree in patterns_trees:
print(pattern_tree[0](pattern_tree[1]))
Not sure I understand what you're trying to do, but could you make your list a list of lambda functions like so:
patterns_trees = [
lambda response : response.css("#Header").xpath("//a/img/#src"),
...
]
And then, in your loop:
for pattern_tree in patterns_trees:
intermediate_response = scrapy.Request(...) # without meta kwarg
pattern_response = pattern_tree(intermediate_response)
Or does leaving the meta away have an impact on the response object?
In python we can do this:
def myFun1(one = '1', two = '2'):
...
Then we can call the function and pass the arguments by their name:
myFun1(two = 'two', one = 'one')
Also, we can do this:
def myFun2(**kwargs):
print kwargs.get('one', 'nothing here')
myFun2(one='one')
So I was wondering if it is possible to combine both methods like:
def myFun3(name, lname, **other_info):
...
myFun3(lname='Someone', name='myName', city='cityName', otherInfo='blah')
In general what combinations can we do?
Thanks and sorry for my silly question.
The general idea is:
def func(arg1, arg2, ..., kwarg1=default, kwarg2=default, ..., *args, **kwargs):
...
You can use as many of those as you want. The * and ** will 'soak up' any remaining values not otherwise accounted for.
Positional arguments (provided without defaults) can't be given by keyword, and non-default arguments can't follow default arguments.
Note Python 3 also adds the ability to specify keyword-only arguments by having them after *:
def func(arg1, arg2, *args, kwonlyarg=default):
...
You can also use * alone (def func(a1, a2, *, kw=d):) which means that no arguments are captured, but anything after is keyword-only.
So, if you are in 3.x, you could produce the behaviour you want with:
def myFun3(*, name, lname, **other_info):
...
Which would allow calling with name and lname as keyword-only.
Note this is an unusual interface, which may be annoying to the user - I would only use it in very specific use cases.
In 2.x, you would need to manually make this by parsing **kwargs.
You can add your named arguments along with kwargs. If the keys are available in the calling function It will taken to your named argument otherwise it will be taken by the kwargs dictionary.
def add(a=1, b=2,**c):
res = a+b
for items in c:
res = res + c[items]
print(res)
add(2,3)
5
add(b=4, a =3)
7
add(a =1,b=2,c=3,d=4)
10
It's possible at least for Python 2.7. Keyword arguments get assigned to positional parameters by name, so you can do
In [34]: def func(name, lname, **kwargs):
print 'name='+name, 'lname='+lname
print kwargs
....:
In [35]: func(lname='lname_val', name='name_val', city='cityName', otherInfo='blah')
name=name_val lname=lname_val
{'city': 'cityName', 'otherInfo': 'blah'}
Official docs state it that way:
"If keyword arguments are present, they are first converted to positional arguments, as follows. First, a list of unfilled slots is created for the formal parameters. If there are N positional arguments, they are placed in the first N slots. Next, for each keyword argument, the identifier is used to determine the corresponding slot (if the identifier is the same as the first formal parameter name, the first slot is used, and so on). If the slot is already filled, a TypeError exception is raised. Otherwise, the value of the argument is placed in the slot, filling it (even if the expression is None, it fills the slot)."
https://docs.python.org/2/reference/expressions.html#calls
In my exercise, i have to use **kwargs to print the arguments entered in my function in alphabetical order.
Here is what I have for now:
def afficher(**kwargs):
if kwargs is not None:
for i in kwargs:
print (i)
afficher(helpme=7,plz=10)
returns:
plz
helpme
My concern is:
I'd like them returned in alphabetical order
I'd like them returned as:
helpme = 7
plz = 10
Thanks in advance !
Plain dictionaries are not sorted, as their keys are hashed and end up stored accordingly.
In your case you can scan the sorted keys of the kwargs dictionary
for i in sorted(kwargs):
So the code becomes
def afficher(**kwargs):
for i in sorted(kwargs):
print ('{}={}'.format(i,kwargs[i]))
afficher(helpme=7,plz=10)
which always produces
helpme=7
plz=10
Note: as you can see, I have removed the if, that I presume you inserted to check if any keyword arguments were passed to the function.
In such case kwargs will be an empty dictionary {}, not None, so there is no need to prevent the iteration.
When you have **something as a function argument, inside the function body something will be a dict that maps the keywords to their values.
To iterate over the sorted keys you would do:
def afficher(**kwargs):
for k,v in sorted(kwargs.items()):
print '{}={}'.format(k,v)
Note that when no keyword arguments are passed, kwargs={} and kwards.items()=[]. The for loop will exit immediately, so there is no reason to check it. Also, you wouldn't check it with if kwargs is None since {} is not None. You would check it with {}.
In python we can do this:
def myFun1(one = '1', two = '2'):
...
Then we can call the function and pass the arguments by their name:
myFun1(two = 'two', one = 'one')
Also, we can do this:
def myFun2(**kwargs):
print kwargs.get('one', 'nothing here')
myFun2(one='one')
So I was wondering if it is possible to combine both methods like:
def myFun3(name, lname, **other_info):
...
myFun3(lname='Someone', name='myName', city='cityName', otherInfo='blah')
In general what combinations can we do?
Thanks and sorry for my silly question.
The general idea is:
def func(arg1, arg2, ..., kwarg1=default, kwarg2=default, ..., *args, **kwargs):
...
You can use as many of those as you want. The * and ** will 'soak up' any remaining values not otherwise accounted for.
Positional arguments (provided without defaults) can't be given by keyword, and non-default arguments can't follow default arguments.
Note Python 3 also adds the ability to specify keyword-only arguments by having them after *:
def func(arg1, arg2, *args, kwonlyarg=default):
...
You can also use * alone (def func(a1, a2, *, kw=d):) which means that no arguments are captured, but anything after is keyword-only.
So, if you are in 3.x, you could produce the behaviour you want with:
def myFun3(*, name, lname, **other_info):
...
Which would allow calling with name and lname as keyword-only.
Note this is an unusual interface, which may be annoying to the user - I would only use it in very specific use cases.
In 2.x, you would need to manually make this by parsing **kwargs.
You can add your named arguments along with kwargs. If the keys are available in the calling function It will taken to your named argument otherwise it will be taken by the kwargs dictionary.
def add(a=1, b=2,**c):
res = a+b
for items in c:
res = res + c[items]
print(res)
add(2,3)
5
add(b=4, a =3)
7
add(a =1,b=2,c=3,d=4)
10
It's possible at least for Python 2.7. Keyword arguments get assigned to positional parameters by name, so you can do
In [34]: def func(name, lname, **kwargs):
print 'name='+name, 'lname='+lname
print kwargs
....:
In [35]: func(lname='lname_val', name='name_val', city='cityName', otherInfo='blah')
name=name_val lname=lname_val
{'city': 'cityName', 'otherInfo': 'blah'}
Official docs state it that way:
"If keyword arguments are present, they are first converted to positional arguments, as follows. First, a list of unfilled slots is created for the formal parameters. If there are N positional arguments, they are placed in the first N slots. Next, for each keyword argument, the identifier is used to determine the corresponding slot (if the identifier is the same as the first formal parameter name, the first slot is used, and so on). If the slot is already filled, a TypeError exception is raised. Otherwise, the value of the argument is placed in the slot, filling it (even if the expression is None, it fills the slot)."
https://docs.python.org/2/reference/expressions.html#calls