Suppose I have a function that is hard-coded to make a substring lowercase, when instances of that substring are found in a larger string, e.g.:
def process_record(header, needle, pattern):
sequence = needle
for idx in [m.start() for m in re.finditer(pattern, needle)]:
offset = idx + len(pattern)
sequence = sequence[:idx] + needle[idx:offset].lower() + sequence[offset:]
sys.stdout.write('%s\n%s\n' % (header, sequence))
This works fine, e.g.:
>>> process_record('>foo', 'ABCDEF', 'BCD')
>foo
AbcdEF
What I'd like to do is generalize this, to pass in a string function (lower, in this case, but it could be any function of a primitive type or class) as a parameter. Something like:
def process_record(header, needle, pattern, fn):
sequence = needle
for idx in [m.start() for m in re.finditer(pattern, needle)]:
offset = idx + len(pattern)
sequence = sequence[:idx] + needle[idx:offset].fn() + sequence[offset:]
sys.stdout.write('%s\n%s\n' % (header, sequence))
This doesn't work (which is why I'm asking the question), but hopefully this demonstrates the idea, to try to generalize what the function does in a way that is readable.
One option I suppose is to write a helper function that wraps stringInstance.lower() and passes copies of strings around, which is inefficient and clumsy. I'm hoping there's a more elegant approach that Python experts know about.
With C, for instance, I'd pass a pointer to the function I want to run as a parameter to process_record(), and run the function pointer directly on the variable of interest.
What is the syntax for doing the same when using string primitive functions (or similar on primitive or other classes) in Python?
In general, use this approach:
def call_fn(arg, fn):
return fn(arg)
call_fn('FOO', str.lower) # 'foo'
The definition of a method in Python always starts with self as it's first argument. By calling the method as an attribute of the class you can force the value of that argument.
Your example is a little complex, so I would break this into two different questions:
1) How can you provide functions as arguments?
Functions are objects like everything else, and can be passed around as expected, e.g.:
def apply(val, func):
# e.g. ("X", string.lower) -> "x"
# ("X", lambda x: x * 2) -> "XX"
return func(val)
In your example, you might do
def process_record(..., func):
...
sequence = ... func(needle[idx:offset]) ...
...
An alternative method that I wouldn't recommend would be something like
def apply_by_name(val, method_name):
# e.g. ("X", "lower") -> "x"
return getattr(val, method_name)()
2) How can I apply an effect to each match of a regular expression in a string?
For this I would recommend the built-in 'sub' function, which takes strings as well as functions.
>>> re.sub('[aeiou]', '!', 'the quick brown fox')
'th! q!!ck br!wn f!x'
def foo(match):
v = match.group()
if v == 'i': return '!!!!!!!'
elif v in 'eo': return v * 2
else: return v.upper()
>>> re.sub('[aeiou]', foo, 'the quick brown fox')
'thee qU!!!!!!!ck broown foox'
Hope this helps!
Related
I am currently working on the following problem:
I want to write a function, that accepts comparisons as arguments - but want to prevent, that these comparisons are evaluated during runtime (leading to potentially unexpected results).
To be more precise, I have the following challenge (first, original one in pySpark, then similar, more general one):
def test_func(*comparisons):
for comparison in comparison:
left_hand = comparison[0]
comparison = comparison[1]
right_hand = comparison[2]
Example 1:
test_func(F.col('a') == F.col('b'))
left_hand -> F.col('a')
right_hand -> F.col('b')
comparison -> ==
Example 2:
test_func(1 <=> 2)
left_hand -> 1
right_hand -> 2
comparison -> <=>
Right now, the equation/parameter is evaluated before it reaches the function - i.e., I have problems splitting the equation into it individual parts.
Is this even possible to like this?
The python operator module stores operators as functions
from operator import *
def test_func(*comparison):
left_hand = comparison[0]
comparison = comparison[1]
right_hand = comparison[2]
test_func(F.col('a'), eq, F.col('b'))
The variables would be (remember they would still be local to test_func):
left_hand = F.col('a')
comparison = eq -> operator.eq
right_hand = F.col('b')
As a quick proof of concept:
>>> import operator
>>> class Col:
... def __init__(self, col):
... self.col = col
...
... def __eq__(self, other):
... return self, operator.eq, other
...
>>> Col('a') == Col('b')
(<__main__.Col object at 0x11134d5b0>, <built-in function eq>, <__main__.Col object at 0x11147cbe0>)
>>> lh, comp, rh = Col('a') == Col('b')
>>> comp(lh.col, rh.col)
False
You'll need to overload all special methods for all operators you want to support, and return the equivalent operator function (or whatever you want, perhaps '==', or a custom object).
Is it possible to unpack elements in python and pass them directly into several functions without assigning them into a variable first?
e. g.
def my_function():
return (1, 2)
# Not sure how the syntax would look like?
(function_1(#first element here), function_2(#second element here)) <= my_function()
It is possible, to not assign your output to any variable, for example by calling the function twice, which theoretically would make only sense if the function is a pure function. However I do not find any useful example. I am curious why you would like to do it.
There is a way around achieving that goal.
This, it would require you to create your own method that does that.
Here is a simple approach of how you might want to do that.
In my example, there is a function called dissolve_args_to_fns which accepts functions and a list that hold values to the functions input.
dissolve_args_to_fns implementation
from typing import Tuple, Any
from collections.abc import Iterable
def dissolve_args_to_fns(*fns, inputs: Tuple[Any, ...]):
# If there are more inputs than there are functions, and vice-versa, throw error
if len(fns) != len(inputs):
raise ValueError('The numbers of functions dont match the number of inputs each function')
# Holds the output corresponding to each function
outputs = []
for i, fn in enumerate(fns):
# Individual input for each function
inp = inputs[i]
# Checks if the input for the function is an iterable
# If so, then its probably for an argument that need multiple arguments
if isinstance(inp, Iterable) :
fn_out = fn(*inp)
else:
fn_out = fn(inp)
outputs.append(fn_out)
# returns an output if, there is any function that has an output
# This extra checking step is not necessary
if any(map(lambda x: x is not None, outputs)):
return outputs
Now that the function is done, we can begin testing it out.
Below, are 3 custom functions, some of which have an output, and others which don't
def show(value):
print("Here is", value)
def blink(value, blink_count:int = 2):
print(f" *blink* {value}" * blink_count)
def full_name(first_name, last_name) -> str:
return "%s %s" % (first_name, last_name)
I'll also be using the sum in-built function to show how wide this implementation can be used
_, name, _, _sum = dissolve_args_to_fn(show, full_name, blink, sum, inputs=(1, 2, ("Mike", "Tyson"), ([10, 5],)))
print("My name is", name)
print("Sum is:", _sum)
Well that's it. This simple function now works like magic.
Happy coding.
PS: As you can see, the simple implementation doesn't work for keyword arguments, but feel free to hack the code as a you please
The following is to the effect of what you described in your further comments:
list1 = []
list2 = []
def my_function():
return (1, 2)
def function_1(x1):
list1.append(x1)
def function_2(x2):
list2.append(x2)
lam = lambda x: (function_1(x[0]), function_2(x[1]))
lam(my_function())
Verification:
>>> print(list1)
[1]
>>> print(list2)
[2]
I have a file with a lot of lines like this
f(a, b)
f(abc, def)
f(a, f(u, i))
...
and I was asked to write a program in Python that would translate the strings into the following format:
a+b
abc+def
a+(u+i)
...
Rule: f(a, b) -> a+b
The approach I am following right now uses eval functions:
def f(a, b):
return "({0} + {1})".format(a,b)
eval("f(f('b','a'),'c')")
which returns
'((b + a) + c)'
However, as you can see, I need to put the letters as strings so that the eval function does not throw me a NameError when I run it.
Is there any way that will allow me to get the same behavior out of the eval function but without declaring the letters as strings?
eval is overkill here. this is just a simple string processing exercise:
replace the first 'f(' and the last ')' with ''
replace all remaining 'f(' with '('
replace all ', ' with '+'
and you're done.
this assumes that the only time the characters 'f(' appear next to each other is when it's supposed to represent a call to function f.
Yes, you can. The key is to use a mapping which returns the string as a key when it is missing.
>>> class Mdict(dict):
... def __missing__(self, k):
... return k
...
>>> eval('foo + bar', Mdict())
'foobar'
Of course, the general caveats about eval apply -- Please don't use it unless you trust the input completely.
You could use the shlex module to give yourself a nice token stack and then parse it as a sort of push down automaton.
>>> import shlex
>>> def parsef(tokens):
ftok = tokens.get_token() # there's no point to naming these tokens
oparentok = tokens.get_token() # unless you want to assert correct syntax
lefttok = tokens.get_token()
if 'f' == lefttok:
tokens.push_token(lefttok)
lefttok = "("+parsef(tokens)+")"
commatok = tokens.get_token()
righttok = tokens.get_token()
if 'f' == righttok:
tokens.push_token(righttok)
righttok = "("+parsef(tokens)+")"
cparentok = tokens.get_token()
return lefttok+"+"+righttok
>>> def parseline(line):
return parsef(shlex.shlex(line.strip()))
>>> parseline('f(a, b)')
'a+b'
>>> parseline('f(abc, def)')
'abc+def'
>>> parseline('f(a, f(u, i))')
'a+(u+i)'
Note that this assumes you are getting correct syntax.
Learning about classes in python. I want the difference between two strings, a sort of subtraction. eg:
a = "abcdef"
b ="abcde"
c = a - b
This would give the output f.
I was looking at this class and I am new to this so would like some clarification on how it works.
class MyStr(str):
def __init__(self, val):
return str.__init__(self, val)
def __sub__(self, other):
if self.count(other) > 0:
return self.replace(other, '', 1)
else:
return self
and this will work in the following way:
>>> a = MyStr('thethethethethe')
>>> b = a - 'the'
>>> a
'thethethethethe'
>>> b
'thethethethe'
>>> b = a - 2 * 'the'
>>> b
'thethethe'
So a string is passed to the class and the constructor is called __init__. This runs the constructor and an object is returned, which contains the value of the string? Then a new subtraction function is created, so that when you use - with the MyStr object it is just defining how subtract works with that class? When sub is called with a string, count is used to check if that string is a substring of the object created. If that is the case, the first occurrence of the passed string is removed. Is this understanding correct?
Edit: basically this class could be reduced to:
class MyStr(str):
def __sub__(self, other):
return self.replace(other, '', 1)
Yes, your understanding is entirely correct.
Python will call a .__sub__() method if present on the left-hand operand; if not, a corresponding .__rsub__() method on the right-hand operand can also hook into the operation.
See emulating numeric types for a list of hooks Python supports for providing more arithmetic operators.
Note that the .count() call is redundant; .replace() will not fail if the other string is not present; the whole function could be simplified to:
def __sub__(self, other):
return self.replace(other, '', 1)
The reverse version would be:
def __rsub__(self, other):
return other.replace(self, '', 1)
Is it possible to have a list be evaluated lazily in Python?
For example
a = 1
list = [a]
print list
#[1]
a = 2
print list
#[1]
If the list was set to evaluate lazily then the final line would be [2]
The concept of "lazy" evaluation normally comes with functional languages -- but in those you could not reassign two different values to the same identifier, so, not even there could your example be reproduced.
The point is not about laziness at all -- it is that using an identifier is guaranteed to be identical to getting a reference to the same value that identifier is referencing, and re-assigning an identifier, a bare name, to a different value, is guaranteed to make the identifier refer to a different value from them on. The reference to the first value (object) is not lost.
Consider a similar example where re-assignment to a bare name is not in play, but rather any other kind of mutation (for a mutable object, of course -- numbers and strings are immutable), including an assignment to something else than a bare name:
>>> a = [1]
>>> list = [a]
>>> print list
[[1]]
>>> a[:] = [2]
>>> print list
[[2]]
Since there is no a - ... that reassigns the bare name a, but rather an a[:] = ... that reassigns a's contents, it's trivially easy to make Python as "lazy" as you wish (and indeed it would take some effort to make it "eager"!-)... if laziness vs eagerness had anything to do with either of these cases (which it doesn't;-).
Just be aware of the perfectly simple semantics of "assigning to a bare name" (vs assigning to anything else, which can be variously tweaked and controlled by using your own types appropriately), and the optical illusion of "lazy vs eager" might hopefully vanish;-)
Came across this post when looking for a genuine lazy list implementation, but it sounded like a fun thing to try and work out.
The following implementation does basically what was originally asked for:
from collections import Sequence
class LazyClosureSequence(Sequence):
def __init__(self, get_items):
self._get_items = get_items
def __getitem__(self, i):
return self._get_items()[i]
def __len__(self):
return len(self._get_items())
def __repr__(self):
return repr(self._get_items())
You use it like this:
>>> a = 1
>>> l = LazyClosureSequence(lambda: [a])
>>> print l
[1]
>>> a = 2
>>> print l
[2]
This is obviously horrible.
Python is not really very lazy in general.
You can use generators to emulate lazy data structures (like infinite lists, et cetera), but as far as things like using normal list syntax, et cetera, you're not going to have laziness.
That is a read-only lazy list where it only needs a pre-defined length and a cache-update function:
import copy
import operations
from collections.abc import Sequence
from functools import partialmethod
from typing import Dict, Union
def _cmp_list(a: list, b: list, op, if_eq: bool, if_long_a: bool) -> bool:
"""utility to implement gt|ge|lt|le class operators"""
if a is b:
return if_eq
for ia, ib in zip(a, b):
if ia == ib:
continue
return op(ia, ib)
la, lb = len(a), len(b)
if la == lb:
return if_eq
if la > lb:
return if_long_a
return not if_long_a
class LazyListView(Sequence):
def __init__(self, length):
self._range = range(length)
self._cache: Dict[int, Value] = {}
def __len__(self) -> int:
return len(self._range)
def __getitem__(self, ix: Union[int, slice]) -> Value:
length = len(self)
if isinstance(ix, slice):
clone = copy.copy(self)
clone._range = self._range[slice(*ix.indices(length))] # slicing
return clone
else:
if ix < 0:
ix += len(self) # negative indices count from the end
if not (0 <= ix < length):
raise IndexError(f"list index {ix} out of range [0, {length})")
if ix not in self._cache:
... # update cache
return self._cache[ix]
def __iter__(self) -> dict:
for i, _row_ix in enumerate(self._range):
yield self[i]
__eq__ = _eq_list
__gt__ = partialmethod(_cmp_list, op=operator.gt, if_eq=False, if_long_a=True)
__ge__ = partialmethod(_cmp_list, op=operator.ge, if_eq=True, if_long_a=True)
__le__ = partialmethod(_cmp_list, op=operator.le, if_eq=True, if_long_a=False)
__lt__ = partialmethod(_cmp_list, op=operator.lt, if_eq=False, if_long_a=False)
def __add__(self, other):
"""BREAKS laziness and returns a plain-list"""
return list(self) + other
def __mul__(self, factor):
"""BREAKS laziness and returns a plain-list"""
return list(self) * factor
__radd__ = __add__
__rmul__ = __mul__
Note that this class is discussed also in this SO.