Replacing special characters in a string - python

i'm trying to unittest a python function, but it seems to not replace any of the chars inside the function. even though the function should be working?
error message:
E AssertionError: assert 'TE/ST-' == 'AEOEAA_TE_ST_'
E - æøå TE/ST-
E + AEOEAA_TE_ST_
function
class Formatter(object):
#classmethod
def string(self, string):
new_string = string.upper()
# split cases
new_string.replace(' ', '_')
new_string.replace('-', '_')
new_string.replace('/', '_')
# chars
new_string.replace('Ø', 'OE')
new_string.replace('Å', 'AA')
new_string.replace('Æ', 'AE')
return new_string
test
def test_formatter():
test = Formatter.string('æøå te/st-')
assert test.decode('utf-8') == 'AEOEAA_TE_ST_'

str.replace is not an in-place function, meaning when you call it, it returns a value that you must assign back to the original variable, otherwise the changes will not be seen. As an example, consider:
In [315]: string = 'æøå te/st-'.upper()
Now, call .replace:
In [316]: string.replace('Ø', 'OE')
Out[316]: 'ÆOEÅ TE/ST-'
In [317]: string
Out[317]: 'ÆØÅ TE/ST-'
No change. Try assigning it back now:
In [318]: string = string.replace('Ø', 'OE')
In [319]: string
Out[319]: 'ÆOEÅ TE/ST-'
As a faster alternative, consider the use of str.translate. If you're on python3, you can pass a dictionary mapping of replacements (you cannot do this on python2).
class Formatter(object):
#classmethod
def string(self, strn):
tab = dict.fromkeys(' -/', '_')
tab.update({'Ø' : 'OE', 'Å' : 'AA', 'Æ' : 'AE'})
return strn.upper().translate(str.maketrans(tab))
For python2, you could choose to stick with str.replace.

Related

Extend str class in Python and modify its attribute value [duplicate]

Do you know of a Python library which provides mutable strings? Google returned surprisingly few results. The only usable library I found is http://code.google.com/p/gapbuffer/ which is in C but I would prefer it to be written in pure Python.
Edit: Thanks for the responses but I'm after an efficient library. That is, ''.join(list) might work but I was hoping for something more optimized. Also, it has to support the usual stuff regular strings do, like regex and unicode.
In Python mutable sequence type is bytearray see this link
This will allow you to efficiently change characters in a string. Although you can't change the string length.
>>> import ctypes
>>> a = 'abcdefghijklmn'
>>> mutable = ctypes.create_string_buffer(a)
>>> mutable[5:10] = ''.join( reversed(list(mutable[5:10].upper())) )
>>> a = mutable.value
>>> print `a, type(a)`
('abcdeJIHGFklmn', <type 'str'>)
class MutableString(object):
def __init__(self, data):
self.data = list(data)
def __repr__(self):
return "".join(self.data)
def __setitem__(self, index, value):
self.data[index] = value
def __getitem__(self, index):
if type(index) == slice:
return "".join(self.data[index])
return self.data[index]
def __delitem__(self, index):
del self.data[index]
def __add__(self, other):
self.data.extend(list(other))
def __len__(self):
return len(self.data)
...
and so on, and so forth.
You could also subclass StringIO, buffer, or bytearray.
How about simply sub-classing list (the prime example for mutability in Python)?
class CharList(list):
def __init__(self, s):
list.__init__(self, s)
#property
def list(self):
return list(self)
#property
def string(self):
return "".join(self)
def __setitem__(self, key, value):
if isinstance(key, int) and len(value) != 1:
cls = type(self).__name__
raise ValueError("attempt to assign sequence of size {} to {} item of size 1".format(len(value), cls))
super(CharList, self).__setitem__(key, value)
def __str__(self):
return self.string
def __repr__(self):
cls = type(self).__name__
return "{}(\'{}\')".format(cls, self.string)
This only joins the list back to a string if you want to print it or actively ask for the string representation.
Mutating and extending are trivial, and the user knows how to do it already since it's just a list.
Example usage:
s = "te_st"
c = CharList(s)
c[1:3] = "oa"
c += "er"
print c # prints "toaster"
print c.list # prints ['t', 'o', 'a', 's', 't', 'e', 'r']
The following is fixed, see update below.
There's one (solvable) caveat: There's no check (yet) that each element is indeed a character. It will at least fail printing for everything but strings. However, those can be joined and may cause weird situations like this: [see code example below]
With the custom __setitem__, assigning a string of length != 1 to a CharList item will raise a ValueError. Everything else can still be freely assigned but will raise a TypeError: sequence item n: expected string, X found when printing, due to the string.join() operation. If that's not good enough, further checks can be added easily (potentially also to __setslice__ or by switching the base class to collections.Sequence (performance might be different?!), cf. here)
s = "test"
c = CharList(s)
c[1] = "oa"
# with custom __setitem__ a ValueError is raised here!
# without custom __setitem__, we could go on:
c += "er"
print c # prints "toaster"
# this looks right until here, but:
print c.list # prints ['t', 'oa', 's', 't', 'e', 'r']
Efficient mutable strings in Python are arrays.
PY3 Example for unicode string using array.array from standard library:
>>> ua = array.array('u', 'teststring12')
>>> ua[-2:] = array.array('u', '345')
>>> ua
array('u', 'teststring345')
>>> re.search('string.*', ua.tounicode()).group()
'string345'
bytearray is predefined for bytes and is more automatic regarding conversion and compatibility.
You can also consider memoryview / buffer, numpy arrays, mmap and multiprocessing.shared_memory for certain cases.
The FIFOStr package in pypi supports pattern matching and mutable strings. This may or may not be exactly what is wanted but was created as part of a pattern parser for a serial port (the chars are added one char at a time from left or right - see docs). It is derived from deque.
from fifostr import FIFOStr
myString = FIFOStr("this is a test")
myString.head(4) == "this" #true
myString[2] = 'u'
myString.head(4) == "thus" #true
(full disclosure I'm the author of FIFOstr)
Just do this
string = "big"
string = list(string)
string[0] = string[0].upper()
string = "".join(string)
print(string)
'''OUTPUT'''
  > Big

A decorator to replace specific string in function to another string

Is it possible to do like this ?
def my_func():
my_list = ['abc', 'def', 'ghi']
my_str = 'abc'
If I pass A to decorator -> replace 'abc' to 'xxx' in function, pass B -> yyy
#decorator('abc')
def my_func():
my_list = ['xxx', 'def', 'ghi']
my_str = 'xxx'
I don't know whether it's possible or not.
How can I do with this?
You can use a decorator that uses ast.NodeTransformer to modify any string node with the target value with the given replacement in the function's AST:
import ast
import inspect
from textwrap import dedent
class Replace(ast.NodeTransformer):
def __init__(self, target, replacement):
self.target = target
self.replacement = replacement
def visit_Str(self, node):
if node.s == self.target:
node.s = self.replacement
return node
# remove 'replace' from the function's decorator list to avoid re-decorating during exec
def visit_FunctionDef(self, node):
node.decorator_list = [
decorator for decorator in node.decorator_list
if not isinstance(decorator, ast.Call) or decorator.func.id != 'replace'
]
self.generic_visit(node)
return node
def replace(target, repl):
def decorator(func):
tree = Replace(target, repl).visit(ast.parse(dedent(inspect.getsource(func))))
ast.fix_missing_locations(tree)
scope = {}
exec(compile(tree, inspect.getfile(func), "exec"), func.__globals__, scope)
return scope[func.__name__]
return decorator
so that:
#replace('abc', 'xxx')
def my_func():
my_list = ['abc', 'def', 'ghi']
my_str = 'abc'
print(my_list, my_str)
my_func()
outputs:
['xxx', 'def', 'ghi'] xxx
Demo: https://repl.it/#blhsing/ValuableLimeVideogames
It's possible to replace the code object of a function with a decorator thereby replacing constants inside of it, but it's unlikely to be the solution you actually want to go with.
This would get unmaintainable at nearly any scale as you'd have large sections looking like:
# Note, this is python 3.x specific
CodeType(
argcount, # integer
kwonlyargcount, # integer
nlocals, # integer
stacksize, # integer
flags, # integer
codestring, # bytes
consts, # tuple
names, # tuple
varnames, # tuple
filename, # string
name, # string
firstlineno, # integer
lnotab, # bytes
freevars, # tuple
cellvars, # tuple
)
In which you need to make copies from the original code object, modifying them to your intent.
A better solution to this sort of problem would be to allow passing the string to the function as a parameter. If you needed the function to later be callable without the string present, you could use a partial (see functools.partial)
Decorator can't change logic inside of your function. You can either do operation on arguments or whatever you are returning. In your case you can use post decorator

remove charcaters from string

i need a function remove() that removes characters from a string.
This was my first approach:
def remove(self, string, index):
return string[0:index] + string[index + 1:]
def remove_indexes(self, string, indexes):
for index in indexes:
string = self.remove(string, index)
return string
Where I pass the indexes I want to remove in an array, but once I remove a character, the whole indexes change.
Is there a more pythonic whay to do this. it would be more preffarable to implement it like that:
"hello".remove([1, 2])
I dont know about a "pythonic" way, but you can achieve this. If you can ensure that in remove_indexes the indexes are always sorted, then you may do this
def remove_indexes(self, string, indexes):
for index in indexes.reverse():
string = self.remove(string, index)
return string
If you cant ensure that then just do
def remove_indexes(self, string, indexes):
for index in indexes.sort(reverse=True):
string = self.remove(string, index)
return string
I think below code will work for you.It removes the indexes(that you want to remove from string) and returns joined string formed with remaining indexes.
def remove_indexes(string,indexes):
return "".join([string[i] for i in range(len(string)) if i not in indexes])
remove_indexes("hello",[1,2])
The most pythonic way would be to use regular expressions. The danger with your indexing approach is that the string you are passing in may have variable length, and therefore you would be removing parts of the string unintentionally.
Lets say you wanted to remove all numbers from a string
import re
s = "This is a string with s0m3 numb3rs in it1 !"
num_reg = re.compile(r"\d+") # catches all digits 0-9
re.sub(num_reg , "**", s) # substitute numbers in `s` with "**"
>>> "This is a string with s**m** numb**rs in it** !"
This way, you define an general expression that may appear regularly in a string (a "regular expression" or regex), and you can quickly and reliably replace all instances of that regex in the string.
You cannot add attribute to built-in types you will have an error like this:
TypeError: can't set attributes of built-in/extension type 'str'
You can create a class that inherit the str and add this method:
class String(str):
def remove(self, index):
if isinstance(index, list):
# order the index to remove the biggest first
for i in sorted(index, reverse=True):
self = self.remove(i)
return self
return String(self[0:index] + self[index + 1:])
s = String("hello")
print(s.remove([0, 1]))
You want change in place you need to create a new type for example:
class String:
def __init__(self, value):
self._str = value
def __getattr__(self, item):
""" delegate to str"""
return getattr(self._str, item)
def __getitem__(self, item):
""" support slicing"""
return String(self._str[item])
def remove(self, indexex):
indexes = indexex if isinstance(indexex, list) else [indexex]
# order the index to remove the biggest first
for i in sorted(indexes, reverse=True):
self._str = self._str[0:i] + self._str[i + 1:]
# change in place should return None
return None
def __str__(self):
return str(self._str)
def __repr__(self):
return repr(self._str)
s = String("hello")
s.remove([0, 1])
print(s.upper()) # delegate to str class
print(s[:1]) # support slicing
print(list(x for x in s)) # it's iterable
But still missing a other magic method to act like a real str class. like __add__ , __mult___, .....
If you want a class like str but have a remove method that changes the instance itself you need to create your own mutable type, str are primitive immutable type and self = self.remove(i) will not really change the variable because it's just changing the reference of self argument to another object, but the reference s is still pointing to the same object created by String("hello").

How to emulate a C-style function pointer with Python functions

Suppose I have a function that is hard-coded to make a substring lowercase, when instances of that substring are found in a larger string, e.g.:
def process_record(header, needle, pattern):
sequence = needle
for idx in [m.start() for m in re.finditer(pattern, needle)]:
offset = idx + len(pattern)
sequence = sequence[:idx] + needle[idx:offset].lower() + sequence[offset:]
sys.stdout.write('%s\n%s\n' % (header, sequence))
This works fine, e.g.:
>>> process_record('>foo', 'ABCDEF', 'BCD')
>foo
AbcdEF
What I'd like to do is generalize this, to pass in a string function (lower, in this case, but it could be any function of a primitive type or class) as a parameter. Something like:
def process_record(header, needle, pattern, fn):
sequence = needle
for idx in [m.start() for m in re.finditer(pattern, needle)]:
offset = idx + len(pattern)
sequence = sequence[:idx] + needle[idx:offset].fn() + sequence[offset:]
sys.stdout.write('%s\n%s\n' % (header, sequence))
This doesn't work (which is why I'm asking the question), but hopefully this demonstrates the idea, to try to generalize what the function does in a way that is readable.
One option I suppose is to write a helper function that wraps stringInstance.lower() and passes copies of strings around, which is inefficient and clumsy. I'm hoping there's a more elegant approach that Python experts know about.
With C, for instance, I'd pass a pointer to the function I want to run as a parameter to process_record(), and run the function pointer directly on the variable of interest.
What is the syntax for doing the same when using string primitive functions (or similar on primitive or other classes) in Python?
In general, use this approach:
def call_fn(arg, fn):
return fn(arg)
call_fn('FOO', str.lower) # 'foo'
The definition of a method in Python always starts with self as it's first argument. By calling the method as an attribute of the class you can force the value of that argument.
Your example is a little complex, so I would break this into two different questions:
1) How can you provide functions as arguments?
Functions are objects like everything else, and can be passed around as expected, e.g.:
def apply(val, func):
# e.g. ("X", string.lower) -> "x"
# ("X", lambda x: x * 2) -> "XX"
return func(val)
In your example, you might do
def process_record(..., func):
...
sequence = ... func(needle[idx:offset]) ...
...
An alternative method that I wouldn't recommend would be something like
def apply_by_name(val, method_name):
# e.g. ("X", "lower") -> "x"
return getattr(val, method_name)()
2) How can I apply an effect to each match of a regular expression in a string?
For this I would recommend the built-in 'sub' function, which takes strings as well as functions.
>>> re.sub('[aeiou]', '!', 'the quick brown fox')
'th! q!!ck br!wn f!x'
def foo(match):
v = match.group()
if v == 'i': return '!!!!!!!'
elif v in 'eo': return v * 2
else: return v.upper()
>>> re.sub('[aeiou]', foo, 'the quick brown fox')
'thee qU!!!!!!!ck broown foox'
Hope this helps!

using the eval function in Python to translate strings

I have a file with a lot of lines like this
f(a, b)
f(abc, def)
f(a, f(u, i))
...
and I was asked to write a program in Python that would translate the strings into the following format:
a+b
abc+def
a+(u+i)
...
Rule: f(a, b) -> a+b
The approach I am following right now uses eval functions:
def f(a, b):
return "({0} + {1})".format(a,b)
eval("f(f('b','a'),'c')")
which returns
'((b + a) + c)'
However, as you can see, I need to put the letters as strings so that the eval function does not throw me a NameError when I run it.
Is there any way that will allow me to get the same behavior out of the eval function but without declaring the letters as strings?
eval is overkill here. this is just a simple string processing exercise:
replace the first 'f(' and the last ')' with ''
replace all remaining 'f(' with '('
replace all ', ' with '+'
and you're done.
this assumes that the only time the characters 'f(' appear next to each other is when it's supposed to represent a call to function f.
Yes, you can. The key is to use a mapping which returns the string as a key when it is missing.
>>> class Mdict(dict):
... def __missing__(self, k):
... return k
...
>>> eval('foo + bar', Mdict())
'foobar'
Of course, the general caveats about eval apply -- Please don't use it unless you trust the input completely.
You could use the shlex module to give yourself a nice token stack and then parse it as a sort of push down automaton.
>>> import shlex
>>> def parsef(tokens):
ftok = tokens.get_token() # there's no point to naming these tokens
oparentok = tokens.get_token() # unless you want to assert correct syntax
lefttok = tokens.get_token()
if 'f' == lefttok:
tokens.push_token(lefttok)
lefttok = "("+parsef(tokens)+")"
commatok = tokens.get_token()
righttok = tokens.get_token()
if 'f' == righttok:
tokens.push_token(righttok)
righttok = "("+parsef(tokens)+")"
cparentok = tokens.get_token()
return lefttok+"+"+righttok
>>> def parseline(line):
return parsef(shlex.shlex(line.strip()))
>>> parseline('f(a, b)')
'a+b'
>>> parseline('f(abc, def)')
'abc+def'
>>> parseline('f(a, f(u, i))')
'a+(u+i)'
Note that this assumes you are getting correct syntax.

Categories

Resources