Slicing strings in str.format - python

I want to achieve the following with str.format:
x,y = 1234,5678
print str(x)[2:] + str(y)[:2]
The only way I was able to do it was:
print '{0}{1}'.format(str(x)[2:],str(y)[:2])
Now, this an example and what I really have is a long and messy string, and so I want to put slicing inside the {}. I've studied the docs, but I can't figure out the correct syntax. My question is: is it possible to slice strings inside a replacement field?

No, you can't apply slicing to strings inside a the replacement field.
You'll need to refer to the Format Specification Mini-Language; it defines what is possible. This mini language defines how you format the referenced value (the part after the : in the replacement field syntax).

You could do something like this.
NOTE
This is a rough example and should not be considered complete and tested. But I think it shows you a way to start getting where you want to be.
import string
class SliceFormatter(string.Formatter):
def get_value(self, key, args, kwds):
if '|' in key:
try:
key, indexes = key.split('|')
indexes = map(int, indexes.split(','))
if key.isdigit():
return args[int(key)][slice(*indexes)]
return kwds[key][slice(*indexes)]
except KeyError:
return kwds.get(key, 'Missing')
return super(SliceFormatter, self).get_value(key, args, kwds)
phrase = "Hello {name|0,5}, nice to meet you. I am {name|6,9}. That is {0|0,4}."
fmt = SliceFormatter()
print fmt.format(phrase, "JeffJeffJeff", name="Larry Bob")
OUTPUT
Hello Larry, nice to meet you. I am Bob. That is Jeff.
NOTE 2
There is no support for slicing like [:5] or [6:], but I think that would be easy enough to implement as well. Also there is no error checking for slice indexes out of range, etc.

You can use a run-time evaluated "f" string. Python f-strings support slicing and don't use a "mini-language" like the formatter. The full power of a python expression is available within each curly-brace of an f-string. Unfortunately there is no string.feval() function ... imo there should be (languages should not have magic abilities that are not provided to the user).
You also can't add one to the string type, because the built-in python types cannot be modified/expanded.
See https://stackoverflow.com/a/49884004/627042 for an example of a run-time evaluates f-string.

Straight answering your question: No, slicing is not supported by builtin str formatting. Although, there is a workaround in case f-strings (runtime evaluated) don't fit your needs.
Workaround
The previous answers to extend string.Formatter are not completely right, since overloading get_value is not the correct way to add the slicing mechanism to string.Formatter.
import string
def transform_to_slice_index(val: str):
if val == "_":
return None
else:
return int(val)
class SliceFormatter(string.Formatter):
def get_field(self, field_name, args, kwargs):
slice_operator = None
if type(field_name) == str and '|' in field_name:
field_name, slice_indexes = field_name.split('|')
slice_indexes = map(transform_to_slice_index,
slice_indexes.split(','))
slice_operator = slice(*slice_indexes)
obj, first = super().get_field(field_name, args, kwargs)
if slice_operator is not None:
obj = obj[slice_operator]
return obj, first
Explanation
get_value is called inside get_field and it is used ONLY to access the args and kwargs from vformat(). attr and item accessing is done in get_field. Thus, the slice access should be done after super().get_field returned the desired obj.
With this said, overloading get_value gives you the problem that the formatter would not work for slicing after the object is traversed. You can see the error in this example:
WrongSliceFormatter().format("{foo.bar[0]|1,3}", foo=foo)
>> ValueError: "Only '.' or '[' may follow ']' in format field specifier"

This is a nice solution and solved my slicing problem quite nicely. However, I also wanted to do value eliding as well. For example 'AVeryLongStringValue' that I might want to stuff in a 10 character field, might be truncated to '...ngValue'. So I extended your example to support slicing, eliding, and normal formatting all in one. This is what I came up with.
class SliceElideFormatter(string.Formatter):
"""An extended string formatter that provides key specifiers that allow
string values to be sliced and elided if they exceed a length limit. The
additional formats are optional and can be combined with normal python
formatting. So the whole syntax looks like:
key[|slice-options][$elide-options[:normal-options]
Where slice options consist of '|' character to begin a slice request,
followed by slice indexes separated by commas. Thus {FOO|5,} requests
everything after the 5th element.
The elide consist of '$' character followed by an inter max field value,
followed by '<', '^', or '>' for pre, centered, or post eliding, followed
by the eliding string. Thus {FOO$10<-} would display the last 9 chanacters
of a string longer then 10 characters with '-' prefix.
Slicing and eliding can be combined. For example given a dict of
{'FOO': 'centeredtextvalue', and a format string of
'{FOO|1,-1$11^%2E%2E%2E}' would yield 'ente...valu'. The slice spec removes
the first and last characrers, and the elide spec center elides the
remaining value with '...'. The '...' value must be encoded in URL format
since . is an existing special format character.
"""
def get_value(self, key, args, kwds):
"""Called by string.Formatter for each format key found in the format
string. The key is checked for the presence of a slice or elide intro-
ducer character. If one or both a found the slice and/or elide spec
is extracted, parsed and processed on value of found with the remaining
key string.
Arguments:
key, A format key string possibly containing slice or elide specs
args, Format values list tuple
kwds, Format values key word dictrionary
"""
sspec = espec = None
if '|' in key:
key, sspec = key.split('|')
if '$' in sspec:
sspec, espec = sspec.split('$')
elif '$' in key:
key, espec = key.split('$')
value = args[int(key)] if key.isdigit() else kwds[key]
if sspec:
sindices = [int(sdx) if sdx else None
for sdx in sspec.split(',')]
value = value[slice(*sindices)]
if espec:
espec = urllib.unquote(espec)
if '<' in espec:
value = self._prefix_elide_value(espec, value)
elif '>' in espec:
value = self._postfix_elide_value(espec, value)
elif '^' in espec:
value = self._center_elide_value(espec, value)
else:
raise ValueError('invalid eliding option %r' % elidespec)
if sspec or espec:
return value
return super(SliceElideFormatter,self).get_value(key, args, kwds)
def _center_elide_value(self, elidespec, value):
"""Return center elide value if it exceeds the elide length.
Arguments:
elidespec, The elide spec field extracted from key
value, Value obtained from remaing key to maybe be elided
"""
elidelen, elidetxt = elidespec.split('^')
elen, vlen = int(elidelen), len(value)
if vlen > elen:
tlen = len(elidetxt)
return value[:(elen-tlen)//2] + elidetxt + value[-(elen-tlen)//2:]
return value
def _postfix_elide_value(self, elidespec, value):
"""Return postfix elided value if it exceeds the elide length.
Arguments:
elidespec, The elide spec field extracted from key
value, Value obtained from remaing key to maybe be elided
"""
elidelen, elidetxt = elidespec.split('>')
elen, vlen = int(elidelen), len(value)
if vlen > elen:
tlen = len(elidetxt)
return value[:(elen-tlen)] + elidetxt
return value
def _prefix_elide_value(self, elidespec, value):
"""Return prefix elided value if it exceeds the elide length.
Arguments:
elidespec, The elide spec field extracted from key
value, Value obtained from remaing key to maybe be elided
"""
elidelen, elidetxt = elidespec.split('<')
elen, vlen = int(elidelen), len(value)
if vlen > elen:
tlen = len(elidetxt)
return elidetxt + value[-(elen-tlen):]
return value
As an example all three format specs can be combined to clip the values first and last characters, center elide the slice to a 10 char value, and finally right justify it in a 12 char field as follows:
sefmtr = SliceElideFormatter()
data = { 'CNT':'centeredtextvalue' }
fmt = '{CNT|1,-1$10^**:>12}'
print '%r' % sefmtr.format(fmt, *(), **data)
Outputs: ' ente**valu'. For anyone else that may be interested. Thanks much.

I tried doing it in python 3.9 and it is working well
x="nowpossible"
print(" slicing is possible {}".format(x[0:2]))
output
slicing is possible now

Related

How to parse string replacement fields in a string in python?

Python has this concept of string replacement fields such as mystr1 = "{replaceme} other text..." where {replaceme} (the replacement field) can be easily formatted via statements such as mystr1.format(replaceme="yay!").
So I often am working with large strings and sometimes do not know all of the replacement fields and need to either manually resolve them which is not too bad if it is one or two, but sometimes it is dozens and would be nice if python had a function similar to dict.keys().
How does one to parse string replacement fields in a string in python?
In lieu of answers from the community I wrote a helper function below to spit out the replacement fields to a dict which I can then simply update the values to what I want and format the string.
Is there a better way or built in way to do this?
cool_string = """{a}yo{b}ho{c}ho{d}and{e}a{f}bottle{g}of{h}rum{i}{j}{k}{l}{m}{n}{o}{p}{q}{r}{s}{t}{u}{v}{w}{x}{y}{z}"""
def parse_keys_string(s,keys={}):
try:
print(s.format(**keys)[:0])
return keys
except KeyError as e:
print("Adding Key:",e)
e = str(e).replace("'","")
keys[e]=e
parse_keys_string(s,keys)
return keys
cool_string_replacement_fields_dict = parse_keys_string(cool_string)
#set replacement field values
i = 1
for k,v in cool_string_replacement_fields_dict.items():
cool_string_replacement_fields_dict[k] = i
i = i + 1
#format the string with desired values...
cool_string_formatted = cool_string.format(**cool_string_replacement_fields_dict)
print(cool_string_formatted)
I came up with the following:
class NumfillinDict(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.i = -1
def __missing__(self, key): #optionally one could have logic based on key
self.i+=1
return f"({self.i})"
cool_string = ("{a}yo{b}ho{c}ho{d}and{e}a{f}bottle{g}of{h}rum{i}\n"
"{j}{k}{l}{m}{n}{o}{p}{q}{r}{s}{t}{u}{v}{w}{x}{y}{z}")
dt = NumfillinDict(notneeded='something', b=' -=actuallyIknowb<=- ')
filled_string = cool_string.format_map(dt)
print(filled_string)
It works a bit like a defaultdict by filling in missing key-value pairs using the __missing__ method.
Result:
(0)yo -=actuallyIknowb<=- ho(1)ho(2)and(3)a(4)bottle(5)of(6)rum(7)
(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)
Inspired by: Format string unused named arguments

Python: Split String and convert to other type

I have a function which could get a String formatted like this:
"true"^^<http://www.w3.org/2001/XMLSchema#boolean>
"100"^^<http://www.w3.org/2001/XMLSchema#int>
Now i want to split the String on the ^^ Characters and convert the first part of the string based on the second part. I also want to remove the " first before converting.
This is my code which i use for this:
def getValue(tObject):
toReturn = tObject.split("^^")
if len(toReturn) == 2:
if toReturn[1] == "<http://www.w3.org/2001/XMLSchema#boolean>":
return bool(toReturn[0].replace('"', ""))
elif toReturn[1] == "<http://www.w3.org/2001/XMLSchema#int>":
return int(toReturn[0].replace('"', ""))
return None
But i'm not so happy with it. Is there maybe a more elegant (pythonic) way to archive this?
You can use a regex, to
check if the given value is valid
retrieve the value to cast, and the way to cast
PATTERN = re.compile(r'"(.*)"\^\^<http:.*#(\w+)>')
types = {"boolean": bool, "int": int}
def getValue(value):
m = PATTERN.fullmatch(value)
return types[m.group(2)](m.group(1)) if m else None
Instead of if len(...) you could just try to unpack the result and except a ValueError. Then you can use a dict for the types and str.strip instead of str.replace:
types = {'boolean': bool, 'int': int}
try:
value, type_hint = tObject.split('^^')
except ValueError:
return None
else:
return types[type_hint.rstrip('>').rsplit('#', 1)[1]](value.strip('"'))
Firstly, you could remove return None, since the function returns None by default.
Secondly, you could use toReturn[1].endswith("boolean>") to match the end of the string, instead of matching the whole string with toReturn[1] == "<http://www.w3.org/2001/XMLSchema#boolean>". Same with the int string as well.
Thirdly, you could store the return value in one variable before the if..elif, then you don't have to calculate it twice for each condition.
Code:
def getValue(tObject):
toReturn = tObject.split("^^")
if len(toReturn) == 2:
return_value = toReturn[0].replace('"', "")
if toReturn[1].endswith("boolean>"):
return bool(return_value)
elif toReturn[1].endswith("int>"):
return int(return_value)
This might not be much of a logic improvement, but the code does look less cluttered now. If you wan't more terse, "pythonic" ways of doing this problem, the other answers might be more suitable.

Trigger f-string parse on python string in variable

This question comes from handling jupyter magics, but can be expressed in a more simple way. Given a string s = "the key is {d['key']}" and a dictionary d = {'key': 'val'}, we want to parse the string.
The old method would be .format(), which will raise an error - it doesn't handle dictionary keys.
"the key is {d['key']}".format(d=d) # ERROR
I thought the only way around was to transform the dictionary to an object (explained here or here).
"the key is {d.key}".format(obj(d))
But Martijn explained nicely that you can simply leave out the quotes to get it working:
"the key is {d[key]}".format(d=d)
Still the new method f'string' does handle dictionary keys ain an intuitive python manner:
f"the key is {d['key']}"
It also handles functions - something .format also cannot handle.
f"this means {d['key'].lower()}"
Although we now know that you can do it with .format, I am still wondering about the original question: given s and d, how do you force a f'string' parse of s? I added another example with a function inside the curly brackets, that .format can also not handle and f'string' would be able to solve.
Is there some function .fstring() or method available? What does Python use internally?
String formatting can handle most string dictionary keys just fine, but you need to remove the quotes:
"the key is {d[key]}".format(d=d)
Demo:
>>> d = {'key': 'val'}
>>> "the key is {d[key]}".format(d=d)
'the key is val'
str.format() syntax isn't quite the same thing as Python expression syntax (which is what f-strings mostly support).
From the Format String Syntax documentation:
field_name ::= arg_name ("." attribute_name | "[" element_index "]")*
[...]
element_index ::= digit+ | index_string
index_string ::= <any source character except "]"> +
and
[A]n expression of the form '[index]' does an index lookup using __getitem__()
The syntax is limited, in that it will convert any digit-only strings into an integer, and everything else is always interpreted as a string (though you could use nested {} placeholders to dynamically interpolate a key value from another variable).
If you must support arbitrary expressions, the same way that f-strings do and you do not take template strings from untrusted sources (this part is important), then you could parse out the field name components and then use the eval() function to evaluate the values before you then output the final string:
from string import Formatter
_conversions = {'a': ascii, 'r': repr, 's': str}
def evaluate_template_expressions(template, globals_=None):
if globals_ is None:
globals_ = globals()
result = []
parts = Formatter().parse(template)
for literal_text, field_name, format_spec, conversion in parts:
if literal_text:
result.append(literal_text)
if not field_name:
continue
value = eval(field_name, globals_)
if conversion:
value = _conversions[conversion](value)
if format_spec:
value = format(value, format_spec)
result.append(value)
return ''.join(result)
Now the quotes are accepted:
>>> s = "the key is {d['key']}"
>>> d = {'key': 'val'}
>>> evaluate_template_expressions(s)
'the key is val'
Essentially, you can do the same with eval(f'f{s!r}', globals()), but the above might give you some more control over what expressions you might want to support.
[G]iven s and d, how do you force a f'string' parse of s? Is there some function or method available?
This can be done... using eval. But beware eval!
>>> eval('f' + repr(s))
the key is val
The repr is there to escape any quotes and to wrap s itself with quotes.
If you are aware of which variables to format (d in this case), opt for Martijn's answer of doing str.format. The above solution should be your last resort due to the dangers of eval.

Python type-hint friendly type that constrains possible values

I want a python type-hint friendly way to create a Type that has constrained range of values.
For example, a URL Type based on type str that would only accept strings that look like an "http" URL.
# this code is made up and will not compile
class URL(typing.NewType('_URL', str)):
def __init__(self, value: str, *args, **kwargs):
if not (value.startswith('http://') or value.startswith('https://')):
raise ValueError('string is not an acceptable URL')
overriding built-in immutable types works well
overriding str; http URL strings
Here is an example overriding str. This does not require the typing module but still works with type-hinting.
This str derived class asserts the initialized string looks like an http URL string.
class URL(str):
def __new__(cls, *value):
if value:
v0 = value[0]
if not type(v0) is str:
raise TypeError('Unexpected type for URL: "%s"' % type(v0))
if not (v0.startswith('http://') or v0.startswith('https://')):
raise ValueError('Passed string value "%s" is not an'
' "http*://" URL' % (v0,))
# else allow None to be passed. This allows an "empty" URL instance, e.g. `URL()`
# `URL()` evaluates False
return str.__new__(cls, *value)
This results in a class that will only allow some strings. Otherwise, it behaves like an immutable str instance.
# these are okay
URL()
URL('http://example.com')
URL('https://example.com')
URL('https://')
# these raise ValueError
URL('example') # ValueError: Passed string value "example" is not an "http*://" URL
URL('') # ValueError: Passed string value "" is not an "http*://" URL
# these evaluate as you would expect
for url in (URL(), # 'False'
URL('https://'), # 'True'
URL('https://example.com'), # 'True'
):
print('True') if url else print('False')
(update: later on I found the purl Python library)
Another example,
overriding int; constrained integer range Number
This int derived class only allows values 1 through 9 inclusive.
This has a special feature, too. In case an instance is initialized with nothing (Number()) then that value equates to 0 (this behavior is derived from the int class). In that case, the __str__ should be a '.' (program requirement).
class Number(int):
"""integer type with constraints; part of a Sudoku game"""
MIN = 1 # minimum
MAX = 9 # maximum
def __new__(cls, *value):
if value:
v0 = int(value[0])
if not (cls.MIN <= v0 <= cls.MAX):
raise ValueError('Bad value "%s" is not acceptable in'
' Sudoku' % (v0,))
# else:
# allow None to be passed. This allows an "empty" Number instance that
# evaluates False, e.g. `Number()`
return int.__new__(cls, *value)
def __str__(self):
"""print the Number accounting for an "empty" value"""
if self == 0:
return '.'
return int.__str__(self)
This ensures errant inputs are handled sooner rather than later. Otherwise, it behaves just like an int.
# these are okay
Number(1)
Number(9)
Number('9')
# this will evaluate True, just like an int
Number(9) == int(9)
Number('9') == int(9)
Number('9') == float(9)
# this is okay, it will evaluate False
Number()
print('True') if Number() else print('False') # 'False'
# these raise ValueError
Number(0) # ValueError: Bad value "0" is not acceptable in Sudoku
Number(11) # ValueError: Bad value "11" is not acceptable in Sudoku
Number('11') # ValueError: Bad value "11" is not acceptable in Sudoku
And the special "feature"
print(Number(1)) # '1' (expected)
print(Number()) # '.' (special feature)
Technique for inheriting immutable types is derived from this SO answer.
Subclassing builtin types can lead to some odd cases (consider code which checks exactly type(...) is str)
Here is a pure-typing approach which is typesafe and fully preserves the type of your strings:
from typing import NewType
_Url = NewType('_Url', str)
def URL(s: str) -> _Url:
if not s.startswith('https://'):
raise AssertionError(s)
return _Url(s)
print(type(URL('https://example.com')) is str) # prints `True`
The approach here "hides" the runtime checking behind a function which looks like a constructor from an api perspective, but in reality is just a tiny type (I couldn't find a canonical reference to "tiny types" this appears to just be the best resource I could find).

How to pass "random" amount of variables not all of them exist

I have a method to validate input:
def validate_user_input(*args):
for item in args:
if not re.match('^[a-zA-Z0-9_-]+$', item):
And I'm calling it like this:
validate_user_input(var1, var2, ..., var7)
But those are generated from user input, and some of those can be missing. What would be the proper way to do that, without creating tons of if statements?
Variables are assigned from a json input like so, and json input might not have some of the needed properties:
var1 = request.json.get('var1')
I assume they are <class 'NoneType'>
Here's the error: TypeError: expected string or buffer
If your request.json object is a dict or dict-like you can just pass a default value as second argument to get
If I understand correctly you are generating var_ variables by request.json.get('var_') which will either return a string which you want to validate or None if the field was missing.
If this is the case then you can just add a special case to validate_user_input for a None value:
def validate_user_input(*args):
for item in args:
if item is None:
continue #this is acceptable, don't do anything with it
elif not re.match('^[a-zA-Z0-9_-]+$', item):
...
Or it may make more sense to store all of the values you are interested in in a dictionary:
wanted_keys = {'var1','var2','var3'}
## set intersection works in python3
present_keys = wanted_keys & response.json.keys()
## or for python 2 use a basic list comp
#present_keys = [key for key in response.json.keys() if key in wanted_keys]
actual_data = {key: response.json[key] for key in present_keys}
Then you would pass actual_data.values() as the argument list to validate_user_input.
If it really is possible that some var-variables are undefined when you call validate_user_input, why not just initialize them all (e.g. to the empty string '' so that your regex fails) before actually defining them?

Categories

Resources