Python string formatting when string contains "%s" without escaping - python

When formatting a string, my string may contain a modulo "%" that I do not wish to have converted. I can escape the string and change each "%" to "%%" as a workaround.
e.g.,
'Day old bread, 50%% sale %s' % 'today!'
output:
'Day old bread, 50% sale today'
But are there any alternatives to escaping? I was hoping that using a dict would make it so Python would ignore any non-keyword conversions.
e.g.,
'Day old bread, 50% sale %(when)s' % {'when': 'today'}
but Python still sees the first modulo % and gives a:
TypeError: not enough arguments for format string

You could (and should) use the new string .format() method (if you have Python 2.6 or higher) instead:
"Day old bread, 50% sale {0}".format("today")
The manual can be found here.
The docs also say that the old % formatting will eventually be removed from the language, although that will surely take some time. The new formatting methods are way more powerful, so that's a Good Thing.

Not really - escaping your % signs is the price you pay for using string formatting. You could use string concatenation instead: 'Day old bread, 50% sale ' + whichday if that helps...

Escaping a '%' as '%%' is not a workaround. If you use String formatting that is the way to represent a '%' sign. If you don't want that, you can always do something like:
print "Day old bread, 50% sale " + "today"
e.g. not using formatting.
Please note that when using string concatenation, be sure that the variable is a string (and not e.g. None) or use str(varName). Otherwise you get something like 'Can't concatenate str and NoneType'.

You can use regular expressions to replace % by %% where % is not followed by (
def format_with_dict(str, dictionary):
str = re.sub(r"%([^\(])", r"%%\1", str)
str = re.sub(r"%$", r"%%", str) # There was a % at the end?
return str % dictionary
This way:
print format_with_dict('Day old bread, 50% sale %(when)s', {'when': 'today'})
Will output:
Day old bread, 50% sale today
This method is useful to avoid "not enough arguments for format string" errors.

Related

Can modulo (%s) be placed in string when first creating a variable and then used later on?

I'm quite new to python, so forgive me if this is a silly question. I know how to use the modulo in strings in this fashion
Me = "I'm %s and I like to %s" % ('Mike', 'code')
However, through my searching I haven't found an answer to whether or not it's possible to hardcode modulos into a string, then take advantage of it later.
Example:
REPO_MENU = {'Issues':Api.github/repo/%s/branch/%s/issues,
'Pull Requests':'Api.github/repo/%s/branch/%s/pull_requests',
'Commits':'Api.github/repo/%s/branch/%s/commits'
'<FILTER>: Branch':'Api.github/repo/%s/branch/%s'
}
for key, value in REPO_MENU.items():
Print value % ('Beta', 'master')
Will that format work? Is it good practice to use this method? I feel it could be beneficial in a lot of situations.
This does work. You can also use the format function, which works well. For example:
menu1 = {'start':'hello_{0}_{1}',
'end':'goodbye_{0}_{1}'}
menu2 = {'start':'hello_%s_%s',
'end':'goodbye_%s_%s'}
for key, value in menu1.items():
print value.format('john','smith')
for key, value in menu2.items():
print value %('john','smith')
% is an operator like any other; when its left-hand operand is a string, it attempts to replace various placeholders with values from its right-hand operand. It doesn't matter if the left-hand operand is a string literal or a more complex expression, as long as it evaluates to a string.
As the other answers have noted, you can definitely perform the string-modulo operation multiple times on the same string. However, if you are using Python 3.6 (and if you can, you definitely SHOULD!), I suggest that you use fstrings rather than the string-modulo or .format. They are faster, easier to read, and very convenient:
A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'. These strings may contain replacement fields, which are expressions delimited by curly braces {}. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.
So the f-string is also portable, just like the other formatting options.
E.g.:
>>> value = f'A {flower.lower()} by any name would smell as sweet.'
>>> flower = 'ROSE'
>>> print(value)
A rose by any name would smell as sweet.
>>> flower = 'Petunia'
>>> print(value)
A petunia by any name would smell as sweet.
>>> flower = 'Ferrari'
>>> print(value)
A ferrari by any name would smell as sweet.
You can add this at the top of any module using the f-string as a helpful alert for other users (or future-you):
try:
eval(f'')
except SyntaxError:
print('Python 3.6+ required.')`.
raise

Are there other ways to format strings other then comma, percent, plus sign?

I've been looking around and I've been unable to find a definitive answer to this question: what's the recommended way to print variables in Python?
So far, I've seen three ways: using commas, using percent signs, or using plus signs:
>>> a = "hello"
>>> b = "world"
>>> print a, "to the", b
hello to the world
>>> print "%s to the %s" % (a, b)
hello to the world
>>> print a + " to the " + b
hello to the world
Each method seems to have its pros and cons.
Commas allow to write the variable directly and add spaces, as well as automatically perform a string conversion if needed. But I seem to remember that good coding practices say that it's best to separate your variables from your text.
Percent signs allow that, though they require to use a list when there's more than one variable, and you have to write the type of the variable (though it seems able to convert even if the variable type isn't the same, like trying to print a number with %s).
Plus signs seem to be the "worst" as they mix variables and text, and don't convert on the fly; though maybe it is necessary to have more control on your variable from time to time.
I've looked around and it seems some of those methods may be obsolete nowadays. Since they all seem to work and each have their pros and cons, I'm wondering: is there a recommended method, or do they all depend on the context?
Including the values from identifiers inside a string is called string formatting. You can handle string formatting in different ways with various pros and cons.
Using string concatenation (+)
Con: You must manually convert objects to strings
Pro: The objects appear where you want to place the into the string
Con: The final layout may not be clear due to breaking the string literal
Using template strings (i.e. $bash-style substitution):
Pro: You may be familiar with shell variable expansion
Pro: Conversion to string is done automatically
Pro: Final layout is clear.
Con: You cannot specify how to perform the conversion
Using %-style formatting:
Pro: similar to formatting with C's printf.
Pro: conversions are done for you
Pro: you can specify different type of conversions, with some options (e.g. precision for floats)
Pro: The final layout is clear
Pro: You can also specify the name of the elements to substitute as in: %(name)s.
Con: You cannot customize handling of format specifiers.
Con: There are some corner cases that can puzzle you. To avoid them you should always use either tuple or dict as argument.
Using str.format:
All the pros of %-style formatting (except that it is not similar to printf)
Similar to .NET String.Format
Pro: You can manually specify numbered fields which allows you to use a positional argument multiple times
Pro: More options in the format specifiers
Pro: You can customize the formatting specifiers in custom types
The commas do not do string-formatting. They are part of the print statement statement syntax.
They have a softspace "feature" which is gone in python3 since print is a function now:
>>> print 'something\t', 'other'
something other
>>> print 'something\tother'
something other
Note how the above outputs are exactly equivalent even though the first one used comma.
This is because the comma doesn't introduce whitespace in certain situations (e.g. right after a tab or a newline).
In python3 this doesn't happen:
>>> print('something\t', 'other')
something other
>>> print('something\tother') # note the difference in spacing.
something other
Since python2.6 the preferred way of doing string formatting is using the str.format method. It was meant to replace the %-style formatting, even though currently there are no plans (and I don't there will ever be) to remove %-style formatting.
string.format() basics
Here are a couple of example of basic string substitution, the {} is the placeholder for the substituted variables. If no format is specified, it will insert and format as a string.
s1 = "so much depends upon {}".format("a red wheel barrow")
s2 = "glazed with {} water beside the {} chickens".format("rain", "white")
You can also use the numeric position of the variables and change them in the strings, this gives some flexibility when doing the formatting, if you made a mistake in the order you can easily correct without shuffling all variables around.
s1 = " {0} is better than {1} ".format("emacs", "vim")
s2 = " {1} is better than {0} ".format("emacs", "vim")
The format() function offers a fair amount of additional features and capabilities, here are a few useful tips and tricks using .format()
Named Arguments
You can use the new string format as a templating engine and use named arguments, instead of requiring a strict order.
madlib = " I {verb} the {object} off the {place} ".format(verb="took", object="cheese", place="table")
>>> I took the cheese off the table
Reuse Same Variable Multiple Times
Using the % formatter, requires a strict ordering of variables, the .format() method allows you to put them in any order as we saw above in the basics, but also allows for reuse.
str = "Oh {0}, {0}! wherefore art thou {0}?".format("Romeo")
>>> Oh Romeo, Romeo! wherefore art thou Romeo?
Use Format as a Function
You can use .format as a function which allows for some separation of text and formatting from code. For example at the beginning of your program you could include all your formats and then use later. This also could be a nice way to handle internationalization which not only requires different text but often requires different formats for numbers.
email_f = "Your email address was {email}".format
print(email_f(email="bob#example.com"))
Escaping Braces
If you need to use braces when using str.format(), just double up
print(" The {} set is often represented as {{0}} ".format("empty"))
>>> The empty set is often represented as {0}
the question is, wether you want print variables (case 1) or want to output formatted text (case 2). Case one is good and easy to use, mostly for debug output.
If you like to say something in a defined way, formatting is the better choice. '+' is not the pythonic way of string maipulation.
An alternative to % is "{0} to the {1}".format(a,b) and is the preferred way of formatting since Python 3.
Depends a bit on which version.
Python 2 will be simply:
print 'string'
print 345
print 'string'+(str(345))
print ''
Python 3 requires parentheses (wish it didn't personally)
print ('string')
print (345)
print ('string'+(str(345))
Also the most foolproof method to do it is to convert everything into a variable:
a = 'string'
b = 345
c = str(345)
d = a + c

Inserting string variables into print in python

I know of two ways to format a string:
print 'Hi {}'.format(name)
print 'Hi %s' % name
What are the relative dis/advantages of using either?
I also know both can efficiently handle multiple parameters like
print 'Hi %s you have %d cars' % (name, num_cars)
and
print 'Hi {0} and {1}'.format('Nick', 'Joe')
There is not really any difference between the two string formatting solutions.
{} is usually referred to as "new-style" and %s is "old string formatting", but old style formatting isn't going away any time soon.
The new style formatting isn't supported everywhere yet though:
logger.debug("Message %s", 123) # Works
logger.debug("Message {}", 123) # Does not work.
Nevertheless, I'd recommend using .format. It's more feature-complete, but there is not a huge difference anyway.
It's mostly a question of personal taste.
I use the "old-style" so I can recursively build strings with strings. Consider...
'%s%s%s'
...this represents any possible string combination you can have. When I'm building an output string of N size inputs, the above lets me recursively go down each root and return up.
An example usage is my Search Query testing (Quality Assurance). Starting with %s I can make any possible query.
/.02

Python string interpolation and tuples

Which of these Python string interpolations is proper (not a trick question)?
'%s' % my_string
'%s' % (my_string)
'%s' % (my_string, )
If it varies by version, please summarize.
Old format
The first one is the most common one. The third has unnecessary parenthesis and doesn't help the legibility if you've only one object you want to use in your format-string. The second is just plain silly, because that's not even a tuple.
New format
Nowadays starting with Python 2.6, there's a new and recommended way of formatting strings using the .format-method:
In your case you would use:
'{}'.format(my_string)
The advantage of the new format-syntax are that you enter more advanced formattings in the format string.
All three of these are equivalent.
The first two are exactly equivalent, in fact. Putting brackets around something does not make it a tuple: putting a comma after it does that. So the second one evaluates to the first.
The third is also valid: using a tuple is the normal way of doing string substitution, but as a special case Python allows a single value if there is only one element to be substituted.

better way to pass data to print in python

I was going through http://web2py.com/book/default/chapter/02 and found this:
>>> print 'number is ' + str(3)
number is 3
>>> print 'number is %s' % (3)
number is 3
>>> print 'number is %(number)s' % dict(number=3)
number is 3
It has been given that The last notation is more explicit and less error prone, and is to be preferred.
I am wondering what is the advantage of using the last notation.. will it not have a performance overhead?
>>> print 'number is ' + str(3)
number is 3
This is definitely the worst solution and might cause you problems if you do the beginner mistake "Value of obj: " + obj where obj is not a string or unicode object. For many concatenations, it's not readable at all - it's similar to something like echo "<p>Hello ".$username."!</p>"; in PHP (this can get arbitrarily ugly).
print 'number is %s' % (3)
number is 3
Now that is much better. Instead of a hard-to-read concatenation, you see the output format immediately. Coming back to the beginner mistake of outputting values, you can do print "Value of obj: %r" % obj, for example. I personally prefer this in most cases. But note that you cannot use it in gettext-translated strings if you have multiple format specifiers because the order might change in other languages.
As you forgot to mention it here, you can also use the new string formatting method which is similar:
>>> "number is {0}".format(3)
'number is 3'
Next, dict lookup:
>>> print 'number is %(number)s' % dict(number=3)
number is 3
As said before, gettext-translated strings might change the order of positional format specifiers, so this option is the best when working with translations. The performance drop should be negligible - if your program is not all about formatting strings.
As with the positional formatting, you can also do it in the new style:
>>> "number is {number}".format(number=3)
'number is 3'
It's hard to tell which one to take. I recommend you to use positional arguments with the % notation for simple strings and dict lookup formatting for translated strings.
I can think of a few differences.
First to me is cumbersome, if more than one variable is involved. I can not speak of performance penalty on that. See additional arguments below.
The second example is positional dependent and it can be easy to change position causing errors. It also does not tell you anything about the variables.
The third example, the position of variables is not important. You use a dictionary. This makes it elegant as it does not rely on positional structuring of variables.
See the example below:
>>> print 'number is %s %s' % (3,4)
number is 3 4
>>> print 'number is %s %s' % (4,3)
number is 4 3
>>> print 'number is %(number)s %(two)s' % dict(number=3, two=4)
number is 3 4
>>> print 'number is %(number)s %(two)s' % dict(two=4, number=3)
number is 3 4
>>>
Also another part of discussion on this
"+" is the string concatenation operator.
"%" is string formatting.
In this trivial case, string formatting accomplishes the same result as concatenation. Unlike string formatting, string concatenation only works when everything is already a string. So if you miss to convert your variables to string, concatenation will cause error.
[Edit: My answer was biased towards templating since the question came from web2py where templates are so commonly involved]
As Ryan says below, the concatenation is faster than formatting.
Suggestion is
Use the first form - concatenation, if you are concatenating just two strings
Use the second form, if there are few variables. You can invariably see the positions and deal with them
Use the third form when you are doing templating i.e. formatting a large piece of string with variable data. The dictionary form helps in providing meaning to variables inside the large piece of text.
I am wondering what is the advantage
of using the last notation..
Hm, as you said, the last notation is really more explicit and actually is less error prone.
will it not have a performance
overhead?
It will have little performance overhead, but it's minor if compared with data fetching from DB or network connections.
It's a bad, unjustified piece of advice.
The third method is cumbersome, violates DRY, and error prone, except if:
You are writing a framework which don't have control over the format string. For example, logging module, web2py, or gettext.
The format string is extremely long.
The format string is read from a file from a config file.
The problem with the third method should be obvious when you consider that foo appears three times in this code: "%(foo)s" % dict(foo=foo). This is error prone. Most programs should not use the third method, unless they know they need to.
The second method is the simplest method, and is what you generally use in most programs. It is best used when the format string is immediate, e.g. 'values: %s %s %s' % (a, b, c) instead of taken from a variable, e.g. fmt % (a, b, c).
The first concatenation is almost never useful, except perhaps if you're building list by loops:
s = ''
for x in l:
s += str(x)
however, in that case, it's generally better and faster to use str.join():
s = ''.join(str(x) for x in l)

Categories

Resources