Variants of string concatenation? - python

Out of the following two variants (with or without plus-sign between) of string literal concatenation:
What's the preferred way?
What's the difference?
When should one or the other be used?
Should non of them ever be used, if so why?
Is join preferred?
Code:
>>> # variant 1. Plus
>>> 'A'+'B'
'AB'
>>> # variant 2. Just a blank space
>>> 'A' 'B'
'AB'
>>> # They seems to be both equal
>>> 'A'+'B' == 'A' 'B'
True

Juxtaposing works only for string literals:
>>> 'A' 'B'
'AB'
If you work with string objects:
>>> a = 'A'
>>> b = 'B'
you need to use a different method:
>>> a b
a b
^
SyntaxError: invalid syntax
>>> a + b
'AB'
The + is a bit more obvious than just putting literals next to each other.
One use of the first method is to split long texts over several lines, keeping
indentation in the source code:
>>> a = 5
>>> if a == 5:
text = ('This is a long string'
' that I can continue on the next line.')
>>> text
'This is a long string that I can continue on the next line.'
''join() is the preferred way to concatenate more strings, for example in a list:
>>> ''.join(['A', 'B', 'C', 'D'])
'ABCD'

The variant without + is done during the syntax parsing of the code. I guess it was done to let you write multiple line strings nicer in your code, so you can do:
test = "This is a line that is " \
"too long to fit nicely on the screen."
I guess that when it's possible, you should use the non-+ version, because in the byte code there will be only the resulting string, no sign of concatenation left.
When you use +, you have two string in your code and you execute the concatenation during runtime (unless interpreters are smart and optimize it, but I don't know if they do).
Obviously, you cannot do:
a = 'A'
ba = 'B' a
Which one is faster? The no-+ version, because it is done before even executing the script.
+ vs join -> If you have a lot of elements, join is prefered because it is optimised to handle many elements. Using + to concat multiple strings creates a lot of partial results in the process memory, while using join doesn't.
If you're going to concat just a couple of elements I guess + is better as it's more readable.

Related

How to print multiple letters in a string when combining two prompts [duplicate]

This question already has answers here:
Understanding slicing
(38 answers)
Closed 29 days ago.
I want to get a new string from the third character to the end of the string, e.g. myString[2:end]. If omitting the second part means 'to the end', and if you omit the first part, does it start from the start?
>>> x = "Hello World!"
>>> x[2:]
'llo World!'
>>> x[:2]
'He'
>>> x[:-2]
'Hello Worl'
>>> x[-2:]
'd!'
>>> x[2:-2]
'llo Worl'
Python calls this concept "slicing" and it works on more than just strings. Take a look here for a comprehensive introduction.
Just for completeness as nobody else has mentioned it. The third parameter to an array slice is a step. So reversing a string is as simple as:
some_string[::-1]
Or selecting alternate characters would be:
"H-e-l-l-o- -W-o-r-l-d"[::2] # outputs "Hello World"
The ability to step forwards and backwards through the string maintains consistency with being able to array slice from the start or end.
Substr() normally (i.e. PHP and Perl) works this way:
s = Substr(s, beginning, LENGTH)
So the parameters are beginning and LENGTH.
But Python's behaviour is different; it expects beginning and one after END (!). This is difficult to spot by beginners. So the correct replacement for Substr(s, beginning, LENGTH) is
s = s[ beginning : beginning + LENGTH]
A common way to achieve this is by string slicing.
MyString[a:b] gives you a substring from index a to (b - 1).
One example seems to be missing here: full (shallow) copy.
>>> x = "Hello World!"
>>> x
'Hello World!'
>>> x[:]
'Hello World!'
>>> x==x[:]
True
>>>
This is a common idiom for creating a copy of sequence types (not of interned strings), [:]. Shallow copies a list, see Python list slice syntax used for no obvious reason.
Is there a way to substring a string in Python, to get a new string from the 3rd character to the end of the string?
Maybe like myString[2:end]?
Yes, this actually works if you assign, or bind, the name,end, to constant singleton, None:
>>> end = None
>>> myString = '1234567890'
>>> myString[2:end]
'34567890'
Slice notation has 3 important arguments:
start
stop
step
Their defaults when not given are None - but we can pass them explicitly:
>>> stop = step = None
>>> start = 2
>>> myString[start:stop:step]
'34567890'
If leaving the second part means 'till the end', if you leave the first part, does it start from the start?
Yes, for example:
>>> start = None
>>> stop = 2
>>> myString[start:stop:step]
'12'
Note that we include start in the slice, but we only go up to, and not including, stop.
When step is None, by default the slice uses 1 for the step. If you step with a negative integer, Python is smart enough to go from the end to the beginning.
>>> myString[::-1]
'0987654321'
I explain slice notation in great detail in my answer to Explain slice notation Question.
I would like to add two points to the discussion:
You can use None instead on an empty space to specify "from the start" or "to the end":
'abcde'[2:None] == 'abcde'[2:] == 'cde'
This is particularly helpful in functions, where you can't provide an empty space as an argument:
def substring(s, start, end):
"""Remove `start` characters from the beginning and `end`
characters from the end of string `s`.
Examples
--------
>>> substring('abcde', 0, 3)
'abc'
>>> substring('abcde', 1, None)
'bcde'
"""
return s[start:end]
Python has slice objects:
idx = slice(2, None)
'abcde'[idx] == 'abcde'[2:] == 'cde'
You've got it right there except for "end". It's called slice notation. Your example should read:
new_sub_string = myString[2:]
If you leave out the second parameter it is implicitly the end of the string.
text = "StackOverflow"
#using python slicing, you can get different subsets of the above string
#reverse of the string
text[::-1] # 'wolfrevOkcatS'
#fist five characters
text[:5] # Stack'
#last five characters
text[-5:] # 'rflow'
#3rd character to the fifth character
text[2:5] # 'rflow'
#characters at even positions
text[1::2] # 'tcOefo'
If myString contains an account number that begins at offset 6 and has length 9, then you can extract the account number this way: acct = myString[6:][:9].
If the OP accepts that, they might want to try, in an experimental fashion,
myString[2:][:999999]
It works - no error is raised, and no default 'string padding' occurs.
Well, I got a situation where I needed to translate a PHP script to Python, and it had many usages of substr(string, beginning, LENGTH).
If I chose Python's string[beginning:end] I'd have to calculate a lot of end indexes, so the easier way was to use string[beginning:][:length], it saved me a lot of trouble.
str1='There you are'
>>> str1[:]
'There you are'
>>> str1[1:]
'here you are'
#To print alternate characters skipping one element in between
>>> str1[::2]
'Teeyuae'
#To print last element of last two elements
>>> str1[:-2:-1]
'e'
#Similarly
>>> str1[:-2:-1]
'e'
#Using slice datatype
>>> str1='There you are'
>>> s1=slice(2,6)
>>> str1[s1]
'ere '
Maybe I missed it, but I couldn't find a complete answer on this page to the original question(s) because variables are not further discussed here. So I had to go on searching.
Since I'm not yet allowed to comment, let me add my conclusion here. I'm sure I was not the only one interested in it when accessing this page:
>>>myString = 'Hello World'
>>>end = 5
>>>myString[2:end]
'llo'
If you leave the first part, you get
>>>myString[:end]
'Hello'
And if you left the : in the middle as well you got the simplest substring, which would be the 5th character (count starting with 0, so it's the blank in this case):
>>>myString[end]
' '
Using hardcoded indexes itself can be a mess.
In order to avoid that, Python offers a built-in object slice().
string = "my company has 1000$ on profit, but I lost 500$ gambling."
If we want to know how many money I got left.
Normal solution:
final = int(string[15:19]) - int(string[43:46])
print(final)
>>>500
Using slices:
EARNINGS = slice(15, 19)
LOSSES = slice(43, 46)
final = int(string[EARNINGS]) - int(string[LOSSES])
print(final)
>>>500
Using slice you gain readability.
a="Helloo"
print(a[:-1])
In the above code, [:-1] declares to print from the starting till the maximum limit-1.
OUTPUT :
>>> Hello
Note: Here a [:-1] is also the same as a [0:-1] and a [0:len(a)-1]
a="I Am Siva"
print(a[2:])
OUTPUT:
>>> Am Siva
In the above code a [2:] declares to print a from index 2 till the last element.
Remember that if you set the maximum limit to print a string, as (x) then it will print the string till (x-1) and also remember that the index of a list or string will always start from 0.
I have a simpler solution using for loop to find a given substring in a string.
Let's say we have two string variables,
main_string = "lullaby"
match_string = "ll"
If you want to check whether the given match string exists in the main string, you can do this,
match_string_len = len(match_string)
for index,value in enumerate(main_string):
sub_string = main_string[index:match_string_len+index]
if sub_string == match_string:
print("match string found in main string")

Python set constructor syntax

Does anyone know the justification for this confusing set construction syntax? I spent a day unable to find this bug because I missed a comma in constructing a set.
> {1 2}
SyntaxError: invalid syntax # This makes sense.
> {'a' 'b'} = set(['ab']) # This does not.
That's not a set construction syntax thing. You're running into implicit string literal concatenation, a confusing and surprising corner of the language:
>>> 'a' 'b'
'ab'
If you write two string literals next to each other, they're implicitly combined into one string. (This only works with literals; str(3) str([]) is a syntax error, not '3[]'.)
This has nothing to do with sets.
Two string literals separated by whitespace are considered one string literal.
rationale = ('This is quite useful when you need to construct '
'a long literal without useless "+" and without '
'the indentation and newlines which triple-quotes bring.')
Do you mean
>>> {'a' 'b'} == set(['ab'])
True
?
That's just because 2 strings are concatenated to 1 string:
>>> type('a' 'b')
<class 'str'>
>>> len('a' 'b')
2
>>> print('a' 'b')
ab

How do I strip a string given a list of unwanted characters? Python

Is there a way to pass in a list instead of a char to str.strip() in python? I have been doing it this way:
unwanted = [c for c in '!##$%^&*(FGHJKmn']
s = 'FFFFoFob*&%ar**^'
for u in unwanted:
s = s.strip(u)
print s
Desired output, this output is correct but there should be some sort of a more elegant way than how i'm coding it above:
oFob*&%ar
Strip and friends take a string representing a set of characters, so you can skip the loop:
>>> s = 'FFFFoFob*&%ar**^'
>>> s.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
(the downside of this is that things like fn.rstrip(".png") seems to work for many filenames, but doesn't really work)
Since, you are looking to not delete elements from the middle, you can just use.
>>> 'FFFFoFob*&%ar**^'.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
Otherwise, Use str.translate().
>>> 'FFFFoFob*&%ar**^'.translate(None, '!##$%^&*(FGHJKmn')
'oobar'

Python String Concatenation - concatenating '\n'

I am new to Python and need help trying to understand two problems i am getting relating to concatenating strings. I am aware that strings can be added to concatenate each other using + symbol like so.
>>> 'a' + 'b'
'ab'
However, i just recently found out you do not even need to use the + symbol to concatenate strings (by accident/fiddling around), which leads to my first problem to understand - How/why is this possible!?
>>> print 'a' + 'b'
ab
Furthermore, I also understand that the '\n' string produces a 'newline'. But when used in conjunction with my first problem. I get the following.
>>> print '\n' 'a'*7
a
a
a
a
a
a
a
So my second problem arises - "Why do i get 7 new lines of the letter 'a'. In other words, shouldn't the repeater symbol, *, repeat the letter 'a' 7 times!? As follows.
>>> print 'a'*7
aaaaaaa
Please help me clarify what is going on.
When "a" "b" is turned into "ab", this ins't the same as concatenating the strings with +. When the Python source code is being read, adjacent strings are automatically joined for convenience.
This isn't a normal operation, which is why it isn't following the order of operations you expect for + and *.
print '\n' 'a'*7
is actually interpreted the same as
print '\na'*7
and not as
print '\n' + 'a'*7
Python concatenates strings together when you do not separate them with a comma:
>>> print 'a' 'b'
ab
>>> print 'a', 'b'
a b
So you are actually printing '\na' 7 times.
I'm not sure what you mean by "how is it possible". You write a rule: two strings next to each other get concatenated. Then you implement it in the parser. Why? Because it allows you do conveniently do things like this:
re.findall('(<?=(foo))' # The first part of a complicated regexp
'>asdas s' # The next part
'[^asd]' # The last part
)
That way, you can describe just what you're doing.
When you do A * B + C, the computer always does A times B first, then adds C, because multiplication comes before addition.
When you do string concatenation by putting the string literals next to each other, and multiplication, the special string concatenation comes first. This means '\n' 'a' * 7 is the same as ('\n' 'a') * 7, so the string you're repeating is '\na'.
You've probably already realised that relying on the implicit concatenation of adjacent strings is sometimes problematic. Also, concatenating with the + operator is not efficient. It's not noticeable if joining only a few small strings, but it is very noticeable at scale.
Be explicit about it; use ''.join()
print '\n'.join(['a'*7])

Replacing reoccuring characters in strings in Python 3.1

Is it possible to replace a single character inside a string that occurs many times?
Input:
Sentence=("This is an Example. Thxs code is not what I'm having problems with.") #Example input
^
Sentence=("This is an Example. This code is not what I'm having problems with.") #Desired output
Replace the 'x' in "Thxs" with an i, without replacing the x in "Example".
You can do it by including some context:
s = s.replace("Thxs", "This")
Alternatively you can keep a list of words that you don't wish to replace:
whitelist = ['example', 'explanation']
def replace_except_whitelist(m):
s = m.group()
if s in whitelist: return s
else: return s.replace('x', 'i')
s = 'Thxs example'
result = re.sub("\w+", replace_except_whitelist, s)
print(result)
Output:
This example
Sure, but you essentially have to build up a new string out of the parts you want:
>>> s = "This is an Example. Thxs code is not what I'm having problems with."
>>> s[22]
'x'
>>> s[:22] + "i" + s[23:]
"This is an Example. This code is not what I'm having problems with."
For information about the notation used here, see good primer for python slice notation.
If you know whether you want to replace the first occurrence of x, or the second, or the third, or the last, you can combine str.find (or str.rfind if you wish to start from the end of the string) with slicing and str.replace, feeding the character you wish to replace to the first method, as many times as it is needed to get a position just before the character you want to replace (for the specific sentence you suggest, just one), then slice the string in two and replace only one occurrence in the second slice.
An example is worth a thousands words, or so they say. In the following, I assume you want to substitute the (n+1)th occurrence of the character.
>>> s = "This is an Example. Thxs code is not what I'm having problems with."
>>> n = 1
>>> pos = 0
>>> for i in range(n):
>>> pos = s.find('x', pos) + 1
...
>>> s[:pos] + s[pos:].replace('x', 'i', 1)
"This is an Example. This code is not what I'm having problems with."
Note that you need to add an offset to pos, otherwise you will replace the occurrence of x you have just found.

Categories

Resources