functioning of replace() - python

print("abc".replace("","|")) #Explain this
#|a|b|c|
print("".replace("","abc"))
#abc
print("".replace("","abc",3))
#no output why? is this bug ?
I am really unable to understand this lines please explain it breefly...

In the first line you're trying to replace each nothing character with |, so the output should be and is a|b|c . If your code was like a b c, then your output would be |a| |b| |c|
Regarding the last line and your expected output which should be abcabcabc, the replace function replaces, not multiplies. So you can modify your code to thing like this, that first of all you replace your desired characters and then multiply them by 3 to reach what you want.
print("".replace("", "abc")*3)
Output is now abcabcabc.
But about your code, your telling Python interpreter that hey, find three '' and replace them by 'abc', but your code includes only one nothing and you cannot replace 3 of nothing by abc and get empty value.
That is not a bug in fact.
Edit
I searched a bit more and figured out Issue 28029 was a bug like your case in Python Bugs in Python version 3.8. I checked it again with Python 3.9 IDLE and now it is working fine:
print(''.replace('', 'abc', 3))
abc

According to doc (help(str.replace)):
replace(self, old, new, count=-1, /)
Return a copy with all occurrences of substring old replaced by new.
count
Maximum number of occurrences to replace.
-1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are
replaced.
Basically you're setting the limit of occurences to be replaced.
For what concern the first example:
it seems that print("abc".replace("","|")) fall in a special case handle by python developers like that. Checking the code here zero length special-case.
You switch from python to c, there you can read that when the string to be replaced has 0 length the function stringlib_replace_interleave is called.
The example said it will:
/* insert the 'to' bytes everywhere. */
/* >>> b"Python".replace(b"", b".") */
/* b'.P.y.t.h.o.n.' */
For all the other cases you can check string replace implementation.

Related

Python range().count() is like "contains()" (?)

i'm looking for verification for the following
count(i) function of Python range() counts the number of occurrences of the value i in the range it runs on, and thus returns either 0 or 1-- nothing else.
found no hint to anything otherwise in the docs read and the runs I made. wanna verify still-- a bit odd.
It is because range() objects conform to the Sequence ABC, and that ABC has a count() method.
So it is there just for completeness sake, so that the object qualifies as a sequence.
Also see the following link that it states the methods range has due to it being part of the Sequence ABC https://docs.python.org/3/library/stdtypes.html#typesseq

Find word/pattern/string in txt/xml-file and add incremental number

I'm using Textwrangler who falls short when it comes to adding numbers to replacements.
I have an xml-file with several strings containg the words:
generatoritem id="Outline Text"
I need to add an incrementing number at the end of each substitution, like so:
generatoritem id="Outline Text1"
So I need to replace 'Outline Text' with 'Outline Text' and an incrementing number.
I found an answer on a similar question and tried to type in this in textwrangler and hit Check Syntax:
perl -ple 's/"Outline Text"/$n++/e' /path/of/file.xml
Plenty of errors.. So I need to be explained this nice one liner. Or perhaps get a new one or a Python script?
-p makes perl read your file(s) one line at a time, and for each line it will execute the script and then emit the line as modified by your script. Note that there is an implicit use of a variable called $_ - it is used as the variable holding the line being read, it's also the default target for s/// and it's the default source for the print after each line.
You won't need the -l (the l in the middle of -ple) for the task you describe, so I won't bother going into it. Remove it.
The final flag -e (the e at the end of -ple) introduces your 'script' from the command line itself - ie allowing a quick and small use of perl without a source file.
Onto the guts of your script: it is wrong for the purpose you describe, and as an attempt it's also a bit unsafe.
If you want to change "Outline text" into something else, your current script replaces ALL of it with $n - which is not what you describe you want. A simple way to do exactly what you ask for is
s/(id="Outline text)(")/$1 . $n++ . $2/eg;
This matches the exact text you want, and notice that I'm also matching id= for extra certainty in case OTHER parts of your file contains "Outline text" - don't laugh, it can happen!
By putting ( ) around parts of the pattern, those bits are saved in variables known as $1, $2 etc. I am then using these in the replacement part. The . operator glues the pieces together, giving your result.
The /e at the end of the s/// means that the replacement is treated as a Perl expression, not just a plain replacement string. I've also added g which makes it match more than once on a line - you may have more than one interesting id= on a line in the input file, so be ready for it.
One final point. You seem to suggest you want to start numbering from 1, so replace $n++ with ++$n. For my suggested change, the variable $n will start as empty (effectively zero) it will be incremented to 1 (and 2, and 3 and ......) and THEN it's value will be used.

How to split a string of Python source code into Python "statements"?

Given a string s containing (syntactically valid) Python source code, how can I split s into an array whose elements are the strings corresponding to the Python "statements" in s?
I put scare-quotes around "statements" because this term does not capture exactly what I'm looking for. Rather than trying to come up with a more accurate wording, here's an example. Compare the following two ipython interactions:
In [1]: if 1 > 0:
......: pass
......:
In [2]: if 1 > 0
File "<ipython-input-1082-0b411f095922>", line 1
if 1 > 0
^
SyntaxError: invalid syntax
In the first interaction, after the first [RETURN] statement, ipython processes the input if 1 > 0: without objection, even though it is still incomplete (i.e. it is not a full Python statement). In contrast, in the second interaction, the input is not only incomplete (in this sense), but also not acceptable to ipython.
As a second, more complete example, suppose the file foo.py contains the following Python source code:
def print_vertically(s):
'''A pretty useless procedure.
Prints the characters in its argument one per line.
'''
for c in s:
print c
greeting = ('hello '
'world'.
upper())
print_vertically(greeting)
Now, if I ran the following snippet, featuring the desired split_python_source function:
src = open('foo.py').read()
for i, s in enumerate(split_python_source(src)):
print '%d. >>>%s<<<' % (i, s)
the output would look like this:
0. >>>def print_vertically(s):<<<
1. >>> '''A pretty useless procedure.
Prints the characters in its argument one per line.
'''<<<
2. >>> for c in s:<<<
3. >>> print c<<<
4. >>>greeting = ('hello '
'world'.
upper())<<<
5. >>>print_vertically(greeting)<<<
As you can see, in this splitting, for c in s: (for example) gets assigned to its own item, rather being part of some "compound statement."
In fact, I don't have a very precise specification for how the splitting should be done, as long as it is done "at the joints" (like ipython does).
I'm not familiar with the internals of the Python lexer (though almost certainly many people on SO are :), but my guess is that you're basically looking for lines, with one important exception : paired open-close delimiters that can span multiple lines.
As a quick and dirty first pass, you might be able to start with something that splits a piece of code on newlines, and then you could merge successive lines that are found to contain paired delimiters -- parentheses (), braces {}, brackets [], and quotes '', ''' ''' are the ones that come to mind.

Python: 2.6 and 3.1 string matching inconsistencies

I wrote my module in Python 3.1.2, but now I have to validate it for 2.6.4.
I'm not going to post all my code since it may cause confusion.
Brief explanation:
I'm writing a XML parser (my first interaction with XML) that creates objects from the XML file. There are a lot of objects, so I have a 'unit test' that manually scans the XML and tries to find a matching object. It will print out anything that doesn't have a match.
I open the XML file and use a simple 'for' loop to read line-by-line through the file. If I match a regular expression for an 'application' (XML has different 'application' nodes), then I add it to my dictionary, d, as the key. I perform a lxml.etree.xpath() query on the title and store it as the value.
After I go through the whole thing, I iterate through my dictionary, d, and try to match the key to my value (I have to use the get() method from my 'application' class). Any time a mismatch is found, I print the key and title.
Python 3.1.2 has all matching items in the dictionary, so nothing is printed. In 2.6.4, every single value is printed (~600) in all. I can't figure out why my string comparisons aren't working.
Without further ado, here's the relevant code:
for i in d:
if i[1:-2] != d[i].get('id'):
print('X%sX Y%sY' % (i[1:-3], d[i].get('id')))
I slice the strings because the strings are different. Where the key would be "9626-2008olympics_Prod-SH"\n the value would be 9626-2008olympics_Prod-SH, so I have to cut the quotes and newline. I also added the Xs and Ys to the print statements to make sure that there wasn't any kind of whitespace issues.
Here is an example line of output:
X9626-2008olympics_Prod-SHX Y9626-2008olympics_Prod-SHY
Remember to ignore the Xs and Ys. Those strings are identical. I don't understand why Python2 can't match them.
Edit:
So the problem seems to be the way that I am slicing.
In Python3,
if i[1:-2] != d[i].get('id'):
this comparison works fine.
In Python2,
if i[1:-3] != d[i].get('id'):
I have to change the offset by one.
Why would strings need different offsets? The only possible thing that I can think of is that Python2 treats a newline as two characters (i.e. '\' + 'n').
Edit 2:
Updated with requested repr() information.
I added a small amount of code to produce the repr() info from the "2008olympics" exmpale above. I have not done any slicing. It actually looks like it might not be a unicode issue. There is now a "\r" character.
Python2:
'"9626-2008olympics_Prod-SH"\r\n'
'9626-2008olympics_Prod-SH'
Python3:
'"9626-2008olympics_Prod-SH"\n'
'9626-2008olympics_Prod-SH'
Looks like this file was created/modified on Windows. Is there a way in Python2 to automatically suppress '\r'?
You are printing i[1:-3] but comparing i[1:-2] in the loop.
Very Important Question
Why are you writing code to parse XML when lxml will do all that for you? The point of unit tests is to test your code, not to ensure that the libraries you are using work!
Russell Borogrove is right.
Python 3 defaults to unicode, and the newline character is correctly interpreted as one character. That's why my offset of [1:-2] worked in 3 because I needed to eliminate three characters: ", ", and \n.
In Python 2, the newline is being interpreted as two characters, meaning I have to eliminate four characters and use [1:-3].
I just added a manual check for the Python major version.
Here is the fixed code:
for i in d:
# The keys in D contain quotes and a newline which need
# to be removed. In v3, newline = 1 char and in v2,
# newline = 2 char.
if sys.version_info[0] < 3:
if i[1:-3] != d[i].get('id'):
print('%s %s' % (i[1:-3], d[i].get('id')))
else:
if i[1:-2] != d[i].get('id'):
print('%s %s' % (i[1:-2], d[i].get('id')))
Thanks for the responses everyone! I appreciate your help.
repr() and %r format are your friends ... they show you (for basic types like str/unicode/bytes) exactly what you've got, including type.
Instead of
print('X%sX Y%sY' % (i[1:-3], d[i].get('id')))
do
print('%r %r' % (i, d[i].get('id')))
Note leaving off the [1:-3] so that you can see what is in i before you slice it.
Update after comment "You are perfectly right about comparing the wrong slice. However, once I change it, python2.6 works, but python3 has the problem now (i.e. it doesn't match any objects)":
How are you opening the file (two answers please, for Python 2 and 3). Are you running on Windows? Have you tried getting the repr() as I suggested?
Update after actual input finally provided by OP:
If, as it appears, your input file was created on Windows (lines are separated by "\r\n"), you can read Windows and *x text files portably by using the "universal newlines" option ... open('datafile.txt', 'rU') on Python2 -- read this. Universal newlines mode is the default in Python3. Note that the Python3 docs say that you can use 'rU' also in Python3; this would save you having to test which Python version you are using.
I don't understand what you're doing exactly, but would you try using strip() instead of slicing and see whether it helps?
for i in d:
stripped = i.strip()
if stripped != d[i].get('id'):
print('X%sX Y%sY' % (stripped, d[i].get('id')))

What's the Ruby equivalent of Python's output[:-1]?

In Python, if I want to get the first n characters of a string minus the last character, I do:
output = 'stackoverflow'
print output[:-1]
What's the Ruby equivalent?
I don't want to get too nitpicky, but if you want to be more like Python's approach, rather than doing "StackOverflow"[0..-2] you can do "StackOverflow"[0...-1] for the same result.
In Ruby, a range with 3 dots excludes the right argument, where a range with two dots includes it. So, in the case of string slicing, the three dots is a bit more close to Python's syntax.
Your current Ruby doesn't do what you describe: it cuts off the last character, but it also reverses the string.
The closest equivalent to the Python snippet would be
output = 'stackoverflow'
puts output[0...-1]
You originally used .. instead of ... (which would work if you did output[0..-2]); the former being closed–closed the latter being closed–open. Slices—and most everything else—in Python are closed–open.
"stackoverflow"[0..-2] will return "stackoverflo"
If all you want to do is remove the last character of the string, you can use the 'chop' method as well:
puts output.chop
or
puts output.chop!
If you only want to remove the last character, you can also do
output.chop

Categories

Resources