_='_=%r;print (_%%_) ';print (_%_)
(Edit: I have recieved your input and fixed the code, thanks for the correction.)
This is the shortest quine you can write in Python (I'm told). A quine being code that returns itself.
Can someone explain this line of code to me as if I know nothing about Python? I use Python 3.x by the way.
What I'm looking for is a character-by-character explanation of what's going on.
Thanks.
As pointed out in the comments, the correct quine is _='_=%r;print (_%%_) ';print (_%_), using this, let's begin:
The ; executes to commands in a line, so the following:
_='_=%r;print (_%%_) ';print (_%_)
is equivalent to:
_='_=%r;print (_%%_) '
print (_%_)
In the first line, _ is a valid variable name which is assigned the string '_=%r;print (_%%_) '
Using python's string formatting, we can inject variable into strings in a printf fashion:
>>> name = 'GNU'
>>> print('%s is Not Unix'%name)
GNU is Not Unix
>>> print('%r is Not Unix'%name)
'GNU' is Not Unix
%s uses a string, %r uses any object and converts the object to a representation through the repr() function.
Now imagine you want to print a % as well; a string such as GNU is Not Unix %. If you try the following,
>>> print('%s is Not Unix %'%name)
You will end up with a ValueError, so you would have to escape the % with another %:
>>> print('%s is Not Unix %%'%name)
GNU is Not Unix %
Back to the original code, when you use _%_, you are actually substituting the %r in the _='_=%r;print (_%%_) with itself and the %% would result in a % because the first one is treated as escape character and finally you are printing the whole result, so you would end up with:
_='_=%r;print (_%%_) ';print (_%_)
which is the exact replica of what produced it in the first place i.e. a quine.
Related
s='s=%r;print(s%%s)';print(s%s)
I understand % is to replace something in a string by s (but actually who to replace?)
Maybe more intriguing is, why the print(s%%s) become print(s%s) automatically after %s is replaced by s itself?
The "%%" you see in that code is a "conversion specifier" for the older printf-style of string formatting.
Most conversion specifiers tell Python how to convert an argument that is passed into the % format operator (for instance, "%d" says to convert the next argument to a decimal integer before inserting it into the string).
"%%" is different, because it directly converts to a single "%" character without consuming an argument. This conversion is needed in the format string specification, since otherwise any "%" would be taken as the first part of some other code and there would be no easy way to produce a string containing a percent sign.
The code you show is a quine (a program that produces its own code as its output). When it runs print(s%s), it does a string formatting operation where both the format string, and the single argument are the same string, s.
The "%r" in the string is a conversion specifier that does a repr of its argument. repr on a string produces the string with quotes around it. This is where the quoted string comes from in the output.
The "%%" produces the % operator that appears between the two s's in the print call. If only one "%" was included in s, you'd get an error about the formatting operation expecting a second argument (since %s is another conversion specifier).
print '% %s' % '' #wrong
print '%% %s' % '' #correct and print '% '
Think about \\ and \.
I had to rewrite my python script from python 3 to python2 and after that I got problem parsing special characters with ElementTree.
This is a piece of my xml:
<account number="89890000" type="Kostnad" taxCode="597" vatCode="">Avsättning egenavgifter</account>
This is the ouput when I parse this row:
('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avs\xc3\xa4ttning egenavgifter')
So it seems to be a problem with the character "ä".
This is how i do it in the code:
sys.setdefaultencoding( "UTF-8" )
xmltree = ET()
xmltree.parse("xxxx.xml")
printAccountPlan(xmltree)
def printAccountPlan(xmltree):
print("account:",str(i.attrib['number']), "AccountType:",str(i.attrib['type']),"Name:",str(i.text))
Anyone have an ide to get the ElementTree parse the charracter "ä", so the result will be like this:
('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')
You're running into two separate differences between Python 2 and Python 3 at the same time, which is why you're getting unexpected results.
The first difference is one you're probably already aware of: Python's print statement in version 2 became a print function in version 3. That change is creating a special circumstance in your case, which I'll get to a little later. But briefly, this is the difference in how 'print' works:
In Python 3:
>>> # Two arguments 'Hi' and 'there' get passed to the function 'print'.
>>> # They are concatenated with a space separator and printed.
>>> print('Hi', 'there')
>>> Hi there
In Python 2:
>>> # 'print' is a statement which doesn't need parenthesis.
>>> # The parenthesis instead create a tuple containing two elements
>>> # 'Hi' and 'there'. This tuple is then printed.
>>> print('Hi', 'there')
>>> ('Hi', 'there')
The second problem in your case is that tuples print themselves by calling repr() on each of their elements. In Python 3, repr() displays unicode as you want. But in Python 2, repr() uses escape characters for any byte values which fall outside the printable ASCII range (e.g., larger than 127). This is why you're seeing them.
You may decide to resolve this issue, or not, depending on what you're goal is with your code. The representation of a tuple in Python 2 uses escape characters because it's not designed to be displayed to an end-user. It's more for your internal convenience as a developer, for troubleshooting and similar tasks. If you're simply printing it for yourself, then you may not need to change a thing because Python is showing you that the encoded bytes for that non-ASCII character are correctly there in your string. If you do want to display something to the end-user which has the format of how tuples look, then one way to do it (which retains correct printing of unicode) is to manually create the formatting, like this:
def printAccountPlan(xmltree):
data = (i.attrib['number'], i.attrib['type'], i.text)
print "('account:', '%s', 'AccountType:', '%s', 'Name:', '%s')" % data
# Produces this:
# ('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')
I'm just starting to fool around with formatting the output of a print statement.
The examples I've seen have a % after the format list and before the arguments, like this:
>>> a=123
>>> print "%d" % a
123
What is the meaning of the % and more important, why is it necessary?
It's the string formatting operator, it tells Python to look at the string to the left, and build a new string where %-sequences in the string are replaced with formatted versions of the values from the right-hand side of the operator.
It's not "necessary", you can print values directly:
>>> print a
123
But it's nice to have printf()-style formatting available, and this is how you do it in Python.
As pointed out in a comment, note that the string formatting operator is not connected to print in any way, it's an operator just like any other. You can format a value into a string without printing it:
>>> a = 123
>>> padded = "%05d" % a
>>> padded
00123
In python the % operator is implemented by calling the method __mod__ on the left hand argument, falling back to __rmod__ on the right argument if it's not found. So what you have written is equivalent to
a = 123
print "%d".__mod__(a)
Python's string classes simply implement __mod__ to do string formatting.
Also note that this style of string formatting is referred to in the documentation as "old string formatting"; moving forward we should move to the new-style string formatting as described here: http://docs.python.org/library/stdtypes.html#str.format
like:
>>> a=123
>>> print "{0}".format(a)
123
See Format String Syntax for a description of the various
formatting options that can be specified in format strings.
This method of string formatting is the new standard in Python 3.0,
and should be preferred to the % formatting described in String
Formatting Operations in new code.
I wrote my module in Python 3.1.2, but now I have to validate it for 2.6.4.
I'm not going to post all my code since it may cause confusion.
Brief explanation:
I'm writing a XML parser (my first interaction with XML) that creates objects from the XML file. There are a lot of objects, so I have a 'unit test' that manually scans the XML and tries to find a matching object. It will print out anything that doesn't have a match.
I open the XML file and use a simple 'for' loop to read line-by-line through the file. If I match a regular expression for an 'application' (XML has different 'application' nodes), then I add it to my dictionary, d, as the key. I perform a lxml.etree.xpath() query on the title and store it as the value.
After I go through the whole thing, I iterate through my dictionary, d, and try to match the key to my value (I have to use the get() method from my 'application' class). Any time a mismatch is found, I print the key and title.
Python 3.1.2 has all matching items in the dictionary, so nothing is printed. In 2.6.4, every single value is printed (~600) in all. I can't figure out why my string comparisons aren't working.
Without further ado, here's the relevant code:
for i in d:
if i[1:-2] != d[i].get('id'):
print('X%sX Y%sY' % (i[1:-3], d[i].get('id')))
I slice the strings because the strings are different. Where the key would be "9626-2008olympics_Prod-SH"\n the value would be 9626-2008olympics_Prod-SH, so I have to cut the quotes and newline. I also added the Xs and Ys to the print statements to make sure that there wasn't any kind of whitespace issues.
Here is an example line of output:
X9626-2008olympics_Prod-SHX Y9626-2008olympics_Prod-SHY
Remember to ignore the Xs and Ys. Those strings are identical. I don't understand why Python2 can't match them.
Edit:
So the problem seems to be the way that I am slicing.
In Python3,
if i[1:-2] != d[i].get('id'):
this comparison works fine.
In Python2,
if i[1:-3] != d[i].get('id'):
I have to change the offset by one.
Why would strings need different offsets? The only possible thing that I can think of is that Python2 treats a newline as two characters (i.e. '\' + 'n').
Edit 2:
Updated with requested repr() information.
I added a small amount of code to produce the repr() info from the "2008olympics" exmpale above. I have not done any slicing. It actually looks like it might not be a unicode issue. There is now a "\r" character.
Python2:
'"9626-2008olympics_Prod-SH"\r\n'
'9626-2008olympics_Prod-SH'
Python3:
'"9626-2008olympics_Prod-SH"\n'
'9626-2008olympics_Prod-SH'
Looks like this file was created/modified on Windows. Is there a way in Python2 to automatically suppress '\r'?
You are printing i[1:-3] but comparing i[1:-2] in the loop.
Very Important Question
Why are you writing code to parse XML when lxml will do all that for you? The point of unit tests is to test your code, not to ensure that the libraries you are using work!
Russell Borogrove is right.
Python 3 defaults to unicode, and the newline character is correctly interpreted as one character. That's why my offset of [1:-2] worked in 3 because I needed to eliminate three characters: ", ", and \n.
In Python 2, the newline is being interpreted as two characters, meaning I have to eliminate four characters and use [1:-3].
I just added a manual check for the Python major version.
Here is the fixed code:
for i in d:
# The keys in D contain quotes and a newline which need
# to be removed. In v3, newline = 1 char and in v2,
# newline = 2 char.
if sys.version_info[0] < 3:
if i[1:-3] != d[i].get('id'):
print('%s %s' % (i[1:-3], d[i].get('id')))
else:
if i[1:-2] != d[i].get('id'):
print('%s %s' % (i[1:-2], d[i].get('id')))
Thanks for the responses everyone! I appreciate your help.
repr() and %r format are your friends ... they show you (for basic types like str/unicode/bytes) exactly what you've got, including type.
Instead of
print('X%sX Y%sY' % (i[1:-3], d[i].get('id')))
do
print('%r %r' % (i, d[i].get('id')))
Note leaving off the [1:-3] so that you can see what is in i before you slice it.
Update after comment "You are perfectly right about comparing the wrong slice. However, once I change it, python2.6 works, but python3 has the problem now (i.e. it doesn't match any objects)":
How are you opening the file (two answers please, for Python 2 and 3). Are you running on Windows? Have you tried getting the repr() as I suggested?
Update after actual input finally provided by OP:
If, as it appears, your input file was created on Windows (lines are separated by "\r\n"), you can read Windows and *x text files portably by using the "universal newlines" option ... open('datafile.txt', 'rU') on Python2 -- read this. Universal newlines mode is the default in Python3. Note that the Python3 docs say that you can use 'rU' also in Python3; this would save you having to test which Python version you are using.
I don't understand what you're doing exactly, but would you try using strip() instead of slicing and see whether it helps?
for i in d:
stripped = i.strip()
if stripped != d[i].get('id'):
print('X%sX Y%sY' % (stripped, d[i].get('id')))
I'm coming from a c# background, and I do this:
Console.Write("some text" + integerValue);
So the integer automatically gets converted to a string and it outputs.
In python I get an error when I do:
print 'hello' + 10
Do I have to convert to string everytime?
How would I do this in python?
String.Format("www.someurl.com/{0}/blah.html", 100);
I'm beginning to really like python, thanks for all your help!
From Python 2.6:
>>> "www.someurl.com/{0}/blah.html".format(100)
'www.someurl.com/100/blah.html'
To support older environments, the % operator has a similar role:
>>> "www.someurl.com/%d/blah.html" % 100
'www.someurl.com/100/blah.html'
If you would like to support named arguments, then you can can pass a dict.
>>> url_args = {'num' : 100 }
>>> "www.someurl.com/%(num)d/blah.html" % url_args
'www.someurl.com/100/blah.html'
In general, when types need to be mixed, I recommend string formatting:
>>> '%d: %s' % (1, 'string formatting',)
'1: string formatting'
String formatting coerces objects into strings by using their __str__ methods.[*] There is much more detailed documentation available on Python string formatting in the docs. This behaviour is different in Python 3+, as all strings are unicode.
If you have a list or tuple of strings, the join method is quite convenient. It applies a separator between all elements of the iterable.
>>> ' '.join(['2:', 'list', 'of', 'strings'])
'2: list of strings'
If you are ever in an environment where you need to support a legacy environment, (e.g. Python <2.5), you should generally avoid string concatenation. See the article referenced in the comments.
[*] Unicode strings use the __unicode__ method.
>>> u'3: %s' % ':)'
u'3: :)'
>>> "www.someurl.com/{0}/blah.html".format(100)
'www.someurl.com/100/blah.html'
you can skip 0 in python 2.7 or 3.1.
Additionally to string formatting, you can always print like this:
print "hello", 10
Works since those are separate arguments and print converts non-string arguments to strings (and inserts a space in between).
For string formatting that includes different types of values, use the % to insert the value into a string:
>>> intvalu = 10
>>> print "hello %i"%intvalu
hello 10
>>>
so in your example:
>>>print "www.someurl.com/%i/blah.html"%100
www.someurl.com/100/blah.html
In this example I'm using %i as the stand-in. This changes depending on what variable type you need to use. %s would be for strings. There is a list here on the python docs website.