f.write vs print >> f - python

There are at least two ways to write to a file in python:
f = open(file, 'w')
f.write(string)
or
f = open(file, 'w')
print >> f, string # in python 2
print(string, file=f) # in python 3
Is there a difference between the two? Or is any one more Pythonic? I'm trying to write a bunch of HTML to file so I need a bunch of write/print statements through my file(but I don't need a templating engine).

print does things file.write doesn't, allowing you to skip string formatting for some basic things.
It inserts spaces between arguments and appends the line terminator.
print "a", "b" # prints something like "a b\n"
It calls the __str__ or __repr__ special methods of an object to convert it to a string.
print 1 # prints something like "1\n"
You would have to manually do these things if you used file.write instead of print.

I disagree somewhat with several of the opinions expressed here, that print >> f is redundant and should be avoided in favour of f.write.
print and file.write are quite different operations. file.write just directly writes a string to a file. print is more like "render values to stdout as text". Naturally, the result of rendering a string as text is just the string, so print >> f, my_string and f.write(my_string) are nearly interchangeable (except for the addition of a newline). But your choice between file.write and print should normally be based on what you're doing; are you writing a string to a file, or are you rendering values to a file?
Sure, print is not strictly necessary, in that you can implement it with file.write. But then file.write is not strictly necessary, because you can implement it with the operations in os for dealing with file descriptors. Really they're operations on different levels, and you should use whichever is more most appropriate for your use (normally the level other nearby code is working on, or the highest level that doesn't get in your way).
I do feel that the print >> f syntax is fairly horrible, and is a really good example of why print should have been a function all along. This is much improved in Python 3. But even if you're writing Python 2 code that you're planning to port to Python 3, it is much easier to convert print >> f, thing1, thing2, thing3, ... to print(thing1, thing2, thing3, file=f) than it is to convert the circumlocution where you roll your own code to do the equivalent of print's rendering and then call f.write(text). I'm pretty sure the semi-automatic converter from Python 2 to Python 3 will even do the conversion for you, which it couldn't possibly do if you avoid the print >> f form.
Bottom line: use print to render values to stdout (or to a file). Use f.write to write text to a file.

Agree with #agf
I preferred print(..., file=f) because of its flexibility.
with open('test.txt', 'w') as f:
print('str', file=f)
print(1, 20, file=f)
It is also easy to convert existing print command.
write accepts only string.

You should not do either of those things. The most Pythonic thing to do is use the Python 3 print function (as opposed to the Python 2 print statement):
f = open(file, 'w')
print(string, file=f)
Of course the ideal way to do this is to just use Python 3. But if you're stuck using Python 2 you can turn it on using a future statement at the top of the file:
from __future__ import print_function
Note that this changes print in other ways, most obviously in that you need to add brackets around its arguments. But the changes are all improvments, which is the whole reason for the change in Python 3. While you're at it, consider using all the future statements to get as many backported improvements from Python 3 as possible.

The documentation on print might help explain this: print statement (Python 2.7 documentation).
print, by default, prints to standard output, which in fact is a "file-like" object (sys.stdout). The standard output itself has a write() method. Using print >> f seems to be an unnecessary abstraction.
Also, it seems too verbose to me. f.write() is fine.

As a bottonline: do use file.write when writing to files.
The ">>" idiom for printing was borrowed from C++ in Python's early days, and is rather unpythonic - so much that it no longer exists in Python 3.x - where one can use the print, now a function instead of a statement, to write to a file - but with no special syntax for that.
As #agf points in his answer, using "print" to write to a file does more things than simply calling write - it automatically calls str(obj) to get a string representation of the object, whereas .write require that a (byte) string be passed as parameter - in Python world "explicit is better than implicit", so one more motive for one to go with file.write instead.

This is the preferred way, using context handlers:
with open(file, 'w') as f:
f.write(string)
On python 2 I prefer file.write because the >> syntax is deprecated. For python 3 you might prefer to use the print function instead, which you should note does some extra things (for example automatically convert numbers to strings for you, etc).

Related

Writing unicode symbols to files (as opposed to unicode code)

I'm new to python and unicode is starting to give me headaches.
Currently I write to file like this:
my_string = "马/馬"
f = codecs.open(local_filepath, encoding='utf-8', mode='w+')
f.write(my_string)
f.close()
And when I open file with i.e. Gedit, I can see something like this:
\u9a6c/\u99ac\tm\u01ce
While I'd like to see exactly what I've written:
马/馬
I've tried a few different variations, like writing my_string.decode() or my_string.encode('utf-8') instead of just my_string, I know those two methods are the opposites but I was not sure which one I needed. Neither worked anyway.
If I manually write these symbols to text file, then with python read the file, re-write what I've just read back to the same file and save, symbols get turned to the code \u9a6c. Not sure if this is importat, figured I'd just mention it to help identify the problem.
Edit: the strings came from SQL Alchemy objects repr method, which turned out to be where the problem lied. I didn't mention it because it just didn't occur to me it can be related to the problem somehow. Thanks again for your help!
From the comments it is now clear you are using either the repr() function or calling the object.__repr__() method directly.
Don't do that. You are writing debugging information to your file:
>>> my_string = u"马/馬"
>>> print repr(my_string)
u'\u9a6c/\u99ac'
The value produced is meant to be pastable back into a Python session so you can re-produce the exact same value, and as such it is ASCII-safe (so it can be used in Python 2 source code without encoding issues).
From the repr() documentation:
For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.
Write the Unicode objects to your file directly instead, codecs.open() handles encoding to UTF-8 correctly if you do.

Change default encoding only for printing

Is there a way, I can change the default encoding in python only for printing?
Can the behaviour of the print statement be changed in general?
I don't want to do it with sys.setdefaultencoding(), because this would change it for the whole script, but I don't know if every module I use supports unicode...
I know I could do it with print u'äöü'.encode('utf-8') but it would be awful to use it everytime...
Any suggestions?
While I don't think you can do it just for printing (using print explicitly), you probably can accomplish what you want using
sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
This changes the encoding for all "normal" program output. If you're not familiar with them, you may want to read this article on standard streams

Reading a line from standard input in Python

What (if any) are the differences between the following two methods of reading a line from standard input: raw_input() and sys.stdin.readline() ? And in which cases one of these methods is preferable over the other ?
raw_input() takes an optional prompt argument. It also strips the trailing newline character from the string it returns, and supports history features if the readline module is loaded.
readline() takes an optional size argument, does not strip the trailing newline character and does not support history whatsoever.
Since they don't do the same thing, they're not really interchangeable. I personally prefer using raw_input() to fetch user input, and readline() to read lines out of a file.
"However, from the point of view of many Python beginners and educators, the use of sys.stdin.readline() presents the following problems:
Compared to the name "raw_input", the name "sys.stdin.readline()" is clunky and inelegant.
The names "sys" and "stdin" have no meaning for most beginners, who are mainly interested in what the function does, and not where in the package structure it is located. The lack of meaning also makes it difficult to remember: is it "sys.stdin.readline()", or " stdin.sys.readline()"? To a programming novice, there is not any obvious reason to prefer one over the other. In contrast, functions simple and direct names like print, input, and raw_input, and open are easier to remember." from here: http://www.python.org/dev/peps/pep-3111/

Shortest Python Quine?

Python 2.x (30 bytes):
_='_=%r;print _%%_';print _%_
Python 3.x (32 bytes)
_='_=%r;print(_%%_)';print(_%_)
Is this the shortest possible Python quine, or can it be done better? This one seems to improve on all the entries on The Quine Page.
I'm not counting the trivial 'empty' program.
I'm just going to leave this here (save as exceptionQuine.py):
File "exceptionQuine.py", line 1
File "exceptionQuine.py", line 1
^
IndentationError: unexpected indent
Technically, the shortest Python quine is the empty file. Apart from this trivial case:
Since Python's print automatically appends a newline, the quine is actually _='_=%r;print _%%_';print _%_\n (where \n represents a single newline character in the file).
Both
print open(__file__).read()
and anything involving import are not valid quines, because a quine by definition cannot take any input. Reading an external file is considered taking input, and thus a quine cannot read a file -- including itself.
For the record, technically speaking, the shortest possible quine in python is a blank file, but that is sort of cheating too.
In a slightly non-literal approach, taking 'shortest' to mean short in terms of the number of statements as well as just the character count, I have one here that doesn't include any semicolons.
print(lambda x:x+str((x,)))('print(lambda x:x+str((x,)))',)
In my mind this contends, because it's all one function, whereas others are multiple. Does anyone have a shorter one like this?
Edit: User flornquake made the following improvement (backticks for repr() to replace str() and shave off 6 characters):
print(lambda x:x+`(x,)`)('print(lambda x:x+`(x,)`)',)
Even shorter:
print(__file__[:-3])
And name the file print(__file__[:-3]).py (Source)
Edit: actually,
print(__file__)
named print(__file__) works too.
Python 3.8
exec(s:='print("exec(s:=%r)"%s)')
Here is another similar to postylem's answer.
Python 3.6:
print((lambda s:s%s)('print((lambda s:s%%s)(%r))'))
Python 2.7:
print(lambda s:s%s)('print(lambda s:s%%s)(%r)')
I would say:
print open(__file__).read()
Source
As of Python 3.8 I have a new quine! I'm quite proud of it because until now I have never created my own. I drew inspiration from _='_=%r;print(_%%_)';print(_%_), but made it into a single function (with only 2 more characters). It uses the new walrus operator.
print((_:='print((_:=%r)%%_)')%_)
This one is least cryptic, cor is a.format(a)
a="a={1}{0}{1};print(a.format(a,chr(34)))";print(a.format(a,chr(34)))
I am strictly against your solution.
The formatting prarameter % is definitively a too advanced high level language function. If such constructs are allowed, I would say, that import must be allowed as well. Then I can construct a shorter Quine by introducing some other high level language construct (which, BTW is much less powerful than the % function, so it is less advanced):
Here is a Unix shell script creating such a quine.py file and checking it really works:
echo 'import x' > quine.py
echo "print 'import x'" > x.py
python quine.py | cmp - quine.py; echo $?
outputs 0
Yes, that's cheating, like using %. Sorry.

Awk, bash or python for converting a regular file?

I have a text file with lots of lines and with this structure:
[('name_1a',
'name_1b',
value_1),
('name_2a',
'name_2b',
value_2),
.....
.....
('name_XXXa',
'name_XXXb',
value_XXX)]
I would like to convert it to:
name_1a, name_1b, value_1
name_2a, name_2b, value_2
......
name_XXXa, name_XXXb, value_XXX
I wonder what would be the best way, whether awk, python or bash.
Thanks
Jose
Tried evaluating it python? Looks like a list of tuples to me.
eval(your_string)
Note, it's massively unsafe! If there's code in there to delete your hard disk, evaluating it will run that code!
I would like to use Python:
lines = open('filename.txt','r').readlines()
n = len(lines) # n % 3 == 0
for i in range(0,n,3):
name1 = lines[i].strip("',[]\n\r")
name2 = lines[i+1].strip("',[]\n\r")
value = lines[i+2].strip("',[]\n\r")
print name1,name2,value
It looks like legal Python. You might be able to just import it as a module and then write it back out after formatting it.
Oh boy, here is a job for ast.literal_eval:
(literal_eval is safer than eval, since it restricts the input string to literals such as strings, numbers, tuples, lists, dicts, booleans and None:
import ast
filename='in'
with open(filename,'r') as f:
contents=f.read()
data=ast.literal_eval(contents)
for elt in data:
print(', '.join(map(str,elt)))
here's one way to do it with (g)awk
$ awk -vRS=")," ' { gsub(/\n|[\047\]\[)(]/,"") } 1' file
name_1a,name_1b,value_1
name_2a,name_2b,value_2
name_XXXa,name_XXXb,value_XXX
Awk is typically line oriented, and bash is a shell, with limited numbrer of string manipulation functions. It really depends on where your strength as a programmer lies, but all other things being equal, I would choose python.
Did you ever consider that by redirecting the time it took to post this on SO, you could have had it done?
"AWK is a language for processing
files of text. A file is treated as a
sequence of records, and by default
each line is a record. Each line is
broken up into a sequence of fields,
so we can think of the first word in a
line as the first field, the second
word as the second field, and so on.
An AWK program is of a sequence of
pattern-action statements. AWK reads
the input a line at a time. A line is
scanned for each pattern in the
program, and for each pattern that
matches, the associated action is
executed." - Alfred V. Aho[2]
Asking what's the best language for doing a given task is a very different question to say, asking: 'what's the best way of doing a given task in a particular language'. The first, what you're asking, is in most cases entirely subjective.
Since this is a fairly simple task, I would suggest going with what you know (unless you're doing this for learning purposes, which I doubt).
If you know any of the languages you suggested, go ahead and solve this in a matter of minutes. If you know none of them, now enters the subjective part, I would suggest learning Python, since it's so much more fun than the other 2 ;)
If the values are legal python values, you can take advantage of eval() since your data is a legal python data sucture. The following would work if values are integers, otherwise you might have to massage the print call a bit:
input = """[('name_1a',
'name_1b',
1),
('name_2a',
'name_2b',
2),
('name_XXXa',
'name_XXXb',
3)]"""
for e in eval(input):
print '%s,%s,%d' % e
P.S. using eval() is quite controversial since it will execute any valid python code that you pass into it, so take care.

Categories

Resources