Change default encoding only for printing - python

Is there a way, I can change the default encoding in python only for printing?
Can the behaviour of the print statement be changed in general?
I don't want to do it with sys.setdefaultencoding(), because this would change it for the whole script, but I don't know if every module I use supports unicode...
I know I could do it with print u'äöü'.encode('utf-8') but it would be awful to use it everytime...
Any suggestions?

While I don't think you can do it just for printing (using print explicitly), you probably can accomplish what you want using
sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
This changes the encoding for all "normal" program output. If you're not familiar with them, you may want to read this article on standard streams

Related

Formatting text that is meant to be replaced

This is a rather generic question, but I have a textfile that I want to edit using a script.
What are some ways to format text, so that it will visually stand out but still be recognized by my script?
It works fine when I use text_to_be_replaced, but it is hard to find when you have a large file.
Tried searching, and it seems that the common ways are:
%text_to_be_replaced%
<text_to_be_replaced>
$(text_to_be_replaced)
But maybe there is a commonly used/widely accepted way to format text for visibility?
The language the script is written in is python, if that matters... but I'm looking for a more-or-less generic soluting which will work 90% of the time.
I'm not aware of any generic standard here, but if it's meant to be replaced, you can use the new string formatting method as follows:
string = 'some text {add_text_here} some more text'
Then to replace it when you need to:
value = 'formatted'
string = string.format(add_text_here=value)
Now print it out:
>>> string
'some text formatted some more text'
In fact, this quite neat at the addition of curly {brackets} around the text that needs to be replaced also may make it stand out a little.
At first I thought that {{curly braces}} would be fine, but than I went with $ALLCAPS.
First of all, caps really stands out, while lowercase may be confused with the rest of the code.
And while it $REALLYSTANDSOUT, it shouldn't cause any problems, since it's just a "bookmark" in a text file, and will be replaced with the appropriate stuff determined by the script.

Writing unicode symbols to files (as opposed to unicode code)

I'm new to python and unicode is starting to give me headaches.
Currently I write to file like this:
my_string = "马/馬"
f = codecs.open(local_filepath, encoding='utf-8', mode='w+')
f.write(my_string)
f.close()
And when I open file with i.e. Gedit, I can see something like this:
\u9a6c/\u99ac\tm\u01ce
While I'd like to see exactly what I've written:
马/馬
I've tried a few different variations, like writing my_string.decode() or my_string.encode('utf-8') instead of just my_string, I know those two methods are the opposites but I was not sure which one I needed. Neither worked anyway.
If I manually write these symbols to text file, then with python read the file, re-write what I've just read back to the same file and save, symbols get turned to the code \u9a6c. Not sure if this is importat, figured I'd just mention it to help identify the problem.
Edit: the strings came from SQL Alchemy objects repr method, which turned out to be where the problem lied. I didn't mention it because it just didn't occur to me it can be related to the problem somehow. Thanks again for your help!
From the comments it is now clear you are using either the repr() function or calling the object.__repr__() method directly.
Don't do that. You are writing debugging information to your file:
>>> my_string = u"马/馬"
>>> print repr(my_string)
u'\u9a6c/\u99ac'
The value produced is meant to be pastable back into a Python session so you can re-produce the exact same value, and as such it is ASCII-safe (so it can be used in Python 2 source code without encoding issues).
From the repr() documentation:
For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.
Write the Unicode objects to your file directly instead, codecs.open() handles encoding to UTF-8 correctly if you do.

^H ^? in python

Some terminals will send ^? as backspace, some other terminals will send ^H.
Most of the terminals can be configured to change their behavior.
I do not want to deal with all the possible combinations but I would like to accept both ^? and ^H as a backspace from python.
doing this
os.system("stty erase '^?'")
I will accept the first option and with
os.system("stty erase '^H'")
I will accept the second one but the first will be no longer available.
I would like to use
raw_input("userinput>>")
to grab the input.
The only way I was able to figure out is implementing my own shell which works not on "raw based input" but on "char based input".
Any better (and quicker) idea?
The built-in function raw_input() (or input() in Python 3) will automatically use the readline library after importing it. This gives you a nice and full-feautured line editor, and it is probably your best bet on platforms where it is available, as long as you don't mind Readline having a contagious licence (GPL).
I don't know your question exactly. IMO, you need a method to read some line-based text(including some special character) from console to program.
No matter what method you use, if read this character have special mean in different console, you should confront a console(not only system-specific, but also console-specific) question, all text in console will be store in buffer first, and then show in screen, finally processed and send in to your program. Another way to surround this problem is to use a raw line-obtaining console environment.
You can add a special method(a decorator) to decorate the raw_input() or somewhat input method to process special word.
After solved that question
using this snippet can deal with input,:
def pre():
textline=raw_input()
# ^? should replace to the specific value.
textline.replace("^?","^H")
return textline
To be faster, maybe invoke some system function depend on OS is an idea. But in fact, IO in python is faster enough for common jobs.
To fix ^? on erase do stty erase ^H

f.write vs print >> f

There are at least two ways to write to a file in python:
f = open(file, 'w')
f.write(string)
or
f = open(file, 'w')
print >> f, string # in python 2
print(string, file=f) # in python 3
Is there a difference between the two? Or is any one more Pythonic? I'm trying to write a bunch of HTML to file so I need a bunch of write/print statements through my file(but I don't need a templating engine).
print does things file.write doesn't, allowing you to skip string formatting for some basic things.
It inserts spaces between arguments and appends the line terminator.
print "a", "b" # prints something like "a b\n"
It calls the __str__ or __repr__ special methods of an object to convert it to a string.
print 1 # prints something like "1\n"
You would have to manually do these things if you used file.write instead of print.
I disagree somewhat with several of the opinions expressed here, that print >> f is redundant and should be avoided in favour of f.write.
print and file.write are quite different operations. file.write just directly writes a string to a file. print is more like "render values to stdout as text". Naturally, the result of rendering a string as text is just the string, so print >> f, my_string and f.write(my_string) are nearly interchangeable (except for the addition of a newline). But your choice between file.write and print should normally be based on what you're doing; are you writing a string to a file, or are you rendering values to a file?
Sure, print is not strictly necessary, in that you can implement it with file.write. But then file.write is not strictly necessary, because you can implement it with the operations in os for dealing with file descriptors. Really they're operations on different levels, and you should use whichever is more most appropriate for your use (normally the level other nearby code is working on, or the highest level that doesn't get in your way).
I do feel that the print >> f syntax is fairly horrible, and is a really good example of why print should have been a function all along. This is much improved in Python 3. But even if you're writing Python 2 code that you're planning to port to Python 3, it is much easier to convert print >> f, thing1, thing2, thing3, ... to print(thing1, thing2, thing3, file=f) than it is to convert the circumlocution where you roll your own code to do the equivalent of print's rendering and then call f.write(text). I'm pretty sure the semi-automatic converter from Python 2 to Python 3 will even do the conversion for you, which it couldn't possibly do if you avoid the print >> f form.
Bottom line: use print to render values to stdout (or to a file). Use f.write to write text to a file.
Agree with #agf
I preferred print(..., file=f) because of its flexibility.
with open('test.txt', 'w') as f:
print('str', file=f)
print(1, 20, file=f)
It is also easy to convert existing print command.
write accepts only string.
You should not do either of those things. The most Pythonic thing to do is use the Python 3 print function (as opposed to the Python 2 print statement):
f = open(file, 'w')
print(string, file=f)
Of course the ideal way to do this is to just use Python 3. But if you're stuck using Python 2 you can turn it on using a future statement at the top of the file:
from __future__ import print_function
Note that this changes print in other ways, most obviously in that you need to add brackets around its arguments. But the changes are all improvments, which is the whole reason for the change in Python 3. While you're at it, consider using all the future statements to get as many backported improvements from Python 3 as possible.
The documentation on print might help explain this: print statement (Python 2.7 documentation).
print, by default, prints to standard output, which in fact is a "file-like" object (sys.stdout). The standard output itself has a write() method. Using print >> f seems to be an unnecessary abstraction.
Also, it seems too verbose to me. f.write() is fine.
As a bottonline: do use file.write when writing to files.
The ">>" idiom for printing was borrowed from C++ in Python's early days, and is rather unpythonic - so much that it no longer exists in Python 3.x - where one can use the print, now a function instead of a statement, to write to a file - but with no special syntax for that.
As #agf points in his answer, using "print" to write to a file does more things than simply calling write - it automatically calls str(obj) to get a string representation of the object, whereas .write require that a (byte) string be passed as parameter - in Python world "explicit is better than implicit", so one more motive for one to go with file.write instead.
This is the preferred way, using context handlers:
with open(file, 'w') as f:
f.write(string)
On python 2 I prefer file.write because the >> syntax is deprecated. For python 3 you might prefer to use the print function instead, which you should note does some extra things (for example automatically convert numbers to strings for you, etc).

In Python what's the best way to emulate Perl's __END__?

Am I correct in thinking that that Python doesn't have a direct equivalent for Perl's __END__?
print "Perl...\n";
__END__
End of code. I can put anything I want here.
One thought that occurred to me was to use a triple-quoted string. Is there a better way to achieve this in Python?
print "Python..."
"""
End of code. I can put anything I want here.
"""
The __END__ block in perl dates from a time when programmers had to work with data from the outside world and liked to keep examples of it in the program itself.
Hard to imagine I know.
It was useful for example if you had a moving target like a hardware log file with mutating messages due to firmware updates where you wanted to compare old and new versions of the line or keep notes not strictly related to the programs operations ("Code seems slow on day x of month every month") or as mentioned above a reference set of data to run the program against. Telcos are an example of an industry where this was a frequent requirement.
Lastly Python's cult like restrictiveness seems to have a real and tiresome effect on the mindset of its advocates, if your only response to a question is "Why would you want to that when you could do X?" when X is not as useful please keep quiet++.
The triple-quote form you suggested will still create a python string, whereas Perl's parser simply ignores anything after __END__. You can't write:
"""
I can put anything in here...
Anything!
"""
import os
os.system("rm -rf /")
Comments are more suitable in my opinion.
#__END__
#Whatever I write here will be ignored
#Woohoo !
What you're asking for does not exist.
Proof: http://www.mail-archive.com/python-list#python.org/msg156396.html
A simple solution is to escape any " as \" and do a normal multi line string -- see official docs: http://docs.python.org/tutorial/introduction.html#strings
( Also, atexit doesn't work: http://www.mail-archive.com/python-list#python.org/msg156364.html )
Hm, what about sys.exit(0) ? (assuming you do import sys above it, of course)
As to why it would useful, sometimes I sit down to do a substantial rewrite of something and want to mark my "good up to this point" place.
By using sys.exit(0) in a temporary manner, I know nothing below that point will get executed, therefore if there's a problem (e.g., server error) I know it had to be above that point.
I like it slightly better than commenting out the rest of the file, just because there are more chances to make a mistake and uncomment something (stray key press at beginning of line), and also because it seems better to insert 1 line (which will later be removed), than to modify X-many lines which will then have to be un-modified later.
But yeah, this is splitting hairs; commenting works great too... assuming your editor supports easily commenting out a region, of course; if not, sys.exit(0) all the way!
I use __END__ all the time for multiples of the reasons given. I've been doing it for so long now that I put it (usually preceded by an exit('0');), along with BEGIN {} / END{} routines, in by force-of-habit. It is a shame that Python doesn't have an equivalent, but I just comment-out the lines at the bottom: extraneous, but that's about what you get with one way to rule them all languages.
Python does not have a direct equivalent to this.
Why do you want it? It doesn't sound like a really great thing to have when there are more consistent ways like putting the text at the end as comments (that's how we include arbitrary text in Python source files. Triple quoted strings are for making multi-line strings, not for non-code-related text.)
Your editor should be able to make using many lines of comments easy for you.

Categories

Resources