It appears that Python's io.StringIO adds an extra newline at the end when I'm calling its getvalue method.
For the following code:
import io
s = io.StringIO()
s.write("1\n2\n3\n4\n5\n")
res = s.getvalue()
s.close()
print(res)
What I'm getting is this, with an extra newline in the end:
1
2
3
4
5
I checked the output with a hex editor, and I'm sure there's an extra newline.
The document says that:
The newline argument works like that of TextIOWrapper. The default is to consider only \n characters as ends of lines and to do no newline translation. If newline is set to None, newlines are written as \n on all platforms, but universal newline decoding is still performed when reading.
And I don't recall the write method append newlines per call.
So why is it adding newlines? I'm writing a script so I would like to make sure that it's behavior is consistent.
StringIO isn't doing this, it's print.
print prints all its arguments, seperated by sep (by default a space), and ending with end, by default a newline. You can suppress that by doing:
print(res, end="")
Related
What is the difference between print("\n") and print("\5")?
I tried below in a python shell.
Why does print("\5") output a new line:
>>> print("\n")
>>> print("\5")
>>>
But when I tried:
print("\4")
print("\6")
It's printing some binary data
Whenever you use print in python, it puts a newline at the end. The thing you should pay attention to is how many newlines are in the output.
"\5" is just a character (it's the control characters ENQ in ASCII; while it is technically non-printable, my terminal renders it as ♣); printing it outputs whatever your terminal decides to use to render it followed by a newline. print("") will output a newline. print("\n") by contrast will output two newlines.
If your terminal can't/won't render \5 (it is a non-printable character after all), print("\5") will be the same as print("").
I am reading files in a folder in a python. I want print the each file content separate by a single empty line.
So, after the for loop I am adding print("\n") which adding two empty lines of each file content. How can I resolve this problem?
print()
will print a single new line in Python 3 (no parens needed in Python 2).
The docs for print() describe this behavior (notice the end parameter), and this question discusses disabling it.
Because print automatically adds a new line, you don't have to do that manually, just call it with an empty string:
print("")
From help(print) (I think you're using Python 3):
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
So print()'s default end argument is \n. That means you don't need add a \n like print('\n'). This will print two newlines, just use print().
By the way, if you're using Python 2, use print.
print has a \n embedded in it....so you don't need to add \n by yourself
Or, if you want to be really explicit, use
sys.stdout.write('\n')
Write method doesn't append line break by default. It's probably a bit more intuitive than an empty print.
I have a json file with several keys. I want to use one of the keys and write that string to a file. The string originally is in unicode. So, I do, s.unicode('utf-8')
Now, there is another key in that json which I write to another file (this is a Machine learning task, am writing original string in one, features in another). The problem is that at the end, the file with the unicode string turns out to have more number of lines (when counted by using "wc -l") and this misguides my tool and it crashes saying sizes not same.
Code for reference:
for line in input_file:
j = json.loads(line)
text = j['text']
label = j[t]
output_file.write(str(label) + '\t' + text.encode('utf-8') + '\n')
norm_file.write(j['normalized'].encode('utf-8') + '\n')
The difference when using "wc -l"
16862965
This is the number of lines I expect and what I get is
16878681
which is actually higher. So I write a script to see how many output labels are actually there
with open(sys.argv[1]) as input_file:
for line in input_file:
p = line.split('\t')
if p[0] not in ("good", "bad"):
print p
else:
c += 1
print c
And, lo and behold, I have 16862965 lines, which means some are wrong. I print them out and I get a bunch of empty new line chars ('\n'). So I guess my question is, "what am i missing when dealing with unicode like this?"
Should I have stripped all leading and trailing spaces (not that there are any in the string)
JSON strings can't contain literal newlines in them e.g.,
not_a_json_string = '"\n"' # in Python source
json.loads(not_a_json_string) # raises ValueError
but they can contain escaped newlines:
json_string = r'"\n"' # raw-string literal (== '"\\n"')
s = json.loads(json_string)
i.e., the original text (json_string) has no newlines in it (it has the backslash followed by n character -- two characters) but the parsed result does contain the newline: '\n' in s.
That is why the example:
for line in file:
d = json.loads(line)
print(d['key'])
may print more lines than the file contains.
It is unrelated to utf-8.
In general, there could also be an issue with non-native newlines e.g., b'\r\r\n\n', or an issue with Unicode newlines such as u'"\u2028
"' (U+2028 LINE SEPARATOR).
Do the same check you were doing on the files written but before you write them, to see how many values get flagged. And make sure those values don't have '\\n' in them. That may be skewing your count.
For better details, see J.F.'s answer below.
Unrelated-to-your-error notes:
(a) When JSON is loads()ed, str objects are automatically unicode already:
>>> a = '{"b":1}'
>>> json.loads(a)['b']
1
>>> json.loads(a).keys()
[u'b']
>>> type(json.loads(a).keys()[0])
<type 'unicode'>
So str(label) in the file write should be either just label or unicode(label). You shouldn't need to encode text and j['normalized'] when you write them to file. Instead, set the file encoding to 'utf-8' when you open it.
(b) Btw, use format() or join() in the write operations - if any of label, text or j['normalized'] is None, the + operator will give an error.
I've got a Python script that prints out a file to the shell:
print open(lPath).read()
If I pass in the path to a file with the following contents (no brackets, they're just here so newlines are visible):
> One
> Two
>
I get the following output:
> One
> Two
>
>
Where's that extra newline coming from? I'm running the script with bash on an Ubuntu system.
Use
print open(lPath).read(), # notice the comma at the end.
print adds a newline. If you end the print statement with a comma, it'll add a space instead.
You can use
import sys
sys.stdout.write(open(lPath).read())
If you don't need any of the special features of print.
If you switch to Python 3, or use from __future__ import print_function on Python 2.6+, you can use the end argument to stop the print function from adding a newline.
print(open(lPath).read(), end='')
Maybe you should write:
print open(lPath).read(),
(notice trailing comma at the end).
This will prevent print from placing a new-line at the end of its output.
In Python when I do
print "Line 1 is"
print "big"
The output I get is
Line 1 is
big
Where does the newline come from? And how do I type both statements in the same line using two print statements?
print adds a newline by default. To avoid this, use a trailing ,:
print "Line 1 is",
print "big"
The , will still yield a space. To avoid the space as well, either concatenate your strings and use a single print statement, or use sys.stdout.write() instead.
From the documentation:
A '\n' character is written at the
end, unless the print statement ends
with a comma. This is the only action
if the statement contains just the
keyword print.
If you need full control of the bytes written to the output, you might want to use sys.stdout
import sys
sys.stdout.write("Line 1 is ")
sys.stdout.write("big!\n")
When not outputing a newline (\n) you will need to explicitly call flush, for your data to not be buffered, like so:
sys.stdout.flush()
this is standard functionality, use print "foo",