Python adds a new line at the end of opened file - python

While writing some tests, i got stuck on some weird behavior. I finally narrow the problem to file opening. For instance, this one, my.dat:
one line
two lines and no final line break
I then ran that python code:
with open('my.dat') as fd:
assert fd.read()[-1] == '\n'
With both python 3 and 2, this code does not raise any AssertError.
My question is: why forcing the presence of line jump at the end of files ?

It works here. Are you 100% sure that there is not actually a newline at the end of your file? Because many text editors (atom, notepad++) automatically add newlines at the end of a file.
>>> open('a.txt', 'w').write('blabla')
6
>>> open('a.txt', 'r').read()
'blabla'
>>> assert open('a.txt', 'r').read()[-1] == '\n'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError

Related

Reading gzipped text file line-by-line for processing in python 3.2.6

I'm a complete newbie when it comes to python, but I've been tasked with trying to get a piece of code running on a machine which has a different version of python (3.2.6) than that which the code was originally built for.
I've come across an issue with reading in a gzipped-text file line-by-line (and processing it depending on the first character). The code (which obviously is written in python > 3.2.6) is
for line in gzip.open(input[0], 'rt'):
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)
(for those who know, this strips gzipped FASTA genome files into headers (with ">" at start) and sequences, and processes the lines into two different files depending on this)
I have found https://bugs.python.org/issue13989, which states that mode 'rt' cannot be used for gzip.open in python-3.2 and to use something along the lines of:
import io
with io.TextIOWrapper(gzip.open(input[0], "r")) as fin:
for line in fin:
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)
but the above code does not work:
UnsupportedOperation in line <4> of /path/to/python_file.py:
read1
How can I rewrite this routine to give out exactly what I want - reading the gzip file line-by-line into the variable "line" and processing based on the first character?
EDIT: traceback from the first version of this routine is (python 3.2.6):
Mode rt not supported
File "/path/to/python_file.py", line 79, in __process_genome_sequences
File "/opt/python-3.2.6/lib/python3.2/gzip.py", line 46, in open
File "/opt/python-3.2.6/lib/python3.2/gzip.py", line 157, in __init__
Traceback from the second version is:
UnsupportedOperation in line 81 of /path/to/python_file.py:
read1
File "/path/to/python_file.py", line 81, in __process_genome_sequences
with no further traceback (the extra two lines in the line count are the import io and with io.TextIOWrapper(gzip.open(input[0], "r")) as fin: lines
I have actually appeared to have solved the problem.
In the end I had to use shell("gunzip {input[0]}") to ensure that the gunzipped file could be read in in text mode, and then read in the resulting file using
for line in open(' *< resulting file >* ','r'):
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)

Error: ValueError: invalid literal for int() with base 10: ''

I have the value 1028 in the file build_ver.txt ,getting the below error while running the following script,script is trying to increment the count by 1 and write the value back to the file..please suggest how to overcome this?
with open(r'\\Network\Build_ver\build_ver.txt','w+') as f:
value = int(f.read())
f.seek(0)
f.write(str(value + 1))
Error:-
Traceback (most recent call last):
File "build_ver.py", line 2, in <module>
value = int(f.read())
ValueError: invalid literal for int() with base 10: ''
This is what opening a file in w+ mode does:
w+
Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is positioned at the beginning of the file.
Emphasis mine. Your file is empty, read() will give you an empty string.
Perhaps you want to open in r+ mode?
You can also use fileinput to modify the file "in-place":
import fileinput
for line in fileinput.input('\\Network\Build_ver\build_ver.txt', inplace=True):
print str(int(line) + 1)
Everything printed inside the loop is written back to the file.

Read filenames from a textfile in python (double backslash issue)

I am trying to read a list of files from a text file. I am using the following code to do that:
filelist = input("Please Enter the filelist: ")
flist = open (os.path.normpath(filelist),"r")
fname = []
for curline in flist:
# check if its a coment - do comment parsing in this if block
if curline.startswith('#'):
continue
fname.append(os.path.normpath(curline));
flist.close() #close the list file
# read the slave files 100MB at a time to generate stokes vectors
tmp = fname[0].rstrip()
t = np.fromfile(tmp,dtype='float',count=100*1000)
This works perfectly fine and I get the following array:
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\q_HH_slv3_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\q_VV_slv3_08Oct2012.bin'
The problem is that the '\' charecter is escaped and there is a trailing '\n' in the strings. I used the str.rstrip() to get rid of the '\n' - this works, but leaves the problem of the two back slashes.
I have used the following approaches to try getting rid of these:
Used the codecs.unicode_escape_decode() but I get this error:
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 56-57: malformed \N character escape. Clearly this is not the right approach because I just want to decode the backslashed, not the rest of the string.
This does not work either: tmp = fname[0].rstrip().replace(r'\\','\\');
Is there no way to make readline() read a raw string?
UPDATE:
Basically I have a text file with 4 file names I would like to open and read data from in python. The text file contains:
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\i_HH_mst_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\i_HH_mst_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\q_HH_slv3_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\q_VV_slv3_08Oct2012.bin
I would like to open each file one by one and read 100MBs of data from them.
When I use this command:np.fromfile(flist[0],dtype='float',count=100) I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Shaunak\\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin'
Update
Full Traceback:
Please Enter the filelist: H:/Shaunak/TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri/NEST_oregistration/Glacier_coreg_Cnv/filelist.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "G:\WinPython-32bit-3.3.2.3\python-3.3.2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 581, in runfile
execfile(filename, namespace)
File "G:\WinPython-32bit-3.3.2.3\python-3.3.2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 41, in execfile
exec(compile(open(filename).read(), filename, 'exec'), namespace)
File "H:/Shaunak/Programs/Arnab_glacier_vel/Stokes_generation_2.py", line 28, in <module>
t = np.fromfile(tmp,dtype='float',count=100*1000)
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Shaunak\\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin'
>>>
As #volcano stated, double slash is only an internal representation. If you print it, they're gone. The same if you write it to files, there will only be one '\'.
>>> string_with_double_backslash = "Here is a double backslash: \\"
>>> print(string_with_double_backslash)
Here is a double backslash: \
try this:
a_escaped = 'attachment; filename="Nuovo Cinema Paradiso 1988 Director\\\'s Cut"'
a_unescaped = codecs.getdecoder("unicode_escape")(a)[0]
yielding:
'attachment; filename="Nuovo Cinema Paradiso 1988 Director\'s Cut"'

Readlines() vs. Read() - Can I convert data back and forth between the two functions?

I have a module file containing the following functions:
def replace(filename):
match = re.sub(r'[^\s^\w]risk', 'risk', filename)
return match
def count_words(newstring):
from collections import defaultdict
word_dict=defaultdict(int)
for line in newstring:
words=line.lower().split()
for word in words:
word_dict[word]+=1
for word in word_dict:
if'risk'==word:
return word, word_dict[word]
when I do this in IDLE:
>>> mylist = open('C:\\Users\\ahn_133\\Desktop\\Python Project\\test10.txt').read()
>>> newstrings=replace(mylist)
>>> newone=count_words(newstrings)
test10.txt just contains words for testing like:
#
risk risky riskier risk. risk?
#
I get the following error:
Traceback (most recent call last):
File "<pyshell#134>", line 1, in <module>
newPH = replace(newPassage)
File "C:\Users\ahn_133\Desktop\Python Project\text_modules.py", line 56, in replace
match = re.sub(r'[^\s^\w]risk', 'risk', filename)
File "C:\Python27\lib\re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
Is there anyway to run both functions without saving newstrings into a file, opening it using readlines(), and then running count_words function?
To run a module, you just do python modulename.py or python.exe modulename.py - or just double click the icon.
But i guess your problem really isn't what your question title states, so you really should learn how to debug python

where is the call to encode the string or force the string to need to be encoded in this file?

I know this may seem rude or mean or unpolite, but I need some help to try to figure out why I cant call window.loadPvmFile("f:\games#DD.ATC3.Root\common\models\a300\amu\dummy.pvm") exactly like that as a string. Instead of doing that, it gives me a traceback error:
Traceback (most recent call last):
File "F:\Python Apps\pvmViewer_v1_1.py", line 415, in <module>
window.loadPvmFile("f:\games\#DD.ATC3.Root\common\models\a300\amu\dummy.pvm")
File "F:\Python Apps\pvmViewer_v1_1.py", line 392, in loadPvmFile
file1 = open(path, "rb")
IOError: [Errno 22] invalid mode ('rb') or filename:
'f:\\games\\#DD.ATC3.Root\\common\\models\x07300\x07mu\\dummy.pvm'
Also notice, that in the traceback error, the file path is different. When I try a path that has no letters in it except for the drive letter and filename, it throws this error:
Traceback (most recent call last):
File "F:\Python Apps\pvmViewer_v1_1.py", line 416, in <module>
loadPvmFile('f:\0\0\dummy.pvm')
File "F:\Python Apps\pvmViewer_v1_1.py", line 393, in loadPvmFile
file1 = open(path, "r")
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
I have searched for the place that the encode function is called or where the argument is encoded and cant find it. Flat out, I am out of ideas, frustrated and I have nowhere else to go. The source code can be found here: PVM VIEWER
Also note that you will not be able to run this code and load a pvm file and that I am using portable python 2.7.3! Thanks for everyone's time and effort!
\a and \0 are escape sequences. Use r'' (or R'') around the string to mark it as a raw string.
window.loadPvmFile(r"f:\games#DD.ATC3.Root\common\models\a300\amu\dummy.pvm")

Categories

Resources