Python failing to read lines properly - python

I'm supposed to open a file, read it line per line and display the lines out.
Here's the code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import re
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
csv_read_line = open(in_path, "rb").read().split("\n")
line_number = 0
for line in csv_read_line:
line_number+=1
print str(line_number) + line
Here's the contents of the input file:
12345^67890^abcedefg
random^test^subject
this^sucks^crap
And here's the result:
this^sucks^crapjectfg
Some weird combo of all three. In addition to this, the result of line_number is missing. Printing out the result of len(csv_read_line) outputs 1, for some reason, no matter how many is in the input file. Changing the split type from \n to ^ gives the expected output, though, so I'm assuming the problem is probably with the input file.
I'm using a Mac, and did both the python code and the input file (on Sublime Text) on the Mac itself.
Am I missing something?

You seem to be splitting on "\n" which isn't necessary, and could be incorrect depending on the line terminators used in the input file. Python includes functionality to iterate over the lines of a file one at a time. The advantages are that it will worry about processing line terminators in a portable way, as well as not requiring the entire file to be held in memory at once.
Further, note that you are opening the file in binary mode (the b character in your mode string) when you actually intend to read the file as text. This can cause problems similar to the one you are experiencing.
Also, you do not close the file when you are done with it. In this case that isn't a problem, but you should get in the habit of using with blocks when possible to make sure the file gets closed at the earliest possible time.
Try this:
with open(in_path, "r") as f:
line_number = 0
for line in f:
line_number += 1
print str(line_number) + line.rstrip('\r\n')

So your example just works for me.
But then, i just copied your text into a text editor on linux, and did it that way, so any carriage returns will have been wiped out.
Try this code though:
import os
in_path = "input.txt"
with open(in_path, "rb") as inputFile:
for lineNumber, line in enumerate(inputFile):
print lineNumber, line.strip()
It's a little cleaner, and the for line in file style deals with line breaks for you in a system independent way - Python's open has universal newline support.

I'd try the following Pythonic code:
#!/usr/bin/env python
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
with open(in_path, 'rb') as f:
for i, line in enumerate(f):
print(str(i) + line)

There are several improvements that can be made here to make it more idiomatic python.
import csv
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
#Lets open the file and make sure that it closes when we unindent
with open(in_path,"rb") as input_file:
#Create a csv reader object that will parse the input for us
reader = csv.reader(input_file,delimiter="^")
#Enumerate over the rows (these will be lists of strings) and keep track of
#of the line number using python's built in enumerate function
for line_num, row in enumerate(reader):
#You can process whatever you would like here. But for now we will just
#print out what you were originally printing
print str(line_num) + "^".join(row)

Related

python loop won't iterate on second pass

When I run the following in the Python IDLE Shell:
f = open(r"H:\Test\test.csv", "rb")
for line in f:
print line
#this works fine
however, when I run the following for a second time:
for line in f:
print line
#this does nothing
This does not work because you've already seeked to the end of the file the first time. You need to rewind (using .seek(0)) or re-open your file.
Some other pointers:
Python has a very good csv module. Do not attempt to implement CSV parsing yourself unless doing so as an educational exercise.
You probably want to open your file in 'rU' mode, not 'rb'. 'rU' is universal newline mode, which will deal with source files coming from platforms with different line endings for you.
Use with when working with file objects, since it will cleanup the handles for you even in the case of errors. Ex:
.
with open(r"H:\Test\test.csv", "rU") as f:
for line in f:
...
You can read the data from the file in a variable, and then you can iterate over this data any no. of times you want to in your script. This is better than doing seek back and forth.
f = open(r"H:\Test\test.csv", "rb")
data = f.readlines()
for line in data:
print line
for line in data:
print line
Output:
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
Because you've gone all the way through the CSV file, and the iterator is exhausted. You'll need to re-open it before the second loop.

Why can't I read file after my for loops python using ubuntu

For some reason after my for loops I am not able to read the ouput text file.
For example :
for line in a:
name = (x)
f = open('name','w')
for line in b:
get = (something)
f.write(get)
for line in c:
get2 = (something2)
f.write(get2)
(the below works if the above is commented out only)
f1 = open(name, 'r')
for line in f1:
print line
If I comment out the loops, I am able to read the file and print the contents.
I am very new to coding and guessing this is something obvious that I am missing.However I can't seem to figure it out. I have used google but the more I read the more I feel I am missing something. Any advice is appreciated.
#bernie is right in his comment above. The problem is that when you do open(..., 'w'), the file is rewritten to be blank, but Python/the OS doesn't actually write out the things you write to disk until its buffer is filled up or you call close(). (This delay helps speed things up, because writing to disk is slow.) You can also call flush() to force this without closing the file.
The with statement bernie referred to would look like this:
with open('name', 'w') as f:
for line in b:
b.write(...)
for line in c:
b.write(...)
# f is closed now that we're leaving the with block
# and any writes are actually written out to the file
with open('name', 'r') as f:
for line in f:
print line
If you're using Python 2.5 rather than 2.6 or 2.7, you'll have to do from __future__ import with_statement at the top of your file.

Changing contents of a file - Python

So I have a program which runs. This is part of the code:
FileName = 'Numberdata.dat'
NumberFile = open(FileName, 'r')
for Line in NumberFile:
if Line == '4':
print('1')
else:
print('9')
NumberFile.close()
A pretty pointless thing to do, yes, but I'm just doing it to enhance my understanding. However, this code doesn't work. The file remains as it is and the 4's are not replaced by 1's and everything else isn't replaced by 9's, they merely stay the same. Where am I going wrong?
Numberdata.dat is "444666444666444888111000444"
It is now:
FileName = 'Binarydata.dat'
BinaryFile = open(FileName, 'w')
for character in BinaryFile:
if charcter == '0':
NumberFile.write('')
else:
NumberFile.write('#')
BinaryFile.close()
You need to build up a string and write it to the file.
FileName = 'Numberdata.dat'
NumberFileHandle = open(FileName, 'r')
newFileString = ""
for Line in NumberFileHandle:
for char in line: # this will work for any number of lines.
if char == '4':
newFileString += "1"
elif char == '\n':
newFileString += char
else:
newFileString += "9"
NumberFileHandle.close()
NumberFileHandle = open(FileName, 'w')
NumberFileHandle.write(newFileString)
NumberFileHandle.close()
First, Line will never equal 4 because each line read from the file includes the newline character at the end. Try if Line.strip() == '4'. This will remove all white space from the beginning and end of the line.
Edit: I just saw your edit... naturally, if you have all your numbers on one line, the line will never equal 4. You probably want to read the file a character at a time, not a line at a time.
Second, you're not writing to any file, so naturally the file won't be getting changed. You will run into difficulty changing a file as you read it (since you have to figure out how to back up to the same place you just read from), so the usual practice is to read from one file and write to a different one.
Because you need to write to the file as well.
with open(FileName, 'w') as f:
f.write(...)
Right now you are just reading and manipulating the data, but you're not writing them back.
At the end you'll need to reopen your file in write mode and write to it.
If you're looking for references, take a look at theopen() documentation and at the Reading and Writing Files section of the Python Tutorial.
Edit: You shouldn't read and write at the same time from the same file. You could either, write to a temp file and at the end call shutil.move(), or load and manipulate your data and then re-open your original file in write mode and write them back.
You are not sending any output to the data, you are simply printing 1 and 9 to stdout which is usually the terminal or interpreter.
If you want to write to the file you have to use open again with w.
eg.
out = open(FileName, 'w')
you can also use
print >>out, '1'
Then you can call out.write('1') for example.
Also it is a better idea to read the file first if you want to overwrite and write after.
According to your comment:
Numberdata is just a load of numbers all one line. Maybe that's where I'm going wrong? It is "444666444666444888111000444"
I can tell you that the for cycle, iterate over lines and not over chars. There is a logic error.
Moreover, you have to write the file, as Rik Poggi said (just rember to open it in write mode)
A few things:
The r flag to open indicates read-only mode. This obviously won't let you write to the file.
print() outputs things to the screen. What you really want to do is output to the file. Have you read the Python File I/O tutorial?
for line in file_handle: loops through files one line at a time. Thus, if line == '4' will only be true if the line consists of a single character, 4, all on its own.
If you want to loop over characters in a string, then do something like for character in line:.
Modifying bits of a file "in place" is a bit harder than you think.
This is because if you insert data into the middle of a file, the rest of the data has to shuffle over to make room - this is really slow because everything after your insertion has to be rewritten.
In theory, a one-byte for one-byte replacement can be done fast, but in general people don't want to replace byte-for-byte, so this is an advanced feature. (See seek().) The usual approach is to just write out a whole new file.
Because print doesn't write to your file.
You have to open the file and read it, modify the string you obtain creating a new string, open again the file and write it again.
FileName = 'Numberdata.dat'
NumberFile = open(FileName, 'r')
data = NumberFile.read()
NumberFile.close()
dl = data.split('\n')
for i in range(len(dl)):
if dl[i] =='4':
dl[i] = '1'
else:
dl[i] = '9'
NumberFile = open(FileName, 'w')
NumberFile.write('\n'.join(dl))
NumberFile.close()
Try in this way. There are for sure different methods but this seems to be the most "linear" to me =)

Python Overwriting files after parsing

I'm new to Python, and I need to do a parsing exercise. I got a file, and I need to parse it (just the headers), but after the process, i need to keep the file the same format, the same extension, and at the same place in disk, but only with the differences of new headers..
I tried this code...
for line in open ('/home/name/db/str/dir/numbers/str.phy'):
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
print linepars
..and it does the job, but I don't know how to "overwrite" the file with the new parsing.
The easiest way, but not the most efficient (by far, and especially for long files) would be to rewrite the complete file.
You could do this by opening a second file handle and rewriting each line, except in the case of the header, you'd write the parsed header. For example,
fr = open('/home/name/db/str/dir/numbers/str.phy')
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
fr.close()
EDIT: Note that this does not use readlines(), so its more memory efficient. It also does not store every output line, but only one at a time, writing it to file immediately.
Just as a cool trick, you could use the with statement on the input file to avoid having to close it (Python 2.5+):
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
with open('/home/name/db/str/dir/numbers/str.phy') as fr:
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
P.S. Welcome :-)
As others are saying here, you want to open a file and use that file object's .write() method.
The best approach would be to open an additional file for writing:
import os
current_cfg = open(...)
parsed_cfg = open(..., 'w')
for line in current_cfg:
new_line = parse(line)
print new_line
parsed.cfg.write(new_line + '\n')
current_cfg.close()
parsed_cfg.close()
os.rename(....) # Rename old file to backup name
os.rename(....) # Rename new file into place
Additionally I'd suggest looking at the tempfile module and use one of its methods for either naming your new file or opening/creating it. Personally I'd favor putting the new file in the same directory as the existing file to ensure that os.rename will work atomically (the configuration file named will be guaranteed to either point at the old file or the new file; in no case would it point at a partially written/copied file).
The following code DOES the job.
I mean it DOES overwrite the file ON ONESELF; that's what the OP asked for. That's possible because the transformations are only removing characters, so the file's pointer fo that writes is always BEHIND the file's pointer fi that reads.
import re
regx = re.compile('\AENS([A-Z]+)0+([0-9]{6})')
with open('bomo.phy','rb+') as fi, open('bomo.phy','rb+') as fo:
fo.writelines(regx.sub('\\1\\2',line) for line in fi)
I think that the writing isn't performed by the operating system one line at a time but through a buffer. So several lines are read before a pool of transformed lines are written. That's what I think.
newlines = []
for line in open ('/home/name/db/str/dir/numbers/str.phy').readlines():
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
newlines.append( linepars )
open ('/home/name/db/str/dir/numbers/str.phy', 'w').write('\n'.join(newlines))
(sidenote: Of course if you are working with large files, you should be aware that the level of optimization required may depend on your situation. Python by nature is very non-lazily-evaluated. The following solution is not a good choice if you are parsing large files, such as database dumps or logs, but a few tweaks such as nesting the with clauses and using lazy generators or a line-by-line algorithm can allow O(1)-memory behavior.)
targetFile = '/home/name/db/str/dir/numbers/str.phy'
def replaceIfHeader(line):
if line.startswith('ENS'):
return re.sub('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
else:
return line
with open(targetFile, 'r') as f:
newText = '\n'.join(replaceIfHeader(line) for line in f)
try:
# make backup of targetFile
with open(targetFile, 'w') as f:
f.write(newText)
except:
# error encountered, do something to inform user where backup of targetFile is
edit: thanks to Jeff for suggestion

Replace a word in a file

I am new to Python programming...
I have a .txt file....... It looks like..
0,Salary,14000
0,Bonus,5000
0,gift,6000
I want to to replace the first '0' value to '1' in each line. How can I do this? Any one can help me.... With sample code..
Thanks in advance.
Nimmyliji
I know that you're asking about Python, but forgive me for suggesting that perhaps a different tool is better for the job. :) It's a one-liner via sed:
sed 's/^0,/1,/' yourtextfile.txt > output.txt
This applies the regex /^0,/ (which matches any 0, that occurs at the beginning of a line) to each line and replaces the matched text with 1, instead. The output is directed into the file output.txt specified.
inFile = open("old.txt", "r")
outFile = open("new.txt", "w")
for line in inFile:
outFile.write(",".join(["1"] + (line.split(","))[1:]))
inFile.close()
outFile.close()
If you would like something more general, take a look to Python csv module. It contains utilities for processing comma-separated values (abbreviated as csv) in files. But it can work with arbitrary delimiter, not only comma. So as you sample is obviously a csv file, you can use it as follows:
import csv
reader = csv.reader(open("old.txt"))
writer = csv.writer(open("new.txt", "w"))
writer.writerows(["1"] + line[1:] for line in reader)
To overwrite original file with new one:
import os
os.remove("old.txt")
os.rename("new.txt", "old.txt")
I think that writing to new file and then renaming it is more fault-tolerant and less likely corrupt your data than direct overwriting of source file. Imagine, that your program raised an exception while source file was already read to memory and reopened for writing. So you would lose original data and your new data wouldn't be saved because of program crash. In my case, I only lose new data while preserving original.
o=open("output.txt","w")
for line in open("file"):
s=line.split(",")
s[0]="1"
o.write(','.join(s))
o.close()
Or you can use fileinput with in place edit
import fileinput
for line in fileinput.FileInput("file",inplace=1):
s=line.split(",")
s[0]="1"
print ','.join(s)
f = open(filepath,'r')
data = f.readlines()
f.close()
edited = []
for line in data:
edited.append( '1'+line[1:] )
f = open(filepath,'w')
f.writelines(edited)
f.flush()
f.close()
Or in Python 2.5+:
with open(filepath,'r') as f:
data = f.readlines()
with open(outfilepath, 'w') as f:
for line in data:
f.write( '1' + line[1:] )
This should do it. I wouldn't recommend it for a truly big file though ;-)
What is going on (ex 1):
1: Open the file in read mode
2,3: Read all the lines into a list (each line is a separate index) and close the file.
4,5,6: Iterate over the list constructing a new list where each line has the first character replaced by a 1. The line[1:] slices the string from index 1 onward. We concatenate the 1 with the truncated list.
7,8,9: Reopen the file in write mode, write the list to the file (overwrite), flush the buffer, and close the file handle.
In Ex. 2:
I use the with statement that lets the file handle closing itself, but do essentially the same thing.

Categories

Resources