Printing every line of the CSV file using readlines() - python

I am trying to print each line of a csv file with a count of the line being printed.
with open('Polly re-records.csv', 'r',encoding='ISO-8859-1') as file: #file1 path
ct=0
while True:
ct+=1
if file.readline():
print(file.readline(),ct)
else:
break #break when reaching empty line
for the above code i am getting the following output:
lg1_1,"Now lets play a game. In this game, you need to find the odd one out.",,,,,,,,,,,,,,,,,,,,,,,,
479
sc_2_1,Youve also learned the strong wordsigns and know how to use them as wordsigns. ,,,,,,,,,,,,,,,,,,,,,,,,
480
so instead of the ct starting from 1,in my output the first value is directly 479 which cant be possible unless the if statement is executed 478 times
what changes should i do or what is the logical flaw preventing the print statement from executing

import csv
with open("data.csv", 'r') as file:
csvreader = csv.reader(file)
header = next(csvreader)
for x in range(len(csvreader)):
print(csvreader[x], x)
Else you can also use other methods as enumerate

It would probably be easier to leverage some of baked in python methods like enumerate()
with open("Polly re-records.csv", "r") as file_in:
for row_number, row in enumerate(file_in, start=1):
print(row_number, row.strip("\n"))
With respect to what you might change and keep your code, the issue you are running into is you are calling readline() too often and discarding half the results.
with open('Polly re-records.csv', 'r',encoding='ISO-8859-1') as file: #file1 path
ct=0
while True:
ct+=1
row = file_in.readline().strip()
if row:
print(row, ct)
else:
break #break when reaching empty line

Python offers useful built-in functions for your use-case. For example enumerate, which yields a consecutive count for each item in an iterable.
with open('Polly re-records.csv', 'r', encoding='ISO-8859-1') as file:
for line_number, line in enumerate(file):
print(line, line_number)
As drdalle noted, it might also be a better idea to use the built-in csv module as it will also handle csv encodings (e.g. if you have multi-line cells containing \n wrapped or other escaped values.

Related

Python 3.4.3: Iterating over each line and each character in each line in a text file

I have to write a program that iterates over each line in a text file and then over each character in each line in order to count the number of entries in each line.
Here is a segment of the text file:
N00000031,B,,D,D,C,B,D,A,A,C,D,C,A,B,A,C,B,C,A,C,C,A,B,D,D,D,B,A,B,A,C,B,,,C,A,A,B,D,D
N00000032,B,A,D,D,C,B,D,A,C,C,D,,A,A,A,C,B,D,A,C,,A,B,D,D
N00000033,B,A,D,D,C,,D,A,C,B,D,B,A,B,C,C,C,D,A,C,A,,B,D,D
N00000034,B,,D,,C,B,A,A,C,C,D,B,A,,A,C,B,A,B,C,A,,B,D,D
The first and last lines are "unusable lines" because they contain too many entries (more or less than 25). I would like to count the amount of unusable lines in the file.
Here is my code:
for line in file:
answers=line.split(",")
i=0
for i in answers:
i+=1
unusable_line=0
for line in file:
if i!=26:
unusable_line+=1
print("Unusable lines in the file:", unusable_line)
I tried using this method as well:
alldata=file.read()
for line in file:
student=alldata.split("\n")
answer=student.split(",")
My problem is each variable I create doesn't exist when I try to run the program. I get a "students" is not defined error.
I know my coding is awful but I'm a beginner. Sorry!!! Thank you and any help at all is appreciated!!!
A simplified code for your method using list,count and if condition
Code:
unusable_line = 0
for line in file:
answers = line.strip().split(",")
if len(answers) < 26:
unusable_line += 1
print("Unusable lines in the file:", unusable_line)
Notes:
Initially I have created a variable to store count of unstable lines unusable_line.
Then I iterate over the lines of the file object.
Then I split the lines at , to create a list.
Then I check if the count of list is less then 26. If so I increment the unusable_line varaiable.
Finally I print it.
You could use something like this and wrap it into a function. You don't need to re-iterate the items in the line, str.split() returns a list[] that has your elements in it, you can count the number of its elements with len()
my_file = open('temp.txt', 'r')
lines_count = usable = ununsable = 0
for line in my_file:
lines_count+=1
if len(line.split(',')) == 26:
usable+=1
else:
ununsable+=1
my_file.close()
print("Processed %d lines, %d usable and %d ununsable" % (lines_count, usable, ununsable))
You can do it much shorter:
with open('my_fike.txt') as fobj:
unusable = sum(1 for line in fobj if len(line.split(',')) != 26)
The line with open('my_fike.txt') as fobj: opens the file for reading and closes it automatically after leaving the indented block. I use a generator expression to go through all lines and add up all that have a length different from 26.

Improving the speed of a python script

I have an input file with containing a list of strings.
I am iterating through every fourth line starting on line two.
From each of these lines I make a new string from the first and last 6 characters and put this in an output file only if that new string is unique.
The code I wrote to do this works, but I am working with very large deep sequencing files, and has been running for a day and has not made much progress. So I'm looking for any suggestions to make this much faster if possible. Thanks.
def method():
target = open(output_file, 'w')
with open(input_file, 'r') as f:
lineCharsList = []
for line in f:
#Make string from first and last 6 characters of a line
lineChars = line[0:6]+line[145:151]
if not (lineChars in lineCharsList):
lineCharsList.append(lineChars)
target.write(lineChars + '\n') #If string is unique, write to output file
for skip in range(3): #Used to step through four lines at a time
try:
check = line #Check for additional lines in file
next(f)
except StopIteration:
break
target.close()
Try defining lineCharsList as a set instead of a list:
lineCharsList = set()
...
lineCharsList.add(lineChars)
That'll improve the performance of the in operator. Also, if memory isn't a problem at all, you might want to accumulate all the output in a list and write it all at the end, instead of performing multiple write() operations.
You can use https://docs.python.org/2/library/itertools.html#itertools.islice:
import itertools
def method():
with open(input_file, 'r') as inf, open(output_file, 'w') as ouf:
seen = set()
for line in itertools.islice(inf, None, None, 4):
s = line[:6]+line[-6:]
if s not in seen:
seen.add(s)
ouf.write("{}\n".format(s))
Besides using set as Oscar suggested, you can also use islice to skip lines rather than use a for loop.
As stated in this post, islice preprocesses the iterator in C, so it should be much faster than using a plain vanilla python for loop.
Try replacing
lineChars = line[0:6]+line[145:151]
with
lineChars = ''.join([line[0:6], line[145:151]])
as it can be more efficient, depending on the circumstances.

Counting rows from a text file

I am trying to create a Python code that will count the number of rows in a text file. However, it will not recognize the for statement as it gives me errors on the indentation below it.
#*********************end of sec 1*****************************
item=open("back.txt","r")
i=0
countlist=[line.strip() for line in item]#seperate lines
i=1+i
print i
item.close()
print len(list(open("some_file.txt")))
no need for with ... or closing files... this wont keep a reference to the fh so it should garbage collect and destroy just fine
Try this,
with open("back.txt", "r") as backfile:
lines = backfile.readlines()
return len(lines)
with open("back.txt") as f:
print len(f.readlines())
try this:
for line in item:
i=i+1
with open('back.txt', 'r') as item:
nbr = sum(1 for i in item)
Generator expression that should not keep to much unneeded in memory.
The for is encapsulated in the brackets and the i=1+i is not within the scope of the for loop. Therefore, the i=1+1 should not be indented.
You don't actually have a for loop. You have a list comprehension. A list comprehension doesn't allow or require an indented block of statements to execute on each iteration; it just builds a list. If you only want the number of rows, you want this:
with open(filename) as f:
rowcount = sum(1 for line in f)
This closes the file when you're done, even on Python implementations other than CPython, and it doesn't store the whole file in memory, so it works on huge files. If you need an actual list of the rows, you want this:
with open(filename) as f:
rows = [line.strip() for line in f]
rowcount = len(rows)

Python I/O Index out of range, not an off-by-one error (I think)

I have this simple code which is really just to help me understand how Python I/O works:
inFile = open("inFile.txt",'r')
outFile = open("outFile.txt",'w')
lines = inFile.readlines()
first = True
for line in lines:
if first == True:
outFile.write(line) #always print the header
first = False
continue
nums = line.split()
outFile.write(nums[3] + "\n") #print the 4th column of each row
outFile.close()
My input file is something like this:
#header
34.2 3.42 64.56 54.43 3.45
4.53 65.6 5.743 34.52 56.4
4.53 90.8 53.45 134.5 4.58
5.76 53.9 89.43 54.33 3.45
The output prints out into the file just as it should but I also get the error:
outFile.write(nums[3] + "\n")
IndexError: list index out of range
I'm assuming this is because it has continued to read the next line although there is no longer any data?
Others have already answered your question. Here is a better way to "always print out the file header", avoiding testing for first at every iteration:
with open('inFile.txt', 'r') as inFile, open('outFile.txt', 'w') as outFile:
outFile.write(inFile.readline()) #always print the header
for line in inFile:
nums = line.split()
if len(nums) >= 4: #Checks to make sure a fourth column exists.
outFile.write(nums[3] + "\n") #print the 4th column of each row
A couple things are going on here:
with open('inFile.txt', 'r') as inFile, open('outFile.txt', 'w') as outFile:
The with expression is a convenient way to open files because it automatically closes the files even if an exception occurs and the with block exits early.
Note: In Python 2.6, you will need to use two with statements, as support for multiple contexts was not added until 2.7. e.g:
with open(somefile, 'r') as f:
with open(someotherfile, 'w') as g:
#code here.
outFile.write(inFile.readline()) #always print the header
The file object is an iterator that gets consumed. When readline() is called, the buffer position advances forwards and the first line is returned.
for line in inFile:
As mentioned before, the file object is an iterator, so you can use it directly in a for loop.
The error shows that in your source code you have the following line:
outFile.write(nums[6] + "\n")
Note that the 6 there is different from the 3 you show in your question. You may have two different versions of the file.
It fails because nums is the result of splitting a line and in your case it contains only 5 elements:
for line in lines:
# ...
# line is for example "34.2 3.42 64.56 54.43 3.45"
nums = line.split()
print len(nums)
You can't index past the end of a list.
You also may have an error in your code. You write the header, then split it and write one element from it. You probably want an if/else.
for line in lines:
if first == 1:
# do something with the header
else:
# do something with the other lines
Or you could just handle the header separately before you enter the loop.
The problem is that you are processing the "header line" just like the rest of the data. I.e., even though you identify the header line, you don't skip its processing. I.e., you don't avoid split()'ing it further down in the loop which causes the run-time error.
To fix your problem simply insert a continue as shown:
first = True
for line in lines:
if first == True:
outFile.write(line) #always print the header
first = False
continue ## skip the rest of the loop and start from the top
nums = line.split()
...
that will bypass the rest of the loop and all will work as it should.
The output file outFile.txt will contain:
#header
54.43
34.52
134.5
54.33
And the 2nd problem turned out having blank lines at the end of the input file (see discussion in comments below)
Notes: You could restructure your code, but if you are not interested in doing that, the simple fix above lets you keep all of your present code, and only requires the addition of the one line. As mentioned in other posts, it's worth looking into using with to manage your open files as it will also close them for you when you are done or an exception is encountered.

Elegant way to skip first line when using python fileinput module?

Is there an elegant way of skipping first line of file when using python fileinput module?
I have data file with nicely formated data but the first line is header. Using fileinput I would have to include check and discard line if the line does not seem to contain data.
The problem is that it would apply the same check for the rest of the file.
With read() you can open file, read first line then go to loop over the rest of the file. Is there similar trick with fileinput?
Is there an elegant way to skip processing of the first line?
Example code:
import fileinput
# how to skip first line elegantly?
for line in fileinput.input(["file.dat"]):
data = proces_line(line);
output(data)
lines = iter(fileinput.input(["file.dat"]))
next(lines) # extract and discard first line
for line in lines:
data = proces_line(line)
output(data)
or use the itertools.islice way if you prefer
import itertools
finput = fileinput.input(["file.dat"])
lines = itertools.islice(finput, 1, None) # cuts off first line
dataset = (process_line(line) for line in lines)
results = [output(data) for data in dataset]
Since everything used are generators and iterators, no intermediate list will be built.
The fileinput module contains a bunch of handy functions, one of which seems to do exactly what you're looking for:
for line in fileinput.input(["file.dat"]):
if not fileinput.isfirstline():
data = proces_line(line);
output(data)
fileinput module documentation
It's right in the docs: http://docs.python.org/library/fileinput.html#fileinput.isfirstline
One option is to use openhook:
The openhook, when given, must be a function that takes two arguments,
filename and mode, and returns an accordingly opened file-like object.
You cannot use inplace and openhook together.
One could create helper function skip_header and use it as openhook, something like:
import fileinput
files = ['file_1', 'file_2']
def skip_header(filename, mode):
f = open(filename, mode)
next(f)
return f
for line in fileinput.input(files=files, openhook=skip_header):
# do something
Do two loops where the first one calls break immediately.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
for line in f:
# add print() here if you only want to empty the line
break
for line in f:
process(line)
Lets say you want to remove or empty all of the first 5 lines.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
for line in f:
# add print() here if you only want to empty the first 5 lines
if f._filelineno == 5:
break
for line in f:
process(line)
But if you only want to get rid of the first line, just use next before the loop to remove the first line.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
next(f)
for line in f:
process(line)
with open(file) as j: #open file as j
for i in j.readlines()[1:]: #start reading j from second line.

Categories

Resources