Python: How to know the number of row of a text file - python

I want to know the number of row of a text file.
How can I do this?

if iterating over a file:
for line_no, line in enumerate(f, start=1):
or if counting the lines in a file (f):
count = sum( 1 for line in f )

f = open('textfile.txt', 'rb')
len(f.readlines())
readlines() method returns a list where each index holds a line of textfile.txt.

f = open("file.text")
count = sum(1 for line in f)
which is equivalent to
count = 0
for line in f:
count+=1

As #Dan D. said, you can use enumerate() on the open file. The default is to start counting with 0, so if you want to start the line count at 1 (or something else), use the start argument when calling enumerate(). Also, it's considered poor practice to use "file" as a variable name, as there is a function by that name. Thus, try something like:
for line_no, line in enumerate(open(file_name), start=1):
print line_no, line

Related

Changing the contents of a text file and making a new file with same format

I have a big text file with a lot of parts. Every part has 4 lines and next part starts immediately after the last part.
The first line of each part starts with #, the 2nd line is a sequence of characters, the 3rd line is a + and the 4th line is again a sequence of characters.
Small example:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACGCTTATCGATAAAATTTTGAATTTTGTAACTTGTTTTTGTAATTCTTTAGTTTGTATGTCTGTTGCTATTATGTCTACTATTCTTTCCCCTGCACTGTACCCCCCAATCCCCCCTTTTCTTTTAAAAGTTAACCGATACCGTCGAGATCCGTTCACTAATCGAACGGATCTGTCTCTGTCTCTCTC
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5AEG1EF511F1?GFH3#BFADGD55F?#GFHFGGFCGG/GHGHHHHHHHDBG4E?FB?BGHHHHHHHHHHHHHHHHHFHHHHHHHHHGHGHGHHHHHFHHHHHGGGGHHHHGGGGHHHHHHHGHGHHHHHHFGHCFGGGHGGGGGGGGFGGEGBFGGGGGGGGGFGGGGFFB9/BFFFFFFFFFF/
I want to change the 2nd and the 4th line of each part and make a new file with similar structure (4 lines for each part). In fact I want to keep the 1st 65 characters (in lines 2 and 4) and remove the rest of characters. The expected output for the small example would look like this:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACG
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5A
I wrote the following code:
infile = open("file.fastq", "r")
new_line=[]
for line_number in len(infile.readlines()):
if line_number ==2 or line_number ==4:
new_line.append(infile[line_number])
with open('out_file.fastq', 'w') as f:
for item in new_line:
f.write("%s\n" % item)
but it does not return what I want. How to fix it to get the expected output?
This code will achieve what you want -
from itertools import islice
with open('bio.txt', 'r') as infile:
while True:
lines_gen = list(islice(infile, 4))
if not lines_gen:
break
a,b,c,d = lines_gen
b = b[0:65]+'\n'
d = d[0:65]+'\n'
with open('mod_bio.txt', 'a+') as f:
f.write(a+b+c+d)
How it works?
We first make a generator that gives 4 lines at a time as you mention.
Then we open the lines into individual lines a,b,c,d and perform string slicing. Eventually we join that string and write it to a new file.
I think some itertools.cycle could be nice here:
import itertools
with open("transformed.file.fastq", "w+") as output_file:
with open("file.fastq", "r") as input_file:
for i in itertools.cycle((1,2,3,4)):
line = input_file.readline().strip()
if not line:
break
if i in (2,4):
line = line[:65]
output_file.write("{}\n".format(line))
readlines() will return list of each line in your file. You don't need to prepare a list new_line. Directly iterate over index-value pair of list, then you can modify all the values in your desired position.
By modifying your code, try this
infile = open("file.fastq", "r")
new_lines = infile.readlines()
for i, t in enumerate(new_lines):
if i == 1 or i == 3:
new_lines[i] = new_lines[i][:65]
with open('out_file.fastq', 'w') as f:
for item in new_lines:
f.write("%s" % item)

Read in every line that starts with a certain character from a file

I am trying to read in every line in a file that starts with an 'X:'. I don't want to read the 'X:' itself just the rest of the line that follows.
with open("hnr1.abc","r") as file: f = file.read()
id = []
for line in f:
if line.startswith("X:"):
id.append(f.line[2:])
print(id)
It doesn't have any errors but it doesn't print anything out.
try this:
with open("hnr1.abc","r") as fi:
id = []
for ln in fi:
if ln.startswith("X:"):
id.append(ln[2:])
print(id)
dont use names like file or line
note the append just uses the item name not as part of the file
by pre-reading the file into memory the for loop was accessing the data by character not by line
for line in f:
search = line.split
if search[0] = "X":
storagearray.extend(search)
That should give you an array of all the lines you want, but they'll be split into separate words. Also, you'll need to have defined storagearray before we call it in the above block of code. It's an inelegant solution, as I'm a learner myself, but it should do the job!
edit: If you want to output the lines, simply use python's inbuilt print function:
str(storagearray)
print storagearray
Read every line in the file (for loop)
Select lines that contains X:
Slice the line with index 0: with starting char's/string as X: = ln[0:]
Print lines that begins with X:
for ln in input_file:
if ln.startswith('X:'):
X_ln = ln[0:]
print (X_ln)

python delete specific line and re-assign the line number

I would like delete specific line and re-assign the line number:
eg:
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
what I want: if line 1 is the line need to be delete, then
output should be:
0,abc,def
1,mno,pqr
2,stu,vwx
What I have done so far:
f=open(file,'r')
lines = f.readlines()
f.close()
f.open(file,'w')
for line in lines:
if line.rsplit(',')[0] != 'line#':
f.write(line)
f.close()
above lines can delete specifc line#, but I don't konw how to rewrite the line number before the first ','
Here is a function that will do the job.
def removeLine(n, file):
f = open(file,"r+")
d = f.readlines()
f.seek(0)
for i in range(len(d)):
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
elif i != n:
f.write(d[i])
f.truncate()
f.close()
Where the parameters n and file are the line you wish to delete and the filepath respectively.
This is assuming the line numbers are written in the line as implied by your example input.
If the number of the line is not included at the beginning of each line, as some other answers have assumed, simply remove the first if statement:
if i > n:
f.write(d[i].replace(d[i].split(",")[0],str(i -1)))
I noticed that your account wasn't created in the past few hours, so I figure that there's no harm in giving you the benefit of the doubt. You will really have more fun on StackOverflow if you spend the time to learn its culture.
I wrote a solution that fits your question's criteria on a file that's already written (you mentioned that you're opening a text file), so I assume it's a CSV.
I figured that I'd answer your question differently than the other solutions that implement the CSV reader library and use a temporary file.
import re
numline_csv = re.compile("\d\,")
# substitute your actual file opening here
so_31195910 = """
0,abc,def
1,ghi,jkl
2,mno,pqr
3,stu,vwx
"""
so = so_31195910.splitlines()
# this could be an input or whatever you need
delete_line = 1
line_bank = []
for l in so:
if l and not l.startswith(str(delete_line)+','):
print(l)
l = re.split(numline_csv, l)
line_bank.append(l[1])
so = []
for i,l in enumerate(line_bank):
so.append("%s,%s" % (i,l))
And the output:
>>> so
['0,abc,def', '1,mno,pqr', '2,stu,vwx']
In order to get a line number for each line, you should use the enumerate method...
for line_index, line in enumerate(lines):
# line_index is 0 for the first line, 1 for the 2nd line, &ct
In order to separate the first element of the string from the rest of the string, I suggest passing a value for maxsplit to the split method.
>>> '0,abc,def'.split(',')
['0', 'abc', 'def']
>>> '0,abc,def'.split(',',1)
['0', 'abc,def']
>>>
Once you have those two, it's just a matter of concatenating line_index to split(',',1)[1].

Iterate file name with counter

I'm splitting a file based on a string, and would like to have the output file names be numbered.
This is what I have so far:
outputfile = open("output.seq")
outputfileContent = outputfile.read()
outputfileList = outputfileContent.split(">")
for count, line in enumerate(f):
for items in outputfileList:
seqInfoFile = open('%f.dat', 'w')
seqInfoFile.write(str(items))
I'm not sure where to define f.
Thanks for any help!
Assuming I haven't misunderstood you, where you have it.
outputfile = open("output.seq")
outputfileContent = outputfile.read()
outputfileList = outputfileContent.split(">")
for count, content in enumerate(outputfileList, 1):
with open("output_%s.dat" % count, "w") as output:
output.write(content)
It would seem that if you want to associate every item in the output file list with a file titled as its index, you should do something like this:
for i in range(len(outputfileList)):
seqInfoFile = open(str(i) + '.dat', 'w')
seqInfoFile.write(str(outputfileList[i]))
It's not quite as elegant as an iterator, but the other option is to determine the number by making a call to outputfileList.index(items) each time.
Open output.seq, write its first line (splitted at >) into the file 1.dat, the second one to 2.dat and so on:
with open("output.seq") as fi:
for count, line in enumerate(fi, 1):
with open('{0}.dat'.format(count), 'w') as fo:
fo.writelines(line.split('>'))

Count number of lines in a txt file with Python excluding blank lines

I wish to count the number of lines in a .txt file which looks something like this:
apple
orange
pear
hippo
donkey
Where there are blank lines used to separate blocks. The result I'm looking for, based on the above sample, is five (lines).
How can I achieve this?
As a bonus, it would be nice to know how many blocks/paragraphs there are. So, based on the above example, that would be two blocks.
non_blank_count = 0
with open('data.txt') as infp:
for line in infp:
if line.strip():
non_blank_count += 1
print 'number of non-blank lines found %d' % non_blank_count
UPDATE: Re-read the question, OP wants to count non-blank lines .. (sigh .. thanks #RanRag).
(I need a break from the computer ...)
A short way to count the number of non-blank lines could be:
with open('data.txt', 'r') as f:
lines = f.readlines()
num_lines = len([l for l in lines if l.strip(' \n') != ''])
I am surprised to see that there isn't a clean pythonic answer yet (as of Jan 1, 2019). Many of the other answers create unnecessary lists, count in a non-pythonic way, loop over the lines of the file in a non-pythonic way, do not close the file properly, do unnecessary things, assume that the end of line character can only be '\n', or have other smaller issues.
Here is my suggested solution:
with open('myfile.txt') as f:
line_count = sum(1 for line in f if line.strip())
The question does not define what blank line is. My definition of blank line: line is a blank line if and only if line.strip() returns the empty string. This may or may not be your definition of blank line.
sum([1 for i in open("file_name","r").readlines() if i.strip()])
Considering the blank lines will only contain the new line character, it would be pretty faster to avoid calling str.strip which creates a new string but instead to check if the line contains only spaces using str.isspace and then skip it:
with open('data.txt') as f:
non_blank_lines = sum(not line.isspace() for line in f)
Demo:
from io import StringIO
s = '''apple
orange
pear
hippo
donkey'''
non_blank_lines = sum(not line.isspace() for line in StringIO(s)))
# 5
You can further use str.isspace with itertools.groupby to count the number of contiguous lines/blocks in the file:
from itertools import groupby
no_paragraphs = sum(k for k, _ in groupby(StringIO(s), lambda x: not x.isspace()))
print(no_paragraphs)
# 2
Not blank lines Counter:
lines_counter = 0
with open ('test_file.txt') as f:
for line in f:
if line != '\n':
lines_counter += 1
Blocks Counter:
para_counter = 0
prev = '\n'
with open ('test_file.txt') as f:
for line in f:
if line != '\n' and prev == '\n':
para_counter += 1
prev = line
This bit of Python code should solve your problem:
with open('data.txt', 'r') as f:
lines = len(list(filter(lambda x: x.strip(), f)))
This is how I would've done it:
f = open("file.txt")
l = [x for x in f.readlines() if x != "\n"]
print len(l)
readlines() will make a list of all the lines in the file and then you can just take those lines that have at least something in them.
Looks pretty straightforward to me!
Pretty straight one! I believe
f = open('path','r')
count = 0
for lines in f:
if lines.strip():
count +=1
print count
My one liner would be
print(sum(1 for line in open(path_to_file,'r') if line.strip()))

Categories

Resources