how to count empty lines in python file - python

I would like to print the total empty lines using python. I have been trying to print using:
f = open('file.txt','r')
for line in f:
if (line.split()) == 0:
but not able to get proper output
I have been trying to print it.. it does print the value as 0.. not sure what wrong with code..
print "\nblank lines are",(sum(line.isspace() for line in fname))
it printing as:
blank lines are 0
There are 7 lines in the file.
There are 46 characters in the file.
There are 8 words in the file.

Since the empty string is a falsy value, you may use .strip():
for line in f:
if not line.strip():
....
The above ignores lines with only whitespaces.
If you want completely empty lines you may want to use this instead:
if line in ['\r\n', '\n']:
...

Please use a context manager (with statement) to open files:
with open('file.txt') as f:
print(sum(line.isspace() for line in f))
line.isspace() returns True (== 1) if line doesn't have any non-whitespace characters, and False (== 0) otherwise. Therefore, sum(line.isspace() for line in f) returns the number of lines that are considered empty.
line.split() always returns a list. Both
if line.split() == []:
and
if not line.split():
would work.

FILE_NAME = 'file.txt'
empty_line_count = 0
with open(FILE_NAME,'r') as fh:
for line in fh:
# The split method will split the word into list. if the line is
# empty the split will return an empty list. ' == [] ' this will
# check the list is empty or not.
if line.split() == []:
empty_line_count += 1
print('Empty Line Count : ' , empty_line_count)

Related

Python - Remove all the lines starting with word/string present in a list

I am trying to parse huge 50K lined file in which I have to remove any line that starts with the word present in a predefined list.
Currently I have tried the below and the output file (DB12_NEW) is not working as desired -
rem = ['remove', 'remove1', 'remove2'....., 'removen']
inputFile = open(r"C:\file", "r")
outputFile = open(r"C:\file_12", "w")
lines = inputFile.readlines()
inputFile.close()
for line in lines:
for i in rem:
if line.startswith(i):
outputFile.write('\n')
else:
outputFile.write(line)
I am getting the same file as output that I initially put in... the script is not removing the lines that start with any of the strings present in the list.
Can you please help understand how to achieve this?
Use a tuple instead of list for str.startswith.
# rem = ['remove', 'rem-ove', 'rem ove']
rem = ('remove', 'rem-ove', 'rem ove')
with open('DB12', 'r') as inputFile, open('DB12_NEW', 'w') as outputFile:
for line in inputFile.readlines():
if not line.startswith(rem):
outputFile.writelines(line)
Currently you check if the line starts with the a word from the remove list one at a time. For example:
If the line starts with "rem ABCDF..." and in your loop you check if the line starts with 'remove' then your if-statement returns false and writes the line in your outputfile.
You could try something like this:
remove = ['remove', 'rem-ove', 'rem', 'rem ove' ...... 'n']
inputFile = open(r"C:\DB12", "r")
outputFile = open(r"C:\DB12_NEW", "w")
for line in inputFile.splitlines():
if not any(line.startswith(i) for i in remove):
outputFile.write(line)
The any keyword only returns False if all elements are also False.
Sometimes this could be caused by leading/trailing spaces.
Try stripping off empty spaces using strip() and check.
rem = [x.strip() for x in rem]
lines = [line.strip() for line in lines]

Why is my if statement not working and just outputting the else, everything works till there? [duplicate]

In Python, calling e.g. temp = open(filename,'r').readlines() results in a list in which each element is a line from the file. However, these strings have a newline character at the end, which I don't want.
How can I get the data without the newlines?
You can read the whole file and split lines using str.splitlines:
temp = file.read().splitlines()
Or you can strip the newline by hand:
temp = [line[:-1] for line in file]
Note: this last solution only works if the file ends with a newline, otherwise the last line will lose a character.
This assumption is true in most cases (especially for files created by text editors, which often do add an ending newline anyway).
If you want to avoid this you can add a newline at the end of file:
with open(the_file, 'r+') as f:
f.seek(-1, 2) # go at the end of the file
if f.read(1) != '\n':
# add missing newline if not already present
f.write('\n')
f.flush()
f.seek(0)
lines = [line[:-1] for line in f]
Or a simpler alternative is to strip the newline instead:
[line.rstrip('\n') for line in file]
Or even, although pretty unreadable:
[line[:-(line[-1] == '\n') or len(line)+1] for line in file]
Which exploits the fact that the return value of or isn't a boolean, but the object that was evaluated true or false.
The readlines method is actually equivalent to:
def readlines(self):
lines = []
for line in iter(self.readline, ''):
lines.append(line)
return lines
# or equivalently
def readlines(self):
lines = []
while True:
line = self.readline()
if not line:
break
lines.append(line)
return lines
Since readline() keeps the newline also readlines() keeps it.
Note: for symmetry to readlines() the writelines() method does not add ending newlines, so f2.writelines(f.readlines()) produces an exact copy of f in f2.
temp = open(filename,'r').read().split('\n')
Reading file one row at the time. Removing unwanted chars from end of the string with str.rstrip(chars).
with open(filename, 'r') as fileobj:
for row in fileobj:
print(row.rstrip('\n'))
See also str.strip([chars]) and str.lstrip([chars]).
I think this is the best option.
temp = [line.strip() for line in file.readlines()]
temp = open(filename,'r').read().splitlines()
My preferred one-liner -- if you don't count from pathlib import Path :)
lines = Path(filename).read_text().splitlines()
This it auto-closes the file, no need for with open()...
Added in Python 3.5.
https://docs.python.org/3/library/pathlib.html#pathlib.Path.read_text
Try this:
u=open("url.txt","r")
url=u.read().replace('\n','')
print(url)
To get rid of trailing end-of-line (/n) characters and of empty list values (''), try:
f = open(path_sample, "r")
lines = [line.rstrip('\n') for line in f.readlines() if line.strip() != '']
You can read the file as a list easily using a list comprehension
with open("foo.txt", 'r') as f:
lst = [row.rstrip('\n') for row in f]
my_file = open("first_file.txt", "r")
for line in my_file.readlines():
if line[-1:] == "\n":
print(line[:-1])
else:
print(line)
my_file.close()
This script here will take lines from file and save every line without newline with ,0 at the end in file2.
file = open("temp.txt", "+r")
file2 = open("res.txt", "+w")
for line in file:
file2.writelines(f"{line.splitlines()[0]},0\n")
file2.close()
if you looked at line, this value is data\n, so we put splitlines()
to make it as an array and [0] to choose the only word data
import csv
with open(filename) as f:
csvreader = csv.reader(f)
for line in csvreader:
print(line[0])

Locate a specific line in a file based on user input then delete a specific number of lines

I'm trying to delete specific lines in a text file the way I need to go about it is by prompting the user to input a string (a phrase that should exist in the file) the file is then searched and if the string is there the data on that line and the number line number are both stored.
After the phrase has been found it and the five following lines are printed out. Now I have to figure out how to delete those six lines without changing any other text in the file which is my issue lol.
Any Ideas as to how I can delete those six lines?
This was my latest attempt to delete the lines
file = open('C:\\test\\example.txt', 'a')
locate = "example string"
for i, line in enumerate(file):
if locate in line:
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i + 1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
break
Usually I would not think it's desirable to overwrite the source file - what if the user does something by mistake? If your project allows, I would write the changes out to a new file.
with open('source.txt', 'r') as ifile:
with open('output.txt', 'w') as ofile:
locate = "example string"
skip_next = 0
for line in ifile:
if locate in line:
skip_next = 6
print(line.rstrip('\n'))
elif skip_next > 0:
print(line.rstrip('\n'))
skip_next -= 1
else:
ofile.write(line)
This is also robust to finding the phrase multiple times - it will just start counting lines to remove again.
You can find the occurrences, copy the list items between the occurrences to a new list and then save the new list into the file.
_newData = []
_linesToSkip = 3
with open('data.txt', 'r') as _file:
data = _file.read().splitlines()
occurrences = [i for i, x in enumerate(data) if "example string" in x]
_lastOcurrence = 0
for ocurrence in occurrences:
_newData.extend(data[_lastOcurrence : ocurrence])
_lastOcurrence = ocurrence + _linesToSkip
_newData.extend(data[_lastOcurrence:])
# Save new data into the file
There are a couple of points that you clearly misunderstand here:
.strip() removes whitespace or given characters:
>>> print(str.strip.__doc__)
S.strip([chars]) -> str
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
incrementing i doesn't actually do anything:
>>> for i, _ in enumerate('ignore me'):
... print(i)
... i += 10
...
0
1
2
3
4
5
6
7
8
You're assigning to the ith element of the line, which should raise an exception (that you neglected to tell us about)
>>> line = 'some text'
>>> line[i] = line.strip()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
Ultimately...
You have to write to a file if you want to change its contents. Writing to a file that you're reading from is tricky business. Writing to an alternative file, or just storing the file in memory if it's small enough is a much healthier approach.
search_string = 'example'
lines = []
with open('/tmp/fnord.txt', 'r+') as f: #`r+` so we can read *and* write to the file
for line in f:
line = line.strip()
if search_string in line:
print(line)
for _ in range(5):
print(next(f).strip())
else:
lines.append(line)
f.seek(0) # back to the beginning!
f.truncate() # goodbye, original lines
for line in lines:
print(line, file=f) # python2 requires `from __future__ import print_function`
There is a fatal flaw in this approach, though - if the sought after line is any closer than the 6th line from the end, it's going to have problems. I'll leave that as an exercise for the reader.
You are appending to your file by using open with 'a'. Also, you are not closing your file (bad habit). str.strip() does not delete the line, it removes whitespace by default. Also, this would usually be done in a loop.
This to get started:
locate = "example string"
n=0
with open('example.txt', 'r+') as f:
for i,line in enumerate(f):
if locate in line:
n = 6
if n:
print( line, end='' )
n-=1
print( "done" )
Edit:
Read-modify-write solution:
locate = "example string"
filename='example.txt'
removelines=5
with open(filename) as f:
lines = f.readlines()
with open(filename, 'w') as f:
n=0
for line in lines:
if locate in line:
n = removelines+1
if n:
n-=1
else:
f.write(line)

Return First Letter of Line in File

I am trying to pull the first letter of every line in a file, then print those letters to a new file. I am working step-by-step so I created the code that would be able to pull the first letter of every line, however, when I added the code to read a specific file it appears that it is not properly iterating over the entire files content. Does anyone know why my for loop is not iterating? Or perhaps, is the issue that it is iterating but not properly adding the letters to 'lines'.
def secret2(m):
infile = open(m, 'r')
text = infile.read()
for line in text:
lines = text[0]
for i in range(len(text)):
if text[i] == '\n':
lines += text[i+1]
print(lines)
return(lines)
m.close()
Output:
>>> secret2('file.txt')
A
'A'
>>>
Proper output would be:
>>> secret2('file.txt')
'ALICE'
>>>
Your code is iterating over the characters instead of lines. You could print the first character from each line with following code:
def secret2(m):
with open(m) as infile:
print(''.join(line[0] for line in infile if line))
You want to consider the each line as a single data. So use readlines() instead of read. So your code should be
def secret2(m):
infile = open(m, 'r')
text = infile.readlines()
for j in (text):
print j[0]
You can use this:
def get_1st_chr(your_file, id_line) :
with open(your_file) as f :
text_splitted = f.read().splitlines()
f.close()
return text_splitted[id_line][0]
Or, if you want all of the first lines character:
def get_1st_chr(your_file, nb_lines) :
with open(your_file) as f :
text_splitted = f.read().splitlines()
f.close()
for i in range(nb_lines) :
print(text_splitted[[i][0])
You could replace 0 with the id of the character you want to print of course.

Finding the last element (digit) on every line in a file (Python)

I'm reading from a text file and I need to find the last element (digit) on each line. I don't understand why this code isn't working as I have tried it on a regular string but it doesn't seem to apply in this case.
f = open("file.txt", "r")
result = 0
for line in f:
string = str(f.read())
if string[-1:].isdigit() == True:
result = int(string[-1:])
else:
result = 40
print(result)
f.close()
The file file.txt only contains the line
81 First line32
so the code should print out 2 as a result, but I only get 40, as the first condition never becomes true. What am I doing wrong?
This line is extraneous:
string = str(f.read())
You don't need to read from your file, and will actually move the file pointer by doing so, causing all sorts of issues. You're already reading with this:
for line in f:
Thus, what you want is:
for line in f:
if line[-1:].isdigit() == True:
result = int(line[-1:])
else:
result = 40
This is explained in the documentation.
You have an f.read() too many. This is all you need:
f = open("file.txt", "r")
result = 0
for line in f:
if line[-1:].isdigit():
result = int(line[-1:])
else:
result = 40
print(result)
f.close()
Also the if string[-1:].isdigit() == True: can be replaced with if line[-1:].isdigit():
You may also want to use line.strip() to get rid of new lines, or else the comparison will fail.
f = open("file.txt", "r")
result = 0
for line in f:
l = line.strip()
if l[-1:].isdigit():
result = int(l[-1:])
else:
result = 40
print(result)
f.close()
The problem is that the last character in the line is the end-of-line character. Use .strip() to remove it (it will also remove extra spaces).
with open("file.txt", "r") as f:
for line in f:
lastchar = line.strip()[-1]
if lastchar.isdigit():
result = int(lastchar)
else:
result = 40
print(result)
This prints 2 as you requested in your question with the one-line file.
81 First line32
It will also work for multiple lines, printing the result for each line.

Categories

Resources