Count how many full stops a text file contains in Python - python

I would like to write a code that will read and open a text file and tell me how many "." (full stops) it contains
I have something like this but i don't know what to do now?!
f = open( "mustang.txt", "r" )
a = []
for line in f:

with open('mustang.txt') as f:
s = sum(line.count(".") for line in f)

Assuming there is absolutely no danger of your file being so large it will cause your computer to run out of memory (for instance, in a production environment where users can select arbitrary files, you may not wish to use this method):
f = open("mustang.txt", "r")
count = f.read().count('.')
f.close()
print count
More properly:
with open("mustang.txt", "r") as f:
count = f.read().count('.')
print count

I'd do it like so:
with open('mustang.txt', 'r') as handle:
count = handle.read().count('.')
If your file isn't too big, just load it into memory as a string and count the dots.

with open('mustang.txt') as f:
fullstops = 0
for line in f:
fullstops += line.count('.')

This will work:
with open('mustangused.txt') as inf:
count = 0
for line in inf:
count += line.count('.')
print 'found %d periods in file.' % count

even with Regular Expression
import re
with open('filename.txt','r') as f:
c = re.findall('\.+',f.read())
if c:print len(c)

Related

Removing duplicates from text file using python

I have this text file and let's say it contains 10 lines.
Bye
Hi
2
3
4
5
Hi
Bye
7
Hi
Every time it says "Hi" and "Bye" I want it to be removed except for the first time it was said.
My current code is (yes filename is actually pointing towards a file, I just didn't place it in this one)
text_file = open(filename)
for i, line in enumerate(text_file):
if i == 0:
var_Line1 = line
if i = 1:
var_Line2 = line
if i > 1:
if line == var_Line2:
del line
text_file.close()
It does detect the duplicates, but it takes a very long time considering the amount of lines there are, but I'm not sure on how to delete them and save it as well
You could use dict.fromkeys to remove duplicates and preserve order efficiently:
with open(filename, "r") as f:
lines = dict.fromkeys(f.readlines())
with open(filename, "w") as f:
f.writelines(lines)
Idea from Raymond Hettinger
Using a set & some basic filtering logic:
with open('test.txt') as f:
seen = set() # keep track of the lines already seen
deduped = []
for line in f:
line = line.rstrip()
if line not in seen: # if not seen already, write the lines to result
deduped.append(line)
seen.add(line)
# re-write the file with the de-duplicated lines
with open('test.txt', 'w') as f:
f.writelines([l + '\n' for l in deduped])

Python program to number rows

i have a file with data as such.
>1_DL_2021.1123
>2_DL_2021.1206
>3_DL_2021.1202
>3_DL_2021.1214
>4_DL_2021.1214
>4_DL_2021.1214
>6_DL_2021.1214
>7_DL_2021.1214
>8_DL_2021.1214
now as you can see the data is not numbered properly and hence needs to be numbered.
what im aiming for is this:
>1_DL_2021.1123
>2_DL_2021.1206
>3_DL_2021.1202
>4_DL_2021.1214
>5_DL_2021.1214
>6_DL_2021.1214
>7_DL_2021.1214
>8_DL_2021.1214
>9_DL_2021.1214
now the file has a lot of other stuff between these lines starting with > sign. i want only the > sign stuff affected.
could someone please help me out with this.
also there are 563 such lines so manually doing it is out of question.
So, assuming input data file is "input.txt"
You can achieve what you want with this
import re
with open("input.txt", "r") as f:
a = f.readlines()
regex = re.compile(r"^>\d+_DL_2021\.\d+\n$")
counter = 1
for i, line in enumerate(a):
if regex.match(line):
tokens = line.split("_")
tokens[0] = f">{counter}"
a[i] = "_".join(tokens)
counter += 1
with open("input.txt", "w") as f:
f.writelines(a)
So what it does it searches for line with the regex ^>\d+_DL_2021\.\d+\n$, then splits it by _ and gets the first (0th) element and rewrites it, then counts up by 1 and continues the same thing, after all it just writes updated strings back to "input.txt"
sudden_appearance already provided a good answer.
In case you don't like regex too much you can use this code instead:
new_lines = []
with open('test_file.txt', 'r') as f:
c = 1
for line in f:
if line[0] == '>':
after_dash = line.split('_',1)[1]
new_line = '>' + str(c) + '_' + after_dash
c += 1
new_lines.append(new_line)
else:
new_lines.append(line)
with open('test_file.txt', 'w') as f:
f.writelines(new_lines)
Also you can have a look at this split tutorial for more information about how to use split.

Deleting n number of lines after specific line of file in python

I am trying to remove a specific number of lines from a file. These lines always occur after a specific comment line. Anyways, talk is cheap, here is an example of what I have.
FILE: --
randomstuff
randomstuff2
randomstuff3
# my comment
extrastuff
randomstuff2
extrastuff2
#some other comment
randomstuff4
So, I am trying to remove the section after # my comment. Perhaps there is someway to delete a line in r+ mode?
Here is what I have so far
with open(file_name, 'a+') as f:
for line in f:
if line == my_comment_text:
f.seek(len(my_comment_text)*-1, 1) # move cursor back to beginning of line
counter = 4
if counter > 0:
del(line) # is there a way to do this?
Not exactly sure how to do this. How do I remove a specific line? I have looked at this possible dup and can't quite figure out how to do it that way either. The answer recommends you read the file, then you re-write it. The problem with this is they are checking for a specific line when they write. I cant do that exactly, plus I dont like the idea of storing the entire files contents in memory. That would eat up a lot of memory with a large file (since every line has to be stored, rather than one at a time).
Any ideas?
You can use the fileinput module for this and open the file in inplace=True mode to allow in-place modification:
import fileinput
counter = 0
for line in fileinput.input('inp.txt', inplace=True):
if not counter:
if line.startswith('# my comment'):
counter = 4
else:
print line,
else:
counter -= 1
Edit per your comment "Or until a blank line is found":
import fileinput
ignore = False
for line in fileinput.input('inp.txt', inplace=True):
if not ignore:
if line.startswith('# my comment'):
ignore = True
else:
print line,
if ignore and line.isspace():
ignore = False
You can make a small modification to your code and stream the content from one file to the other very easily.
with open(file_name, 'r') as f:
with open(second_file_name,'w') a t:
counter = 0
for line in f:
if line == my_comment_text:
counter = 3
elif: counter > 0
counter -= 1
else:
w.write(line)
I like the answer form #Ashwini. I was working on the solution also and something like this should work if you are OK to write a new file with filtered lines:
def rewriteByRemovingSomeLines(inputFile, outputFile):
unDesiredLines = []
count = 0
skipping = False
fhIn = open(inputFile, 'r')
line = fhIn.readline()
while(line):
if line.startswith('#I'):
unDesiredLines.append(count)
skipping = True
while (skipping):
line = fhIn.readline()
count = count + 1
if (line == '\n' or line.startswith('#')):
skipping=False
else:
unDesiredLines.append(count)
count = count + 1
line = fhIn.readline()
fhIn.close()
fhIn = open(inputFile, 'r')
count = 0
#Write the desired lines to a new file
fhOut = open(outputFile, 'w')
for line in fhIn:
if not (count in unDesiredLines):
fhOut.write(line)
count = count + 1
fhIn.close()
fhOut.close

Taking numbers from a file, and find the average where the amount of lines changes?

I am trying to make a program which grabs a list of numbers from a file (which could change in lines and size), and then print out the total of all the numbers and the average. I had no problems doing this when I had a set number of linereads, but am confused on the 'proper' way when the lineread changes every run.
This is my work-in-progress code. I read around a bit and found the correct (?) way of looping through the file to find the length, but not sure how to implement it since it throws some type of IO error currently. Thanks for the help!
def main():
filename = input("Enter file name (name.txt):")
try:
file = open(filename, "r")
except IOError:
print("Error opening file!")
totalLines = totalLineGet(filename)
results = []
for x in range(totalLines):
results.append(getLineNumber(x+1, file))
print("Total = ", numTotal)
print("Average = ", numAvg)
def totalLineGet(_filename):
count = 0
_file = open(_filename, "r")
for x in open(_file):
count+= 1
return count
def getLineNumber(linetoget, _file):
try:
intNumber = int(number = _file.readline())
except ValueError:
print("Error in file data!")
return intNumber
main()
I'm not sure what you want to do... but you should be able to get the answer in one pass.
You can use enumerate() to number an iterable object, in this case a file, if you need to know the item/line number count.
Assuming a single int() per line:
with open(filename, "r") as in_f:
numbers = []
for line in in_f:
line = line.strip() # remove whitespace
if line: # make sure there is something there
number_on_line = int(line)
numbers.append(number_on_line)
sum_of_numbers = sum(numbers)
avg_of_numbers = sum(numbers)/len(numbers)
if this is CSV data you should look into using the csv module, it will split the line into rows/columns for you.
import csv
filename = "somefile"
with open(filename, "rb") as in_f: # <-- notice "rb" is used
reader = csv.reader(in_f)
for row in reader:
for col in row:
# do stuff
...
A simple solution, doing what you want...
filename = 'tmp.txt'
f = open(filename)
s, totnum = 0, 0
for line_number, line in enumerate(f):
nums = map(int, line.split())
s += sum(nums)
totnum += len(nums)
print "numbers:", totnum, "average:", 1.0*s/totnum
This assumes your file only has numbers on each line and not characters, otherwise you'll get a TypeError.
list_of_numbers = []
with open('somefile.txt') as f:
for line in f:
if line.strip(): # this skips blank lines
list_of_numbers.append(int(line.strip()))
print 'Total ',len(list_of_numbers)
print 'Average ',1.0*sum(list_of_numbers)/len(list_of_numbers)
There are some good answers regarding how to do what you want. As for the IO error, the input() built-in attempts to evaluate the user's input which is both dangerous and not what you want.
Try using the raw_input() built-in. That returns the user's input as a string. For fun, try running your script and giving it __name__ as the filename and see what happens.

Counting lines, words, and characters within a text file using Python

I'm having a bit of a rough time laying out how I would count certain elements within a text file using Python. I'm a few months into Python and I'm familiar with the following functions;
raw_input
open
split
len
print
rsplit()
Here's my code so far:
fname = "feed.txt"
fname = open('feed.txt', 'r')
num_lines = 0
num_words = 0
num_chars = 0
for line in feed:
lines = line.split('\n')
At this point I'm not sure what to do next. I feel the most logical way to approach it would be to first count the lines, count the words within each line, and then count the number of characters within each word. But one of the issues I ran into was trying to perform all of the necessary functions at once, without having to re-open the file to perform each function seperately.
Try this:
fname = "feed.txt"
num_lines = 0
num_words = 0
num_chars = 0
with open(fname, 'r') as f:
for line in f:
words = line.split()
num_lines += 1
num_words += len(words)
num_chars += len(line)
Back to your code:
fname = "feed.txt"
fname = open('feed.txt', 'r')
what's the point of this? fname is a string first and then a file object. You don't really use the string defined in the first line and you should use one variable for one thing only: either a string or a file object.
for line in feed:
lines = line.split('\n')
line is one line from the file. It does not make sense to split('\n') it.
Functions that might be helpful:
open("file").read() which reads the contents of the whole file at once
'string'.splitlines() which separates lines from each other (and discards empty lines)
By using len() and those functions you could accomplish what you're doing.
fname = "feed.txt"
feed = open(fname, 'r')
num_lines = len(feed.splitlines())
num_words = 0
num_chars = 0
for line in lines:
num_words += len(line.split())
file__IO = input('\nEnter file name here to analize with path:: ')
with open(file__IO, 'r') as f:
data = f.read()
line = data.splitlines()
words = data.split()
spaces = data.split(" ")
charc = (len(data) - len(spaces))
print('\n Line number ::', len(line), '\n Words number ::', len(words), '\n Spaces ::', len(spaces), '\n Charecters ::', (len(data)-len(spaces)))
I tried this code & it works as expected.
One of the way I like is this one , but may be good for small files
with open(fileName,'r') as content_file:
content = content_file.read()
lineCount = len(re.split("\n",content))
words = re.split("\W+",content.lower())
To count words, there is two way, if you don't care about repetition you can just do
words_count = len(words)
if you want the counts of each word you can just do
import collections
words_count = collections.Counter(words) #Count the occurrence of each word

Categories

Resources