Python to count commas in a text file - python

I am trying to count the commas between entries in a text file so I can use the number of commas to find the number of entries to come up with the average. Unfortunately it comes up with commacount of zero.
file = open("inputs.txt", "r")
line = file.read()
commaCount = 0
for line in file:
for char in line:
if char == ',':
commaCount+=1
commacount2 = (multiply(commaCount,2))
total = sum(int(num) for num in line.strip(',').split(','))
print(commaCount)
print(commacount2)
print("Your average for all inputs is" + str(divide(total,commacount2)))

You have already consumed the file iterator with line = file.read() so you are not iterating over anything. You should forget read and iterate over the file object itself:
with open("inputs.txt", "r") as f:
count = sum(line.count(",") for line in f)
# f.seek(0)
# use the lines again
If you want to get the pointer back to the start to iterate again you could f.seek(0) but I am not sure what the total = sum(int(num) for num in line.strip(',').split(',')) is doing.
Once you call .read or .readlines you have move the pointer to the end of the file so unless you f.seek(0) you cannot iterate over all the lines again, you are basically doing:
In [8]: iterator = iter((1,2,3))
In [9]: list(iterator) # consume
Out[9]: [1, 2, 3]
In [10]: list(iterator) # empty
Out[10]: []
In [11]: list(iterator).count(1)
Out[11]: 0
If you have a comma separated file with integers you can use the csv module, the length of the rows will give you the count of how may elements and map the strings to ints and sum all the row values:
import csv
with open("inputs.txt") as f:
r = csv.reader(f) # create rows split on commas
sm = 0
com_count = 0
for row in r:
com_count += len(row) # "1,2,3"
sm += sum(map(int,row))
It would actually be sm += sum(map(int,row)) -1 to match the comma count but if you want the number of elements then counting the commas is not the correct approach "1,2,3".count(",") == 2 but there are three elements.

This should help you get started, It should give you the number of commas in a text file, If you use a loop you can use it for all the files you have.
with open('inputs.txt', 'r') as f:
numCommas = f.read().count(',')
print numCommas

Related

How to find and sum up all with the matching words on a list?

EmpRecords=[1,'Angelo','Fabregas','South','City',
2,'Fabian','Fabregas','North','City',
3,'Griffin','De Leon','West','City',
4,'John','Doe','East','City',
5,'Jane','Doe','Southville','Town']
Output should something be like:
Enter word to search: Doe
Same words: 2
How do I do this? I should also clarify that EmpRecords is actually just a text File that is converted into a list.
so it's actually:
EmpRecords='''1,Angelo,Fabregas,South,City;
2,Fabian,Fabregas,North,City;
3,Griffin,De Leon,West,City;
4,John,Doe,East,City;
5,Jane,Doe',Southville,Town'''
Maybe this has something to do with finding the matching words?
Assuming you want to search for any word separated by comma and each line is a separate item:
Since your actual records are separated by ";" you need to create a nested list as below:
>>> record_groups = EmpRecords.split(";")
>>> final_groups = [each_group.split(",") for each_group in record_groups]
Later you can search through list items for the given word:
>>> word = "Doe"
>>> counter = 0
>>> for each_entry in final_groups:
if word in each_entry:
counter += 1
>>> print(counter)
APPROACH 2:
If it is already in a file you can directly open line by line and search:
word = "Doe"
counter = 0
with open("input.txt") as fd:
for line in fd:
if word in line.strip().split(",")
counter += 1
print(counter)
If you want to read from the file and count, you can use a loop.
import csv
with open('records.txt') as csvfile:
linereader = csv.reader(csvfile, delimiter=',')
count = 0;
target_value = 'Doe'
for row in linereader:
if row[2] == target_value:
count += 1;
print("Count: ",count)
You may need to remove the semicolon (;) from the last field if you will be using the data.

Is there a way to read a line backwards in Python?

Im trying to write a program that counts the number of N's at the end of a string.
I have a file containing a many lines of unique sequences and I want to measure how often the sequence ends with N, and how long the series of N's are. For example, the file input will look like this:
NTGTGTAATAGATTTTACTTTTGCCTTTAAGCCCAAGGTCCTGGACTTGAAACATCCAAGGGATGGAAAATGCCGTATAACNN
NAAAGTCTACCAATTATACTTAGTGTGAAGAGGTGGGAGTTAAATATGACTTCCATTAATAGTTTCATTGTTTGGAAAACAGN
NTACGTTTAGTAGAGACAGTGTCTTGCTATGTTGCCCAGGCTGGTCTCAAACTCCTGAGCTCTAGCAAGCCTTCCACCTCNNN
NTAATCCAACTAACTAAAAATAAAAAGATTCAAATAGGTACAGAAAACAATGAAGGTGTAGAGGTGAGAAATCAACAGGANNN
Ideally, the code will read through the file, line by line and count how often a line ends with 'N'.
Then, if a line ends with N, it should read each character backwards to see how long the string of N's is. This information will be used to calculate the percentage of lines ending in N, as well as the mean, mode, median and range of N strings.
Here is what I have so far.
filename = 'N_strings_test.txt'
n_strings = 0
n_string_len = []
with open(filename, 'r') as in_f_obj:
line_count = 0
for line in in_f_obj:
line_count += 1
base_seq = line.rstrip()
if base_seq[-1] == 'N':
n_strings += 1
if base_seq[-2] == 'N':
n_string_len.append(int(2))
else:
n_string_len.append(int(1))
print(line_count)
print(n_strings)
print(n_string_len)
All i'm getting is an index out of range error, but I don't understand why. Also, what I have so far is only limited to 2 characters.
I want to try and write this for myself, so I don't want to import any modules.
Thanks.
You will probably get the IndexError because your file has empty lines!
Two sound approaches. First the generic one: iterate the line in reverse using reversed():
line = line.rstrip()
count = 0
for c in reversed(line):
if c != 'N':
break
count += 1
# count will now contain the number of N characters from the end
Another, even easier, which does modify the string, is to rstrip() all whitespace, get the length, and then rstrip() all Ns. The number of trailing Ns is the difference in lengths:
without_whitespace = line.rstrip()
without_ns = without_whitespace .rstrip('N')
count = len(without_whitespace) - len(without_ns)
This code is:
Reading line by line
Reversing the string and lstriping it. Reversing is not necessary but it make things natural.
Read last character, if N then increment
Keep reading that line until we have stream of N
n_string_count, n_string_len, line_count = 0, [], 0
with open('file.txt', 'r') as input_file:
for line in input_file:
line_count += 1
line = line[::-1].lstrip()
if line:
if line[0] == 'N':
n_string_count += 1
consecutive_n = 1
while consecutive_n < len(line) and line[consecutive_n] == 'N': consecutive_n += 1
n_string_len.append(consecutive_n)
print(line_count)
print(n_string_count)
print(n_string_len)

I want to write a function which prints a sum

I just started learning Python a few weeks ago and I want to write a function which opens a file, counts and adds up the characters in each line and prints that those equal the total number of characters in the file.
For example, given a file test1.txt:
lineLengths('test1.txt')
The output should be:
15+20+23+24+0=82 (+0 optional)
This is what I have so far:
def lineLengths(filename):
f=open(filename)
lines=f.readlines()
f.close()
answer=[]
for aline in lines:
count=len(aline)
It does what I want it to do, but I don't know how to include all the of numbers added together when I have the function print.
If you only want to print the sum of the length of each line, you can do it like so:
def lineLengths(filename):
with open(filename) as f:
answer = []
for aline in f:
answer.append(len(aline))
print("%s = %s" %("+".join(str(c) for c in answer), sum(answer))
If you however also need to track lengths of all the individual lines, you can append the length for each line in your answer list by using the append method and then print the sum by using sum(answer)
Try this :
f=open(filename)
mylist = f.read().splitlines()
sum([len(i) for i in mylist])
Simple as this:
sum(map(len, open(filename)))
open(filename) returns an iterator that passes through each line, each of which is run through the len function, and the results are summed.
Once you read lines from file you can count sum using:
sum([len(aline) for aline in lines])
Separate you problem in function : a responsible by return total sum of lines and other to format sum of each line.
def read_file(file):
with open(file) as file:
lines = file.readlines()
return lines
def format_line_sum(lines):
lines_in_str = []
for line in lines:
lines_in_str.append(str(line)
return "+".join(str_lines))
def lines_length(file):
lines = read_file(file)
total_sum = 0
for line in lines:
total_sum += len(line)
return format_lines_sum(lines) + "=" + total_sum
And to use:
print(lines_length('file1.txt'))
Assuming your output is literal, something like this should work.
You can use python sum() function when you figure out how to add numbers to the list
def lineLengths(filename):
with open(filename) as f:
line_lengths = [len(l.rstrip()) for l in f]
summ = '+'.join(map(str, line_lengths)) # can only join strings
return sum(line_lengths), summ
total_chars, summ = lineLengths(filename)
print("{} = {}".format(summ, total_chars))
This should have the output you want : x+y+z=a
def lineLengths(filename):
count=[]
with open(filename) as f: #this is an easier way to open/close a file
for line in f:
count.append(len(line))
print('+'.join(str(x) for x in count) + "=" + str(sum(count))

how can I print lines of a file that specefied by a list of numbers Python?

I open a dictionary and pull specific lines the lines will be specified using a list and at the end i need to print a complete sentence in one line.
I want to open a dictionary that has a word in each line
then print a sentence in one line with a space between the words:
N = ['19','85','45','14']
file = open("DICTIONARY", "r")
my_sentence = #?????????
print my_sentence
If your DICTIONARY is not too big (i.e. can fit your memory):
N = [19,85,45,14]
with open("DICTIONARY", "r") as f:
words = f.readlines()
my_sentence = " ".join([words[i].strip() for i in N])
EDIT: A small clarification, the original post didn't use space to join the words, I've changed the code to include it. You can also use ",".join(...) if you need to separate the words by a comma, or any other separator you might need. Also, keep in mind that this code uses zero-based line index so the first line of your DICTIONARY would be 0, the second would be 1, etc.
UPDATE:: If your dictionary is too big for your memory, or you just want to consume as little memory as possible (if that's the case, why would you go for Python in the first place? ;)) you can only 'extract' the words you're interested in:
N = [19, 85, 45, 14]
words = {}
word_indexes = set(N)
counter = 0
with open("DICTIONARY", "r") as f:
for line in f:
if counter in word_indexes:
words[counter] = line.strip()
counter += 1
my_sentence = " ".join([words[i] for i in N])
you can use linecache.getline to get specific line numbers you want:
import linecache
sentence = []
for line_number in N:
word = linecache.getline('DICTIONARY',line_number)
sentence.append(word.strip('\n'))
sentence = " ".join(sentence)
Here's a simple one with more basic approach:
n = ['2','4','7','11']
file = open("DICTIONARY")
counter = 1 # 1 if you're gonna count lines in DICTIONARY
# from 1, else 0 is used
output = ""
for line in file:
line = line.rstrip() # rstrip() method to delete \n character,
# if not used, print ends with every
# word from a new line
if str(counter) in n:
output += line + " "
counter += 1
print output[:-1] # slicing is used for a white space deletion
# after last word in string (optional)

count suffixes appearing in the word file

I have got this python program which reads through a wordlist file and checks for the suffixes ending which are given in another file using endswith() method.
the suffixes to check for is saved into the list: suffixList[]
The count is being taken using suffixCount[]
The following is my code:
fd = open(filename, 'r')
print 'Suffixes: '
x = len(suffixList)
for line in fd:
for wordp in range(0,x):
if word.endswith(suffixList[wordp]):
suffixCount[wordp] = suffixCount[wordp]+1
for output in range(0,x):
print "%-6s %10i"%(prefixList[output], prefixCount[output])
fd.close()
The output is this :
Suffixes:
able 0
ible 0
ation 0
the program is unable to reach this loop :
if word.endswith(suffixList[wordp]):
You need to strip the newline:
word = ln.rstrip().lower()
The words are coming from a file so each line ends with a newline character. You are then trying to use endswith which always fails as none of your suffixes end with a newline.
I would also change the function to return the values you want:
def store_roots(start, end):
with open("rootsPrefixesSuffixes.txt") as fs:
lst = [line.split()[0] for line in map(str.strip, fs)
if '#' not in line and line]
return lst, dict.fromkeys(lst[start:end], 0)
lst, sfx_dict = store_roots(22, 30) # List, SuffixList
Then slice from the end and see if the substring is in the dict:
with open('longWordList.txt') as fd:
print('Suffixes: ')
mx, mn = max(sfx_dict, key=len), min(sfx_dict, key=len)
for ln in map(str.rstrip, fd):
suf = ln[-mx:]
for i in range(mx-1, mn-1, -1):
if suf in sfx_dict:
sfx_dict[suf] += 1
suf = suf[-i:]
for k,v in sfx_dict:
print("Suffix = {} Count = {}".format(k,v))
Slicing the end of the string incrementally should be faster than checking every string especially if you have numerous suffixes that are the same length. At most it does mx - mn iterations, so if you had 20 four character suffixes you would only need to check the dict once, only one n length substring can be matched at a time so we would kill n length substrings at the one time with a single slice and lookup.
You could use a Counter to count the occurrences of suffix:
from collections import Counter
with open("rootsPrefixesSuffixes.txt") as fp:
List = [line.strip() for line in fp if line and '#' not in line]
suffixes = List[22:30] # ?
with open('longWordList.txt') as fp:
c = Counter(s for word in fp for s in suffixes if word.rstrip().lower().endswith(s))
print(c)
Note: add .split()[0] if there are more than one words per line you want to ignore, otherwise this is unnecessary.

Categories

Resources