Count consecutive occurrences of values in a .txt file - python

I have a .txt file that has two words repeating in separate lines.
Here is an example. (the actual one is about 80,000 lines long)
ANS
ANS
ANS
AUT
AUT
AUT
AUT
ANS
ANS
ANS
ANS
ANS
I am trying to develop some Python code to count the consecutive lines and return the number of times they repeat. So for this example I would like to return [3,4,5] to another .txt file
word="100011010"
count=1
length=""
for i in range(1, len(word)):
if word[i-1] == word[i]:
count += 1
else:
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
print (length)
The concept is similar to the above code for a string. Is there a way to do this with a list?

You can read the entire file as this:
content = []
with open('/path/to/file.txt', 'r') as file
content = file.readlines()
#Maybe you want to strip the lines
#content = [line.strip() for line in file.readlines()]
Here you have a list with all the lines of the file
def count_consecutive_lines(lines):
counter = 1
output = ''
for index in range(1, len(lines)):
if lines[index] != lines[index-1]:
output += '{} repeats {} times.\n'.format(lines[index], counter)
counter = 1
counter += 1
return output
And call this like
print(count_consecutive_lines(content))

An answer that doesn't load the whole file into memory:
last = None
count = 0
result = []
with open('sample.txt', 'rb') as f:
for line in f:
line = line.strip()
if line == last:
count = count + 1
else:
if count > 0:
result.append(count)
count = 1
last = line
result.append(count)
print result
Result:
[3, 4, 5]
UPDATE
The list contains integers, you can only join strings, so you will have to convert it.
outFile.write('\n'.join(str(n) for n in result))

You can try to convert the file data into a list and follow the approach given below:
with open("./sample.txt", 'r') as fl:
fl_list = list(fl)
unique_data = set(fl_list)
for unique in unique_data:
print "%s - count: %s" %(unique, fl_list.count(unique))
#output:
ANS - count: 8
AUT - count: 4

Open your file and read it to count:
l=[]
last=''
with open('data.txt', 'r') as f:
data = f.readlines()
for line in data:
words = line.split()
if words[0]==last:
l[-1]=l[-1]+1
last=words[0]
else:
l.append(1)
if last=='':
last=words[0]

Here is your expected output :)
with open("./sample.txt", 'r') as fl:
word = list(fl)
count=1
length=[]
for i in range(1, len(word)):
if word[i-1] == word[i]:
count += 1
else:
length.append(count)
count=1
length.append(count)
print (length)
#output as you excpect:
[3, 4, 5]

Related

Programming a sum function in python?

how to sum numbers attached to words in a text file(not separate them into digits) in python? (example: "a23 B55" - answer = 78)
thats what i did but its not quite right:
def rixum(file_name):
f = open(file_name,'r')
line = f.readline()
temp = line.split()
res = []
for word in temp:
i = 0
while i < len(word)-1:
if word[i].isdigit():
res.append(int(word[i:]))
print(sum(res))
f.close()
return sum(res)
This worked for me:
import re
string = 'F43 n90 i625'
def summ_numbers(string):
return sum([int(num) for num in re.findall('\d+', string)])
print(summ_numbers(string))
Output:
758
You don't really need to build a list - you can simply accumulate the values as you go along (line by line):
def rixum(filename):
with open(filename) as data:
for line in data:
total = 0
for token in line.split():
for i, c in enumerate(token):
if c.isdigit():
total += int(token[i:])
break
print(total)

I have 3 lines in a `.txt` file each line containing 3 numbers and I want to `+1` to them numbers depending on the user input

So basically I have a file with 3 lines and each line has 3 numbers
7,2,1
10,0,0
2,8,0
Then depending on the user input I want to +1 to one of the numbers on the line.
if user_input == 1
+1 to line1Number1
elif user_input == 2
+1 to line1Number2
elif user_input == 3
+1 to line1Number3
else
print"error"
You can do something like this:
In [1634]: user_input = int(input())
In [1627]: with open('t.txt', 'r') as f:
...: lines = f.readlines()
...: for c,l in enumerate(lines):
...: if c == user_input:
...: lst = l.split(',')
...: lst = [int(x) + 1 for x in lst]
...: print(lst)
[3, 9, 1]
with open(filename, "r") as txtr:
data = txtr.readlines()
data = [x.split(",") for x in data]
for i in range(len(data)):
for j in range(len(data[i])):
data[i][j] = int(data[i][j])
data now has 3 lists with 3 numbers each.
if user_input == 1
data[0] = [x+1 for x in data[0]]
just do the same for the rest.
to save to text file:
ndata = [",".join(x) for x in data]
nndata = "\n".join(ndata)
with open(filename, "w") as txtw:
txtw.write(nndata)
Another way of doing this (explanation in comment). Reads from in.txt, write to out.txt.
# ask user input for which column to update
update_column = int(input("Column to update 1,2 or 3?"))
# open file to read from
with open("in.txt", "r") as f:
# for every line in the text file
for line in f.readlines():
# make the numbers into an integer list (remove new line, split by comma, and convert to int)
new_line = list(map(int, line.strip().split(",")))
# add one to the number in the column input
new_line[update_column -1] +=1
# open a file for a pending
with open("out.txt", "a") as of:
# append the list removing brackets and adding a new line
of.write(str(new_line).replace("[","").replace("]","") + "\n")

appending specific words to list from file in python

I am writing a program that reads from a file of 50,000 words and it needs to get the percentage of words that do not have the letter 'e' in them. I can get the program to print all the words without e's but I want to append them to a list so that I can get the sum of the elements within the list. What I have now gives me the result of 0 every time I run it. It also produces the total amount of lines which is correct. Sorry, I am not the best in python.
f=open("hardwords.txt")
def has_no_e(f):
words = []
sum_words= len(words)
total = sum(1 for s in f)
print total
print sum_words
letter = 'e'
for line in f:
for l in letter:
if l in line:
break
else:
words.append(line)
has_no_e(f)
You don't need to collect the words, just count them.
Untested:
total = 0
without_e = 0
with open("hardwords.txt") as f:
for line in f:
total = total + 1
if not 'e' in line:
without_e = without_e + 1
percentage = float(without_e) / float(total)
What about this:
def has_no_e():
with open(path, "r") as f:
words = [word.strip() for line in f.readlines() for word in line.strip().split(',')]
words_without_e = [word for word in words if 'e' not in word]
print len(words), words
print len(words_without_e), words_without_e
has_no_e()
Now you just need to calculate the percentage
This does just so:
def has_no_e(path):
total_words = 0
words_without_e = 0
with open(path, "r") as f:
for line in f:
words = line.lower().split()
total_words += len(words)
words_without_e += sum("e" not in w for w in words)
return (float(words_without_e)/total_words)*100
This a possible way to do it:
with open('G:\Tmp\demo.txt', 'r') as f:
total = 0
count = 0
for line in f:
words = line.split()
total = total + len(words)
count = count + len([w for w in words if w.find('e') > 0])
print 'Total word:{0}, counted:{1}'.format(total, count)

Count lines after line with specific character

I have a file which contains this data:
>P136
FCF#0.73
FCF#0.66
FCF#0.86
>P129
FCF#0.72
>P142
>P144
>P134
FCF#0.70
FCF#0.82
And I need to count the number of lines after a line containing ">" , but keeping the ">" line as reference, for this example the output should be:
>P136 3
>P129 1
>P134 2
Any ideas?
Use dictionary to store the count per line, and every time there is no > at the start, increment the count:
counts = {}
current = None
with open(filename) as fo:
for line in fo:
if line.startswith('>'):
current = line.strip()
counts[current] = 0
else:
counts[current] += 1
then simply loop and print the counts:
for entry, count in counts.items():
print('{} {:2d}'.format(entry, count))
You could even just print the number every time you find a new section:
count = 0
current = None
with open(filename) as fo:
for line in fo:
if line.startswith('>'):
if current and count:
print('{} {:2d}'.format(entry, count))
current = line.strip()
counts = 0
else:
count += 1
if current and count:
print('{} {:2d}'.format(entry, count))
but you cannot then easily re-purpose the counts for other work.
In one line, just to show that we can:
s=""">P136
FCF#0.73
FCF#0.66
FCF#0.86
>P129
FCF#0.72
>P142
>P144
>P134
FCF#0.70
FCF#0.82
"""
First variant:
print [(i.split("\n")[0],len(i.split("\n")[1:])-1) for i in s.split(">")if i if len(i.split("\n")[1:])-1>0]
using re:
import re
print [ (block.split("\n")[0],sum(1 for m in re.finditer("#", block)))for block in s.split(">")]
This is a simple solution that attempts to be minimalistic.
with open(filename) as f:
def printcc(current, count):
if current is not None and count > 0:
print(current.strip(), count)
current = None
count = 0
for line in f:
if line[0] == '>':
printcc(current, count)
current = line
count = 0
else:
count += 1
printcc(current, count)
In case you actually want all lines that contain a > character, use '>' in line as your condition. If you're targeting Python 2.x, use print current.strip(), count because having the outer parentheses will print a two-tuple.

How to add specific lines from a file into List in Python?

I have an input file:
3
PPP
TTT
QPQ
TQT
QTT
PQP
QQQ
TXT
PRP
I want to read this file and group these cases into proper boards.
To read the Count (no. of boards) i have code:
board = []
count =''
def readcount():
fp = open("input.txt")
for i, line in enumerate(fp):
if i == 0:
count = int(line)
break
fp.close()
But i don't have any idea of how to parse these blocks into List:
TQT
QTT
PQP
I tried using
def readboard():
fp = open('input.txt')
for c in (1, count): # To Run loop to total no. of boards available
for k in (c+1, c+3): #To group the boards into board[]
board[c].append(fp.readlines)
But its wrong way. I know basics of List but here i am not able to parse the file.
These boards are in line 2 to 4, 6 to 8 and so on. How to get them into Lists?
I want to parse these into Count and Boards so that i can process them further?
Please suggest
I don't know if I understand your desired outcome. I think you want a list of lists.
Assuming that you want boards to be:
[[data,data,data],[data,data,data],[data,data,data]], then you would need to define how to parse your input file... specifically:
line 1 is the count number
data is entered per line
boards are separated by white space.
If that is the case, this should parse your files correctly:
board = []
count = 0
currentBoard = 0
fp = open('input.txt')
for i,line in enumerate(fp.readlines()):
if i == 0:
count = int(i)
board.append([])
else:
if len(line[:-1]) == 0:
currentBoard += 1
board.append([])
else: #this has board data
board[currentBoard].append(line[:-1])
fp.close()
import pprint
pprint.pprint(board)
If my assumptions are wrong, then this can be modified to accomodate.
Personally, I would use a dictionary (or ordered dict) and get the count from len(boards):
from collections import OrderedDict
currentBoard = 0
board = {}
board[currentBoard] = []
fp = open('input.txt')
lines = fp.readlines()
fp.close()
for line in lines[1:]:
if len(line[:-1]) == 0:
currentBoard += 1
board[currentBoard] = []
else:
board[currentBoard].append(line[:-1])
count = len(board)
print(count)
import pprint
pprint.pprint(board)
If you just want to take specific line numbers and put them into a list:
line_nums = [3, 4, 5, 1]
fp = open('input.txt')
[line if i in line_nums for i, line in enumerate(fp)]
fp.close()

Categories

Resources