Difficulty Iterating through Dictionary to Search .txt for Values in Dictionary - python

The goal for my code is to convert an excel spreadsheet into a Dictionary + use that Dictionary to search a .txt file for a string + print out counts for the # of times each string was used in text. What I'm having trouble with is iterating through the Dictionary and getting counts for all Dictionary values.
I've tried to enumerate and iterate through the values using for loops, but I still end up getting counts for "Carla" only instead of getting counts for all Dictionary items.
Dict = {}
for row in range(1, dictionary.max_row+1):
for col in range(1, 2):
cell_value = dictionary.cell(row=row, column=col).value
Dict[cell_value] = dictionary.cell(row=row, column=1).value
def searchtxt():
count = 0
with open('26440384.txt', 'r') as f:
for key in Dict.values():
print(key)
for line in f:
count += line.count(str(key))
print(count)
count = 0
searchtxt()
RETURNS:
Carla
6
God
radiation
I get the code to print out all items of the dictionary, but it only counts the number of times "Carla" is present in the text. I want the code to return this:
Carla
6
God
4
radiation
3
s/p Klaas' Edits:
def searchtxt():
count = 0
with open('26440384.txt', 'r') as f:
for key in Dict.values():
print(key)
lineList = [line.rstrip('\n') for line in open('26440384.txt', 'r')]
for key in lineList:
count += lineList.count(str(key))
print(count)
count = 0
searchtxt()
RETURNS:
Carla
1
God
1
radiation
1
SOLUTION:
def searchtxt():
count = 0
with open('26440384.txt', 'r') as f:
for key in Dict.values():
print(key)
for line in f:
count += line.count(str(key))
print(count)
count = 0
f.seek(0)
searchtxt()

the problem is that you're reading the file once and then your pointer is at the end of the file, so next time you come to the section
for line in f:
count += line.count(str(key))
print(count)
count = 0
there are no more lines in the file to read as you're already at the end.
If the file isn't too big (or you're not worried about memory) I would read the file into a list first and then loop through that list
lineList = [line. rstrip('\n') for line in open(fileName)]
So rather than for line in f you would go for line in lineList: etc

Related

How to find all instances of list values(ex: [1,2,3]) in a file at a specific index

I want to find out a list of elements in a file at a specific index.
For ex, below are the contents of the file "temp.txt"
line_0 1
line_1 2
line_2 3
line_3 4
line_4 1
line_5 1
line_6 2
line_7 1
line_8 2
line_9 3
line_10 4
Now, I need to find out the list of values [1,2,3] occurring in sequence at column 2 of each line in above file.
Output should look like below:
line_2 3
line_9 3
I have tried the below logic, but it some how not working ;(
inf = open("temp.txt", "rt")
count = 0
pos = 0
ListSeq = ["1","2","3"]
for line_no, line in enumerate(inf):
arr = line.split()
if len(arr) > 1:
if count == 1 :
pos = line_no
if ListSeq[count] == arr[1] :
count += 1
elif count > 0 :
inf.seek(pos)
line_no = pos
count = 0
else :
count = 0
if count >= 3 :
print(line)
count = 0
Can somebody help me in finding the issue with above code? or even a different logic which will give a correct output is also fine.
Your code is flawed. Most prominent bug: trying to seek in a text file using line number is never going to work: you have to use byte offset for that. Even if you did that, it would be wrong because you're iterating on the lines, so you shouldn't attempt to change file pointer while doing that.
My approach:
The idea is to "transpose" your file to work with vertical vectors, find the sequence in the 2nd vertical vector, and use the found index to extract data on the first vertical vector.
split lines to get text & number, zip the results to get 2 vectors: 1 of numbers 1 of text.
At this point, one list contains ["line_0","line_1",...] and the other one contains ["1","2","3","4",...]
Find the indexes of the sequence in the number list, and print the couple txt/number when found.
code:
with open("text.txt") as f:
sequence = ('1','2','3')
txt,nums = list(zip(*(l.split()[:2] for l in f))) # [:2] in case there are more columns
for i in range(len(nums)-len(sequence)+1):
if nums[i:i+len(sequence)]==sequence:
print("{} {}".format(txt[i+2],nums[i+2]))
result:
line_2 3
line_9 3
last for loop can be replaced by a list comprehension to generate the tuples:
result = [(txt[i+2],nums[i+2]) for i in range(len(nums)-len(sequence)) if nums[i:i+len(sequence)]==sequence ]
result:
[('line_2', '3'), ('line_9', '3')]
Generalizing for any sequence and any column.
sequence = ['1','2','3']
col = 1
with open(filename, 'r') as infile:
idx = 0
for _i, line in enumerate(infile):
if line.strip().split()[col] == sequence[idx]:
if idx == len(sequence)-1:
print(line)
idx = 0
else:
idx += 1
else:
idx = 0

Count consecutive occurrences of values in a .txt file

I have a .txt file that has two words repeating in separate lines.
Here is an example. (the actual one is about 80,000 lines long)
ANS
ANS
ANS
AUT
AUT
AUT
AUT
ANS
ANS
ANS
ANS
ANS
I am trying to develop some Python code to count the consecutive lines and return the number of times they repeat. So for this example I would like to return [3,4,5] to another .txt file
word="100011010"
count=1
length=""
for i in range(1, len(word)):
if word[i-1] == word[i]:
count += 1
else:
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
print (length)
The concept is similar to the above code for a string. Is there a way to do this with a list?
You can read the entire file as this:
content = []
with open('/path/to/file.txt', 'r') as file
content = file.readlines()
#Maybe you want to strip the lines
#content = [line.strip() for line in file.readlines()]
Here you have a list with all the lines of the file
def count_consecutive_lines(lines):
counter = 1
output = ''
for index in range(1, len(lines)):
if lines[index] != lines[index-1]:
output += '{} repeats {} times.\n'.format(lines[index], counter)
counter = 1
counter += 1
return output
And call this like
print(count_consecutive_lines(content))
An answer that doesn't load the whole file into memory:
last = None
count = 0
result = []
with open('sample.txt', 'rb') as f:
for line in f:
line = line.strip()
if line == last:
count = count + 1
else:
if count > 0:
result.append(count)
count = 1
last = line
result.append(count)
print result
Result:
[3, 4, 5]
UPDATE
The list contains integers, you can only join strings, so you will have to convert it.
outFile.write('\n'.join(str(n) for n in result))
You can try to convert the file data into a list and follow the approach given below:
with open("./sample.txt", 'r') as fl:
fl_list = list(fl)
unique_data = set(fl_list)
for unique in unique_data:
print "%s - count: %s" %(unique, fl_list.count(unique))
#output:
ANS - count: 8
AUT - count: 4
Open your file and read it to count:
l=[]
last=''
with open('data.txt', 'r') as f:
data = f.readlines()
for line in data:
words = line.split()
if words[0]==last:
l[-1]=l[-1]+1
last=words[0]
else:
l.append(1)
if last=='':
last=words[0]
Here is your expected output :)
with open("./sample.txt", 'r') as fl:
word = list(fl)
count=1
length=[]
for i in range(1, len(word)):
if word[i-1] == word[i]:
count += 1
else:
length.append(count)
count=1
length.append(count)
print (length)
#output as you excpect:
[3, 4, 5]

Troubling shooting after combining two python scripts

Here is my first code. Using this code I extracted a list of (6800) random elements and saved my results as a text file. (The file that this code is reading from has over 10,000 lines so every time I run it, I get a new set of random elements).
import random
with open('filename.txt') as fin:
lines = fin.readlines()
random.shuffle(lines)
for i, line in enumerate(lines):
if i >= 0 and i < 6800:
print(line, end='')
Here is my second code. Using that saved text file from my previous step, I then use this code to compare the file to another text file. My results are as you can see, is the count; which always varies, e.g 2390 or 4325 etc..
import csv
with open ("SavedTextFile_RandomElements.txt") as f:
dict1 = {}
r = csv.reader(f,delimiter="\t")
for row in r:
a, b, v = row
dict1.setdefault((a,b),[]).append(v)
#for key in dict1:
#print(key[0])
#print(key[1])
#print(d[key][0]])
with open ("filename2.txt") as f:
dict2 = {}
r = csv.reader(f,delimiter="\t")
for row in r:
a, b, v = row
dict2.setdefault((a,b),[]).append(v)
#for key in dict2:
#print(key[0])
count = 0
for key1 in dict1:
for key2 in dict2:
if (key1[0] == key2[0]) and abs((float(key1[1].split(" ")[0])) - (float(key2[1].split(" ")[0]))) < 10000:
count += 1
print(count)
I decided to combine the two, because I want to skip the extracting and saving process and just have the first code run straight into the second having the random elements read automatically. Here is the combined two:
import csv
import random
with open('filename.txt') as fin:
lines = fin.readlines()
random.shuffle(lines)
str_o = " "
for i, line in enumerate(lines):
if i >= 0 and i < 6800:
str_o += line
r = str_o
dict1 = {}
r = csv.reader(fin,delimiter="\t")
for row in r:
a, b, v = row
dict1.setdefault((a,b),[]).append(v)
with open ("filename2.txt") as f:
dict2 = {}
r = csv.reader(f,delimiter="\t")
for row in r:
a, b, v = row
dict2.setdefault((a,b),[]).append(v)
count = 0
for key1 in dict1:
for key2 in dict2:
if (key1[0] == key2[0]) and abs((float(key1[1].split(" ")[0])) - (float(key2[1].split(" ")[0]))) < 1000:
count += 1
print(count)
However, now when I run the code. I always get a count of 0. Even if I change (less than one thousand):
< 1000:
to for example (less than ten thousand):
< 10000:
I am only receiving a count of zero. And I should only receive a count of zero when I write of course less than zero:
< 0:
But no matter what number I put in, I always get zero. I went wrong somewhere. Can you guys help me figure out where that was? I am happy to clarify anything.
[EDIT]
Both of my files are in the following format:
1 10045 0.120559958
1 157465 0.590642951
1 222471 0.947959795
1 222473 0.083341617
1 222541 0.054014337
1 222588 0.060296547
You wanted to combine the two codes, so you really don't need to read from SavedTextFile_RandomElements.txt, right?
In that case you need to read from somewhere, and I think you intended to store those in variable 'r'. But you overwrote 'r' using this:
r = csv.reader(fin,delimiter="\t")
BTW, was there a typo there 2 lines above that line? You didn't have any file open statement for 'fin'. The above combined code must have not been able to run properly (exception thrown).
To fix, simply remove the csv.reader line, like so (and reduce indentation starting dict1={}
r = str_o
dict1 = {}
for row in r:
a, b, v = row.split()
dict1.setdefault((a,b),[]).append(v)
EDIT: another issue, causing ValueError exception
I missed this earlier. You are concatenating the whole file contents into a single string, but you later on loops over r to read each line and break it into a,b,v.
The error to unpack comes from this loop because you are looping over a single string, meaning you are getting each character, instead of each line, per loop.
To fix this, you just need a single list 'r', no need for string:
r = []
for i, line in enumerate(lines):
if i >= 0 and i < 6800:
r.append(line)
dict1 = {}
for row in r:
a, b, v = row.split()
dict1.setdefault((a,b),[]).append(v)
EDIT: Reading 'row' or line into variables
Since input file is split line by line into string separated by whitespaces, you need to split the 'row' var:
a, b, v = row.split()
Can't tell where exactly you went wrong. In your combined code:
You create str_o and assign it to r but you never use it
A few lines later you assign a csv.reader to r - it is hard to tell from your indentation whether this is still within the with block.
You want to be doing something like this (I didn't use the csv module):
import collections, random
d1 = collections.defaultdict(list)
d2 = collections.defaultdict(list)
with open('filename.txt') as fin:
lines = fin.readlines()
lines = random.sample(lines, 6800)
for line in lines:
line = line.strip()
try:
a, b, v = line.split('\t')
d1[(a,b)].append(v)
except ValueError as e:
print 'Error:' + line
with open ("filename2.txt") as f:
for line in f:
line = line.strip()
try:
a, b, v = line.split('\t')
d2[(a,b)].append(v)
except ValueError as e:
print 'Error:' + line

Count lines after line with specific character

I have a file which contains this data:
>P136
FCF#0.73
FCF#0.66
FCF#0.86
>P129
FCF#0.72
>P142
>P144
>P134
FCF#0.70
FCF#0.82
And I need to count the number of lines after a line containing ">" , but keeping the ">" line as reference, for this example the output should be:
>P136 3
>P129 1
>P134 2
Any ideas?
Use dictionary to store the count per line, and every time there is no > at the start, increment the count:
counts = {}
current = None
with open(filename) as fo:
for line in fo:
if line.startswith('>'):
current = line.strip()
counts[current] = 0
else:
counts[current] += 1
then simply loop and print the counts:
for entry, count in counts.items():
print('{} {:2d}'.format(entry, count))
You could even just print the number every time you find a new section:
count = 0
current = None
with open(filename) as fo:
for line in fo:
if line.startswith('>'):
if current and count:
print('{} {:2d}'.format(entry, count))
current = line.strip()
counts = 0
else:
count += 1
if current and count:
print('{} {:2d}'.format(entry, count))
but you cannot then easily re-purpose the counts for other work.
In one line, just to show that we can:
s=""">P136
FCF#0.73
FCF#0.66
FCF#0.86
>P129
FCF#0.72
>P142
>P144
>P134
FCF#0.70
FCF#0.82
"""
First variant:
print [(i.split("\n")[0],len(i.split("\n")[1:])-1) for i in s.split(">")if i if len(i.split("\n")[1:])-1>0]
using re:
import re
print [ (block.split("\n")[0],sum(1 for m in re.finditer("#", block)))for block in s.split(">")]
This is a simple solution that attempts to be minimalistic.
with open(filename) as f:
def printcc(current, count):
if current is not None and count > 0:
print(current.strip(), count)
current = None
count = 0
for line in f:
if line[0] == '>':
printcc(current, count)
current = line
count = 0
else:
count += 1
printcc(current, count)
In case you actually want all lines that contain a > character, use '>' in line as your condition. If you're targeting Python 2.x, use print current.strip(), count because having the outer parentheses will print a two-tuple.

How to add specific lines from a file into List in Python?

I have an input file:
3
PPP
TTT
QPQ
TQT
QTT
PQP
QQQ
TXT
PRP
I want to read this file and group these cases into proper boards.
To read the Count (no. of boards) i have code:
board = []
count =''
def readcount():
fp = open("input.txt")
for i, line in enumerate(fp):
if i == 0:
count = int(line)
break
fp.close()
But i don't have any idea of how to parse these blocks into List:
TQT
QTT
PQP
I tried using
def readboard():
fp = open('input.txt')
for c in (1, count): # To Run loop to total no. of boards available
for k in (c+1, c+3): #To group the boards into board[]
board[c].append(fp.readlines)
But its wrong way. I know basics of List but here i am not able to parse the file.
These boards are in line 2 to 4, 6 to 8 and so on. How to get them into Lists?
I want to parse these into Count and Boards so that i can process them further?
Please suggest
I don't know if I understand your desired outcome. I think you want a list of lists.
Assuming that you want boards to be:
[[data,data,data],[data,data,data],[data,data,data]], then you would need to define how to parse your input file... specifically:
line 1 is the count number
data is entered per line
boards are separated by white space.
If that is the case, this should parse your files correctly:
board = []
count = 0
currentBoard = 0
fp = open('input.txt')
for i,line in enumerate(fp.readlines()):
if i == 0:
count = int(i)
board.append([])
else:
if len(line[:-1]) == 0:
currentBoard += 1
board.append([])
else: #this has board data
board[currentBoard].append(line[:-1])
fp.close()
import pprint
pprint.pprint(board)
If my assumptions are wrong, then this can be modified to accomodate.
Personally, I would use a dictionary (or ordered dict) and get the count from len(boards):
from collections import OrderedDict
currentBoard = 0
board = {}
board[currentBoard] = []
fp = open('input.txt')
lines = fp.readlines()
fp.close()
for line in lines[1:]:
if len(line[:-1]) == 0:
currentBoard += 1
board[currentBoard] = []
else:
board[currentBoard].append(line[:-1])
count = len(board)
print(count)
import pprint
pprint.pprint(board)
If you just want to take specific line numbers and put them into a list:
line_nums = [3, 4, 5, 1]
fp = open('input.txt')
[line if i in line_nums for i, line in enumerate(fp)]
fp.close()

Categories

Resources