I'm trying to count the different characters in two individual strings using an if-else statement in a for-loop. However, it never counts the different characters.
for char in range(len(f1CurrentLine)): # Compare line by line
if f1CurrentLine[char] != f2CurrentLine[char]: # If the lines have different characters
print("Unmatched characters: ", count, ":", char)
diffCharCount = diffCharCount + 1 # add 1 to the difference counter
count = count + 1
text1Count = text1Count + len(f1CurrentLine)
text2Count = text2Count + len(f2CurrentLine)
return CharByChar(count=count, text2Count=text2Count, text1Count=text1Count,
diffCharCount=diffCharCount) # return difference count
else:
print("Characters matched in line:", count, ". Moving to next line.")
text1Count = text1Count + len(f1CurrentLine)
text2Count = text2Count + len(f2CurrentLine)
count = count + 1
return CharByChar(count, diffCharCount=diffCharCount, text1Count=text1Count,
text2Count=text2Count,
diffLineCount=diffLineCount)
I have two files with the following in them
File 1:
1 Hello World
2 bazzle
3 foobar
File 2:
1 Hello world
2 bazzle
3 fooBar
It should return 2 different characters, but it does not. If you want to take a look at the entire function I have linked it here: Pastebin. Hopefully you can see something I have missed.
Your code is too complicated for this sort of application. I've tried my best to understand the code and I've come up with a better solution.
text1 = open("file1.txt")
text2 = open("file2.txt")
# Difference variables
diffLineCount = diffCharCount = line_num = 0
# Iterate through both files line by line
for line1, line2 in zip(text1.readlines(), text2.readlines()):
if line1 == "\n" or line2 == "\n": continue # If newline, go to next line
if len(line1) != len(line2): # If lines are of different length
diffLineCount += 1
continue # Go to next line
for c1, c2 in zip(list(line1.strip()), list(line2.strip())): # Iterate through both lines character by character
if c1 != c2: # If they do not match
print("Unmatched characters: ", line_num, ":", c1)
diffCharCount += 1
line_num += 1
# Goes back to the beginning of each file
text1.seek(0)
text2.seek(0)
# Prints the stats
print("Number of characters in the first file: ", len(text1.read()))
print("number of characters in the second file: ", len(text2.read()))
print("Number of characters that do not match in lines of the same length: ", diffCharCount)
print("Number of lines that are not the same length: ", diffLineCount)
# Closes the files
text1.close()
text2.close()
I hope you understand how this works and are able to make it fit your needs specifically. Good luck!
Unlike the other solution I edited your code, so that you can understand what was going wrong. I agree with him that you should anyway organize better your code because it is complex
text1 = open("file1.txt")
text2 = open("file2.txt")
def CharByChar(count, diffCharCount, text1Count, text2Count, diffLineCount):
"""
This function compares two files character by character and prints the number of characters that are different
:param count: What line of the file the program is comparing
:param diffCharCount: The sum of different characters
:param text1Count: Sum of characters in file 1
:param text2Count: Sum of characters in file 2
:param diffLineCount: Sum of different lines
"""
# see comment below for strip removal
f1CurrentLine = text1.readline()
f2CurrentLine = text2.readline()
while f1CurrentLine != '' or f2CurrentLine != '':
count = count + 1
print(f1CurrentLine)
print(f2CurrentLine)
#if f1CurrentLine != '' or f2CurrentLine != '':
if len(f1CurrentLine) != len(f2CurrentLine): # If the line lengths are not equal return the line number
print("Lines are a different length. The line number is: ", count)
diffLineCount = diffLineCount + 1
count = count + 1
#text1Count = text1Count + len(f1CurrentLine)
#text2Count = text2Count + len(f2CurrentLine)
# return CharByChar(count)
elif len(f1CurrentLine) == len(f2CurrentLine): # If the lines lengths are equal
for char in range(len(f1CurrentLine)): # Compare line by line
print(char)
if f1CurrentLine[char] != f2CurrentLine[char]: # If the lines have different characters
print("Unmatched characters: ", count, ":", char)
diffCharCount = diffCharCount + 1 # add 1 to the difference counter
#count = count + 1
text1Count = text1Count + len(f1CurrentLine)
text2Count = text2Count + len(f2CurrentLine)
# return CharByChar(count=count, text2Count=text2Count, text1Count=text1Count,diffCharCount=diffCharCount) # return difference count
else:
print("Characters matched in line:", count, ". Moving to next char.")
#text1Count = text1Count + len(f1CurrentLine)
#text2Count = text2Count + len(f2CurrentLine)
#count = count + 1
#return CharByChar(count, diffCharCount=diffCharCount, text1Count=text1Count,text2Count=text2Count,diffLineCount=diffLineCount)
#elif len(f1CurrentLine) == 0 or len(f2CurrentLine) == 0:
#print(count, "lines are not matching")
#diffLineCount = diffLineCount + 1
#return CharByChar(diffLineCount=diffLineCount)
else:
print("Something else happened!")
f1CurrentLine = text1.readline()
f2CurrentLine = text2.readline()
print("Number of characters in the first file: ", text1Count)
print("number of characters in the second file: ", text2Count)
print("Number of characters that do not match in lines of the same length: ", diffCharCount)
print("Number of lines that are not the same length: ", diffLineCount)
def main():
"Calls the primary function"
CharByChar(count=0, diffCharCount=0, text1Count=0, text2Count=0, diffLineCount=0)
input("Hit enter to close the program...")
main() #Runs this bad boi
I think the general trouble is organizing your CharByChar() function to scan all the lines in the file [which is something we maintain in this solution] but then asking to call the same function at then end of every character check
some parts have no reasons to be there: for example you set count in the main when calling CharByChar() and then you create a branch with if(count == 0). You can cut this out, the code will look cleaner
some variables as well should be removed to keep the code as clean as possible: you never use text1Count and text2Count
you enter with a condition on the while and the next if has the same condition: if you entered the while you will enter also the if [or none of them] so you can cut one of them out
I suggest you to remove the branch with if len(f1CurrentLine) == 0 or len(f2CurrentLine) == 0 because both the files can have length 0 for the same line and then the lines would be equal [see the very next example below]
I suggest you to remove the strip() to avoid troubles to interrupt the check earlier for files where you have newlines in the middle, e.g.
1 Hello
3 foobar
Related
I have a file in which I have to count the number of words in each line, but there is a trick, whatever comes in between ' ' or " ", should be counted as a single word.
Example file:
TopLevel
DISPLAY "In TopLevel. Starting to run program"
PERFORM OneLevelDown
DISPLAY "Back in TopLevel."
STOP RUN.
For the above file the count of words in each line has to be as below:
Line: 1 has: 1 words
Line: 2 has: 2 words
Line: 3 has: 2 words
Line: 4 has: 2 words
Line: 5 has: 2 words
But I am getting as below:
Line: 1 has: 1 words
Line: 2 has: 7 words
Line: 3 has: 2 words
Line: 4 has: 4 words
Line: 5 has: 2 words
from os import listdir
from os.path import isfile, join
srch_dir = r'C:\Users\sagrawal\Desktop\File'
onlyfiles = [srch_dir+'\\'+f for f in listdir(srch_dir) if isfile(join(srch_dir, f))]
for i in onlyfiles:
index = 0
with open(i,mode='r') as file:
lst = file.readlines()
for line in lst:
cnt = 0
index += 1
linewrds=line.split()
for lwrd in linewrds:
if lwrd:
cnt = cnt +1
print('Line:',index,'has:',cnt,' words')
If you only have this simple format (no nested quotes or escaped quotes), you could use a simple regex:
lines = '''TopLevel
DISPLAY "In TopLevel. Starting to run program"
PERFORM OneLevelDown
DISPLAY "Back in TopLevel."
STOP RUN.'''.split('\n')
import re
counts = [len(re.findall(r'\'.*?\'|".*?"|\S+', l))
for l in lines]
# [1, 2, 2, 2, 2]
If not, you have to write a parser
If you are looking for a not regex solution, this is my method for you:
# A simple function that will simply count words in each line
def count_words(line):
# Check the next function
line = manage_quotes(line)
words = line.strip()
# In case of several spaces in a row, We need to filter empty words
words = [word for word in words if len(word) > 0]
return len(words)
# This method will manage the quotes
def manage_quotes(line):
# We do not mind the escaped quotes, They are like a simple char
# Also since the changes will be local we can replace words in line
line = line.replace("\\\"", "Q").replace("\\\'", "q")
# As all words between 2 quotes act as one word we can replace them with 1 simple word and we start with `"`
# This loop will help to find all quotes in one line
while True:
i1 = line.find("\"")
if (i1 == -1): # No `"` anymore
break
i2 = line[i1+1:].find("\"") # Search after the previous one
if (i2 == -1): # What shall we do with not paired quotes???
# raise Exception()
break
line = line[:i1-1] + "QUOTE" + line[i2:]
# Now search for `'`
while True:
i1 = line.find("\'")
if (i1 == -1): # No `'` anymore
break
i2 = line[i1+1:].find("\'") # Search after the previous one
if (i2 == -1): # What shall we do with not paired quotes???
# raise Exception()
break
line = line[:i1-1] + "quote" + line[i2:]
return line
This is how this method works, For example, You have a line like this DISPLAY "Part One \'Test1\'" AND 'Part Two \"Test2\"'
At first, we remove escaped quotes:
DISPLAY "Part One qTest1q" AND 'Part Two QTest2Q'
Then we replace double quotations:
DISPLAY QUOTE AND 'Part Two QTest2Q'
Then the other one:
DISPLAY QUOTE AND quote
And now we count this which is 4
You can solve this without regex if you keep some marker if you are inside a quoted area or not.
str.split() - splitts at spaces, returns a list
str.startswith()
str.endswith() - takes a (tuple of) string(s) and returns True if it starts/ends with (any of) it
Code:
# create input file
name = "file.txt"
with open(name, "w") as f:
f.write("""TopLevel
DISPLAY "In TopLevel. Starting to run program"
PERFORM OneLevelDown
DISPLAY "Back in TopLevel."
STOP RUN.""")
# for testing later
expected = [(1,1),(2,2),(3,2),(4,2),(5,2)] # 1 base line/word count
# program that counts words
counted = []
with open(name) as f:
for line_nr, content in enumerate(f,1): # 1 based line count
splt = content.split()
in_quotation = []
line_count = 0
for word in splt:
if not in_quotation:
line_count += 1 # only increments if list empty
if word.startswith(("'",'"')):
in_quotation.append(word[0])
if word.endswith(("'","'")):
in_quotation.pop()
counted.append((line_nr, line_count))
print(expected)
print(counted)
print("Identical: ", all(a == expected[i] for i,a in enumerate(counted)))
Output:
[(1, 1), (2, 2), (3, 2), (4, 2), (5, 2)]
[(1, 1), (2, 2), (3, 2), (4, 2), (5, 2)]
Identical: True
You can tinker with the code - currently it does not well behave if you space out your " - it does not know if something ends or starts and both tests are True.
It seems that the code attached above doesn't care about ' or ".
And here is the definition of str.split in Python here.
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].
Code:
input_str = '''TopLevel
DISPLAY "In TopLevel. Starting to run program"
PERFORM OneLevelDown
DISPLAY "Back in TopLevel."
STOP RUN.'''
input_str_list = input_str.split('\n')
print(input_str_list)
def get_trick_word(s: str):
num = 0
quotation_list = []
previous_char = ' '
for c in s:
need_set_previous = False
if c == ' ' and len(quotation_list) == 0 and previous_char != ' ':
num = num + 1
else:
has_quato = len(quotation_list)
if c == '\'' or c == '"':
if len(quotation_list) != 0 and quotation_list[-1] == c:
quotation_list.pop()
else:
quotation_list.append(c)
if has_quato and len(quotation_list) == 0:
num = num + 1
need_set_previous = True
previous_char = c
if need_set_previous:
previous_char = ' '
if previous_char != ' ' and len(quotation_list) == 0:
num = num + 1
return num
result = [get_trick_word(s) for s in input_str_list]
print(result)
And the result is:
# ['TopLevel ', ' DISPLAY "In TopLevel. Starting to run program" ', ' PERFORM OneLevelDown ', ' DISPLAY "Back in TopLevel." ', ' STOP RUN.']
# [1, 2, 2, 2, 2]
For example: string = aaaacccc, then I need the output to be 4a4c. Is there a way to do this without using any advanced methods, such as libraries or functions?
Also, if someone knows how to do the reverse: turning "4a4c: into aaaacccc, that would be great to know.
This will do the work in one iteration
Keep two temp variable one for current character, another for count of that character and one variable for the result.
Just iterate through the string and keep increasing the count if it matches with the previous one.
If it doesn't then update the result with count and value of character and update the character and count.
At last add the last character and the count to the result. Done!
input_str = "aaaacccc"
if input_str.isalpha():
current_str = input_str[0]
count = 0
final_string = ""
for i in input_str:
if i==current_str:
count+=1
else:
final_string+=str(count)+current_str
current_str = i
count = 1
final_string+=str(count)+current_str
print (final_string)
Another solution and I included even a patchwork reverse operation like you mentioned in your post. Both run in O(n) and are fairly simple to understand. The encode is basically identical one posted by Akanasha, he was just a bit faster in posting his answer while i was writing the decode().
def encode(x):
if not x.isalpha():
raise ValueError()
output = ""
current_l = x[0]
counter = 0
for pos in x:
if current_l != pos:
output += str(counter) + current_l
counter = 1
current_l = pos
else:
counter += 1
return output + str(counter) + current_l
def decode(x):
output = ""
i = 0
while i < len(x):
if x[i].isnumeric():
n = i + 1
while x[n].isnumeric():
n += 1
output += int(x[i:n])*x[n]
i = n
i += 1
return output
test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasasggggggbbbbdd"
test1 = encode(test)
print(test1)
test2 = decode(test1)
print(test2)
print(test == test2)
yes, you do not need any libraries:
list1 = list("aaaacccc")
letters = []
for i in list1:
if i not in letters:
letters.append(i)
string = ""
for i in letters:
string += str(list1.count(i))
string+=str(i)
print(string)
Basically, it loops through the list, finds the unique letters and then prints the count with the letter itself. Reversing would be the same function, just print the amount.
I tried to construct my own string.find() method/function in Python. I did this for a computer science class I'm in.
Basically, this program opens a text file, gets a user input on this the text they want to search for in the file, and outputs the line number on which the string resides, or outputs a 'not found' if the string doesn't exist in the file.
However, this takes about 34 seconds to complete 250,000 lines of XML.
Where is the bottleneck in my code? I made this in C# and C++ as well, and this runs in about 0.3 seconds for 4.5 million lines. I also performed this same search using the built-in string.find() from Python, and this takes around 4 seconds for 250,000 lines of XML. So, I'm trying to understand why my version is so slow.
https://github.com/zach323/Python/blob/master/XML_Finder.py
fhand = open('C:\\Users\\User\\filename')
import time
str = input('Enter string you would like to locate: ') #string to be located in file
start = time.time()
delta_time = 0
def find(str):
time.sleep(0.01)
found_str ='' #initialize placeholder for found string
next_index = 0 #index for comparison checking
line_count = 1
for line in fhand: #each line in file
line_count = line_count +1
for letter in line: #each letter in line
if letter == str[next_index]: #compare current letter index to beginning index of string you want to find
found_str += letter #if a match, concatenate to string placeholder
#print(found_str) #print for visualization of inline search per iteration
next_index = next_index + 1
if found_str == str: #if complete match is found, break out of loop.
print('Result is: ', found_str, ' on line %s '%(line_count))
print (line)
return found_str #return string to function caller
break
else:
#if a match was found but the next_index match was False, reset the indexes and try again.
next_index=0 # reset indext back to zero
found_str = '' #reset string back to empty
if found_str == str:
print(line)
if str != "":
result = find(str)
delta_time = time.time() - start
print(result)
print('Seconds elapsed: ', delta_time)
else:
print('sorry, empty string')
Try this:
with open(filename) as f:
for row in f:
if string in row:
print(row)
The following code runs on a text file of size comparable to the size of your file. Your code doesn't run too slowly on my computer.
fhand = open('test3.txt')
import time
string = input('Enter string you would like to locate: ') #string to be located in file
start = time.time()
delta_time = 0
def find(string):
next_index_to_match = 0
sl = len(string)
ct = 0
for line in fhand: #each line in file
ct += 1
for letter in line: #each letter in line
if letter == string[next_index_to_match]: #compare current letter index to beginning index of string you want to find
# print(line)
next_index_to_match += 1
if sl == next_index_to_match: #if complete match is found, break out of loop.
print('Result is: ', string, ' on line %s '%(ct))
print (line)
return True
else:
#if a match was found but the next_index match was False, reset the indexes and try again.
next_index_to_match=0 # reset indext back to zero
return False
if string != "":
find(string)
delta_time = time.time() - start
print('Seconds elapsed: ', delta_time)
else:
print('sorry, empty string')
I want to write a code to count number of words in a given sentence by using character comparison and below is the code I have written as I am not allowed to use some fancy utilities like split(), etc. So, could you please guide me where am I making mistakes' I am a novice in python and currently trying to fiigure out how to do charactery by character comparison so as to find out simple counts of words, lines, strings withous using built in utitilites. So, kindly guide me about it.
Input Sentence : I am XYZ
Input_Sentence = raw_input("Enter your sentence: ")
print Input_Sentence
count = 0
i=0
while(Input_Sentence[i] != "\n"):
if(Input_Sentence[i] == ' '):
count=count+1
i+=1
else:
i+=1
print ('Number of Words in a given sentence is :' +str(count))
At first I wouldn't use a while loop in this context. Why not using a for loop?
for char in Input_sentence:
With this you iterate over every letter.
Then you can use the rest of you code and check:
if char == ' ':
# initialize the counter
word_count = 0
last_space_index = 0
# loop through each character in the sentence (assuming Input_Sentence is a string)
for i, x in enumerate(Input_Sentence): # enumerate to get the index of the character
# if a space is found (or newline character for end of sentence)
if x in (' ', '\n'):
word_count += 1 # increment the counter
last_space_index = i # set the index of the last space found
if len(Input_Sentence) > (last_space_index + 1): # check if we are at the end of the sentence (this is in case the word does not end with a newline character or a space)
word_count += 1
# print the total number of words
print 'Number of words:', word_count
The following will avoid errors if there's an space at the beginning or the end of the sentence.
Input_Sentence = raw_input("Enter your sentence: ")
print Input_Sentence
count = 0
sentence_length = len(Input_Sentence)
for i in range(sentence_length):
if Input_Sentence[i] == " ":
if i not in (0, sentence_length - 1):
count += 1
count += 1
print "There are %s words in the sentence \"%s\"." % (count, Input_Sentence)
You may use try-except syntax.
In your code you used while(Input_Sentence[i] != "\n") to find when the sentence comes to an end. If you just print the output at every step before i+ = 1 like this:
...
while(Input_Sentence[i] != "\n"):
...
print i,Input_Sentence[i]
i+=1
else:
print i,Input_Sentence[i],'*'
i+=1
...
you can see for yourself that the output is something like this:
Enter your sentence: Python is good
Python is good
0 P *
1 y *
2 t *
3 h *
4 o *
5 n *
6
7 i *
8 s *
9
10 g *
11 o *
12 o *
13 d *
Traceback (most recent call last):
File "prog8.py", line 19, in <module>
while(Input_Sentence[i] != "\n"):
IndexError: string index out of range
which means that the code that you have written works fine upto the length of the input sentence. After that when i is increased by 1 and it is demanded of the code to check if Input_Sentence[i] == "\n" it gives IndexError. This problem can be overcome by using exception handling tools of Python. Which leaves the option to neglect the block inside try if it is an exception and execute the block within except instead.
Input_Sentence = raw_input("Enter your sentence: ")
print Input_Sentence
count = 0
i=0
try:
while (Input_Sentence[i] != "\n"):
if (Input_Sentence[i] == ' '):
count=count+1
i+=1
else:
i+=1
except:
count = count+1
print ('Number of Words in a given sentence is :' +str(count))
I am trying to count the number of times 1,2,3,...,9 appear at the beginning of each number in a file. This is how my code goes:
DECIMAL_NUM='123456789'
def main():
#get the file name from the user
file_name=str(input("Enter a file name: "))
#open the file to read
input_file= open(str(file_name),'r')
#reads the first line of the file
line=input_file.readline().strip()
one=0
two=0
three=0
four=0
five=0
six=0
seven=0
eight=0
nine=0
i=0
while line!="":
if line[0]==DECIMAL_NUM[0]:
one+=1
elif line[0]==DECIMAL_NUM[1]:
two+=1
elif line[0]==DECIMAL_NUM[2]:
three+=1
elif line[0]==DECIMAL_NUM[3]:
four+=1
elif line[0]==DECIMAL_NUM[4]:
five+=1
elif line[0]==DECIMAL_NUM[5]:
six+=1
elif line[0]==DECIMAL_NUM[6]:
seven+=1
elif line[0]==DECIMAL_NUM[7]:
eight+=1
elif line[0]==DECIMAL_NUM[8]:
nine+=1
line=input_file.readline().strip()
i+=1
input_file.close()
print(one)
print(two)
main()
I am also counting how many numbers are there in the file, so that I can calculate percentage of appearance of each digit. I think my codes are a little bit wordy and there might be a better way to do it. The input file has the following numbers:
1292
1076
188040
1579
3510
2597
3783
64690
For some reason, I am getting the number of times 1 is appearing as 1, when it should be 5. Could someone please give me some pointers? Thanks
Here is one way of approaching this task:
# Get non-empty lines from input file:
relevant_lines = [line for line in open(file_name).readlines() if line.strip()]
# Count them:
num_lines = len(relevant_lines)
import defaultdict
# If a key does not exist in a defaultdict when adding a value for it,
# it will be added with a default value for the given data type
# (0 in case of int):
d = defaultdict(int)
# Iterate through lines; get first character of line
# and increment counter for this character by one in defaultdict:
for line in relevant_lines:
d[line[0]] += 1
# Print results:
for key, value in d.items():
print(k + ' appears ' + value + ' times in file.')
If you are not allowed to use dicts, here's how to fix your code:
DECIMAL_NUM='123456789'
def main():
# Get file name from user
file_name = input("Enter a file name: ")
# Open the file to read, and get a list of all lines:
lines = open(file_name, 'r').readlines()
one = 0
two = 0
three = 0
four = 0
five = 0
six = 0
seven = 0
eight = 0
nine = 0
for line in lines:
if line.strip(): # Check if line is not empty
if line[0] == DECIMAL_NUM[0]:
one += 1
elif line[0] == DECIMAL_NUM[1]:
two += 1
elif line[0] == DECIMAL_NUM[2]:
three += 1
elif line[0] == DECIMAL_NUM[3]:
four += 1
elif line[0] == DECIMAL_NUM[4]:
five += 1
elif line[0] == DECIMAL_NUM[5]:
six += 1
elif line[0] == DECIMAL_NUM[6]:
seven += 1
elif line[0] == DECIMAL_NUM[7]:
eight += 1
elif line[0] == DECIMAL_NUM[8]:
nine += 1
print(one)
print(two)
main()
You code is fine. It's your data file that's giving you problem. Remove the blank lines and your program should give you the right results.
1292
1076
188040
1579
3510
2597
3783
64690
After you processed the first line, the next line is read. But that's a blank line and your while loop ends.