Split lines into words task - linear search method

Split lines into words task - linear search method - python

I've been given a task to split lines into words and then split the lines up based on spaces and newlines. I've came up with an incomplete solution as it will not print the last word. I can only use linear search hence the basic approach.
line = raw_input()
while line != "end":
i = 0
while i < len(line):
i = 0
while i < len(line) and line[i] == " ":
i = i + 1
j = i
while line[j] != " ":
j = j + 1
print line[i:j]
line = line[j:]
line = raw_input()

I understand your problem this is somewhat similar to the Hackerearth problem
See this example to clarify your concept
y=list()
1). y=map(int,raw_input().split()) # for storing integer in list
2). y=map(str,raw_input().split()) # for storing characters of string in list
#then use this to print the value
for i in y:
print i #this loop will print one by one value
#working example of this code is #for Second part
>>baby is cute #input
>>['baby','is','cute'] #output
>> #now according to your problem line is breaks into words
If you find it useful Thumbs up

Related

IndexError: string index out of range. While-Loop goes one round too much

I know there are other problems similar to this but I can't seem to figure out how to fix it. I've implemented some "tracking"-code to understand where it goes wrong. The problem is obviously the same for other parts of the code as well depending on what is run through the tokenize function. I know the problem exists because of the end += 1 in the while-loops and doesn't stop/continue correctly. After the last letter/number/symbol is read it should add it to words but instead it tries to go one further step and creates this error. Tried numerous if's and tries things but my coding is to weak to solve it properly. Any other comments of the code in general is much appreciated as well. I had a draft that was working earlier but I accidentally deleted that draft when I was supposed to polish it and move it to another document...
def tokenize(lines):
words = []
for line in lines:
print("new line")
start = 0
while start-1 < len(line):
print(start)
print("start")
while line[start].isspace() == True:
print("remove space")
start += 1
end = start
while line[end].isspace() == True:
print("remove space")
end += 1
if line[end].isalpha() == True:
while line[end].isalpha() == True:
print("letter")
end += 1
elif line[end].isdigit() == True:
while line[end].isdigit() == True:
print("number")
end += 1
else:
print("symbol")
end += 1
words.append(line[start:end].lower())
print(line[start:end] + " - adds to words")
start = end
print(len(line))
print(words)
return words
tokenize([" all .. 12 foas d 12 9"])

The main issue is that you have to check your indexing bounds in every part of the code where indexing variables might have changed. This includes both start and end variables as they are independently incremented within your code.
I also cut out areas of your code which were unnecessary and considered mainly duplicate code and untidy logic, which you have to avoid in every program you write. This also makes it easier to debug, maintain, and understand your program. Always make sure your logic gets as straightforward as possible before you start writing any code.
def tokenize(lines):
words = []
for line in lines:
print("new line")
start = 0
# start, as an index, is allowed in the range [0, len(line) - 1]
# so use either *start < len(line)* or *start <= len(line) - 1* as they are equivalent
while start < len(line):
print(start)
print("start")
# going forward, watch not to overstep again
while start < len(line) and line[start].isspace():
print("remove space")
start += 1
end = start
# whatever variable you use as an index, you have to make sure
# it will be within bounds; as you go forward to capture
# non-space symbols, you should also stop before the string finishes.
while end < len(line) and not line[end].isspace():
if line[end].isalpha():
print("letter")
elif line[end].isdigit():
print("number")
else:
print("symbol")
end += 1
words.append(line[start:end].lower())
print(line[start:end] + " - adds to words")
start = end
print(len(line))
print(words)
UPDATE:
It seems the OP is trying to keep the non-alphanumeric symbols as separate tokens. I suggest not doing this in a single passing. You can first split the normal way and then go over each word again to split by symbols (and retain symbols). This will keep your code simpler and easier to read. I'm going to use regex split for the second step:
import re
greeting = "Hey, how are you doing?"
# get rid of spaces
tokens = greeting.split()
result = []
for w in tokens:
# '[^\d\w]' will match symbol characters (non-digit and non-alpha)
# parentheses will capture the delimiters (the symbols) as tokens in the final list
for x in re.split("([^\d\w]+)", w):
if x:
result.append(x)
print(result)
##### or use list comprehension to achieve this in a single line
result = [x for w in greeting.split() for x in re.split("([^\d\w]+)", w) if x != ""]
print(result)

How to group consecutive letters in a string in Python?

For example: string = aaaacccc, then I need the output to be 4a4c. Is there a way to do this without using any advanced methods, such as libraries or functions?
Also, if someone knows how to do the reverse: turning "4a4c: into aaaacccc, that would be great to know.

This will do the work in one iteration
Keep two temp variable one for current character, another for count of that character and one variable for the result.
Just iterate through the string and keep increasing the count if it matches with the previous one.
If it doesn't then update the result with count and value of character and update the character and count.
At last add the last character and the count to the result. Done!
input_str = "aaaacccc"
if input_str.isalpha():
current_str = input_str[0]
count = 0
final_string = ""
for i in input_str:
if i==current_str:
count+=1
else:
final_string+=str(count)+current_str
current_str = i
count = 1
final_string+=str(count)+current_str
print (final_string)

Another solution and I included even a patchwork reverse operation like you mentioned in your post. Both run in O(n) and are fairly simple to understand. The encode is basically identical one posted by Akanasha, he was just a bit faster in posting his answer while i was writing the decode().
def encode(x):
if not x.isalpha():
raise ValueError()
output = ""
current_l = x[0]
counter = 0
for pos in x:
if current_l != pos:
output += str(counter) + current_l
counter = 1
current_l = pos
else:
counter += 1
return output + str(counter) + current_l
def decode(x):
output = ""
i = 0
while i < len(x):
if x[i].isnumeric():
n = i + 1
while x[n].isnumeric():
n += 1
output += int(x[i:n])*x[n]
i = n
i += 1
return output
test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasasggggggbbbbdd"
test1 = encode(test)
print(test1)
test2 = decode(test1)
print(test2)
print(test == test2)

yes, you do not need any libraries:
list1 = list("aaaacccc")
letters = []
for i in list1:
if i not in letters:
letters.append(i)
string = ""
for i in letters:
string += str(list1.count(i))
string+=str(i)
print(string)
Basically, it loops through the list, finds the unique letters and then prints the count with the letter itself. Reversing would be the same function, just print the amount.

If-else statement not functioning properly in for loop

I'm trying to count the different characters in two individual strings using an if-else statement in a for-loop. However, it never counts the different characters.
for char in range(len(f1CurrentLine)): # Compare line by line
if f1CurrentLine[char] != f2CurrentLine[char]: # If the lines have different characters
print("Unmatched characters: ", count, ":", char)
diffCharCount = diffCharCount + 1 # add 1 to the difference counter
count = count + 1
text1Count = text1Count + len(f1CurrentLine)
text2Count = text2Count + len(f2CurrentLine)
return CharByChar(count=count, text2Count=text2Count, text1Count=text1Count,
diffCharCount=diffCharCount) # return difference count
else:
print("Characters matched in line:", count, ". Moving to next line.")
text1Count = text1Count + len(f1CurrentLine)
text2Count = text2Count + len(f2CurrentLine)
count = count + 1
return CharByChar(count, diffCharCount=diffCharCount, text1Count=text1Count,
text2Count=text2Count,
diffLineCount=diffLineCount)
I have two files with the following in them
File 1:
1 Hello World
2 bazzle
3 foobar
File 2:
1 Hello world
2 bazzle
3 fooBar
It should return 2 different characters, but it does not. If you want to take a look at the entire function I have linked it here: Pastebin. Hopefully you can see something I have missed.

Your code is too complicated for this sort of application. I've tried my best to understand the code and I've come up with a better solution.
text1 = open("file1.txt")
text2 = open("file2.txt")
# Difference variables
diffLineCount = diffCharCount = line_num = 0
# Iterate through both files line by line
for line1, line2 in zip(text1.readlines(), text2.readlines()):
if line1 == "\n" or line2 == "\n": continue # If newline, go to next line
if len(line1) != len(line2): # If lines are of different length
diffLineCount += 1
continue # Go to next line
for c1, c2 in zip(list(line1.strip()), list(line2.strip())): # Iterate through both lines character by character
if c1 != c2: # If they do not match
print("Unmatched characters: ", line_num, ":", c1)
diffCharCount += 1
line_num += 1
# Goes back to the beginning of each file
text1.seek(0)
text2.seek(0)
# Prints the stats
print("Number of characters in the first file: ", len(text1.read()))
print("number of characters in the second file: ", len(text2.read()))
print("Number of characters that do not match in lines of the same length: ", diffCharCount)
print("Number of lines that are not the same length: ", diffLineCount)
# Closes the files
text1.close()
text2.close()
I hope you understand how this works and are able to make it fit your needs specifically. Good luck!

Unlike the other solution I edited your code, so that you can understand what was going wrong. I agree with him that you should anyway organize better your code because it is complex
text1 = open("file1.txt")
text2 = open("file2.txt")
def CharByChar(count, diffCharCount, text1Count, text2Count, diffLineCount):
"""
This function compares two files character by character and prints the number of characters that are different
:param count: What line of the file the program is comparing
:param diffCharCount: The sum of different characters
:param text1Count: Sum of characters in file 1
:param text2Count: Sum of characters in file 2
:param diffLineCount: Sum of different lines
"""
# see comment below for strip removal
f1CurrentLine = text1.readline()
f2CurrentLine = text2.readline()
while f1CurrentLine != '' or f2CurrentLine != '':
count = count + 1
print(f1CurrentLine)
print(f2CurrentLine)
#if f1CurrentLine != '' or f2CurrentLine != '':
if len(f1CurrentLine) != len(f2CurrentLine): # If the line lengths are not equal return the line number
print("Lines are a different length. The line number is: ", count)
diffLineCount = diffLineCount + 1
count = count + 1
#text1Count = text1Count + len(f1CurrentLine)
#text2Count = text2Count + len(f2CurrentLine)
# return CharByChar(count)
elif len(f1CurrentLine) == len(f2CurrentLine): # If the lines lengths are equal
for char in range(len(f1CurrentLine)): # Compare line by line
print(char)
if f1CurrentLine[char] != f2CurrentLine[char]: # If the lines have different characters
print("Unmatched characters: ", count, ":", char)
diffCharCount = diffCharCount + 1 # add 1 to the difference counter
#count = count + 1
text1Count = text1Count + len(f1CurrentLine)
text2Count = text2Count + len(f2CurrentLine)
# return CharByChar(count=count, text2Count=text2Count, text1Count=text1Count,diffCharCount=diffCharCount) # return difference count
else:
print("Characters matched in line:", count, ". Moving to next char.")
#text1Count = text1Count + len(f1CurrentLine)
#text2Count = text2Count + len(f2CurrentLine)
#count = count + 1
#return CharByChar(count, diffCharCount=diffCharCount, text1Count=text1Count,text2Count=text2Count,diffLineCount=diffLineCount)
#elif len(f1CurrentLine) == 0 or len(f2CurrentLine) == 0:
#print(count, "lines are not matching")
#diffLineCount = diffLineCount + 1
#return CharByChar(diffLineCount=diffLineCount)
else:
print("Something else happened!")
f1CurrentLine = text1.readline()
f2CurrentLine = text2.readline()
print("Number of characters in the first file: ", text1Count)
print("number of characters in the second file: ", text2Count)
print("Number of characters that do not match in lines of the same length: ", diffCharCount)
print("Number of lines that are not the same length: ", diffLineCount)
def main():
"Calls the primary function"
CharByChar(count=0, diffCharCount=0, text1Count=0, text2Count=0, diffLineCount=0)
input("Hit enter to close the program...")
main() #Runs this bad boi
I think the general trouble is organizing your CharByChar() function to scan all the lines in the file [which is something we maintain in this solution] but then asking to call the same function at then end of every character check
some parts have no reasons to be there: for example you set count in the main when calling CharByChar() and then you create a branch with if(count == 0). You can cut this out, the code will look cleaner
some variables as well should be removed to keep the code as clean as possible: you never use text1Count and text2Count
you enter with a condition on the while and the next if has the same condition: if you entered the while you will enter also the if [or none of them] so you can cut one of them out
I suggest you to remove the branch with if len(f1CurrentLine) == 0 or len(f2CurrentLine) == 0 because both the files can have length 0 for the same line and then the lines would be equal [see the very next example below]
I suggest you to remove the strip() to avoid troubles to interrupt the check earlier for files where you have newlines in the middle, e.g.
1 Hello
3 foobar

how can I print lines of a file that specefied by a list of numbers Python?

I open a dictionary and pull specific lines the lines will be specified using a list and at the end i need to print a complete sentence in one line.
I want to open a dictionary that has a word in each line
then print a sentence in one line with a space between the words:
N = ['19','85','45','14']
file = open("DICTIONARY", "r")
my_sentence = #?????????
print my_sentence

If your DICTIONARY is not too big (i.e. can fit your memory):
N = [19,85,45,14]
with open("DICTIONARY", "r") as f:
words = f.readlines()
my_sentence = " ".join([words[i].strip() for i in N])
EDIT: A small clarification, the original post didn't use space to join the words, I've changed the code to include it. You can also use ",".join(...) if you need to separate the words by a comma, or any other separator you might need. Also, keep in mind that this code uses zero-based line index so the first line of your DICTIONARY would be 0, the second would be 1, etc.
UPDATE:: If your dictionary is too big for your memory, or you just want to consume as little memory as possible (if that's the case, why would you go for Python in the first place? ;)) you can only 'extract' the words you're interested in:
N = [19, 85, 45, 14]
words = {}
word_indexes = set(N)
counter = 0
with open("DICTIONARY", "r") as f:
for line in f:
if counter in word_indexes:
words[counter] = line.strip()
counter += 1
my_sentence = " ".join([words[i] for i in N])

you can use linecache.getline to get specific line numbers you want:
import linecache
sentence = []
for line_number in N:
word = linecache.getline('DICTIONARY',line_number)
sentence.append(word.strip('\n'))
sentence = " ".join(sentence)

Here's a simple one with more basic approach:
n = ['2','4','7','11']
file = open("DICTIONARY")
counter = 1 # 1 if you're gonna count lines in DICTIONARY
# from 1, else 0 is used
output = ""
for line in file:
line = line.rstrip() # rstrip() method to delete \n character,
# if not used, print ends with every
# word from a new line
if str(counter) in n:
output += line + " "
counter += 1
print output[:-1] # slicing is used for a white space deletion
# after last word in string (optional)

How to compare sentence character by character in python?

I want to write a code to count number of words in a given sentence by using character comparison and below is the code I have written as I am not allowed to use some fancy utilities like split(), etc. So, could you please guide me where am I making mistakes' I am a novice in python and currently trying to fiigure out how to do charactery by character comparison so as to find out simple counts of words, lines, strings withous using built in utitilites. So, kindly guide me about it.
Input Sentence : I am XYZ
Input_Sentence = raw_input("Enter your sentence: ")
print Input_Sentence
count = 0
i=0
while(Input_Sentence[i] != "\n"):
if(Input_Sentence[i] == ' '):
count=count+1
i+=1
else:
i+=1
print ('Number of Words in a given sentence is :' +str(count))

At first I wouldn't use a while loop in this context. Why not using a for loop?
for char in Input_sentence:
With this you iterate over every letter.
Then you can use the rest of you code and check:
if char == ' ':

# initialize the counter
word_count = 0
last_space_index = 0
# loop through each character in the sentence (assuming Input_Sentence is a string)
for i, x in enumerate(Input_Sentence): # enumerate to get the index of the character
# if a space is found (or newline character for end of sentence)
if x in (' ', '\n'):
word_count += 1 # increment the counter
last_space_index = i # set the index of the last space found
if len(Input_Sentence) > (last_space_index + 1): # check if we are at the end of the sentence (this is in case the word does not end with a newline character or a space)
word_count += 1
# print the total number of words
print 'Number of words:', word_count

The following will avoid errors if there's an space at the beginning or the end of the sentence.
Input_Sentence = raw_input("Enter your sentence: ")
print Input_Sentence
count = 0
sentence_length = len(Input_Sentence)
for i in range(sentence_length):
if Input_Sentence[i] == " ":
if i not in (0, sentence_length - 1):
count += 1
count += 1
print "There are %s words in the sentence \"%s\"." % (count, Input_Sentence)

You may use try-except syntax.
In your code you used while(Input_Sentence[i] != "\n") to find when the sentence comes to an end. If you just print the output at every step before i+ = 1 like this:
...
while(Input_Sentence[i] != "\n"):
...
print i,Input_Sentence[i]
i+=1
else:
print i,Input_Sentence[i],'*'
i+=1
...
you can see for yourself that the output is something like this:
Enter your sentence: Python is good
Python is good
0 P *
1 y *
2 t *
3 h *
4 o *
5 n *
6
7 i *
8 s *
9
10 g *
11 o *
12 o *
13 d *
Traceback (most recent call last):
File "prog8.py", line 19, in <module>
while(Input_Sentence[i] != "\n"):
IndexError: string index out of range
which means that the code that you have written works fine upto the length of the input sentence. After that when i is increased by 1 and it is demanded of the code to check if Input_Sentence[i] == "\n" it gives IndexError. This problem can be overcome by using exception handling tools of Python. Which leaves the option to neglect the block inside try if it is an exception and execute the block within except instead.
Input_Sentence = raw_input("Enter your sentence: ")
print Input_Sentence
count = 0
i=0
try:
while (Input_Sentence[i] != "\n"):
if (Input_Sentence[i] == ' '):
count=count+1
i+=1
else:
i+=1
except:
count = count+1
print ('Number of Words in a given sentence is :' +str(count))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split lines into words task - linear search method - python

Related

IndexError: string index out of range. While-Loop goes one round too much

How to group consecutive letters in a string in Python?

If-else statement not functioning properly in for loop

how can I print lines of a file that specefied by a list of numbers Python?

How to compare sentence character by character in python?

Categories

Resources