Trouble splitting text without using split() - python

splitText(text) where text is a string and return the list of the words by splitting the string text.
See example below:
sampleText = "As Python's creator, I'd like to say a few words about its origins.”
splitText(sampleText)
['As', 'Python', 's', 'creator', 'I', 'd', 'like', 'to', 'say', 'a', 'few', 'words', 'about', 'its', 'origins']
You must NOT use the method split() from the str type, however other methods >from the class are allowed. You must not use python library such as string.py.
This is my code:
def split(text):
final_lst = ""
length = len(text)
for x in range(length):
if text[x].isalpha() == True:
final_lst = final_lst + text[x]
else:
final_lst = final_lst + ", "
final_len = len(final_lst)
for a in range(final_len):
if final_lst[:a] == " " or final_lst[:a] == "":
final_lst = "'" + final_lst[a]
if final_lst[a:] == " " or final_lst[a:] == ", ":
final_lst = final_lst[a] + "'"
elif final_lst[:a].isalpha() or final_lst[a:].isalpha():
final_lst[a]
print(final_lst)
split(sampleText)
When I run it I get this:
'A
I've tried lots of things to try and solve.

First of all, your function name is wrong. You have split(text) and the exercise specifically calls for splitText(text). If your class is graded automatically, for example by a program that just loads your code and tries to run splitText(), you'll fail.
Next, this would be a good time for you to learn that a string is an iterable object in Python. You don't have to use an index - just iterate through the characters directly.
for ch in text:
Next, as #Evert pointed out, you are trying to build a list, not a string. So use the correct Python syntax:
final_list = []
Next, let's think about how you can process one character at a time and get this done. When you see a character, you can determine whether it is, or is not, an alphabetic character. You need one more piece of information: what were you doing before?
If you are in a "word", and you get "more word", you can just append it.
If you are in a "word", and you get "not a word", you have reached the end of the word and should add it to your list.
If you are in "not a word", and you get "not a word", you can just ignore it.
If you are in "not a word", and you get "word", that's the start of a new word.
Now, how can you tell whether you are in a word or not? Simple. Keep a word variable.
def splitText(text):
"""Split text on any non-alphabetic character, return list of words."""
final_list = []
word = ''
for ch in text:
if word: # Empty string is false!
if ch.isalpha():
word += ch
else:
final_list.append(word)
word = ''
else:
if ch.isalpha():
word += ch
else:
# still not alpha.
pass
# Handle end-of-text with word still going
if word:
final_list.append(word)
return final_list
sampleText = "As Python's creator, I'd like to say a few words about its origins."
print(splitText(sampleText))
Output is:
['As', 'Python', 's', 'creator', 'I', 'd', 'like', 'to', 'say', 'a', 'few', 'words', 'about', 'its', 'origins']
Next, if you sit and stare at it for a while you'll realize that you can combine some of the cases. It boils down nicely- try turning it inside out by moving the outer if to the inside, and see what you get.

To me, it looks like you are complicating things too much, basically all you need to do is to go through the text char by char, and combining them to words, once you find empty space you separate it and add it to the result array. After you run out of text you just return the array.
def splittext(text):
result = []
word = ""
for i in text:
if i != " ":
word += i
else:
result.append(word)
word = ""
result.append(word)
return result

This should work:
smapleText = 'As Python\'s creator, I\'d like to say a few words about its origins.'
def split(text):
result =[]
temp=""
length = len(text)
for x in range(length):
if text[x].isalpha():
temp = temp+text[x]
else:
result.append(temp)
temp=""
print result
split(smapleText)

Can you cheat with regular expressions?
import re
sampleText = "As Python's creator, I'd like to say a few words about its origins."
result = re.findall(r'\w+', sampleText)
>>> result
['As', 'Python', 's', 'creator', 'I', 'd', 'like', 'to', 'say', 'a', 'few', 'words', 'about', 'its', 'origins']

def stringSplitter(string):
words = []
current_word = ""
for x in range(len(string)):
if string[x] == " ":
words.append(current_word)
current_word = ""
else:
current_word += string[x]
return words

Related

Replacing character in string doesn't do anything

I have a list like this,
['Therefore', 'allowance' ,'(#)', 't(o)o', 'perfectly', 'gentleman', '(##)' ,'su(p)posing', 'man', 'his', 'now']
Expected output:
['Therefore', 'allowance' ,'(#)', 'too', 'perfectly', 'gentleman', '(##)' ,'supposing', 'man', 'his', 'now']
Removing the brackets is easy by using .replace(), but I don't want to remove the brackets from strings (#) and (##).
my code:
ch = "()"
for w in li:
if w in ["(#)", "(##)"]:
print(w)
else:
for c in ch:
w.replace(c, "")
print(w)
but this doesn't remove the brackets from the words.
You can use re.sub. In particular, note that it can take a function as repl parameter. The function takes a match object, and returns the desired replacement based on the information the match object has (e.g., m.group(1)).
import re
lst = ['Therefore', 'allowance', '(#)', 't(o)o', 'perfectly', 'gentleman', '(##)', 'su(p)posing', 'man', 'his', 'now']
def remove_paren(m):
return m.group(0) if m.group(1) in ('#', '##') else m.group(1)
output = [re.sub(r"\((.*?)\)", remove_paren, word) for word in lst]
print(output) # ['Therefore', 'allowance', '(#)', 'too', 'perfectly', 'gentleman', '(##)', 'supposing', 'man', 'his', 'now']
def removeparanthesis(s):
a=''
for i in s:
if i not in '()':
a+=i
return a
a = ['Therefore', 'allowance' , '(#)' , 't(o)o' , 'perfectly' , 'gentleman' , '(##)' , 'su(p)posing', 'man', 'his', 'now']
b=[]
for i in a:
if i == '(#)' or i == '(##)':
b.append(i)
else:
b.append(removeparanthesis(i))
print(b)
#I just created a function to remove parenthesis to those with not having them as a start and end
Give this a try!
Here, I define another empty array. And by looping in the original array to append the words again except the ones that we don't need.
At first, as you can see we got two loops. In the second one, we loop through each character and whenever we encounter a ( or ) we skip it and continue appending our string word.
If you notice that; to keep the (#) and (##) we skip the second loop but do not forget to add them again to the new list.
li = ["Therefore", "allowance", "(#)", "t(o)o" , "perfectly", "gentleman", "(##)", "su(p)posing", "man", "his", "now"]
new_li = []
for index, w in enumerate(li):
if w in ["(#)", "(##)"]:
new_li.append(w)
continue
new_word = ""
for c in w:
if c == "(" or c == ")":
continue
new_word = new_word + c
new_li.append(new_word)
print(new_li)

Return a list of words that contain a letter

I wanna return a list of words containing a letter disregarding its case.
Say if i have sentence = "Anyone who has never made a mistake has never tried anything new", then f(sentence, a) would return
['Anyone', 'has', 'made', 'a', 'mistake', 'has', 'anything']
This is what i have
import re
def f(string, match):
string_list = string.split()
match_list = []
for word in string_list:
if match in word:
match_list.append(word)
return match_list
You don't need re. Use str.casefold:
[w for w in sentence.split() if "a" in w.casefold()]
Output:
['Anyone', 'has', 'made', 'a', 'mistake', 'has', 'anything']
You can use string splitting for it, if there is not punctuation.
match_list = [s for s in sentence.split(' ') if 'a' in s.lower()]
Here's another variation :
sentence = 'Anyone who has never made a mistake has never tried anything new'
def f (string, match) :
match_list = []
for word in string.split () :
if match in word.lower ():
match_list.append (word)
return match_list
print (f (sentence, 'a'))

How can you output a list in a more English form (like "a, b and c")?

I created a list but, when printing, I need to add the 'and' right before the last item in the list. Example:
mylist = ['me', 'you', 'him', 'her']
When I print out the list I want it to look like:
me, you, him and her.
I don't want the ', [ or ] to show.
I'm currently using:
mylist = ['me', 'you', 'him', 'her']
print (','.join.(mylist))
but the output is me,you,him,her. I need it to show me, you, him and her.
Using str.join twice with rsplit:
mylist = ['me', 'you', 'him', 'her']
new_str = ' and '.join(', '.join(mylist).rsplit(', ', 1))
print(new_str)
Output:
me, you, him and her
This works fine with empty or single-element list:
new_str = ' and '.join(', '.join([]).rsplit(', ', 1))
print(new_str)
# None
new_str = ' and '.join(', '.join(['me']).rsplit(', ', 1))
print(new_str)
# me
I'm a huge fan of explicitness, so I might write this like:
def human_list(items):
# Empty list? Empty string.
if not items:
return ''
# One-item list? Return that item.
if len(items) == 1:
return items[0]
# For everything else, join all items *before* the last one with commas,
# then add ' and {last_item}' to the end.
return ', '.join(items[:-1]) + ' and ' + items[-1]
# Demonstrate that this works the way we want
assert human_list([]) == ''
assert human_list(['spam']) == 'spam'
assert human_list(['spam', 'eggs']) == 'spam and eggs'
assert human_list(['one', 'two', 'three']) == 'one, two and three'
assert human_list(['knife', 'fork', 'bottle', 'a cork']) == 'knife, fork, bottle and a cork'
You can do something like this:
mylist = ['me', 'you', 'him', 'her']
length = len(mylist)
for i,j in enumerate(mylist):
if i == length-2:
print(j,'and ',end='')
elif i == length-1:
print(j,end="")
else:
print(j,end=', ')
The below is a simple method if you don't want to go with slicing etc. It will allow you to reuse the functionality implemented (function calling) and also you can easily change the logic inside.
Note: If list is empty, a blank string will be returned
def get_string(l):
s = ""
index = 0
length = len(l)
while index < length:
word = l[index]
if index == length - 1:
s += 'and ' + word
else:
s += word + ", "
index += 1
return s.strip()
# Try
mylist = ['me', 'you', 'him', 'her']
print(get_string(mylist)) # me, you, him, and her
A helper function is probably a good way to go since it centralises control at one point, meaning you can fix bugs or make improvements easily (such as handling edge cases like empty lists). It also makes the main code easier to read since it simply contains something like readableList(myList).
The following function is all you need:
def readableList(pList):
if len(pList) == 0: return ""
if len(pList) == 1: return pList[0]
return ", ".join(pList[:-1]) + ' and ' + pList[-1]
For a test harness, you can use something like:
for myList in [['me', 'you', 'him', 'her'], ['one', 'two'], ['one'], []]:
print("{} -> '{}'".format(myList, readableList(myList)))
which gives the output:
['me', 'you', 'him', 'her'] -> 'me, you, him and her'
['one', 'two'] -> 'one and two'
['one'] -> 'one'
[] -> ''
Note that those quotes to the right of -> are added by my test harness just so you can see what the string is (no trailing spaces, showing empty strings, etc). As per your requirements, they do not come from the readableList function itself.
To add an element before the last element you can do this
last_element = mylist.pop()
mylist.append(' and ')
mylist.append(last_element)
my_string = ', 'join(mylist[:-2]) + mylist[-2] + mylist[-1]
print(my_string)
or
mylist.insert(-1, ' and ')
my_string = ', 'join(mylist[:-2]) + mylist[-2] + mylist[-1]
print(my_string)
But a better answer as given by LoMaPh in the comments is:
', '.join(mylist[:-1]) + ' and ' + mylist[-1]

why my code does not decode the encrypted string based on the dictionary?

I have a dictionary with keys and values that represent letters.
for example a simple one :
DICT_CODE = {'b' : 'g', 'n' :'a', 'p' : 'o', 'x' : 'd', 't' : 'y'}
I've received an encrypted code and turned the string into a list, where each item is a word. I need to solve it, according to the items in the dictionary.
an example for a code is :
words_list = ["bppx","xnt!"] # "good day!"
I've tried to solve it by using double for loops, as here:
for word in words_list:
for char in word:
if char in string.letters:
word = word.replace(char, DICT_CODE.get(char))
print words_list
expected output -> ["good","day!"]
output -> ["bppx","xnt!"]
It does not working at all. the charcaters stay the same and the code is stil unknown.
I don't understand why it isn't working, if someone has time to look and try to help me and see whats wrong, or even suggest a better way (that works).
Changing the word variable inside the for loop, would not change the string inside the word_list. You would need to remember the index and update the element at that index (and get the word from the index) -
for i, word in enumerate(words_list):
for char in word:
if char in string.letters:
words_list[i] = words_list[i].replace(char, DICT_CODE.get(char))
Demo -
>>> words_list = ["bppx","xnt!"]
>>> DICT_CODE = {'b' : 'g', 'n' :'a', 'p' : 'o', 'x' : 'd', 't' : 'y'}
>>> for i, word in enumerate(words_list):
... for char in word:
... if char in string.letters:
... words_list[i] = words_list[i].replace(char, DICT_CODE.get(char))
>>> words_list
['good', 'day!']
But an easier way for you would be to use str.translate (along with string.maketrans ). Example -
table = string.maketrans('bnpxt','gaody') #First argument characters in your original string, and second argument what they map to.
for i, word in enumerate(words_list):
words_list[i] = word.translate(table)
Demo -
>>> import string
>>> table = string.maketrans('bnpxt','gaody') #This creates the translation table
>>> words_list = ["bppx","xnt!"]
>>> for i, word in enumerate(words_list):
... words_list[i] = word.translate(table)
...
>>> print words_list
['good', 'day!']
This using list comprehension -
words_list[:] = [word.translate(table) for word in words_list]
Demo -
>>> words_list = ["bppx","xnt!"]
>>> table = string.maketrans('bnpxt','gaody')
>>> words_list[:] = [word.translate(table) for word in words_list]
>>> words_list
['good', 'day!']
Your problem is that you don't actually modify original list.
for i, word in enumerate(words_list):
for char in word:
if char in string.letters:
word = word.replace(char, DICT_CODE.get(char))
words_list[i] = word
print words_list
['good', 'day!']
As mentioned in the comments, by #marmeladze, print word_list will print the word_list which you declared above.
What you want, is something like this:
DICT_CODE = {'b' : 'g', 'n' :'a', 'p' : 'o', 'x' : 'd', 't' : 'y', '!': '!'}
words_list = ["bppx","xnt!"]
decoded_list = []
for word in words_list:
for char in word:
word = word.replace(char, DICT_CODE.get(char))
decoded_list.append(word)
print decoded_list
Output
['good', 'day!']
Hope this helps.

Python: Appending to a list

I'm working on a definition tester (you enter in words, their part of speeches, and synonyms for each, and it tests you on them). Problem I have is with the part that gets the word:
def get_word(): # this is in another function, that's why it is indented
import easygui as eg
word_info = eg.multenterbox(msg = 'Enter in the following information about each word.'
, title = 'Definition Tester'
, fields = ['Word: ', 'Part of Speech: ', 'Synonyms (separated by spaces): ']
, values = []
)
return word_info
for i in range(n):
my_list = get_word()
print my_list # for testing
word, pOS, synonyms = my_list[0], my_list[1], my_list[2]
word = word.capitalize()
synonyms = synonyms.split(', ')
words_list += word
print word # for testing
test_dict[word] = [pOS, synonyms]
print words_list # for testing
However, words_list ends up being the word(s) after the list(word) function is applied to them--- I'm not sure why.
For example: if the only word was 'word', words_list turns out to be ['w', 'o', 'r', 'd']. If there were two words ('dog', 'cat'), words_list turns out to be ['d', 'o', 'g', 'c', 'a', 't'].
Here is my input (into get_word()): Word: 'myword', Part of Speech: 'n', Synonyms: 'synonym, definition'.
This is the output I get:
['myword', 'n', 'synonym, definition']
Myword
['M', 'y', 'w', 'o', 'r', 'd'] # Why does this happen?
This is the only thing wrong with my program... If I could get some input on how to fix this and what is wrong, it would be much appreciated. Thanks!
It's because of this line:
words_list += word
+= on a list is for adding all the elements in another list. As it happens, Python strings also function like lists of characters, so you are adding each character to the list as its own element.
You want this:
words_list.append(word)
which is for adding a single element to the end.
After messing around with it, I figured out the problem myself, so I thought I should put it here for anyone who has something similar:
Instead of doing words_list += word, it should be: words_list.append(word).
Or, which is what I did, you can do: words_list += [word]. Now, word is a list object, so it will add onto the previous list.

Categories

Resources