unusable array when passing a txt file into array - python

So i need to pass all elements(the all are strings) from txt file into an array to use further. I have this kind of output:
['mzm\n', 'vur\n', 'bmc\n', 'irl\n'],
but i have:
KeyError: '\n' because of this '/n's.
Is it possible to pass all strings into array to have this output [mzm, vur, bmc, irl]?
This is for my radix sort algorithm.
def main():
with open('Array.txt') as my_file:
words = my_file.readlines()
max_size = check_max_word_size(words)
new_list = set_same_size(words, max_size)
new_list = radix_sort(new_list, max_size-1, 0)
#Remove the dots previously added to the words
index = 0
for word in new_list:
new_list[index]= re.sub('[.]', '', word)
index+=1
#Print the final ordered list, all lower case
print(new_list)
if __name__ == '__main__':
main()
[mzm, vur, bmc, irl]

You can strip off the trailing newlines in word like this:
new_list[index]= re.sub('[.]', '', word.rstrip())

The characters '\n' come from the file, as readlines() keeps them.
You can remove the characters '\n' like this:
words = [w.strip('\n') for w in words]

Related

Ignore words with '

I need to count all unique five letter words in a txt file and ignore any word with an
apostrophe. I'm new to python so I am quite confused trying to get just the five letter words and not sure how to ignore words that have an ' .
what I wrote so far seemed to work for filtering the unique words but not for just five letter words.
with open ("names.txt", 'r') as f: #open the file
words = f.read().lower().split() #read the contents into a sting, made all the character in string lower case and split string into list of words
print(words)
unique_words = set(words) #get unique words
print(len(unique_words))
for w in unique_words:
if len(w) == 5:
print(unique_words)
else:
pass
Your code looks good. I think the only bit you did wrong was to print(unique_words) instead of print(w) when you found a word w of length 5.
To ignore the words containing ' you can add this condition:
for w in unique_words:
if len(w) == 5 and "'" not in w:
print(w)
B.t.w. you don't need the pass statement if you are already at the end of the for loop.
This should do the trick
with open("names.txt", 'r') as f: #open the file
words = f.read().lower().split() #read the contents into a sting, made all the character in string lower case and split string into list of words
print(words)
unique_words = set() #Create empty set
for w in words:
if len(w) == 5 and "'" not in w:
unique_words.add(w) #add words to set
print(len(unique_words))

String Index Out of Range Issue - Python

I am trying to make a lossy text compression program that removes all vowels from the input, except for if the vowel is the first letter of a word. I keep getting this "string index out of range" error on line 6. Please help!
text = str(input('Message: '))
text = (' ' + text)
for i in range(0, len(text)):
i = i + 1
if str(text[i-1]) != ' ': #LINE 6
text = text.replace('a', '')
text = text.replace('e', '')
text = text.replace('i', '')
text = text.replace('o', '')
text = text.replace('u', '')
print(text)
As busybear notes, the loop isn't necessary: your replacements don't depend on i.
Here's how I'd do it:
def strip_vowels(s): # Remove all vowels from a string
for v in 'aeiou':
s = s.replace(v, '')
return s
def compress_word(s):
if not s: return '' # Needed to avoid an out-of-range error on the empty string
return s[0] + strip_vowels(s[1:]) # Strip vowels from all but the first letter
def compress_text(s): # Apply to each word
words = text.split(' ')
new_words = compress_word(w) for w in words
return ' '.join(new_words)
When you replace letters with a blank, your word gets shorter. So what was originally len(text) is going to be out of bounds if you remove any letters. Do note however, replace is replacing all occurrences within your string, so a loop isn't even necessary.
An alternative to use the loop is to just keep track of the index of letters to replace while going through the loop, then replace after the loop is complete.
Shortening your string length by replacing any char with "" means that if you remove a character, len(text) used in your iterator is longer than the actual string length. There are plenty of alternative solutions. for example,
text_list = list(text)
for i in range(1, len(text_list)):
if text_list[i] in "aeiou":
text_list[i] = ""
text = "".join(text_list)
By turning your string into a list of its composite characters, you can remove characters but maintain the list length (since empty elements are allowed) then rejoin them.
Be sure to account for special cases, such as len(text)<2.

Sorting words in a text file (with parameters) and writing them into a new file with Python

I have a file.txt with thousands of words, and I need to create a new file based on certain parameters, and then sort them a certain way.
Assuming the user imports the proper libraries when they test, what is wrong with my code? (There are 3 separate functions)
For the first, I must create a file with words containing certain letters, and sort them lexicographically, then put them into a new file list.txt.
def getSortedContain(s,ifile,ofile):
toWrite = ""
toWrites = ""
for line in ifile:
word = line[:-1]
if s in word:
toWrite += word + "\n"
newList = []
newList.append(toWrite)
newList.sort()
for h in newList:
toWrites += h
ofile.write(toWrites[:-1])
The second is similar, but must be sorted reverse lexicographically, if the string inputted is NOT in the word.
def getReverseSortedNotContain(s,ifile,ofile):
toWrite = ""
toWrites = ""
for line in ifile:
word = line[:-1]
if s not in word:
toWrite += word + "\n"
newList = []
newList.append(toWrite)
newList.sort()
newList.reverse()
for h in newList:
toWrites += h
ofile.write(toWrites[:-1])
For the third, I must sort words that contain a certain amount of integers, and sort lexicographically by the last character in each word.
def getRhymeSortedCount(n, ifile, ofile):
toWrite = ""
for line in ifile:
word = line[:-1] #gets rid of \n
if len(word) == n:
toWrite += word + "\n"
reversetoWrite = toWrite[::-1]
newList = []
newList.append(toWrite)
newList.sort()
newList.reverse()
for h in newList:
toWrites += h
reversetoWrite = toWrites[::-1]
ofile.write(reversetoWrites[:-1])
Could someone please point me in the right direction for these? Right now they are not sorting as they're supposed to.
There is a lot of stuff that is unclear here so I'll try my best to clean this up.
You're concatenating strings together into one big string then appending that one big string into a list. You then tried to sort your 1-element list. This obviously will do nothing. Instead put all the strings into a list and then sort that list
IE: for your first example do the following:
def getSortedContain(s,ifile,ofile):
words = [word for word in ifile if s in words]
words.sort()
ofile.write("\n".join(words))

Sorting an ofile in lexicographical order

I made a function that takes the words that contain a string 's' from a file 'ifile' moves them into 'ofile'.
It works perfectly. I just need helping sorting the words that are now in ofile in lexicographical order.
Here's my code:
def getListContain(s,ifile,file):
newline= ''
for word in ifile:
if word.find(s) != -1:
ofile.write(newline + word.strip())
newline = '\n'
To sort a list of strings in lexicographical order you simply do:
listOfStrings.sort()
So in your case it would be:
def getListContain(s,ifile,file):
listOfStrings = []
for word in ifile:
if word.find(s) != -1:
listOfStrings.append(word.strip()+'\n')
listOfStrings.sort()
for item in listOfStrings:
ofile.write(item) #Based on your code, i'm assuming that 'ofile' is defined outside the function.
1- Append your words to a list.
2- Sort the list.
3- Write the list to the file.
word_list = []
for word in ifile:
if word.find(s) != -1:
word_list.append(word.strip())
word_list = sorted(word_list)
ofile.write("\n".join(word_list))

count suffixes appearing in the word file

I have got this python program which reads through a wordlist file and checks for the suffixes ending which are given in another file using endswith() method.
the suffixes to check for is saved into the list: suffixList[]
The count is being taken using suffixCount[]
The following is my code:
fd = open(filename, 'r')
print 'Suffixes: '
x = len(suffixList)
for line in fd:
for wordp in range(0,x):
if word.endswith(suffixList[wordp]):
suffixCount[wordp] = suffixCount[wordp]+1
for output in range(0,x):
print "%-6s %10i"%(prefixList[output], prefixCount[output])
fd.close()
The output is this :
Suffixes:
able 0
ible 0
ation 0
the program is unable to reach this loop :
if word.endswith(suffixList[wordp]):
You need to strip the newline:
word = ln.rstrip().lower()
The words are coming from a file so each line ends with a newline character. You are then trying to use endswith which always fails as none of your suffixes end with a newline.
I would also change the function to return the values you want:
def store_roots(start, end):
with open("rootsPrefixesSuffixes.txt") as fs:
lst = [line.split()[0] for line in map(str.strip, fs)
if '#' not in line and line]
return lst, dict.fromkeys(lst[start:end], 0)
lst, sfx_dict = store_roots(22, 30) # List, SuffixList
Then slice from the end and see if the substring is in the dict:
with open('longWordList.txt') as fd:
print('Suffixes: ')
mx, mn = max(sfx_dict, key=len), min(sfx_dict, key=len)
for ln in map(str.rstrip, fd):
suf = ln[-mx:]
for i in range(mx-1, mn-1, -1):
if suf in sfx_dict:
sfx_dict[suf] += 1
suf = suf[-i:]
for k,v in sfx_dict:
print("Suffix = {} Count = {}".format(k,v))
Slicing the end of the string incrementally should be faster than checking every string especially if you have numerous suffixes that are the same length. At most it does mx - mn iterations, so if you had 20 four character suffixes you would only need to check the dict once, only one n length substring can be matched at a time so we would kill n length substrings at the one time with a single slice and lookup.
You could use a Counter to count the occurrences of suffix:
from collections import Counter
with open("rootsPrefixesSuffixes.txt") as fp:
List = [line.strip() for line in fp if line and '#' not in line]
suffixes = List[22:30] # ?
with open('longWordList.txt') as fp:
c = Counter(s for word in fp for s in suffixes if word.rstrip().lower().endswith(s))
print(c)
Note: add .split()[0] if there are more than one words per line you want to ignore, otherwise this is unnecessary.

Categories

Resources