Recreating a sentence from a list of indexed words - python

I am having a little trouble figuring out how to recreate the sentence:
"Believe in the me, that believes in you!"
(a little cringe for those who have watched Gurren Lagann...) while using the indexes that I obtained by enumerating the list of words:
['believe', 'in', 'the', 'me', 'that', 'believes', 'you']
This list was .split() and .lower() to remove punctuation in a previous bit of code I made to make the words list file and index list file.
When indexed, these are the words in their enumerated form:
(1, 'believe')
(2, 'in')
(3, 'the')
(4, 'me')
(5, 'that')
(6, 'believes')
(7, 'you')
That is all I have so far as I have been searching for a solution which none have worked for my code. Here is the whole thing so far:
with open("Words list file 2.txt", 'r') as File:
contain = File.read()
contain = contain.replace("[", "")
contain = contain.replace("]", "")
contain = contain.replace(",", "")
contain = contain.replace("'", "")
contain = contain.split()
print("The orginal file reads:")#prints to tell the user the orginal file
print(contain)
for i in enumerate(contain, start = 1):
print(i)

You may join the strings in list via using join method as:
my_list = ['believe', 'in', 'the', 'me', 'that', 'believes', 'you']
>>> ' '.join(my_list)
'believe in the me that believes you'
# ^ missing "in"
But this will result in string with missing "in" after "believes". If you want the make a new string based on index of words in previous list, you may use a temporary list to store the index and then do join on a generator expression as:
>>> temp_list = [0, 1, 2, 3, 4, 5, 1, 6]
>>> ' '.join(my_list[i] for i in temp_list)
'believe in the me that believes in you'

' '.join(['believe', 'in', 'the', 'me', 'that', 'believes', 'you'])
Not sure what the original file contains or what the "contains" variable is when loaded from the file. Please show that.

Related

Stemmer function that takes a string and returns the stems of each word in a list

I am trying to create this function which takes a string as input and returns a list containing the stem of each word in the string. The problem is, that using a nested for loop, the words in the string are appended multiple times in the list. Is there a way to avoid this?
def stemmer(text):
stemmed_string = []
res = text.split()
suffixes = ('ed', 'ly', 'ing')
for word in res:
for i in range(len(suffixes)):
if word.endswith(suffixes[i]):
stemmed_string.append(word[:-len(suffixes[i])])
elif len(word) > 8:
stemmed_string.append(word[:8])
else:
stemmed_string.append(word)
return stemmed_string
If I call the function on this text ('I have a dog is barking') this is the output:
['I',
'I',
'I',
'have',
'have',
'have',
'a',
'a',
'a',
'dog',
'dog',
'dog',
'that',
'that',
'that',
'is',
'is',
'is',
'barking',
'barking',
'bark']
You are appending something in each round of the loop over suffixes. To avoid the problem, don't do that.
It's not clear if you want to add the shortest possible string out of a set of candidates, or how to handle stacked suffixes. Here's a version which always strips as much as possible.
def stemmer(text):
stemmed_string = []
suffixes = ('ed', 'ly', 'ing')
for word in text.split():
for suffix in suffixes:
if word.endswith(suffix):
word = word[:-len(suffix)]
stemmed_string.append(word)
return stemmed_string
Notice the fixed syntax for looping over a list, too.
This will reduce "sparingly" to "spar", etc.
Like every naïve stemmer, this will also do stupid things with words like "sly" and "thing".
Demo: https://ideone.com/a7FqBp

Replacing numbers in a list of lists with corresponding lines from a text file

I have a big text file like this (without the blank space in between words but every word in each line):
this
is
my
text
and
it
should
be
awesome
.
And I have also a list like this:
index_list = [[1,2,3,4,5],[6,7,8][9,10]]
Now I want to replace every element of each list with the corresponding index line of my text file, so the expected answer would be:
new_list = [[this, is, my, text, and],[it, should, be],[awesome, .]
I tried a nasty workaround with two for loops with a range function that was way too complicated (so I thought). Then I tried it with linecache.getline, but that also has some issues:
import linecache
new_list = []
for l in index_list:
for j in l:
new_list.append(linecache.getline('text_list', j))
This does produce only one big list, which I don't want. Also, after every word I get a bad \n which I do not get when I open the file with b = open('text_list', 'r').read.splitlines() but I don't know how to implement this in my replace function (or create, rather) so I don't get [['this\n' ,'is\n' , etc...
You are very close. Just use a temp list and the append that to the main list. Also you can use str.strip to remove newline char.
Ex:
import linecache
new_list = []
index_list = [[1,2,3,4,5],[6,7,8],[9,10]]
for l in index_list:
temp = [] #Temp List
for j in l:
temp.append(linecache.getline('text_list', j).strip())
new_list.append(temp) #Append to main list.
You could use iter to do this as long as you text_list has exactly as many elements as sum(map(len, index_list))
text_list = ['this', 'is', 'my', 'text', 'and', 'it', 'should', 'be', 'awesome', '.']
index_list = [[1,2,3,4,5],[6,7,8],[9,10]]
text_list_iter = iter(text_list)
texts = [[next(text_list_iter) for _ in index] for index in index_list]
Output
[['this', 'is', 'my', 'text', 'and'], ['it', 'should', 'be'], ['awesome', '.']]
But I am not sure if this is what you wanted to do. Maybe I am assuming some sort of ordering of index_list. The other answer I can think of is this list comprehension
texts_ = [[text_list[i-1] for i in l] for l in index_list]
Output
[['this', 'is', 'my', 'text', 'and'], ['it', 'should', 'be'], ['awesome', '.']]

How to skip a line in my current program? [duplicate]

This question already has answers here:
Printing list elements on separate lines in Python
(10 answers)
Closed 4 years ago.
I've got this code. This code removes the stopwords(in the stopwords.py file) from yelp.py
def remove_stop(text, stopwords):
disallowed = set(stopwords)
return [word for word in text if word not in disallowed]
text = open('yelp.py','r').read().split()
stopwords = open('stopwords.py','r').read().split()
print(remove_stop(text, stopwords))
Currently, the output is a very long string.
I want the output to skip a line after every word in the yelp.py file.
How do i do that? Can somebody help pls!!
The current output is ['near', 'best', "I've", 'ever', 'price', 'good', 'deal.', 'For', 'less', '6', 'dollars', 'person', 'get', 'pizza', 'salad', 'want.', 'If', 'looking', 'super', 'high', 'quality', 'pizza', "I'd", 'recommend', 'going', 'elsewhere', 'looking', 'decent', 'pizza', 'great', 'price,', 'go', 'here.']
How do i get it to skip a line?
Once you have collected your output, a list l, you can print it as
print(*l, sep="\n")
where the * operator unpacks the list. Each element is used as a separate argument to the function.
Moreover, with the sep named argument you can customize the separator between items.
Full updated code:
def remove_stop(text, stopwords):
disallowed = set(stopwords)
return [word for word in text if word not in disallowed]
text = open('yelp.py','r').read().split()
stopwords = open('stopwords.py','r').read().split()
output = remove_stop(text, stopwords)
print(*output, sep="\n")
When you print a list you get one long list as ouput [0, 1, 2, 3, 4, 5, ...]. Instead of printing the list you could iterate over it:
for e in my_list:
print(e)
and you will get a newline after each element in the list.

Turn a list, that is in another list, into a string, then reverse the string

I'm new to programming in Python (and programming in general) and we were asked to develop a function to encrypt a string by rearranging the text. We were given this as a test:
encrypt('THE PRICE OF FREEDOM IS ETERNAL VIGILENCE', 5)
'SI MODEERF FO ECIRP EHT ECNELIGIV LANRETE'
We have to make sure it works for any string of any length though. I got as far as this before getting stuck:
##Define encrypt
def encrypt(text, encrypt_value):
##Split string into list
text_list = text.split()
##group text_list according to encrypt_value
split_list = [text_list[index:index+encrypt_value] for index in xrange\
(0, len(text_list), encrypt_value)]
If I printed the result now, this would give me:
encrypt("I got a jar of dirt and you don't HA", 3)
[['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
So I need to combine each of the lists in the list into a string (which I think is ' '.join(text)?), reverse it with [::-1], before joining the whole thing together into one string. But how in the world do I do that?
To combine your elements, you can try to using reduce:
l = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur, ''), l, '')
It will result in:
" I got a jar of dirt and you don't HA"
If you want to remove extra spaces:
str.replace(' ',' ').strip()
This reduce use can be easily modified to reverse each sublist right before combining their elements:
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur[::-1], ''), l, '')
Or to reverse the combined substrings just before joining all together:
str = reduce(lambda prev,cur: prev+' '+reduce(lambda subprev,word: subprev+' '+word,cur, '')[::-1], l, '')
You can do what you're looking for fairly simply with a few nested list comprehensions.
For example, you already have
split_list = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
What you want now is to reverse each triplet of words with a list comprehension, e.g. like so:
reversed_sublists = [sublist[::-1] for sublist in split_list]
// [['a', 'got', 'I'], ['dirt', 'of', 'jar'], ["don't", 'you', 'and'], ['HA']]
Then reverse each string in each sublist
reversed_strings = [[substr[::-1] for substr in sublist] for sublist in split_list]
// [['a', 'tog', 'I'], ['trid', 'fo', 'raj'], ["t'nod", 'uoy', 'dna'], ['AH']]
And then join them all up, as you said, with ' '.join(), e.g.
' '.join([' '.join(sublist) for sublist in reversed_strings])
// "a tog I trid fo raj t'nod uoy dna AH"
But nothing says you can't just do all those things at the same time with some nesting:
' '.join([' '.join([substring[::-1] for substring in sublist[::-1]]) for sublist in split_list])
// "a tog I trid fo raj t'nod uoy dna AH"
I personally prefer the aesthetic of this (and the fact you don't need to go back to strip spaces), but I'm not sure whether it performs better than Pablo's solution.
b = [['I', 'got', 'a'], ['jar', 'of', 'dirt'], ['and', 'you', "don't"], ['HA']]
print "".join([j[::-1]+' ' for i in b for j in reversed(i)])
a tog I trid fo raj t'nod uoy dna AH
Is this what you wanted...
Is there any reason you are trying to do it in one list comprehension?
It's probably easier to conceptualize (and implement) by breaking it down into parts:
def encrypt(text, encrypt_value):
reversed_words = [w[::-1] for w in text.split()]
rearranged_words = reversed_words[encrypt_value:] + reversed_words[:encrypt_value]
return ' '.join(rearranged_words[::-1])
Example output:
In [6]: encrypt('THE PRICE OF FREEDOM IS ETERNAL VIGILENCE', 5)
Out[6]: 'SI MODEERF FO ECIRP EHT ECNELIGIV LANRETE'

compare words in two lists in python

I would appreciate someone's help on this probably simple matter: I have a long list of words in the form ['word', 'another', 'word', 'and', 'yet', 'another']. I want to compare these words to a list that I specify, thus looking for target words whether they are contained in the first list or not.
I would like to output which of my "search" words are contained in the first list and how many times they appear. I tried something like list(set(a).intersection(set(b))) - but it splits up the words and compares letters instead.
How can I write in a list of words to compare with the existing long list? And how can I output co-occurences and their frequencies? Thank you so much for your time and help.
>>> lst = ['word', 'another', 'word', 'and', 'yet', 'another']
>>> search = ['word', 'and', 'but']
>>> [(w, lst.count(w)) for w in set(lst) if w in search]
[('and', 1), ('word', 2)]
This code basically iterates through the unique elements of lst, and if the element is in the search list, it adds the word, along with the number of occurences, to the resulting list.
Preprocess your list of words with a Counter:
from collections import Counter
a = ['word', 'another', 'word', 'and', 'yet', 'another']
c = Counter(a)
# c == Counter({'word': 2, 'another': 2, 'and': 1, 'yet': 1})
Now you can iterate over your new list of words and check whether they are contained within this Counter-dictionary and the value gives you their number of appearance in the original list:
words = ['word', 'no', 'another']
for w in words:
print w, c.get(w, 0)
which prints:
word 2
no 0
another 2
or output it in a list:
[(w, c.get(w, 0)) for w in words]
# returns [('word', 2), ('no', 0), ('another', 2)]

Categories

Resources