This question already has answers here:
Printing list elements on separate lines in Python
(10 answers)
Closed 4 years ago.
I've got this code. This code removes the stopwords(in the stopwords.py file) from yelp.py
def remove_stop(text, stopwords):
disallowed = set(stopwords)
return [word for word in text if word not in disallowed]
text = open('yelp.py','r').read().split()
stopwords = open('stopwords.py','r').read().split()
print(remove_stop(text, stopwords))
Currently, the output is a very long string.
I want the output to skip a line after every word in the yelp.py file.
How do i do that? Can somebody help pls!!
The current output is ['near', 'best', "I've", 'ever', 'price', 'good', 'deal.', 'For', 'less', '6', 'dollars', 'person', 'get', 'pizza', 'salad', 'want.', 'If', 'looking', 'super', 'high', 'quality', 'pizza', "I'd", 'recommend', 'going', 'elsewhere', 'looking', 'decent', 'pizza', 'great', 'price,', 'go', 'here.']
How do i get it to skip a line?
Once you have collected your output, a list l, you can print it as
print(*l, sep="\n")
where the * operator unpacks the list. Each element is used as a separate argument to the function.
Moreover, with the sep named argument you can customize the separator between items.
Full updated code:
def remove_stop(text, stopwords):
disallowed = set(stopwords)
return [word for word in text if word not in disallowed]
text = open('yelp.py','r').read().split()
stopwords = open('stopwords.py','r').read().split()
output = remove_stop(text, stopwords)
print(*output, sep="\n")
When you print a list you get one long list as ouput [0, 1, 2, 3, 4, 5, ...]. Instead of printing the list you could iterate over it:
for e in my_list:
print(e)
and you will get a newline after each element in the list.
Related
This question already has answers here:
Removing duplicates in lists
(56 answers)
Closed 1 year ago.
I have a string like :
'hi', 'what', 'are', 'are', 'what', 'hi'
I want to remove a specific repeated word. For example:
'hi', 'what', 'are', 'are', 'what'
Here, I am just removing the repeated word of hi, and keeping rest of the repeated words.
How to do this using regex?
Regex is used for text search. You have structured data, so this is unnecessary.
def remove_all_but_first(iterable, removeword='hi'):
remove = False
for word in iterable:
if word == removeword:
if remove:
continue
else:
remove = True
yield word
Note that this will return an iterator, not a list. Cast the result to list if you need it to remain a list.
You can do this
import re
s= "['hi', 'what', 'are', 'are', 'what', 'hi']"
# convert string to list. Remove first and last char, remove ' and empty spaces
s=s[1:-1].replace("'",'').replace(' ','').split(',')
remove = 'hi'
# store the index of first occurance so that we can add it after removing all occurance
firstIndex = s.index(remove)
# regex to remove all occurances of a word
regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
op = regex.sub("", '|'.join(s)).split('|')
# clean up the list by removing empty items
while("" in op) :
op.remove("")
# re-insert the removed word in the same index as its first occurance
op.insert(firstIndex, remove)
print(str(op))
You don't need regex for that, convert the string to list and then you can find the index of the first occurrence of the word and filter it from a slice of the rest of the list
lst = "['hi', 'what', 'are', 'are', 'what', 'hi']"
lst = ast.literal_eval(lst)
word = 'hi'
index = lst.index('hi') + 1
lst = lst[:index] + [x for x in lst[index:] if x != word]
print(lst) # ['hi', 'what', 'are', 'are', 'what']
This question already has answers here:
Some built-in to pad a list in python
(14 answers)
Finding length of the longest list in an irregular list of lists
(10 answers)
Closed 6 months ago.
I have a list of lists of sentences and I want to pad all sentences so that they are of the same length.
I was able to do this but I am trying to find most optimal ways to do things and challenge myself.
max_length = max(len(sent) for sent in sents)
list_length = len(sents)
sents_padded = [[pad_token for i in range(max_length)] for j in range(list_length)]
for i,sent in enumerate(sents):
sents_padded[i][0:len(sent)] = sent
and I used the inputs:
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
Is my method an efficient way to do it or there are better ways to do it?
This is provided in itertools (in python3) for iteration, with zip_longest, which you can just invert normally with zip(*), and pass it to list if you prefer that over an iterator.
import itertools
from pprint import pprint
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
padded = zip(*itertools.zip_longest(*sents, fillvalue=pad_token))
pprint (list(padded))
[['Hello', 'World', 'Hi', 'Hi'],
['Where', 'are', 'you', 'Hi'],
['I', 'am', 'doing', 'fine']]
Here is how you can use str.ljust() to pad each string, and use max() with a key of len to find the number in which to pad each string:
lst = ['Hello World', 'Good day!', 'How are you?']
l = len(max(lst, key=len)) # The length of the longest sentence
lst = [s.ljust(l) for s in lst] # Pad each sentence with l
print(lst)
Output:
['Hello World ',
'Good day! ',
'How are you?']
Assumption:
The output should be the same as OP output (i.e. same number of words in each sublist).
Inputs:
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
Following 1-liner produces the same output as OP code.
sents_padded = [sent + [pad_token]*(max_length - len(sent)) for sent in sents]
print(sents_padded)
# [['Hello', 'World', 'Hi', 'Hi'], ['Where', 'are', 'you', 'Hi'], ['I', 'am', 'doing', 'fine']]
This seemed to be faster when I timed it:
maxi = 0
for sent in sents:
if sent.__len__() > maxi:
maxi = sent.__len__()
for sent in sents:
while sent.__len__() < maxi:
sent.append(pad_token)
print(sents)
I have a big text file like this (without the blank space in between words but every word in each line):
this
is
my
text
and
it
should
be
awesome
.
And I have also a list like this:
index_list = [[1,2,3,4,5],[6,7,8][9,10]]
Now I want to replace every element of each list with the corresponding index line of my text file, so the expected answer would be:
new_list = [[this, is, my, text, and],[it, should, be],[awesome, .]
I tried a nasty workaround with two for loops with a range function that was way too complicated (so I thought). Then I tried it with linecache.getline, but that also has some issues:
import linecache
new_list = []
for l in index_list:
for j in l:
new_list.append(linecache.getline('text_list', j))
This does produce only one big list, which I don't want. Also, after every word I get a bad \n which I do not get when I open the file with b = open('text_list', 'r').read.splitlines() but I don't know how to implement this in my replace function (or create, rather) so I don't get [['this\n' ,'is\n' , etc...
You are very close. Just use a temp list and the append that to the main list. Also you can use str.strip to remove newline char.
Ex:
import linecache
new_list = []
index_list = [[1,2,3,4,5],[6,7,8],[9,10]]
for l in index_list:
temp = [] #Temp List
for j in l:
temp.append(linecache.getline('text_list', j).strip())
new_list.append(temp) #Append to main list.
You could use iter to do this as long as you text_list has exactly as many elements as sum(map(len, index_list))
text_list = ['this', 'is', 'my', 'text', 'and', 'it', 'should', 'be', 'awesome', '.']
index_list = [[1,2,3,4,5],[6,7,8],[9,10]]
text_list_iter = iter(text_list)
texts = [[next(text_list_iter) for _ in index] for index in index_list]
Output
[['this', 'is', 'my', 'text', 'and'], ['it', 'should', 'be'], ['awesome', '.']]
But I am not sure if this is what you wanted to do. Maybe I am assuming some sort of ordering of index_list. The other answer I can think of is this list comprehension
texts_ = [[text_list[i-1] for i in l] for l in index_list]
Output
[['this', 'is', 'my', 'text', 'and'], ['it', 'should', 'be'], ['awesome', '.']]
This question already has answers here:
Split Strings into words with multiple word boundary delimiters
(31 answers)
Closed 4 years ago.
The python code below reads 'resting-place' as one word.
The modified list shows up as: ['This', 'is', 'my', 'resting-place.']
I want it to show as: ['This', 'is', 'my', 'resting', 'place']
Thereby, giving me a total of 5 words instead of 4 words in the modified list.
original = 'This is my resting-place.'
modified = original.split()
print(modified)
numWords = 0
for word in modified:
numWords += 1
print ('Total words are:', numWords)
Output is:
Total words are: 4
I want the output to have 5 words.
To count number of words in a sentence with - separates to two words without splitting:
>>> original = 'This is my resting-place.'
>>> sum(map(original.strip().count, [' ','-'])) + 1
5
Here is the code:
s='This is my resting-place.'
len(s.split(" "))
4
You can use regex:
import re
original = 'This is my resting-place.'
print(re.split("\s+|-", original))
Output:
['This', 'is', 'my', 'resting', 'place.']
I think you will find what you want in this article, here you can find how to create a function where you can pass multiple parameter to split a string, in your case you'll be able to split that extra character
http://code.activestate.com/recipes/577616-split-strings-w-multiple-separators/
here is an example of the final result
>>> s = 'thing1,thing2/thing3-thing4'
>>> tsplit(s, (',', '/', '-'))
>>> ['thing1', 'thing2', 'thing3', 'thing4']
I am having a little trouble figuring out how to recreate the sentence:
"Believe in the me, that believes in you!"
(a little cringe for those who have watched Gurren Lagann...) while using the indexes that I obtained by enumerating the list of words:
['believe', 'in', 'the', 'me', 'that', 'believes', 'you']
This list was .split() and .lower() to remove punctuation in a previous bit of code I made to make the words list file and index list file.
When indexed, these are the words in their enumerated form:
(1, 'believe')
(2, 'in')
(3, 'the')
(4, 'me')
(5, 'that')
(6, 'believes')
(7, 'you')
That is all I have so far as I have been searching for a solution which none have worked for my code. Here is the whole thing so far:
with open("Words list file 2.txt", 'r') as File:
contain = File.read()
contain = contain.replace("[", "")
contain = contain.replace("]", "")
contain = contain.replace(",", "")
contain = contain.replace("'", "")
contain = contain.split()
print("The orginal file reads:")#prints to tell the user the orginal file
print(contain)
for i in enumerate(contain, start = 1):
print(i)
You may join the strings in list via using join method as:
my_list = ['believe', 'in', 'the', 'me', 'that', 'believes', 'you']
>>> ' '.join(my_list)
'believe in the me that believes you'
# ^ missing "in"
But this will result in string with missing "in" after "believes". If you want the make a new string based on index of words in previous list, you may use a temporary list to store the index and then do join on a generator expression as:
>>> temp_list = [0, 1, 2, 3, 4, 5, 1, 6]
>>> ' '.join(my_list[i] for i in temp_list)
'believe in the me that believes in you'
' '.join(['believe', 'in', 'the', 'me', 'that', 'believes', 'you'])
Not sure what the original file contains or what the "contains" variable is when loaded from the file. Please show that.