Expanding stack with list comprehensions [duplicate] - python

This question already has answers here:
What does "list comprehension" and similar mean? How does it work and how can I use it?
(5 answers)
List comprehension returning values plus [None, None, None], why? [duplicate]
(4 answers)
Appending item to lists within a list comprehension
(7 answers)
List comprehension output is None [duplicate]
(3 answers)
Closed 5 years ago.
I want to put the unique items from one list to another list, i.e eliminating duplicate items. When I do it by the longer method I am able to do it see for example.
>>>new_list = []
>>>a = ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun']
>>> for word in a:
if word not in a:
new_list.append(word)
>>> new_list
['It', 'is', 'the', 'east', 'and', 'Juliet', 'sun']
But when try to accomplish this using list comprehension in a single line the each iteration returns value "None"
>>> new_list = []
>>> a = ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun']
>>> new_list = [new_list.append(word) for word in a if word not in new_list]
Can someone please help in understanding whats going wrong in the list comprehension.
Thanks in Advance
Umesh

List comprehensions provide a concise way to create lists. Common
applications are to make new lists where each element is the result of
some operations applied to each member of another sequence or
iterable, or to create a subsequence of those elements that satisfy a
certain condition.
Maybe you can try this:
>>> new_list = []
>>> a = ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun']
>>> unused=[new_list.append(word) for word in a if word not in new_list]
>>> new_list
['It', 'is', 'the', 'east', 'and', 'Juliet', 'sun']
>>> unused
[None, None, None, None, None, None, None]
Notice:
append() returns None if the inserted operation is successful.
Another way, you can try to use set to remove duplicate item:
>>> a = ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun']
>>> list(set(a))
['and', 'sun', 'is', 'It', 'the', 'east', 'Juliet']

If you want a unique list of words, you can use set().
list(set(a))
# returns:
# ['It', 'is', 'east', 'and', 'the', 'sun', 'Juliet']
If the order is important, try:
new_list = []
for word in a:
if not a in new_list:
new_list.append(word)

Related

beginner issue with python : how to make one list of separated lines in a file in python

I have an issue as a beginner that made me exhausted trying to solve it so many times/ways but still feel dump, the problem is that I have a small file that I read in python and I have to make a list of the whole lines to sort it in alphabetical order. but when I try to make it in a list, it makes a separate list for each line.
here is my may that I tried to solve the issue using it:
file = open("romeo.txt")
for line in file:
words = line.split()
unique = list()
if words not in unique:
unique.extend(words)
unique.sort()
print(unique)
output:
['But', 'breaks', 'light', 'soft', 'through', 'what', 'window', 'yonder']
['It', 'Juliet', 'and', 'east', 'is', 'is', 'sun', 'the', 'the']
['Arise', 'and', 'envious', 'fair', 'kill', 'moon', 'sun', 'the']
['Who', 'already', 'and', 'grief', 'is', 'pale', 'sick', 'with']
To have a list of all lines you can use simple
with open(your_file, 'r') as f:
data = [''.join(x.split('\n')) for x in f.readlines()] # I used simple list comprehension to delete the `\n` at the end.
In data you have each line in a list. To sort the list you have to use sorted()
new_list = sorted(data)
now the new_list is the sorted list.
You have a inbuilt function for it
lines_of_files = open("filename.txt").readlines()
This returns a list of each line in the file.Hope this solves you question

Find words that do not match a list in a list

Would like to find words in a list that do not match words in a master list.
Code is:
master = ['This', 'is', 'a', 'pond', 'full', 'of', 'good', 'words']
dontfindme = ['po', 'go', 'a']
Expected result is:
['This', 'is', 'full', 'of', 'words']
Can do:
list(set(master).difference(set([m for m in master for df in dontfindme if df in m])))
...but it screws up the order.
Is there a better way using just list comprehension?
master = ['This', 'is', 'a', 'pond', 'full', 'of', 'good', 'words']
dontfindme = ['po', 'go', 'a']
result = [x for x in master if all(item not in x for item in dontfindme)]
print(result)
Gives:
['This', 'is', 'full', 'of', 'words']
You can use filter() python built-in method.
filter(function, iterable)
Construct an iterator from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed.
Note that filter(function, iterable) is equivalent to the generator expression (item for item in iterable if function(item)) if function is not None and (item for item in iterable if item) if function is None.
def _filter():
master = ['This', 'is', 'a', 'pond', 'full', 'of', 'good', 'words']
dontfindme = ['po', 'go', 'a']
return list(filter(lambda x: all([item not in x for item in dontfindme]), master))
if __name__ == '__main__':
print(_filter())
Output:
['This', 'is', 'full', 'of', 'words']

removing common words from a text file

I am trying to remove common words from a text. for example the sentence
"It is not a commonplace river, but on the contrary is in all ways remarkable."
I want to turn it into just unique words. This means removing "it", "but", "a" etc. I have a text file that has all the common words and another text file that contains a paragraph. How can I delete the common words in the paragraph text file?
For example:
['It', 'is', 'not', 'a', 'commonplace', 'river', 'but', 'on', 'the', 'contrary', 'is', 'in', 'all', 'ways', 'remarkable']
How do I remove the common words from the file efficiently. I have a text file called common.txt that has all the common words listed. How do I use that list to remove identical words in the sentence above. End output I want:
['commonplace', 'river', 'contrary', 'remarkable']
Does that make sense?
Thanks.
you would want to use "set" objects in python.
If order and number of occurrence are not important:
str_list = ['It', 'is', 'not', 'a', 'commonplace', 'river', 'but', 'on', 'the', 'contrary', 'is', 'in', 'all', 'ways', 'remarkable']
common_words = ['It', 'is', 'not', 'a', 'but', 'on', 'the', 'in', 'all', 'ways','other_words']
set(str_list) - set(common_words)
>>> {'contrary', 'commonplace', 'river', 'remarkable'}
If both are important:
#Using "set" is so much faster
common_set = set(common_words)
[s for s in str_list if not s in common_set]
>>> ['commonplace', 'river', 'contrary', 'remarkable']
Here's an example that you can use:
l = text.replace(",","").replace(".","").split(" ")
occurs = {}
for word in l:
occurs[word] = l.count(word)
resultx = ''
for word in occurs.keys()
if occurs[word] < 3:
resultx += word + " "
resultx = resultx[:-1]
you can change 3 with what you think suited or based it on the average using :
occurs.values()/len(occurs)
Additional
if you want it to be Case insensitive change the 1st line with :
l = text.replace(",","").replace(".","").lower().split(" ")
Most simple method would be just to read() your common.txt and then use list comprehension and only take the words that are not in the file we read
with open('common.txt') as f:
content = f.read()
s = ['It', 'is', 'not', 'a', 'commonplace', 'river', 'but', 'on', 'the', 'contrary', 'is', 'in', 'all', 'ways', 'remarkable']
res = [i for i in s if i not in content]
print(res)
# ['commonplace', 'river', 'contrary', 'remarkable']
filter also works here
res = list(filter(lambda x: x not in content, s))

finding gappy sublists within a larger list

Let's say I have a list like this:
[['she', 'is', 'a', 'student'],
['she', 'is', 'a', 'lawer'],
['she', 'is', 'a', 'great', 'student'],
['i', 'am', 'a', 'teacher'],
['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]
Now I have a list like this:
['she', 'is', 'student']
I want to query the larger list with this one, and return all the lists that contain the words within the query list in the same order. There might be gaps, but the order should be the same. How can I do that? I tried using the in operator but I don't get the desired output.
If all that you care about is that the words appear in order somehwere in the array, you can use a collections.deque and popleft to iterate through the list, and if the deque is emptied, you have found a valid match:
from collections import deque
def find_gappy(arr, m):
dq = deque(m)
for word in arr:
if word == dq[0]:
dq.popleft()
if not dq:
return True
return False
By comparing each word in arr with the first element of dq, we know that when we find a match, it has been found in the correct order, and then we popleft, so we now are comparing with the next element in the deque.
To filter your initial list, you can use a simple list comprehension that filters based on the result of find_gappy:
matches = ['she', 'is', 'student']
x = [i for i in x if find_gappy(i, matches)]
# [['she', 'is', 'a', 'student'], ['she', 'is', 'a', 'great', 'student'], ['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]
You can compare two lists, with a function like this one. The way it works is it loops through your shorter list, and every time it finds the next word in the long list, cuts off the first part of the longer list at that point. If it can't find the word it returns false.
def is_sub_sequence(long_list, short_list):
for word in short_list:
if word in long_list:
i = long_list.index(word)
long_list = long_list[i+1:]
else:
return False
return True
Now you have a function to tell you if the list is the desired type, you can filter out all the lists you need from the 'list of lists' using a list comprehension like the following:
a = [['she', 'is', 'a', 'student'],
['she', 'is', 'a', 'lawer'],
['she', 'is', 'a', 'great', 'student'],
['i', 'am', 'a', 'teacher'],
['she', 'is', 'a', 'very', 'very', 'exceptionally', 'good', 'student']]
b = ['she', 'is', 'student']
filtered = [x for x in a if is_sub_sequence(x,b)]
The list filtered will include only the lists of the desired type.

Python append words to a list from file

I'm writing a program to read text from a file into a list, split it into a list of words using the split function. And for each word, I need to check it if its already in the list, if not I need to add it to the list using the append function.
The desired output is:
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']
My output is :
[['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks', 'It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun', 'Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon', 'Who', 'is', 'already', 'sick', 'and', 'pale', 'with', 'grief']]
I have been trying to sort it and remove the double square brackets "[[ & ]]" in the begining and end but I'm not able to do so. And fo some reason the sort() function does not seem to work.
Please let me know where I am making a mistake.
word_list = []
word_list = [open('romeo.txt').read().split()]
for item in word_list:
if item in word_list:
continue
else:
word_list.append(item)
word_list.sort()
print word_list
Remove brackets
word_list = open('romeo.txt').read().split()
Use two separate variables. Also, str.split() returns a list so no need to put [] around it:
word_list = []
word_list2 = open('romeo.txt').read().split()
for item in word_list2:
if item in word_list:
continue
else:
word_list.append(item)
word_list.sort()
print word_list
At the moment you're checking if item in word_list:, which will always be true because item is from word_list. Make item iterate from another list.
If order doesn't matter, it's a one liner
uniq_words = set(open('romeo.txt').read().split())
If order matters, then
uniq_words = []
for word in open('romeo.txt').read().split():
if word not in uniq_words:
uniq_words.append(word)
If you want to sort, then take the first approach and use sorted().
The statement open('remeo.txt).read().split() returns a list already so remove the [ ] from the [open('remeo.txt).read().split() ]
if i say
word = "Hello\nPeter"
s_word = [word.split()] # print [['Hello', wPeter']]
But
s_word = word.split() # print ['Hello', wPeter']
Split returns an list, so no need to put square brackets around the open...split. To remove duplicates use a set:
word_list = sorted(set(open('romeo.txt').read().split()))
print word_list

Categories

Resources