Python looping through lists - python

I have a list called:
word_list_pet_image = [['beagle', '01125.jpg'], ['saint', 'bernard', '08010.jpg']]
There is more data in this list but I kept it short. I am trying to iterate through this list and check to see if the word is only alphabetical characters if this is true append the word to a new list called
pet_labels = []
So far I have:
word_list_pet_image = []
for word in low_pet_image:
word_list_pet_image.append(word.split("_"))
for word in word_list_pet_image:
if word.isalpha():
pet_labels.append(word)
print(pet_labels)
For example I am trying to put the word beagle into the list pet_labels, but skip 01125.jpg. see below.
pet_labels = ['beagles', 'Saint Bernard']
I am getting a atributeError
AtributeError: 'list' object has no attribute 'isalpha'
I am sure it has to do with me not iterating through the list properly.

It looks like you are trying to join alphabetical words in each sublist. A list comprehension would be effective here.
word_list = [['beagle', '01125.jpg'], ['saint', 'bernard', '08010.jpg']]
pet_labels = [' '.join(w for w in l if w.isalpha()) for l in word_list]
>>> ['beagle', 'saint bernard']

You have lists of lists, so the brute force method would be to nest loops. like:
for pair in word_list_pet_image:
for word in pair:
if word.isalpha():
#append to list
Another option might be single for loop, but then slicing it:
for word in word_list_pet_image:
if word[0].isalpha():
#append to list

word_list = [['beagle', '01125.jpg'], ['saint', 'bernard', '08010.jpg']]
Why not list comprehension (only if non-all alphabetical letters element is always at last):
pet_labels = [' '.join(l[:-1]) for l in word_list]

word_list_pet_image.append(word.split("_"))
.split() returns lists, so word_list_pet_image itself contains lists, not plain words.

Related

how to compare substring in a list and fetch the whole string if its present using python [duplicate]

This question already has answers here:
How to check if a string is a substring of items in a list of strings
(18 answers)
Pythonic way to print list items
(13 answers)
Closed 6 months ago.
I have a list containing strings
list=["geeks for geeks","string1","string2","geeks"]
I want to check if "geek" is present in the list, if present fetch me the whole string from the list
if geek in list:
output :
geeks for geeks
geeks
how to achieve this?
Sounds like you want a list comprehension.
lst = ["geeks for geeks", "string1", "string2", "geeks"]
lst2 = [s for s in lst if "geek" in s]
# ['geeks for geeks', 'geeks']
You may wish to use str.lower to account for different capitalizations of "geek".
lst2 = [s for s in lst if "geek" in s.lower()]
Please replace the Python reserved word list with another variable name like lis or words. You can try this
lis = ["geeks for geeks", "string1", "string2", "geeks"]
for word in lis:
if "geek" in word:
print(word)
The output is
geeks for geeks
geeks
Or if you want to append to a new list
words = []
for word in lis:
if "geek" in word:
words.append(word)
print(words)
Or using list comprehension (thanks #Chris!)
[word for word in lis if "geek" in word]
The output is a list
['geeks for geeks', 'geeks']

Filtering a list of strings

The problem has the following parameters:
Write a function called only_a that takes in a list of strings and returns a list of strings that contain the letter 'a'.
This is code I am trying to use:
def only_a(starting_list):
i = 'a'
final_list = ()
for char in starting_list:
if i in starting_list:
final_list = final_list + starting_list.append[0]
return final_list
t_string_list = ['I like apples', 'I like tea', 'I don\'t', 'Never']
print(only_a(t_string_list))
I keep getting () as a result.
You should a bit change your solution to have it works:
def only_a(starting_list):
i = 'a'
final_list = []
for char in starting_list:
if i in char:
final_list.append(char)
return final_list
t_string_list = ['I like apples', 'I like tea', 'I don\'t', 'Never']
print(only_a(t_string_list))
There is a confusion of types here.
() is a tuple, [] is a list, and
final_list = final_list + starting_list.append[0] is treating final_list as a string since you are concatenating(adding) another presumably string. But
starting_list.append[0] errors are that append is not an attribute of starting_list and/or that the append list function is not subscriptable(can't be indexed). anotherlist.append(alist[0]) appends the value in alist at [0] to anotherlist.
[0] only gets you a value at index 0 of a string or tuple or list
this is why you are getting an empty tuple but I am surprised it gets that far.
for char in starting_list: implies that you are thinking of the list as a string instead of a list of strings.
So a suggestion to explore the different types in Python.

Python removing word if subset of other word in list

A simple puzzle but I cannot wrap my head around it:
In words:
I have a list of words. If in my list, the word is a "subset" of another value in list, then remove.
Input: ['car', 'car-10', 'truck-20']
Output: ['car-10', 'truck-20']
We have removed 'car' because it is a subset of 'car-10'. 'car-10' is not a subset of 'car'
Input: ['car', 'car-10', 'car-100']
Output: ['car-100']
We have removed 'car' and 'car-10' because it is a subset of 'car-100'.
The one I am really trying to solve, don't use numbers:
Input: ['car-strong', 'car', 'truck-weak']
Output: ['car-strong', 'truck-weak']
We might have 'truck', 'bananas', 'apple', and things would be 'apple-10'.
Note that the "type" (car, truck, apple etc) is always the beginning of the word.
The typical list to parse is around 5-10 elements long. (brute forceable i guess?)
But there are around 200,000 of these short lists to "clean"... is also the issue.
brute force
l =['car', 'car-10', 'truck-20']
remove_me = [x for x in l
if any([y.startswith(x) for y in l if x!=y])]
result = [x for x in l if x not in remove_me]
For better performance, order the list alphabetically to find candidate 'superset' faster, e.g. along the lines of
Python: Remove elements from the list which are prefix of other
This is a solution that should work for all kind input formats:
input = ['car-strong', 'car', 'truck-weak']
delete = []
for idx,str in enumerate(input):
for idx2,str2 in enumerate(input):
if str in str2 and idx != idx2:
delete.append(str)
for str in delete:
input.remove(str)
print(input)

Extracting all words starting with a certain character

I have a list of lists, in which I store sentences as strings. What I want to do is to get only the words starting with #. In order to do that, I split the sentences into words and now trying to pick only the words that start with # and exclude all the other words.
# to create the empty list:
lst = []
# to iterate through the columns:
for i in range(0,len(df)):
lst.append(df['col1'][i].split())
If I am mistaken you just need flat list containing all words starting with particular character. For doing that I would employ list flattening (via itertools):
import itertools
first = 'f' #look for words starting with f letter
nested_list = [['This is first sentence'],['This is following sentence']]
flat_list = list(itertools.chain.from_iterable(nested_list))
nested_words = [i.split(' ') for i in flat_list]
words = list(itertools.chain.from_iterable(nested_words))
lst = [i for i in words if i[0]==first]
print(lst) #output: ['first', 'following']

How to return the count of words from a list of words that appear in a list of lists?

I have a very large list of strings like this:
list_strings = ['storm', 'squall', 'overcloud',...,'cloud_up', 'cloud_over', 'plague', 'blight', 'fog_up', 'haze']
and a very large list of lists like this:
lis_of_lis = [['the storm was good blight'],['this is overcloud'],...,[there was a plague stormicide]]
How can I return a list of counts of all the words that appear in list_strings on each sub-list of lis_of_lis. For instance for the above example this will be the desired output: [2,1,1]
For example:
['storm', 'squall', 'overcloud',...,'cloud_up', 'cloud_over', 'plague', 'blight', 'fog_up', 'haze']
['the storm was good blight']
The count is 2, since storm and blight appear in the first sublist (lis_of_lis)
['storm', 'squall', 'overcloud',...,'cloud_up', 'cloud_over', 'plague', 'blight', 'fog_up', 'haze']
['this is overcloud stormicide']
The count is 1, since overcloud appear in the first sublist (lis_of_lis)
since stormicide doesnt appear in the first list
['storm', 'squall', 'overcloud',...,'cloud_up', 'cloud_over', 'plague', 'blight', 'fog_up', 'haze']
[there was a plague]
The count is 1, since plague appear in the first sublist (lis_of_lis)
Hence is the desired output [2,1,1]
The problem with all the answers is that are counting all the substrings in a word instead of the full word
You can use sum function within a list comprehension :
[sum(1 for i in list_strings if i in sub[0]) for sub in lis_of_lis]
result = []
for sentence in lis_of_lis:
result.append(0)
for word in list_strings:
if word in sentence[0]:
result[-1]+=1
print(result)
which is the long version of
result = [sum(1 for word in list_strings if word in sentence[0]) for sentence in lis_of_lis]
This will return [2,2,1] for your example.
If you want only whole words, add spaces before and after the words / sentences:
result = []
for sentence in lis_of_lis:
result.append(0)
for word in list_strings:
if ' '+word+' ' in ' '+sentence[0]+' ':
result[-1]+=1
print(result)
or short version:
result = [sum(1 for word in list_strings if ' '+word+' ' in ' '+sentence[0]+' ') for sentence in lis_of_lis]
This will return [2,1,1] for your example.
This creates a dictionary with the words in list_string as keys, and the values starting at 0. It then iterates through the lis_of_lis, splits the phrase up into a list of words, iterates through that, and checks to see if they are in the dictionary. If they are, 1 is added to the corresponding value.
word_count = dict()
for word in list_string:
word_count[word] = 0
for phrase in lis_of_lis:
words_in_phrase = phrase.split()
for word in words_in_phrase:
if word in word_count:
word_count[word] += 1
This will create a dictionary with the words as keys, and the frequency as values. I'll leave it to you to get the correct output out of that data structure.

Categories

Resources