Extracting all words starting with a certain character

Extracting all words starting with a certain character - python

I have a list of lists, in which I store sentences as strings. What I want to do is to get only the words starting with #. In order to do that, I split the sentences into words and now trying to pick only the words that start with # and exclude all the other words.
# to create the empty list:
lst = []
# to iterate through the columns:
for i in range(0,len(df)):
lst.append(df['col1'][i].split())

If I am mistaken you just need flat list containing all words starting with particular character. For doing that I would employ list flattening (via itertools):
import itertools
first = 'f' #look for words starting with f letter
nested_list = [['This is first sentence'],['This is following sentence']]
flat_list = list(itertools.chain.from_iterable(nested_list))
nested_words = [i.split(' ') for i in flat_list]
words = list(itertools.chain.from_iterable(nested_words))
lst = [i for i in words if i[0]==first]
print(lst) #output: ['first', 'following']

Related

How to compare reverse strings in list of strings with the original list of strings in python?

Input a given string and check if any word in that string matches with its reverse in the same string then print that word else print $
I split the string and put the words in a list and then I reversed the words in that list. After that, I couldn't able to compare both the lists.
str = input()
x = str.split()
for i in x: # printing i shows the words in the list
str1 = i[::-1] # printing str1 shows the reverse of words in a new list
# now how to check if any word of the new list matches to any word of the old list
if(i==str):
print(i)
break
else:
print('$)
Input: suman is a si boy.
Output: is ( since reverse of 'is' is present in the same string)

You almost have it, just need to add another loop to compare each word against each inverted word. Try using the following
str = input()
x = str.split()
for i in x:
str1 = i[::-1]
for j in x: # <-- this is the new nested loop you are missing
if j == str1: # compare each inverted word against each regular word
if len(str1) > 1: # Potential condition if you would like to not include single letter words
print(i)
Update
To only print the first occurrence of a match, you could, in the second loop, only check the elements that come after. We can do this by keeping track of the index:
str = input()
x = str.split()
for index, i in enumerate(x):
str1 = i[::-1]
for j in x[index+1:]: # <-- only consider words that are ahead
if j == str1:
if len(str1) > 1:
print(i)
Note that I used index+1 in order to not consider single word palindromes a match.

a = 'suman is a si boy'
# Construct the list of words
words = a.split(' ')
# Construct the list of reversed words
reversed_words = [word[::-1] for word in words]
# Get an intersection of these lists converted to sets
print(set(words) & set(reversed_words))
will print:
{'si', 'is', 'a'}

Another way to do this is just in a list comprehension:
string = 'suman is a si boy'
output = [x for x in string.split() if x[::-1] in string.split()]
print(output)
The split on string creates a list split on spaces. Then the word is included only if the reverse is in the string.
Output is:
['is', 'a', 'si']
One note, you have a variable name str. Best not to do that as str is a Python thing and could cause other issues in your code later on.
If you want word more than one letter long then you can do:
string = 'suman is a si boy'
output = [x for x in string.split() if x[::-1] in string.split() and len(x) > 1]
print(output)
this gives:
['is', 'si']
Final Answer...
And for the final thought, in order to get just the 'is':
string = 'suman is a si boy'
seen = []
output = [x for x in string.split() if x[::-1] not in seen and not seen.append(x) and x[::-1] in string.split() and len(x) > 1]
print(output)
output is:
['is']
BUT, this is not necessarily a good way to do it, I don't believe. Basically you are storing information in seen during the list comprehension AND referencing that same list. :)

This answer wouldn't show you 'a' and won't output 'is' with 'si'.
str = input() #get input string
x = str.split() #returns list of words
y = [] #list of words
while len(x) > 0 :
a = x.pop(0) #removes first item from list and returns it, then assigns it to a
if a[::-1] in x: #checks if the reversed word is in the list of words
#the list doesn't contain that word anymore so 'a' that doesn't show twice wouldn't be returned
#and 'is' that is present with 'si' will be evaluated once
y.append(a)
print(y) # ['is']

Python looping through lists

I have a list called:
word_list_pet_image = [['beagle', '01125.jpg'], ['saint', 'bernard', '08010.jpg']]
There is more data in this list but I kept it short. I am trying to iterate through this list and check to see if the word is only alphabetical characters if this is true append the word to a new list called
pet_labels = []
So far I have:
word_list_pet_image = []
for word in low_pet_image:
word_list_pet_image.append(word.split("_"))
for word in word_list_pet_image:
if word.isalpha():
pet_labels.append(word)
print(pet_labels)
For example I am trying to put the word beagle into the list pet_labels, but skip 01125.jpg. see below.
pet_labels = ['beagles', 'Saint Bernard']
I am getting a atributeError
AtributeError: 'list' object has no attribute 'isalpha'
I am sure it has to do with me not iterating through the list properly.

It looks like you are trying to join alphabetical words in each sublist. A list comprehension would be effective here.
word_list = [['beagle', '01125.jpg'], ['saint', 'bernard', '08010.jpg']]
pet_labels = [' '.join(w for w in l if w.isalpha()) for l in word_list]
>>> ['beagle', 'saint bernard']

You have lists of lists, so the brute force method would be to nest loops. like:
for pair in word_list_pet_image:
for word in pair:
if word.isalpha():
#append to list
Another option might be single for loop, but then slicing it:
for word in word_list_pet_image:
if word[0].isalpha():
#append to list

word_list = [['beagle', '01125.jpg'], ['saint', 'bernard', '08010.jpg']]
Why not list comprehension (only if non-all alphabetical letters element is always at last):
pet_labels = [' '.join(l[:-1]) for l in word_list]

word_list_pet_image.append(word.split("_"))
.split() returns lists, so word_list_pet_image itself contains lists, not plain words.

How can I create a list that is created using the indexes and items of two other lists?

I have two lists. One is made up of positions from a sentence and the other is made up of words that make up the sentence. I want to recreate the variable sentence using poslist and wordlist.
recreate = []
sentence = "This and that, and this and this."
poslist = [1, 2, 3, 2, 4, 2, 5]
wordlist = ['This', 'and', 'that', 'this', 'this.']
I wanted to use a for loop to go through poslist and if the item in poslist was equal to the position of a word in wordlist it would append it to a new list, recreating the original list. My first try was:
for index in poslist:
recreate.append(wordlist[index])
print (recreate)
I had to make the lists strings to write the lists into a text file. When I tried splitting them again and using the code shown above it does not work. It said that the indexes needed to be slices or integers or slices not in a list. I would like a solution to this problem. Thank you.
The list of words is gotten using:
sentence = input("Enter a sentence >>") #asking the user for an input
sentence_lower = sentence.lower() #making the sentence lower case
wordlist = [] #creating a empty list
sentencelist = sentence.split() #making the sentence into a list
for word in sentencelist: #for loop iterating over the sentence as a list
if word not in wordlist:
wordlist.append(word)
txtfile = open ("part1.txt", "wt")
for word in wordlist:
txtfile.write(word +"\n")
txtfile.close()
txtfile = open ("part1.txt", "rt")
for item in txtfile:
print (item)
txtfile.close()
print (wordlist)
And the positions are gotten using:
poslist = []
textfile = open ("part2.txt", "wt")
for word in sentencelist:
poslist.append([position + 1 for position, i in enumerate(wordlist) if i == word])
print (poslist)
str1 = " ".join(str(x) for x in poslist)
textfile = open ("part2.txt", "wt")
textfile.write (str1)
textfile.close()

Lists are 0-indexed (the first item has the index 0, the second the index 1, ...), so you have to substract 1 from the indexes if you want to use "human" indexes in the poslist:
for index in poslist:
recreate.append(wordlist[index-1])
print (recreate)
Afterwards, you can glue them together again and write them to a file:
with open("thefile.txt", "w") as f:
f.write("".join(recreate))

First, your code can be simplified to:
sentence = input("Enter a sentence >>") #asking the user for an input
sentence_lower = sentence.lower() #making the sentence lower case
wordlist = [] #creating a empty list
sentencelist = sentence.split() #making the sentence into a list
with open ("part1.txt", "wt") as txtfile:
for word in sentencelist: #for loop iterating over the sentence as a list
if word not in wordlist:
wordlist.append(word)
txtfile.write(word +"\n")
poslist = [wordlist.index(word) for word in sentencelist]
print (poslist)
str1 = " ".join(str(x) for x in poslist)
with open ("part2.txt", "wt") as textfile:
textfile.write (str1)
In your original code, poslist was a list of lists instead of a list of integers.
Then, if you want to reconstruct your sentence from poslist (which is now a list of int and not a list of lists as in the code you provided) and wordlist, you can do the following:
sentence = ' '.join(wordlist[pos] for pos in poslist)

You can also do it using a generator expression and the string join method:
sentence = ' '.join(wordlist[pos-1] for pos in poslist if pos if pos <= len(wordlist))
# 'This and that, and this and this.'

You can use operator.itemgetter() for this.
from operator import itemgetter
poslist = [0, 1, 2, 1, 3, 1, 4]
wordlist = ['This', 'and', 'that', 'this', 'this.']
print(' '.join(itemgetter(*poslist)(wordlist)))
Note that I had to subtract one from all of the items in poslist, as Python is a zero-indexed language. If you need to programmatically change poslist, you could do poslist = (n - 1 for n in poslist) right after you declare it.

Python intersection of 2 UNICODE tuple/List

I am trying to search words from a file and appending resulting words from each line to a Tuple. And then I want to find intersecting words from the two tuples list_1 and list_2. But i get error-
TypeError: unhashable type: 'list'
# -*- coding: utf-8 -*-
import re
list_1 = []
list_2 = []
datafile = open(filename)
for line1 in datafile:
if '1st word to be searched' in line1:
s = line1
left, right = re.findall(r'(\S+\s+\S+)\s+1stWordToBeSearched\s+(\S+\s+\S+)', s)[0]
set1 = {left, right}
list_1.extend([left,right])
list_1 = list(list_1)
datafile1 = open(filename)
for line2 in datafile1:
if ' 2nd word to be searched' in line2:
s = line2
left, right = re.findall(r'(\S+\s+\S+)\s+2ndWordTbeSearched\s+(\S+\s+\S+)', s)[0]
set2 = {left, right}
list_2.extend([left,right])
list_2 = list(list_2)
result = set1.intersection(set2)
print (result)
in first for loop- The 'findall' searches for sentences with the word "number".
And then finds words to Left and Right of the word "number". And Creates a list-
list_1 = [of, a, of, elements]
in Second for loop- Findall searches for word "modern". and gives words to its Left and Right. And creates a 2nd list-
list_2 = [of, all, elements, are]
The File- Essays can consist of a number of elements, including literary criticism, political manifestos, learned arguments, observations of daily life, recollections, and reflections of the author of all modern elements are written in prose, but works in verse have been dubbed essays.
When list_1 and list_2 are obtained, the words common in them should be obtained.
Please note the file is NOT a English file. It is in a different Language.

There you have list inside list. fix it.
result = set(list_1).intersection(list_2)
set([]) = Ok
set([[],[]]) = Failed because list can't be hashed

You append one list object to each of your lists:
list_1.append([left,right])
and
list_2.append([left,right])
This gives you [[left, right]] for both lists, so you are trying to put the nested [left, right] list into a set as one element.
Normally, if you wanted to add multiple elements to an existing list, you'd use list.extend():
list_1.extend([left, right])
However, since your lists were empty in the first place and all you wanted to do was create a set intersection, you could just produce sets from those two elements in one step:
left, right = re.findall(r'(\S+\s+\S+)\s+1stWordToBeSearched\s+(\S+\s+\S+)', s)[0]
set1 = {left, right}
left, right = re.findall(r'(\S+\s+\S+)\s+2ndWordToBeSearched\s+(\S+\s+\S+)', s)[0]
set2 = {left, right}
result = set1.intersection(set2)
Note that you are ignoring all but the first two words! You are using [0] to take the first result of the findall() list here.
If you wanted to create an intersection of all the words, you could use a set comprehension to extract all the words into a set:
set1 = {word for matched in re.findall(r'(\S+\s+\S+)\s+1stWordToBeSearched\s+(\S+\s+\S+)', s)
for word in matched}
set1 = {word for matched in re.findall(r'(\S+\s+\S+)\s+2ndWordToBeSearched\s+(\S+\s+\S+)', s)
for word in matched}
result = set1.intersection(set2)

How to return the count of words from a list of words that appear in a list of lists?

I have a very large list of strings like this:
list_strings = ['storm', 'squall', 'overcloud',...,'cloud_up', 'cloud_over', 'plague', 'blight', 'fog_up', 'haze']
and a very large list of lists like this:
lis_of_lis = [['the storm was good blight'],['this is overcloud'],...,[there was a plague stormicide]]
How can I return a list of counts of all the words that appear in list_strings on each sub-list of lis_of_lis. For instance for the above example this will be the desired output: [2,1,1]
For example:
['storm', 'squall', 'overcloud',...,'cloud_up', 'cloud_over', 'plague', 'blight', 'fog_up', 'haze']
['the storm was good blight']
The count is 2, since storm and blight appear in the first sublist (lis_of_lis)
['storm', 'squall', 'overcloud',...,'cloud_up', 'cloud_over', 'plague', 'blight', 'fog_up', 'haze']
['this is overcloud stormicide']
The count is 1, since overcloud appear in the first sublist (lis_of_lis)
since stormicide doesnt appear in the first list
['storm', 'squall', 'overcloud',...,'cloud_up', 'cloud_over', 'plague', 'blight', 'fog_up', 'haze']
[there was a plague]
The count is 1, since plague appear in the first sublist (lis_of_lis)
Hence is the desired output [2,1,1]
The problem with all the answers is that are counting all the substrings in a word instead of the full word

You can use sum function within a list comprehension :
[sum(1 for i in list_strings if i in sub[0]) for sub in lis_of_lis]

result = []
for sentence in lis_of_lis:
result.append(0)
for word in list_strings:
if word in sentence[0]:
result[-1]+=1
print(result)
which is the long version of
result = [sum(1 for word in list_strings if word in sentence[0]) for sentence in lis_of_lis]
This will return [2,2,1] for your example.
If you want only whole words, add spaces before and after the words / sentences:
result = []
for sentence in lis_of_lis:
result.append(0)
for word in list_strings:
if ' '+word+' ' in ' '+sentence[0]+' ':
result[-1]+=1
print(result)
or short version:
result = [sum(1 for word in list_strings if ' '+word+' ' in ' '+sentence[0]+' ') for sentence in lis_of_lis]
This will return [2,1,1] for your example.

This creates a dictionary with the words in list_string as keys, and the values starting at 0. It then iterates through the lis_of_lis, splits the phrase up into a list of words, iterates through that, and checks to see if they are in the dictionary. If they are, 1 is added to the corresponding value.
word_count = dict()
for word in list_string:
word_count[word] = 0
for phrase in lis_of_lis:
words_in_phrase = phrase.split()
for word in words_in_phrase:
if word in word_count:
word_count[word] += 1
This will create a dictionary with the words as keys, and the frequency as values. I'll leave it to you to get the correct output out of that data structure.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting all words starting with a certain character - python

Related

How to compare reverse strings in list of strings with the original list of strings in python?

Python looping through lists

How can I create a list that is created using the indexes and items of two other lists?

Python intersection of 2 UNICODE tuple/List

How to return the count of words from a list of words that appear in a list of lists?

Categories

Resources