This is the input given John plays chess and l u d o. I want the output to be in this format (given below)
John plays chess and ludo.
I have tried Regular expression for removing spaces
but doesn't work for me.
import re
sentence='John plays chess and l u d o'
sentence = re.sub(r"\s+", "", sentence, flags=re.UNICODE)
print(sentence)
I expected the output John plays chess and ludo. .
But the output I got is Johnplayschessandludo
This should work! In essence, the solution extracts the single characters out of the sentence, makes it a word and joins it back to the remaining sentence.
s = 'John plays chess and l u d o'
chars = []
idx = 0
#Get the word which is divided into single characters
while idx < len(s)-1:
#This will get the single characters around single spaces
if s[idx-1] == ' ' and s[idx].isalpha() and s[idx+1] == ' ':
chars.append(s[idx])
idx+=1
#This is get the single character if it is present as the last item
if s[len(s)-2] == ' ' and s[len(s)-1].isalpha():
chars.append(s[len(s)-1])
#Create the word out of single character
join_word = ''.join(chars)
#Get the other words
old_words = [item for item in s.split() if len(item) > 1]
#Form the final string
res = ' '.join(old_words + [join_word])
print(res)
The output will then look like
John plays chess and ludo
Above code won't maintain the sequence of words while solving the problem.
For example, try entering this sentence "John plays c h e s s and ludo"
Try using this instead if you have single word with whitespaces in the text at any position:
sentence = "John plays c h e s s and ludo"
sentence_list = sentence.split()
index = [index for index, item in enumerate(sentence_list) if len(item) == 1]
join_word = "".join([item for item in sentence_list if len(item) == 1])
if index != []:
list(map(lambda x: sentence_list.pop(index[0]), index[:-1]))
sentence_list[index[0]] = join_word
sentence = " ".join(sentence_list)
else:
sentence
Related
I tried using this code that I found online:
K=sentences
m=[len(i.split()) for i in K]
lengthorder= sorted(K, key=len, reverse=True)
#print(lengthorder)
#print("\n")
list1 = lengthorder
str1 = '\n'.join(list1)
print(str1)
print('\n')
Sentence1 = "We have developed speed, but we have shut ourselves in"
res = len(Sentence1.split())
print ("The longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")
Sentence2 = "More than cleverness we need kindness and gentleness"
res = len(Sentence2.split())
print ("The second longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")
Sentence3 = "Machinery that gives abundance has left us in want"
res = len(Sentence3.split())
print ("The third longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")
but it doesn't sort out the sentences per word number, but per actual length (as in cm)
You can simply iterate through the different sentaces and split them up into words like this:
text = " We have developed speed. but we have. shut ourselves in Machinery that. gives abundance has left us in want Our knowledge has made us cynical Our cleverness, hard and unkind We think too much and feel too little More than machinery we need humanity More than cleverness we need kindness and gentleness"
# split into sentances
text2array = text.split(".")
i =0
# interate through sentances and split them into words
for sentance in text2array:
text2array[i] = sentance.split(" ")
i += 1
# sort the sentances by word length
text2array.sort(key=len,reverse=True)
i = 0
#iterate through sentances and print them to screen
for sentance in text2array:
i += 1
sentanceOut = ""
for word in sentance:
sentanceOut += " " + word
sentanceOut += "."
print("the nr "+ str(i) +" longest sentence is" + sentanceOut)
You can define a function that uses the regex to obtain the number of words in a given sentence:
import re
def get_word_count(sentence: str) -> int:
return len(re.findall(r"\w+", sentence))
Assuming you already have a list of sentences, you can iterate the list and pass each sentence to the word count function then store each sentence and its word count in a dictionary:
sentences = [
"Assume that this sentence has one word. Really?",
"Assume that this sentence has more words than all sentences in this list. Obviously!",
"Assume that this sentence has more than one word. Duh!",
]
word_count_dict = {}
for sentence in sentences:
word_count_dict[sentence] = get_word_count(sentence)
At this point, the word_count_dict contains sentences as keys and their associated word count as values.
You can then sort word_count_dict by values:
sorted_word_count_dict = dict(
sorted(word_count_dict.items(), key=lambda item: item[1], reverse=True)
)
Here's the full snippet:
import re
def get_word_count(sentence: str) -> int:
return len(re.findall(r"\w+", sentence))
sentences = [
"Assume that this sentence has one word. Really?",
"Assume that this sentence has more words than all sentences in this list. Obviously!",
"Assume that this sentence has more than one word. Duh!",
]
word_count_dict = {}
for sentence in sentences:
word_count_dict[sentence] = get_word_count(sentence)
sorted_word_count_dict = dict(
sorted(word_count_dict.items(), key=lambda item: item[1], reverse=True)
)
print(sorted_word_count_dict)
Let's assume that your sentences are already separate and there is no need to detect the sentences.
So we have a list of sentences. Then we need to calculate the length of the sentence based on the word count. the basic way is to split them by space. So each space separates two words from each other in a sentence.
list_of_sen = ['We have developed speed, but we have shut ourselves in','Machinery that gives abundance has left us in want Our knowledge has made us cynical Our cleverness', 'hard and unkind We think too much and feel too little More than machinery we need humanity More than cleverness we need kindness and gentleness']
sen_len=[len(i.split()) for i in list_of_sen]
sen_len= sorted(sen_len, reverse=True)
for index , count in enumerate(sen_len):
print(f'The {index+1} longest sentence in this text contains {count} words')
But if your sentence is not separated, first we need to recognize the end of the sentence then split them. Your sample date does not contain any punctuation that can be useful to separate sentences. So if we assume that your data has punctuation the answer below can be helpful.
see this question
from nltk import tokenized
p = "Good morning Dr. Adams. The patient is waiting for you in room number 3."
tokenize.sent_tokenize(p)
I want to write the first letter of every item while linebreak stays the same but when I turn the list to string it's written in one line. Like this "I w t w f l o e I w l s s" but I want output to look like this "I w t \n w t f l \n o e i \n w l \n s s".
r = '''I want to
write the first letter
of every item
while linebreak
stay same'''
list_of_words = r.split()
m = [x[0] for x in list_of_words]
string = ' '.join([str(item) for item in m])
print(string)
What you are doing is you are splitting all the lines in a single go, so you are losing the information of each line. You need to create list of list to preserve the line information.
When you provide no argument means split according to any whitespace, that means both ' ' and '\n'.
r = '''I want to
write the first letter
of every item
while linebreak
stay same'''
list_of_words = [i.split() for i in r.split('\n')]
m = [[y[0] for y in x] for x in list_of_words]
string = '\n'.join([' '.join(x) for x in m])
print(string)
I w t
w t f l
o e i
w l
s s
Via regexp
r = '''I want to
write the first letter
of every item
while linebreak
stay same'''
import re
string = re.sub(r"(.)\S+(\s)", r"\1\2", r + " ")[:-1]
print(string)
Output:
I t
w t f l
o e i
w l
s s
What you're doing is - Get the first letter from each word
of the list and then joining them. You are not keeping track of the \n in the string.
You could do this instead.
list_of_words = r.split('\n')
m = [[x[0] for x in y.split()] for y in list_of_words]
for i in m:
string = ' '.join(i)
print(string)
Output
I w t
w t f l
o e i
w l
s s
Here is the solution by using while loop
r = '''I want to
write the first letter
of every item
while linebreak
stay same'''
total_lines = len(r.splitlines())
line_no = 0
while line_no < total_lines:
words_line = r.splitlines()[line_no]
list_of_words = words_line.split()
m = [x[0] for x in list_of_words]
print(' '.join([str(item) for item in m]))
line_no = line_no + 1
Since many valid methods were already provided, here's a nice and comprehensive way of doing the same task without the use of str.split(), which creates unnecessary list intermediates in memory (not that it represents any problem in this case though).
This method takes advantage of str.isspace() to deliver the whole set of instructions in one line:
string = "".join([string[i] for i in range(len(string)) if string[i].isspace() or string[i-1].isspace() or i == 0])
I have a list with values say list =[Copper, Wood, Glass, Metal]
string = 'Wood table with small glass center,little bit of metal'
I need to search if specific values are available in my string but should ignore the least prominent values like glass and metal using nearby words.
I tried re.findall and I am getting output as Wood, Glass, Metal. How to ignore 'Glass' and 'Metal' in this case by using nearby keywords such as 'small' and 'little'.
Expected Output = [Wood]
My understanding: What I understand from your question is that you are trying to remove values from the list that follow words such as 'small' and 'little'.
Code:
lst = ['Copper', 'Wood', 'Glass', 'Metal']
string = 'Wood table with small glass center,little bit of metal'
keywords = ['small','little']
punc = '''!()-[]{};:'"\, <>./?##$%^&*_~'''
for ele in string:
if ele in punc:
string = string.replace(ele, " ")
lst = [stringg.lower() for stringg in lst]
string = string.lower()
lst = [word for word in lst if word.lower() in string.lower()]
words_lst = string.split(' ')
final = []
count = 0
for elem in lst:
count = 0
index = words_lst.index(elem)
slice_index = index - 4 if index - 4 >= 0 else 0
range_lst = words_lst[slice_index:index + 1]
for keyword in keywords:
if keyword not in range_lst and elem not in final:
count += 1
if count == len(keywords):
final.append(elem)
Output:
>>> final
['wood']
I have a list of sentences, I need to find the start phrase and end phrase in that sentence if present and get the middle element. If the middle element has more than one word it should skip and move to the next occurrence.
list of sentences
para_list = [["hello sir good afternoon calling to you from news18 curious"],["a pleasant welcome from enws18 team what can i"], ["hi a good afternoon sir from news18"]]
start phrase
to_find_s =['good','afternoon']
end phrase
to_find_l = ['from','news18']
Code
for i, w in enumerate(para_list):
l = [sentence for sentence in w if all(word in w for word in to_find_s)]
if l:
m = [sentence for sentence in w if all(word in w for word in to_find_l)
Output
I am getting the sentences in which the phrases are present but not able to get the middle term
Expected Output
list = ['sir'] #from the last list. There would not be any item from first list as it has two words in between-'to you'
The following function will do the work.
def find_middle_phrase(para_list, to_find_s, to_find_l):
output_list = []
start_phrase, end_phrase = " ".join(to_find_s), " ".join(to_find_l)
for para in para_list:
para_string = para[0]
if para_string.find(start_phrase)!=-1 and para_string.find(end_phrase)!=-1:
required_phrase_starting_index = para_string.index(start_phrase) + len(start_phrase)
required_phrase_ending_index = para_string.index(end_phrase)
required_output_string = para_string[required_phrase_starting_index: required_phrase_ending_index].strip()
if required_output_string.find(" ") == -1:
output_list.append(required_output_string)
return output_list
EXAMPLE:
para_list = [["hello sir good afternoon calling to you from news18 curious"],["a pleasant welcome from enws18 team what can i"], ["hi a good afternoon sir from news18"]]
to_find_s =['good','afternoon']
to_find_l = ['from','news18']
expected_output = find_middle_phrase(para_list, to_find_s, to_find_l)
print(expected_output)
Got output:
['sir']
This produces the expected output:
for sentence in para_list:
words = sentence[0].split()
for i in range(len(words) - 3):
if(words[i] == to_find_s[0] and words[i+1] == to_find_s[1]):
if(words[i+3] == to_find_l[0] and words[i+4] == to_find_l[1]):
m = words[i+2]
print(m)
I wanted to create a code which ask a user to enter a list of position on a plain text file, save the position the user entered in the text file as list than ask the user to enter the word each position represent (the same order as the list of position) end re-create the sentence. However when i run this code:
import subprocess
subprocess.Popen(["notepad","list_of_numbers.txt"])
p =open("list_of_numbers.txt","r")
l = p.read()
p.close()
positions = list(l)
subprocess.Popen(["notepad","list_of_words.txt"])
s = open("list_of_words.txt","r")
s.read()
s.close()
sentence = str(s)
print (sentence)
mapping = {}
words = sentence.split()
for (position, word) in zip(positions, words):
mapping[position] = word
output = [mapping[position] for position in positions]
print(' '.join(output))
and i run
1 2 3 4 5 1 2 3 4 5
as list of position
and this as the list of words:
this is a repeated sentence
the output should be:
this is a repeated sentence this is a repeated sentence
but i get
"key error:3"
Im think they problem is i didnt store the list of position into a list properly but im not sure. Can somebody help me?
try this one
import subprocess
subprocess.Popen(["notepad","list_of_numbers.txt"])
with open("list_of_numbers.txt","r") as p:
l = p.read()
positions = l.split() # see you had to create list by spliting the string
subprocess.Popen(["notepad","list_of_words.txt"])
with open("list_of_words.txt","r") as s:
sentence = s.read() # you had to assign the string to variable
print (sentence)
mapping = {}
words = sentence.split()
for position, word in zip(positions, words):
mapping[position] = word
output = [mapping[position] for position in positions]
print(' '.join(output))
but also it could be
import subprocess
subprocess.Popen(["notepad","list_of_numbers.txt"])
with open("list_of_numbers.txt","r") as pos_file:
positions = pos_file.read().spllit()
subprocess.Popen(["notepad","list_of_words.txt"])
with open("list_of_words.txt","r") as sentence_file:
words = sentence_file.read().split()
mapping = dict(zip(positions, words))
output = [mapping[position] for position in positions]
print(' '.join(output))