Take the containing string array after split in Python [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
If I have some text like:
text= " First sentence. Second sentence. Third sentence."
And after this I split by '.':
new_split = text.split('.')
I will receive: ['First sentence', Second sentence','Third sentence']
How could I print entire second sentence if I call it?
Like:
if 'second' in new_split : print (new_split[GET SECOND SENTENCE])
I would like to know how to get entire 'second sentence' if I know in my split there exists a sentence that contains my keyword.

To find the index of the first sentence in a list of sentences which is containing a given substring:
i = next(i for i, sentence in enumerate(sentences) if word in sentence)
The same thing with simple Python:
for i, sentence in enumerate(sentences):
if word in sentence:
break
else:
# the word is not in any of the sentences

text= " First sentence. Second sentence. Third sentence."
[print(i) for i in text.split('.') if "second" in i.lower()]
which prints:
Second sentence
Above is the shortest way I could think of doing it in terms of lines, but you just as easily do it with a for-loop rather than a list-comp:
for sentence in text.split('.'):
if "second" in sentence.lower():
print(sentence)

You can try this:
text= " First sentence. Second sentence. Third sentence."
new_text = [i for i in text.split('.') if "second" in i.lower()][0]
Output:
' Second sentence'

if new_split[1] != '':
print(new_split[1])

Related

How do you put the words of a large text file into a list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a large text file that is a book long and I want to take the words out of the text file and move them into a list.
If you care about removing punctuation:
import re
testString = 'Word test! Hello, how are you?'
wordList = re.findall(r'\w+', testString)
print(wordList)
['Word', 'test', 'Hello', 'how', 'are', 'you']
Otherwise:
testString = 'Word test! Hello, how are you?'
wordList = testString.split()
print(wordList)
['Word', 'test!', 'Hello,', 'how', 'are', 'you?']
first you need to open the file
file = open('yourfile.fmt')
after opening simply read all the lines
lines = file.readlines()
after that turn lines into a string with a space in between them and trim each line to remove extra whitespace.
from string import strip
string = ' '.join(map(strip, lines))
finally you can split your string and get the words
words = string.split()
if you are using it for AI or Data Science there is packages that can make your life easier like sklearn and nltk

How to split this list into words & find most common top 5 words & tags in python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
s=['how are you r',
'many many happy returns of the day',
'lets go for a walf']
I have tried like below-
from collections import Counter
split_it =[i.split('\t')[0] for i in s]
Counter = Counter(split_it)
most_occur = Counter.most_common(5)
print(most_occur)
The output is with sentence occurrences but not words & I don't have idea about how to build code for tags.How to code for top 5 tags & words?
When you split the sentences you get a list of lists, over which you then count appearances (of the nested list). You should flatten it. Also, you should be splitting by space and not tab:
s = ['how are you r',
'many many happy returns of the day',
'lets go for a walf']
from collections import Counter
flattened = [item for i in s for item in i.split()]
Counter = Counter(flattened)
most_occur = Counter.most_common(5)
print(most_occur)
To get only the words you could iterate the result and take only the term part of the tuple:
print([t for t,n in most_occur])
Not sure what you mean about tags. I don't see any tags in your example.
your i.split('\t')[0] for i in s splits on a tab ('\t') on only the first item in each sentence but s doesn't contain any tabs.
you can split on space [i.split() for i in s] - you will get lists of lists though.
My advice is to break this up into steps - look at what split_it returns and evaluate whether that was expected.
if you only want words and are not interested in sentences, then you need to break apart the sentences into words then put all those words into a another list.
in explicit terms:
my_new_list=[]
for sent in s:
words=sent.split()
for word in words:
my_new_list.append(word)
counts = Counter(my_new_list).most_common(5)
print(counts)
This will give you individual word counts.

Special characters to the end of sentence [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
For a random string such as:
H!i I am f.rom G3?ermany
how can I move all the special characters to the end of the word, for instance:
Hi! I am from. Germany3?
You can try this one :
s = "H!i I am f.rom G3?ermany"
l = []
for i in s.split():
k = [j for j in i if j.isalpha()]
for m in i:
if not m.isalpha():
k.append(m)
l.append(''.join(k))
print(' '.join(l))
It will o/p like :
"Hi! I am from. Germany3?
In python 2x you can do it in single line like :
k = ' '.join([filter(str.isalpha,i)+''.join([j for j in i if not j.isalpha()]) for i in s.split()])
I'm defining special character as anything thats not a-z, A-Z or spaces:
You can split the string into words, use regex to find the special characters in each word, remove them, add them back to the end of the word, then join the words together to create your new string:
import re
string = "H!i I am f.rom G3?ermany"
words = string.split(' ')
pattern = re.compile('[^a-zA-Z\s]')
new = ' '.join([re.sub(pattern, '', w) + ''.join(pattern.findall(w)) for w in words])
That will turn H!i I am f.rom G3?ermany into Hi! I am from. Germany3?

Python Find a word in a list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hello so I have to do a program that identifies all of the positions where a word occurs in a list but when I run my program it doesn't output anything.
Here's my code :
sentence =("ASK NOT WHAT YOUR CONTRY CAN DO FOR ASK WHAT YOU CAN DO FOR YOUR CONTRY") #This is a list
print (sentence)
text = input("Choose a word from the sentence above")#this prints out some text with an input
sentence = sentence.split(" ")# This splits the list
text = text.upper ()# this makes the text in capital letters
def lookfor ():
if text in sentence:
for i in sentence:
value = sentence.index(sentence)
print ("Your word has been found in the sentence at the position", value + "and", value )
else:
print ("The word that you have typed is not found in the sentence.")
Thank you
To answer your questions, nothing is happening because you aren't calling the function.
There's a lot of work remaining on your function, but here are some general tips:
1) Index only finds the first instance of an element in a list
2) You can't be sure that a word is in your sentence exactly twice
3) Use descriptive variable names. For example, for word in sentence makes a lot more sense intuitively
You can do something like this:
sentence =("ASK NOT WHAT YOUR CONTRY CAN DO FOR ASK WHAT YOU CAN DO FOR YOUR CONTRY") #This is a list
print (sentence)
text = raw_input("Choose a word from the sentence above: ")#this prints out some text with an input
sentence = sentence.split(" ")# This splits the list
text = text.upper ()# this makes the text in capital letters
def lookfor (text):
indexes = [ idx+1 for word, idx in zip(sentence, range(0,len(sentence))) if text == word ]
print ("Your word has been found in the sentence at these positions", indexes )
if not indexes:
print ("The word that you have typed is not found in the sentence.")
lookfor(text)
Example:
ASK NOT WHAT YOUR CONTRY CAN DO FOR ASK WHAT YOU CAN DO FOR YOUR CONTRY
Choose a word from the sentence above: for
('Your word has been found in the sentence at these positions', [8, 14])

Elongated word check in sentence [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to check in a sentence if there are elongated words. For example, soooo, toooo, thaaatttt, etc. Now I dont know what the user might type because I have a list of sentences which may or may not have elongated words. How do I check that in python. I am new to python.
Try this:
import re
s1 = "This has no long words"
s2 = "This has oooone long word"
def has_long(sentence):
elong = re.compile("([a-zA-Z])\\1{2,}")
return bool(elong.search(sentence))
print has_long(s1)
False
print has_long(s2)
True
#HughBothwell had a good idea. As far as I know, there is not a single English word that has the same letter repeat three consecutive times. So, you can search for words that do this:
>>> from re import search
>>> mystr = "word word soooo word tooo thaaatttt word"
>>> [x for x in mystr.split() if search(r'(?i)[a-z]\1\1+', x)]
['soooo,', 'tooo', 'thaaatttt']
>>>
Any you find will be elongated words.
Well, you can make a list of every elongated word logically possible. Then loop through the words in the sentence then the words in the list to find elongated words.
sentence = "Hoow arre you doing?"
elongated = ["hoow",'arre','youu','yoou','meee'] #You will need to have a much larger list
for word in sentence:
word = word.lower()
for e_word in elongated:
if e_word == word:
print "Found an elongated word!"
If you wanted to do what Hugh Bothwell said, then:
sentence = "Hooow arrre you doooing?"
elongations = ["aaa","ooo","rrr","bbb","ccc"]#continue for all the letters
for word in sentence:
for x in elongations:
if x in word.lower():
print '"'+word+'" is an elongated word'
You need to have a reference of valid English words available. On *NIX systems, you could use /etc/share/dict/words or /usr/share/dict/words or equivalent and store all the words into a set object.
Then, you'll want to check, for every word in a sentence,
That the word is not itself a valid word (i.e., word not in all_words); and
That, when you shorten all consecutive sequences to one or two letters, the new word is a valid word.
Here's one way you might try to extract all of the possibilities:
import re
import itertools
regex = re.compile(r'\w\1\1')
all_words = set(get_all_words())
def without_elongations(word):
while re.search(regex, word) is not None:
replacing_with_one_letter = re.sub(regex, r'\1', word, 1)
replacing_with_two_letters = re.sub(regex, r'\1\1', word, 1)
return list(itertools.chain(
without_elongations(replacing_with_one_letter),
without_elongations(replacing_with_two_letters),
))
for word in sentence.split():
if word not in all_words:
if any(map(lambda w: w in all_words, without_elongations(word)):
print('%(word) is elongated', { 'word': word })

Categories

Resources