Splitting only long words in string

Splitting only long words in string - python

I have some random string, let's say :
s = "This string has some verylongwordsneededtosplit"
I'm trying to write a function trunc_string(string, len) that takes string as argument to operate on and 'len' as the number of chars after long words will be splitted.
The result should be something like that
str = trunc_string(s, 10)
str = "This string has some verylongwo rdsneededt osplit"
For now I have something like this :
def truncate_long_words(s, num):
"""Splits long words in string"""
words = s.split()
for word in words:
if len(word) > num:
split_words = list(words)
After this part I have this long word as a list of chars. Now I need to :
join 'num' chars together in some word_part temporary list
join all word_parts into one word
join this word with the rest of words, that weren't long enough to be splitted.
Should I make it in somehow similar way ? :
counter = 0
for char in split_words:
word_part.append(char)
counter = counter+1
if counter == num
And here I should somehow join all the word_part together creating word and further on

def split_word(word, length=10):
return (word[n:n+length] for n in range(0, len(word), length))
string = "This string has some verylongwordsneededtosplit"
print [item for word in string.split() for item in split_word(word)]
# ['This', 'string', 'has', 'some', 'verylongwo', 'rdsneededt', 'osplit']
Note: it's a bad idea to name your string str. It shadows the built in type.

an option is the textwrap module
http://docs.python.org/2/library/textwrap.html
example usage:
>>> import textwrap
>>> s = "This string has some verylongwordsneededtosplit"
>>> list = textwrap.wrap(s, width=10)
>>> for line in list: print line;
...
This
string has
some veryl
ongwordsne
ededtospli
t
>>>

Why not:
def truncate_long_words(s, num):
"""Splits long words in string"""
words = s.split()
for word in words:
if len(word) > num:
for i in xrange(0,len(word),num):
yield word[i:i+num]
else:
yield word
for t in truncate_long_words(s):
print t

Abusing regex:
import re
def trunc_string(s, num):
re.sub("(\\w{%d}\\B)" % num, "\\1 ", s)
assert "This string has some verylongwo rdsneededt osplit" == trunc_string("This string has some verylongwordsneededtosplit", 10)
(Edit: adopted simplification by Brian. Thanks. But I kept the \B to avoid adding a space when the word is exactly 10 characters long.)

Related

How to solve the string indices must be integers problem in a for loop for capitalizing every word in a string

I hope everyone is safe.
I am trying to go over a string and capitalize every first letter of the string.
I know I can use .title() but
a) I want to figure out how to use capitalize or something else in this case - basics, and
b) The strings in the tests, have some words with (') which makes .title() confused and capitalize the letter after the (').
def to_jaden_case(string):
appended_string = ''
word = len(string.split())
for word in string:
new_word = string[word].capitalize()
appended_string +=str(new_word)
return appended_string
The problem is the interpreter gives me "TypeError: string indices must be integers" even tho I have an integer input in 'word'. Any help?
thanks!

You are doing some strange things in the code.
First, you split the string just to count the number of words, but don't store it to manipulate the words after that.
Second, when iterating a string with a for in, what you get are the characters of the string, not the words.
I have made a small snippet to help you do what you desire:
def first_letter_of_word_upper(string, exclusions=["a", "the"]):
words = string.split()
for i, w in enumerate(words):
if w not in exclusions:
words[i] = w[0].upper() + w[1:]
return " ".join(words)
test = first_letter_of_word_upper("miguel angelo santos bicudo")
test2 = first_letter_of_word_upper("doing a bunch of things", ["a", "of"])
print(test)
print(test2)
Notes:
I assigned the value of the string splitting to a variable to use it in the loop
As a bonus, I included a list to allow you exclude words that you don't want to capitalize.
I use the original same array of split words to build the result... and then join based on that array. This a way to do it efficiently.
Also, I show some useful Python tricks... first is enumerate(iterable) that returns tuples (i, j) where i is the positional index, and j is the value at that position. Second, I use w[1:] to get a substring of the current word that starts at character index 1 and goes all the way to the end of the string. Ah, and also the usage of optional parameters in the list of arguments of the function... really useful things to learn! If you didn't know them already. =)

You have a logical error in your code:
You have used word = len(string.split()) which is of no use ,Also there is an issue in the for loop logic.
Try this below :
def to_jaden_case(string):
appended_string = ''
word_list = string.split()
for i in range(len(word_list)):
new_word = word_list[i].capitalize()
appended_string += str(new_word) + " "
return appended_string

from re import findall
def capitalize_words(string):
words = findall(r'\w+[\']*\w+', string)
for word in words:
string = string.replace(word, word.capitalize())
return string
This just grabs all the words in the string, then replaces the words in the original string, the characters inside the [ ] will be included in the word aswell

You are using string index to access another string word is a string you are accessing word using string[word] this causing the error.
def to_jaden_case(string):
appended_string = ''
for word in string.split():
new_word = word.capitalize()
appended_string += new_word
return appended_string
Simple solution using map()
def to_jaden_case(string):
return ' '.join(map(str.capitalize, string.split()))

In for word in string: word will iterate over the characters in string. What you want to do is something like this:
def to_jaden_case(string):
appended_string = ''
splitted_string = string.split()
for word in splitted_string:
new_word = word.capitalize()
appended_string += new_word
return appended_string
The output for to_jaden_case("abc def ghi") is now "AbcDefGhi", this is CammelCase. I suppose you actually want this: "Abc Def Ghi". To achieve that, you must do:
def to_jaden_case(string):
appended_string = ''
splitted_string = string.split()
for word in splitted_string:
new_word = word.capitalize()
appended_string += new_word + " "
return appended_string[:-1] # removes the last space.

Look, in your code word is a character of string, it is not index, therefore you can't use string[word], you can correct this problem by modifying your loop or using word instead of string[word]
So your rectified code will be:
def to_jaden_case(string):
appended_string = ''
for word in range(len(string)):
new_word = string[word].capitalize()
appended_string +=str(new_word)
return appended_string
Here I Changed The Third Line for word in string with for word in len(string), the counterpart give you index of each character and you can use them!
Also I removed the split line, because it's unnecessary and you can do it on for loop like len(string)

Is there any way to simplify this list comprehension with if statement method?

I am trying to take a string and remove vowels from words with more than 4 characters.
Is there a more efficient way to write this code?
(1) Make an array from string.
(2) Loop through an array and remove vowels from strings with more than 4 characters.
(3) Join strings in array to from new string.
Thanks!
def abbreviate_sentence(sent):
split_string = sent.split()
for words in split_string:
abbrev = [words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", "")
if len(words) > 4 else words for words in split_string]
sentence = " ".join(abbrev)
return sentence
print(abbreviate_sentence("follow the yellow brick road")) # => "fllw the yllw brck road"

You can avoid the outer for because you're already iterating inside. On another note, you can replace multiple replaces with another list comprehension that would be nested inside the already existing comprehension.
# for words in split_string: <- This line is not required
vowels = 'aeiou'
abbrev = [''.join([x for x in words if x.lower() not in vowels]) if len(words) > 4 else words for words in split_string]
sentence = " ".join(abbrev)
return sentence
Or abstract the forming of string part to a new function probably adding to its readability:
def form_word(words):
vowels = 'aeiou'
return ''.join([x for x in words if x.lower() not in vowels])
def abbreviate_sentence(sent):
split_string = sent.split()
abbrev = [form_word(words) if len(words) > 4 else words for words in split_string]
sentence = " ".join(abbrev)
return sentence

Austin's solution or the one below should both work. I don't think either is much more efficient computationally than what you have now so I'd focus on readability and reasonability.
def abbreviate_sentence(sent):
abbrev = []
for word in sent.split():
if len(word) > 4:
abbrev.append(words.replace("a", "").replace("e", "").replace("i", "").replace("o", "").replace("u", ""))
else:
abbrev.append(word)
return " ".join(abbrev)
print(abbreviate_sentence("follow the yellow brick road"))

How to convert the following code output in one line using join in python.. currently for two word input i am getting output in two lines

def cat_latin_word(text):
""" convert the string in another form
"""
constant = "bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ"
for word in text.split():
if word[0] in constant:
word = (str(word)[-1:] + str(word)[:4] + "eeoow")
else:
word = (str(word) + "eeoow")
print(word)
def main():
""" converts"""
text = input("Enter a sentence ")
cat_latin_word(text)
main()

A few pointers:
Converting your code to "one line" doesn't make it better.
No need to type out all consonants, use the string module and use set for O(1) lookup complexity.
Use formatted string literals (Python 3.6+) for more readable and efficient code.
No need to use str on variables which are already strings.
For a single line, you can use a list comprehension with a ternary statement and ' '.join.
Here's a working example:
from string import ascii_lowercase, ascii_uppercase
def cat_latin_word(text):
consonants = (set(ascii_lowercase) | set(ascii_uppercase)) - set('aeiouAEIOU')
print(' '.join([f'{word}eeow' if not word[0] in consonants else \
f'{word[-1:]}{word[:4]}eeoow' for word in text.split()]))
text = input("Enter a sentence ")
cat_latin_word(text)

You may use a list to put all the words or use print() in a different way.
Example:
print(word, end="\t")
where here I use the keyword argument end to set it to '\t' ( by default is '\n')

Simply edited your code to return the results as a words separated by space.
def cat_latin_word(text):
constant = "bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ"
result = []
for word in text.split():
if word[0] in constant:
word = (str(word)[-1:] + str(word)[:4] + "eeoow")
result.append(word)
else:
word = (str(word) + "eeoow")
result.append(word)
return ' '.join(result)
def main():
text = 'ankit jaiswal'
print(cat_latin_word(text))

Removing words containing digits from a given string

I'm trying to write a simple program that removes all words containing digits from a received string.
Here is my current implementation:
import re
def checkio(text):
text = text.replace(",", " ").replace(".", " ") .replace("!", " ").replace("?", " ").lower()
counter = 0
words = text.split()
print words
for each in words:
if bool(re.search(r'\d', each)):
words.remove(each)
print words
checkio("1a4 4ad, d89dfsfaj.")
However, when I execute this program, I get the following output:
['1a4', '4ad', 'd89dfsfaj']
['4ad']
I can't figure out why '4ad' is printed in the second line as it contains digits and should have been removed from the list. Any ideas?

Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.
import re
def checkio(text):
text = re.sub('[,\.\?\!]', ' ', text).lower()
words = [w for w in text.split() if not re.search(r'\d', w)]
print words ## prints [] in this case
Also, note that I simplified your text = text.replace(...) line.
Additionally, if you do not need to reuse your text variable, you can use regex to split it directly.
import re
def checkio(text):
words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
print words ## prints [] in this case

If you are testing for alpha numeric strings why not use isalnum() instead of regex ?
In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']
In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []

This would be possible through using re.sub, re.search and list_comprehension.
>>> import re
>>> def checkio(s):
print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])
>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']

So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.
At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it.
Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.

How do I print words with only 1 vowel?

my code so far, but since i'm so lost it doesn't do anything close to what I want it to do:
vowels = 'a','e','i','o','u','y'
#Consider 'y' as a vowel
input = input("Enter a sentence: ")
words = input.split()
if vowels == words[0]:
print(words)
so for an input like this:
"this is a really weird test"
I want it to only print:
this, is, a, test
because they only contains 1 vowel.

Try this:
vowels = set(('a','e','i','o','u','y'))
def count_vowels(word):
return sum(letter in vowels for letter in word)
my_string = "this is a really weird test"
def get_words(my_string):
for word in my_string.split():
if count_vowels(word) == 1:
print word
Result:
>>> get_words(my_string)
this
is
a
test

Here's another option:
import re
words = 'This sentence contains a bunch of cool words'
for word in words.split():
if len(re.findall('[aeiouy]', word)) == 1:
print word
Output:
This
a
bunch
of
words

You can translate all the vowels to a single vowel and count that vowel:
import string
trans = string.maketrans('aeiouy','aaaaaa')
strs = 'this is a really weird test'
print [word for word in strs.split() if word.translate(trans).count('a') == 1]

>>> s = "this is a really weird test"
>>> [w for w in s.split() if len(w) - len(w.translate(None, "aeiouy")) == 1]
['this', 'is', 'a', 'test']
Not sure if words with no vowels are required. If so, just replace == 1 with < 2

You may use one for-loop to save the sub-strings into the string array if you have checked he next character is a space.
Them for each substring, check if there is only one a,e,i,o,u (vowels) , if yes, add into the another array
aFTER THAT, FROM another array, concat all the strings with spaces and comma

Try this:
vowels = ('a','e','i','o','u','y')
words = [i for i in input('Enter a sentence ').split() if i != '']
interesting = [word for word in words if sum(1 for char in word if char in vowel) == 1]

i found so much nice code here ,and i want to show my ugly one:
v = 'aoeuiy'
o = 'oooooo'
sentence = 'i found so much nice code here'
words = sentence.split()
trans = str.maketrans(v,o)
for word in words:
if not word.translate(trans).count('o') >1:
print(word)

I find your lack of regex disturbing.
Here's a plain regex only solution (ideone):
import re
str = "this is a really weird test"
words = re.findall(r"\b[^aeiouy\W]*[aeiouy][^aeiouy\W]*\b", str)
print(words)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting only long words in string - python

Why not: def truncate_long_words(s, num): """Splits long words in string""" words = s.split() for word in words: if len(word) > num: for i in xrange(0,len(word),num): yield word[i:i+num] else: yield word for t in truncate_long_words(s): print t

Related

How to solve the string indices must be integers problem in a for loop for capitalizing every word in a string

Is there any way to simplify this list comprehension with if statement method?

How to convert the following code output in one line using join in python.. currently for two word input i am getting output in two lines

Removing words containing digits from a given string

How do I print words with only 1 vowel?

Categories

Resources