I have been looking for an answer to this for a while but keep finding answers about stripping a specific string from a list.
Let's say this is my list of strings
stringList = ["cat\n","dog\n","bird\n","rat\n","snake\n"]
But all list items contain a new line character (\n)
How can I remove this from all the strings within the list?
Use a list comprehension with rstrip():
stringList = ["cat\n","dog\n","bird\n","rat\n","snake\n"]
output = [x.rstrip() for x in stringList]
print(output) # ['cat', 'dog', 'bird', 'rat', 'snake']
If you really want to target a single newline character only at the end of each string, then we can get more precise with re.sub:
stringList = ["cat\n","dog\n","bird\n","rat\n","snake\n"]
output = [re.sub(r'\n$', '', x) for x in stringList]
print(output) # ['cat', 'dog', 'bird', 'rat', 'snake']
By applying the method strip (or rstrip) to all terms of the list with map
out = list(map(str.strip, stringList))
print(out)
or with a more rudimental check and slice
strip_char = '\n'
out = [s[:-len(strip_char)] if s.endswith(strip_char) else s for s in stringList]
print(out)
Since you can use an if to check if a new line character exists in a string, you can use the code below to detect string elements with the new line character and replace those characters with empty strings
stringList = ["cat\n","dog\n","bird\n","rat\n","snake\n"]
nlist = []
for string in stringList:
if "\n" in string:
nlist.append(string.replace("\n" , ""))
print(nlist)
You could also use map() along with str.rstrip:
>>> string_list = ['cat\n', 'dog\n', 'bird\n', 'rat\n', 'snake\n']
>>> new_string_list = list(map(str.rstrip, string_list))
>>> new_string_list
['cat', 'dog', 'bird', 'rat', 'snake']
Related
I have a list and would like to print all words after 4th position using python and each word after the 3rd position will be suffixed with ".com"
Example
my_list = ['apple', 'ball', 'cat', 'dog', 'egg', 'fish', 'rat']
From the above I would like to print the value from 'egg' onwards, i.e: egg.com, fish.com, rat.com
Just do this:
for i in my_list[3:]:
print(i + '.com')
That is it.
Code
def get_words(lst, word):
' Returns string of words starting from a particular word in list lst '
# Use lst.index to find index of word in list
# slice (i.e. lst[lst.index(word):] for sublist of words from word in list
# list comprehension to add '.com' to each word starting at index
# join to concatenate words
if word in lst:
return ', '.join([x + '.com' for x in lst[lst.index(word):]])
Usage
my_list = ['apple', 'ball', 'cat', 'dog', 'egg', 'fish', 'rat']
print(get_words(my_list, 'egg')) # egg.com, fish.com, rat.com
print(get_words(my_list, 'dog')) # dog.com, egg.com, fish.com, rat.com
print(get_words(my_list, 'pig')) # None
I've a String as Input like
input = 'apple&&bat&&&cat&&dog&elephant'
and i want to reverse the words and special character should be remains same in their place.
Output - 'elephant&&dog&&&cat&&bat&apple'
Exactly, i don't know in which approach i have to solve this problem.
But, yes i've tried this
with this i got the reverse word but how to place the '&' in their respective position i don't know.
input = 'apple&&bat&&&cat&&dog&elephant'
ab = input.split('&')[::-1]
print ab
output
['elephant', 'dog', '', 'cat', '', '', 'bat', '', 'apple']
But my output should be
'elephant&&dog&&&cat&&bat&apple'
First get separate lists of the words and special marks using re module:
In [2]: import re
In [4]: words = re.findall(r'\w+', input)
In [6]: words
Out[6]: ['apple', 'bat', 'cat', 'dog', 'elephant']
In [7]: special = re.findall(r'\W+', input)
In [8]: special
Out[8]: ['&&', '&&&', '&&', '&']
Then reverse the words list:
In [11]: rwords = words[::-1]
In [12]: rwords
Out[12]: ['elephant', 'dog', 'cat', 'bat', 'apple']
Finally, combine each word with the corresponding mark. Please note that I expand the special list by one empty string to make the lists the same length. The final operation is one line of code:
In [15]: ''.join(w + s for w, s in zip(rwords, special + ['']))
Out[15]: 'elephant&&dog&&&cat&&bat&apple'
Here is one solution to the problem that uses only the basic concepts. It navigates the split list from both the left and the right and swaps each pair of encountered words.
s = 'apple&&bat&&&cat&&dog&elephant'
words = s.split('&')
idx_left = 0
idx_right = len(words) - 1
while idx_left < idx_right:
while not words[idx_left]:
idx_left += 1
while not words[idx_right]:
idx_right -= 1
words[idx_left], words[idx_right] = words[idx_right], words[idx_left] # Swap words
idx_left += 1
idx_right -= 1
output = '&'.join(words)
The result is
'elephant&&dog&&&cat&&bat&apple'
Another more advanced approach is to use groupby and list slicing:
from itertools import groupby
# Split the input string into the list
# ['apple', '&&', 'bat', '&&&', 'cat', '&&', 'dog', '&', 'elephant']
words = [''.join(g) for _, g in groupby(s, lambda c: c == '&')]
n = len(words)
words[::2] = words[n-n%2::-2] # Swapping (assume the input string does not start with a separator string)
output = ''.join(words)
Another regex solution:
>>> import re
>>> # Extract the "words" from the string.
>>> words = re.findall(r'\w+', s)
>>> words
['apple', 'bat', 'cat', 'dog', 'elephant']
>>> # Replace the words with formatting placeholders ( {} )
>>> # then format the resulting string with the words in
>>> # reverse order
>>> re.sub(r'\w+', '{}', s).format(*reversed(words))
'elephant&&dog&&&cat&&bat&apple'
I would like to do some word filtering (extracting only items in 'keyword' list that exist in 'whitelist').
Here is my code so far:
whitelist = ['Cat', 'Dog', 'Cow']
keyword = ['Cat, Cow, Horse', 'Bird, Whale, Dog', 'Pig, Chicken', 'Tiger, Cat']
keyword_filter = []
for word in whitelist:
for i in range(len(keyword)):
if word in keyword[i]:
keyword_filter.append(word)
else: pass
I want to remove every word except for 'Cat', 'Dog', and 'Cow' (which are in the
'whitelist') so that the result ('keyword_filter' list) will look like this:
['Cat, Cow', 'Dog', '', 'Cat']
However, I got the result something like this:
['Cat', 'Cat', 'Dog', 'Cow']
I would sincerely appreciate if you can give some advice.
You need to split the strings in the list and check if word in the split is contained in the whitelist. Then rejoin all words in the whitelist after filtering:
whitelist = {'Cat', 'Dog', 'Cow'}
filtered = []
for words in keyword:
filtered.append(', '.join(w for w in words.split(', ') if w in whitelist))
print(filtered)
# ['Cat, Cow', 'Dog', '', 'Cat']
Better to make whitelist a set to improve the performance for lookup of each word.
You could also use re.findall to find all parts of each word matching strings contained in the whitelist, and then rejoin after finding the matches:
import re
pattern = re.compile(',?\s?Cat|,?\s?Dog|,?\s?Cow')
filtered = [''.join(pattern.findall(words))) for words in keyword]
try this..
whitelist = ['Cat', 'Dog', 'Cow']
keyword = ['Cat, Cow, Horse', 'Bird, Whale, Dog', 'Pig, Chicken', 'Tiger, Cat']
keyword_filter = []
for word in keyword:
whitelistedWords = []
for w in word.split(', '):
if w in whitelist:
whitelistedWords.append(w)
#print whitelistedWords
keyword_filter.append( ', '.join(whitelistedWords) )
print keyword_filter
Simple list comprehension:
whitelist = ['Cat', 'Dog', 'Cow']
keyword = ['Cat, Cow, Horse', 'Bird, Whale, Dog', 'Pig, Chicken', 'Tiger, Cat']
keyword_filter = [', '.join(w for w in k.split(', ') if w in whitelist) for k in keyword]
print(keyword_filter)
The output:
['Cat, Cow', 'Dog', '', 'Cat']
Since you want to preserve the order of your keyword list, you'll want to have that as the outermost loop.
for phrase in keyword:
Now you need to split up the phrase into its actual words and determine if those words are in the whitelist. Then you need to put the words back together. You can do this in one line.
filtered = ", ".join(word in phrase.split(", ") if word in whitelist)
Breakdown: phrase.split(", ") gives you a list of strings that were separated by ", " in the original string -- i.e. the words you care about. word in ... if word in whitelist is a list comprehension. It will return a list of each word in ..., in this case phrase.split, that meets the condition word in whitelist. Finally, ", ".join(...) gives you a string made up of every element in the list ... connected by ", ".
Lastly, you need to put the newly filtered string into your list of filtered strings.
keyword_filter.append(filtered)
As a sidenote, I agree with others that you should use a set for your collection of whitelisted words. It has much faster lookup time. However, for a miniscule list of words like this example you won't notice a performance difference.
You could use regex:
import re
whitelist = ['Cat', 'Dog', 'Cow']
keyword = ['Cat, Cow, Horse', 'Bird, Whale, Dog', 'Pig, Chicken', 'Tiger, Cat']
keyword_filter = []
for words in keyword:
match = re.findall('(' + r'|'.join(whitelist) + ')[,\s]*', words)
keyword_filter.append(', '.join(match))
print(keyword_filter)
Right now I have a list of for example
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
I want to remove the words with the repeated letters, in which I want to remove the words
'aa','aac','bbb','bcca','ffffff'
Maybe import re?
Thanks to this thread: Regex to determine if string is a single repeating character
Here is the re version, but I would stick to PM2 ring and Tameem's solutions if the task was as simple as this:
import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
[i for i in data if not re.search(r'^(.)\1+$', i)]
Output
['dog', 'cat', 'a', 'aac', 'bcca']
And the other:
import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
[i for i in data if not re.search(r'((\w)\2{1,})', i)]
Output
['dog', 'cat', 'a']
Loop is the way to go. Forget about sets so far as they do not work for words with repetitive letters.
Here is a method you can use to determine if word is valid in a single loop:
def is_valid(word):
last_char = None
for i in word:
if i == last_char:
return False
last_char = i
return True
Example
In [28]: is_valid('dogo')
Out[28]: True
In [29]: is_valid('doo')
Out[29]: False
The original version of this question wanted to drop words that consist entirely of repetitions of a single character. An efficient way to do this is to use sets. We convert each word to a set, and if it consists of only a single character the length of that set will be 1. If that's the case, we can drop that word, unless the original word consisted of a single character.
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
newdata = [s for s in data if len(s) == 1 or len(set(s)) != 1]
print(newdata)
output
['dog', 'cat', 'a', 'aac', 'bcca']
Here's code for the new version of your question, where you want to drop words that contain any repeated characters. This one's simpler, because we don't need to make a special test for one-character words..
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
newdata = [s for s in data if len(set(s)) == len(s)]
print(newdata)
output
['dog', 'cat', 'a']
If the repetitions have to be consecutive, we can handle that using groupby.
from itertools import groupby
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow']
newdata = [s for s in data if max(len(list(g)) for _, g in groupby(s)) == 1]
print(newdata)
output
['dog', 'cat', 'a', 'abab', 'wow']
Here's a way to check if there are consecutive repeated characters:
def has_consecutive_repeated_letters(word):
return any(c1 == c2 for c1, c2 in zip(word, word[1:]))
You can then use a list comprehension to filter your list:
words = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow']
[word for word in words if not has_consecutive_repeated_letters(word)]
# ['dog', 'cat', 'a', 'abab', 'wow']
One line is all it takes :)
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
data = [value for value in data if(len(set(value))!=1 or len(value) ==1)]
print(data)
Output
['dog', 'cat', 'a', 'aac', 'bcca']
I have the following string
myString = "cat(50),dog(60),pig(70)"
I try to convert the above string to 2D array.
The result I want to get is
myResult = [['cat', 50], ['dog', 60], ['pig', 70]]
I already know the way to solve by using the legacy string method but it is quite complicated. So I don't want to use this approach.
# Legacy approach
# 1. Split string by ","
# 2. Run loop and split string by "(" => got the <name of animal>
# 3. Got the number by exclude ")".
Any suggestion would appreciate.
You can use the re.findall method:
>>> import re
>>> re.findall(r'(\w+)\((\d+)\)', myString)
[('cat', '50'), ('dog', '60'), ('pig', '70')]
If you want a list of lists as noticed by RomanPerekhrest convert it with a list comprehension:
>>> [list(t) for t in re.findall(r'(\w+)\((\d+)\)', myString)]
[['cat', '50'], ['dog', '60'], ['pig', '70']]
Alternative solution using re.split() function:
import re
myString = "cat(50),dog(60),pig(70)"
result = [re.split(r'[)(]', i)[:-1] for i in myString.split(',')]
print(result)
The output:
[['cat', '50'], ['dog', '60'], ['pig', '70']]
r'[)(]' - pattern, treats parentheses as delimiters for splitting
[:-1] - slice containing all items except the last one(which is empty space ' ')