This question already has answers here:
How to methodically join two lists?
(4 answers)
Closed 1 year ago.
I have a function that takes in a string, which is then divided into two strings, a and b. I want to perform some logic that will take the first word from a, append it to an array, then take the first word from b and append it to the same array. I'd like to loop through both strings until the array contains every other word from both strings. For example:
Start with
a = 'Hello are today'
b = 'how you'
End with
x = ['Hello', 'how', 'are', 'you', 'today']
Here's a one-liner that does it:
>>> import itertools
>>> [word for t in itertools.zip_longest(a.split(), b.split()) for word in t if word]
['Hello', 'how', 'are', 'you', 'today']
Using a for loop to iterate over the strings and add the words if they are within range:
def concate_strings(a, b):
a = a.split()
b = b.split()
x = []
for i in range(max(len(a), len(b))):
if i < len(a):
x.append(a[i])
if i < len(b):
x.append(b[i])
return x
if __name__ == "__main__":
a = 'Hello are today'
b = 'how you'
print(concate_strings(a, b))
Output:
['Hello', 'how', 'are', 'you', 'today']
Alternate solution:
You may have more than two strings and want to concatenate them together. In that case, you can pass a list of strings to the method and then concatenate them as their split length is within range.
def concate_strings_multiple(strings):
x = []
for i in range(max([len(s.split()) for s in strings])):
for s in strings:
if i < len(s.split()):
x.append(s.split()[i])
return x
if __name__ == "__main__":
a = 'Hello are today'
b = 'how you'
print(concate_strings_multiple([a, b]))
Output:
['Hello', 'how', 'are', 'you', 'today']
Related
This question already has answers here:
Removing duplicates in lists
(56 answers)
Closed 1 year ago.
I have a string like :
'hi', 'what', 'are', 'are', 'what', 'hi'
I want to remove a specific repeated word. For example:
'hi', 'what', 'are', 'are', 'what'
Here, I am just removing the repeated word of hi, and keeping rest of the repeated words.
How to do this using regex?
Regex is used for text search. You have structured data, so this is unnecessary.
def remove_all_but_first(iterable, removeword='hi'):
remove = False
for word in iterable:
if word == removeword:
if remove:
continue
else:
remove = True
yield word
Note that this will return an iterator, not a list. Cast the result to list if you need it to remain a list.
You can do this
import re
s= "['hi', 'what', 'are', 'are', 'what', 'hi']"
# convert string to list. Remove first and last char, remove ' and empty spaces
s=s[1:-1].replace("'",'').replace(' ','').split(',')
remove = 'hi'
# store the index of first occurance so that we can add it after removing all occurance
firstIndex = s.index(remove)
# regex to remove all occurances of a word
regex = re.compile(r'('+remove+')', flags=re.IGNORECASE)
op = regex.sub("", '|'.join(s)).split('|')
# clean up the list by removing empty items
while("" in op) :
op.remove("")
# re-insert the removed word in the same index as its first occurance
op.insert(firstIndex, remove)
print(str(op))
You don't need regex for that, convert the string to list and then you can find the index of the first occurrence of the word and filter it from a slice of the rest of the list
lst = "['hi', 'what', 'are', 'are', 'what', 'hi']"
lst = ast.literal_eval(lst)
word = 'hi'
index = lst.index('hi') + 1
lst = lst[:index] + [x for x in lst[index:] if x != word]
print(lst) # ['hi', 'what', 'are', 'are', 'what']
This question already has answers here:
Some built-in to pad a list in python
(14 answers)
Finding length of the longest list in an irregular list of lists
(10 answers)
Closed 6 months ago.
I have a list of lists of sentences and I want to pad all sentences so that they are of the same length.
I was able to do this but I am trying to find most optimal ways to do things and challenge myself.
max_length = max(len(sent) for sent in sents)
list_length = len(sents)
sents_padded = [[pad_token for i in range(max_length)] for j in range(list_length)]
for i,sent in enumerate(sents):
sents_padded[i][0:len(sent)] = sent
and I used the inputs:
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
Is my method an efficient way to do it or there are better ways to do it?
This is provided in itertools (in python3) for iteration, with zip_longest, which you can just invert normally with zip(*), and pass it to list if you prefer that over an iterator.
import itertools
from pprint import pprint
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
padded = zip(*itertools.zip_longest(*sents, fillvalue=pad_token))
pprint (list(padded))
[['Hello', 'World', 'Hi', 'Hi'],
['Where', 'are', 'you', 'Hi'],
['I', 'am', 'doing', 'fine']]
Here is how you can use str.ljust() to pad each string, and use max() with a key of len to find the number in which to pad each string:
lst = ['Hello World', 'Good day!', 'How are you?']
l = len(max(lst, key=len)) # The length of the longest sentence
lst = [s.ljust(l) for s in lst] # Pad each sentence with l
print(lst)
Output:
['Hello World ',
'Good day! ',
'How are you?']
Assumption:
The output should be the same as OP output (i.e. same number of words in each sublist).
Inputs:
sents = [["Hello","World"],["Where","are","you"],["I","am","doing","fine"]]
pad_token = "Hi"
Following 1-liner produces the same output as OP code.
sents_padded = [sent + [pad_token]*(max_length - len(sent)) for sent in sents]
print(sents_padded)
# [['Hello', 'World', 'Hi', 'Hi'], ['Where', 'are', 'you', 'Hi'], ['I', 'am', 'doing', 'fine']]
This seemed to be faster when I timed it:
maxi = 0
for sent in sents:
if sent.__len__() > maxi:
maxi = sent.__len__()
for sent in sents:
while sent.__len__() < maxi:
sent.append(pad_token)
print(sents)
This question already has answers here:
How to concatenate (join) items in a list to a single string
(11 answers)
Closed 3 years ago.
a = ['Hi How are you', 'i am doing fine', 'how about you']
Here a is a list of sentences. I need something like this using python.
result = [Hi How are you i am doing fine how about you]
The following code does this:
result = " ".join(a)
You can use str.join(iterable) - method takes input iterable like list, string, etc. and return a string which is the concatenation of the strings in iterable.
>>> [' '.join(a)]
['Hi How are you i am doing fine how about you']
In case you want to split the sentences further like the title suggests:
a = ['Hi How are you', 'i am doing fine', 'how about you']
words = [w for x in a for w in x.split()]
print(words)
Gives:
['Hi', 'How', 'are', 'you', 'i', 'am', 'doing', 'fine', 'how', 'about', 'you']
Using a nested list-comprehension
This question already has answers here:
Split Strings into words with multiple word boundary delimiters
(31 answers)
Closed 4 years ago.
The python code below reads 'resting-place' as one word.
The modified list shows up as: ['This', 'is', 'my', 'resting-place.']
I want it to show as: ['This', 'is', 'my', 'resting', 'place']
Thereby, giving me a total of 5 words instead of 4 words in the modified list.
original = 'This is my resting-place.'
modified = original.split()
print(modified)
numWords = 0
for word in modified:
numWords += 1
print ('Total words are:', numWords)
Output is:
Total words are: 4
I want the output to have 5 words.
To count number of words in a sentence with - separates to two words without splitting:
>>> original = 'This is my resting-place.'
>>> sum(map(original.strip().count, [' ','-'])) + 1
5
Here is the code:
s='This is my resting-place.'
len(s.split(" "))
4
You can use regex:
import re
original = 'This is my resting-place.'
print(re.split("\s+|-", original))
Output:
['This', 'is', 'my', 'resting', 'place.']
I think you will find what you want in this article, here you can find how to create a function where you can pass multiple parameter to split a string, in your case you'll be able to split that extra character
http://code.activestate.com/recipes/577616-split-strings-w-multiple-separators/
here is an example of the final result
>>> s = 'thing1,thing2/thing3-thing4'
>>> tsplit(s, (',', '/', '-'))
>>> ['thing1', 'thing2', 'thing3', 'thing4']
I have to match all the alphanumeric words from a text.
>>> import re
>>> text = "hello world!! how are you?"
>>> final_list = re.findall(r"[a-zA-Z0-9]+", text)
>>> final_list
['hello', 'world', 'how', 'are', 'you']
>>>
This is fine, but further I have few words to negate i.e. the words that shouldn't be in my final list.
>>> negate_words = ['world', 'other', 'words']
A bad way to do it
>>> negate_str = '|'.join(negate_words)
>>> filter(lambda x: not re.match(negate_str, x), final_list)
['hello', 'how', 'are', 'you']
But i can save a loop if my very first regex-pattern can be changed to consider negation of those words. I found negation of characters but i have words to negate, also i found regex-lookbehind in other questions, but that doesn't help either.
Can it be done using python re?
Update
My text can span a few hundered lines. Also, list of negate_words can be lengthy too.
Considering this, is using regex for such task, correct in the first place?? Any suggestions??
I don't think there is a clean way to do this using regular expressions. The closest I could find was bit ugly and not exactly what you wanted:
>>> re.findall(r"\b(?:world|other|words)|([a-zA-Z0-9]+)\b", text)
['hello', '', 'how', 'are', 'you']
Why not use Python's sets instead. They are very fast:
>>> list(set(final_list) - set(negate_words))
['hello', 'how', 'are', 'you']
If order is important, see the reply from #glglgl below. His list comprehension version is very readable. Here's a fast but less readable equivalent using itertools:
>>> negate_words_set = set(negate_words)
>>> list(itertools.ifilterfalse(negate_words_set.__contains__, final_list))
['hello', 'how', 'are', 'you']
Another alternative is the build-up the word list in a single pass using re.finditer:
>>> result = []
>>> negate_words_set = set(negate_words)
>>> result = []
>>> for mo in re.finditer(r"[a-zA-Z0-9]+", text):
word = mo.group()
if word not in negate_words_set:
result.append(word)
>>> result
['hello', 'how', 'are', 'you']
Maybe is worth trying pyparsing for this:
>>> from pyparsing import *
>>> negate_words = ['world', 'other', 'words']
>>> parser = OneOrMore(Suppress(oneOf(negate_words)) ^ Word(alphanums)).ignore(CharsNotIn(alphanums))
>>> parser.parseString('hello world!! how are you?').asList()
['hello', 'how', 'are', 'you']
Note that oneOf(negate_words) must be before Word(alphanums) to make sure that it matches earlier.
Edit: Just for the fun of it, I repeated the exercise using lepl (also an interesting parsing library)
>>> from lepl import *
>>> negate_words = ['world', 'other', 'words']
>>> parser = OneOrMore(~Or(*negate_words) | Word(Letter() | Digit()) | ~Any())
>>> parser.parse('hello world!! how are you?')
['hello', 'how', 'are', 'you']
Don't ask uselessly too much to regex.
Instead, think to generators.
import re
unwanted = ('world', 'other', 'words')
text = "hello world!! how are you?"
gen = (m.group() for m in re.finditer("[a-zA-Z0-9]+",text))
li = [ w for w in gen if w not in unwanted ]
And a generator can be created instead of li, also