Related
I created a list but, when printing, I need to add the 'and' right before the last item in the list. Example:
mylist = ['me', 'you', 'him', 'her']
When I print out the list I want it to look like:
me, you, him and her.
I don't want the ', [ or ] to show.
I'm currently using:
mylist = ['me', 'you', 'him', 'her']
print (','.join.(mylist))
but the output is me,you,him,her. I need it to show me, you, him and her.
Using str.join twice with rsplit:
mylist = ['me', 'you', 'him', 'her']
new_str = ' and '.join(', '.join(mylist).rsplit(', ', 1))
print(new_str)
Output:
me, you, him and her
This works fine with empty or single-element list:
new_str = ' and '.join(', '.join([]).rsplit(', ', 1))
print(new_str)
# None
new_str = ' and '.join(', '.join(['me']).rsplit(', ', 1))
print(new_str)
# me
I'm a huge fan of explicitness, so I might write this like:
def human_list(items):
# Empty list? Empty string.
if not items:
return ''
# One-item list? Return that item.
if len(items) == 1:
return items[0]
# For everything else, join all items *before* the last one with commas,
# then add ' and {last_item}' to the end.
return ', '.join(items[:-1]) + ' and ' + items[-1]
# Demonstrate that this works the way we want
assert human_list([]) == ''
assert human_list(['spam']) == 'spam'
assert human_list(['spam', 'eggs']) == 'spam and eggs'
assert human_list(['one', 'two', 'three']) == 'one, two and three'
assert human_list(['knife', 'fork', 'bottle', 'a cork']) == 'knife, fork, bottle and a cork'
You can do something like this:
mylist = ['me', 'you', 'him', 'her']
length = len(mylist)
for i,j in enumerate(mylist):
if i == length-2:
print(j,'and ',end='')
elif i == length-1:
print(j,end="")
else:
print(j,end=', ')
The below is a simple method if you don't want to go with slicing etc. It will allow you to reuse the functionality implemented (function calling) and also you can easily change the logic inside.
Note: If list is empty, a blank string will be returned
def get_string(l):
s = ""
index = 0
length = len(l)
while index < length:
word = l[index]
if index == length - 1:
s += 'and ' + word
else:
s += word + ", "
index += 1
return s.strip()
# Try
mylist = ['me', 'you', 'him', 'her']
print(get_string(mylist)) # me, you, him, and her
A helper function is probably a good way to go since it centralises control at one point, meaning you can fix bugs or make improvements easily (such as handling edge cases like empty lists). It also makes the main code easier to read since it simply contains something like readableList(myList).
The following function is all you need:
def readableList(pList):
if len(pList) == 0: return ""
if len(pList) == 1: return pList[0]
return ", ".join(pList[:-1]) + ' and ' + pList[-1]
For a test harness, you can use something like:
for myList in [['me', 'you', 'him', 'her'], ['one', 'two'], ['one'], []]:
print("{} -> '{}'".format(myList, readableList(myList)))
which gives the output:
['me', 'you', 'him', 'her'] -> 'me, you, him and her'
['one', 'two'] -> 'one and two'
['one'] -> 'one'
[] -> ''
Note that those quotes to the right of -> are added by my test harness just so you can see what the string is (no trailing spaces, showing empty strings, etc). As per your requirements, they do not come from the readableList function itself.
To add an element before the last element you can do this
last_element = mylist.pop()
mylist.append(' and ')
mylist.append(last_element)
my_string = ', 'join(mylist[:-2]) + mylist[-2] + mylist[-1]
print(my_string)
or
mylist.insert(-1, ' and ')
my_string = ', 'join(mylist[:-2]) + mylist[-2] + mylist[-1]
print(my_string)
But a better answer as given by LoMaPh in the comments is:
', '.join(mylist[:-1]) + ' and ' + mylist[-1]
Might it be possible to split a Python string (sentence) so it retains the whitespaces between words in the output, but within a split substring by appending it after each word?
For example:
given_string = 'This is my string!'
output = ['This ', 'is ', 'my ', 'string!']
I avoid regexes most of the time, but here it makes it really simple:
import re
given_string = 'This is my string!'
res = re.findall(r'\w+\W?', given_string)
# res ['This ', 'is ', 'my ', 'string!']
Maybe this will help?
>>> given_string = 'This is my string!'
>>> l = given_string.split(' ')
>>> l = [item + ' ' for item in l[:-1]] + l[-1:]
>>> l
['This ', 'is ', 'my ', 'string!']
just split and add the whitespace back:
a = " "
output = [e+a for e in given_string.split(a) if e]
output[len(output)-1] = output[len(output)-1][:-1]
the last line is for deleting space after thankyou!
This question already has answers here:
Preserve whitespaces when using split() and join() in python
(3 answers)
Closed 7 years ago.
I want to split strings based on whitespace and punctuation, but the whitespace and punctuation should still be in the result.
For example:
Input: text = "This is a text; this is another text.,."
Output: ['This', ' ', 'is', ' ', 'a', ' ', 'text', '; ', 'this', ' ', 'is', ' ', 'another', ' ', 'text', '.,.']
Here is what I'm currently doing:
def classify(b):
"""
Classify a character.
"""
separators = string.whitespace + string.punctuation
if (b in separators):
return "separator"
else:
return "letter"
def tokenize(text):
"""
Split strings to words, but do not remove white space.
The input must be of type str, not bytes
"""
if (len(text) == 0):
return []
current_word = "" + text[0]
previous_mode = classify(text)
offset = 1
results = []
while offset < len(text):
current_mode = classify(text[offset])
if current_mode == previous_mode:
current_word += text[offset]
else:
results.append(current_word)
current_word = text[offset]
previous_mode = current_mode
offset += 1
results.append(current_word)
return results
It works, but it's so C-style. Is there a better way in Python?
You can use a regular expression:
import re
re.split('([\s.,;()]+)', text)
This splits on arbitrary-width whitespace (including tabs and newlines) plus a selection of punctuation characters, and by grouping the split text you tell re.sub() to include it in the output:
>>> import re
>>> text = "This is a text; this is another text.,."
>>> re.split('([\s.,;()]+)', text)
['This', ' ', 'is', ' ', 'a', ' ', 'text', '; ', 'this', ' ', 'is', ' ', 'another', ' ', 'text', '.,.', '']
If you only wanted to match spaces (and not other whitespace), replace \s with a space:
>>> re.split('([ .,;()]+)', text)
['This', ' ', 'is', ' ', 'a', ' ', 'text', '; ', 'this', ' ', 'is', ' ', 'another', ' ', 'text', '.,.', '']
Note the extra trailing empty string; a split always has a head and a tail, so text starting or ending in a split group will always have an extra empty string at the start or end. This is easily removed.
I have a string with this pattern: repeat of a char in [' ', '.', "#"] plus space.
For example: # . #.
I want to split this string based on space separator (getting ['#', '.', ' ', '#'] but the problem is that space is one of characters itself, so split(" ") doesn't work.
How can I do this?
There's no need to use comprehensions here - you can just use a stepping slice:
>>> text = "# . #"
>>> text[::2]
'#. #'
>>> list(text[::2])
['#', '.', ' ', '#']
result = []
for c in yourString:
if c == ' ' and result[-1] == ' ':
continue
result.append(c)
Assuming exactly one space delimiter between each word, the below would work as well
str = "# . #."
result = []
for index,c in enumerate(str):
if index%2==0:
result.append(c)
If your string always has a (char,space,char,space,...) sequence, you can do:
new_list = [old_string[x] for x in range(0,len(old_string),2)]
>>> old_string = '# # # . #'
#Run code above
>>> print new_string
['#','#','#','.',' ','#']
I need to split strings of data using each character from string.punctuation and string.whitespace as a separator.
Furthermore, I need for the separators to remain in the output list, in between the items they separated in the string.
For example,
"Now is the winter of our discontent"
should output:
['Now', ' ', 'is', ' ', 'the', ' ', 'winter', ' ', 'of', ' ', 'our', ' ', 'discontent']
I'm not sure how to do this without resorting to an orgy of nested loops, which is unacceptably slow. How can I do it?
A different non-regex approach from the others:
>>> import string
>>> from itertools import groupby
>>>
>>> special = set(string.punctuation + string.whitespace)
>>> s = "One two three tab\ttabandspace\t end"
>>>
>>> split_combined = [''.join(g) for k, g in groupby(s, lambda c: c in special)]
>>> split_combined
['One', ' ', 'two', ' ', 'three', ' ', 'tab', '\t', 'tabandspace', '\t ', 'end']
>>> split_separated = [''.join(g) for k, g in groupby(s, lambda c: c if c in special else False)]
>>> split_separated
['One', ' ', 'two', ' ', 'three', ' ', 'tab', '\t', 'tabandspace', '\t', ' ', 'end']
Could use dict.fromkeys and .get instead of the lambda, I guess.
[edit]
Some explanation:
groupby accepts two arguments, an iterable and an (optional) keyfunction. It loops through the iterable and groups them with the value of the keyfunction:
>>> groupby("sentence", lambda c: c in 'nt')
<itertools.groupby object at 0x9805af4>
>>> [(k, list(g)) for k,g in groupby("sentence", lambda c: c in 'nt')]
[(False, ['s', 'e']), (True, ['n', 't']), (False, ['e']), (True, ['n']), (False, ['c', 'e'])]
where terms with contiguous values of the keyfunction are grouped together. (This is a common source of bugs, actually -- people forget that they have to sort by the keyfunc first if they want to group terms which might not be sequential.)
As #JonClements guessed, what I had in mind was
>>> special = dict.fromkeys(string.punctuation + string.whitespace, True)
>>> s = "One two three tab\ttabandspace\t end"
>>> [''.join(g) for k,g in groupby(s, special.get)]
['One', ' ', 'two', ' ', 'three', ' ', 'tab', '\t', 'tabandspace', '\t ', 'end']
for the case where we were combining the separators. .get returns None if the value isn't in the dict.
import re
import string
p = re.compile("[^{0}]+|[{0}]+".format(re.escape(
string.punctuation + string.whitespace)))
print p.findall("Now is the winter of our discontent")
I'm no big fan of using regexps for all problems, but I don't think you have much choice in this if you want it fast and short.
I'll explain the regexp since you're not familiar with it:
[...] means any of the characters inside the square brackets
[^...] means any of the characters not inside the square brackets
+ behind means one or more of the previous thing
x|y means to match either x or y
So the regexp matches 1 or more characters where either all must be punctuation and whitespace, or none must be. The findall method finds all non-overlapping matches of the pattern.
Try this:
import re
re.split('(['+re.escape(string.punctuation + string.whitespace)+']+)',"Now is the winter of our discontent")
Explanation from the Python documentation:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
Solution in linear (O(n)) time:
Let's say you have a string:
original = "a, b...c d"
First convert all separators to space:
splitters = string.punctuation + string.whitespace
trans = string.maketrans(splitters, ' ' * len(splitters))
s = original.translate(trans)
Now s == 'a b c d'. Now you can use itertools.groupby to alternate between spaces and non-spaces:
result = []
position = 0
for _, letters in itertools.groupby(s, lambda c: c == ' '):
letter_count = len(list(letters))
result.append(original[position:position + letter_count])
position += letter_count
Now result == ['a', ', ', 'b', '...', 'c', ' ', 'd'], which is what you need.
My take:
from string import whitespace, punctuation
import re
pattern = re.escape(whitespace + punctuation)
print re.split('([' + pattern + '])', 'now is the winter of')
Depending on the text you are dealing with, you may be able to simplify your concept of delimiters to "anything other than letters and numbers". If this will work, you can use the following regex solution:
re.findall(r'[a-zA-Z\d]+|[^a-zA-Z\d]', text)
This assumes that you want to split on each individual delimiter character even if they occur consecutively, so 'foo..bar' would become ['foo', '.', '.', 'bar']. If instead you expect ['foo', '..', 'bar'], use [a-zA-Z\d]+|[^a-zA-Z\d]+ (only difference is adding + at the very end).
from string import punctuation, whitespace
s = "..test. and stuff"
f = lambda s, c: s + ' ' + c + ' ' if c in punctuation else s + c
l = sum([reduce(f, word).split() for word in s.split()], [])
print l
For any arbitrary collection of separators:
def separate(myStr, seps):
answer = []
temp = []
for char in myStr:
if char in seps:
answer.append(''.join(temp))
answer.append(char)
temp = []
else:
temp.append(char)
answer.append(''.join(temp))
return answer
In [4]: print separate("Now is the winter of our discontent", set(' '))
['Now', ' ', 'is', ' ', 'the', ' ', 'winter', ' ', 'of', ' ', 'our', ' ', 'discontent']
In [5]: print separate("Now, really - it is the winter of our discontent", set(' ,-'))
['Now', ',', '', ' ', 'really', ' ', '', '-', '', ' ', 'it', ' ', 'is', ' ', 'the', ' ', 'winter', ' ', 'of', ' ', 'our', ' ', 'discontent']
Hope this helps
from itertools import chain, cycle, izip
s = "Now is the winter of our discontent"
words = s.split()
wordsWithWhitespace = list( chain.from_iterable( izip( words, cycle([" "]) ) ) )
# result : ['Now', ' ', 'is', ' ', 'the', ' ', 'winter', ' ', 'of', ' ', 'our', ' ', 'discontent', ' ']