So SO, i am trying to "merge" a string (a) and a list of strings (b):
a = '1234'
b = ['+', '-', '']
to get the desired output (c):
c = '1+2-34'
The characters in the desired output string alternate in terms of origin between string and list. Also, the list will always contain one element less than characters in the string. I was wondering what the fastest way to do this is.
what i have so far is the following:
c = a[0]
for i in range(len(b)):
c += b[i] + a[1:][i]
print(c) # prints -> 1+2-34
But i kind of feel like there is a better way to do this..
You can use itertools.zip_longest to zip the two sequences, then keep iterating even after the shorter sequence ran out of characters. If you run out of characters, you'll start getting None back, so just consume the rest of the numerical characters.
>>> from itertools import chain
>>> from itertools import zip_longest
>>> ''.join(i+j if j else i for i,j in zip_longest(a, b))
'1+2-34'
As #deceze suggested in the comments, you can also pass a fillvalue argument to zip_longest which will insert empty strings. I'd suggest his method since it's a bit more readable.
>>> ''.join(i+j for i,j in zip_longest(a, b, fillvalue=''))
'1+2-34'
A further optimization suggested by #ShadowRanger is to remove the temporary string concatenations (i+j) and replace those with an itertools.chain.from_iterable call instead
>>> ''.join(chain.from_iterable(zip_longest(a, b, fillvalue='')))
'1+2-34'
Related
I'm trying to create a list from a permutation of a str object. However the resultant list has duplicates. I have the following code:
from itertools import permutations
a = permutations('144')
b = [''.join(i) for i in a]
print(b)
What am I doing wrong? I'm getting the following:
['144', '144', '414', '441', '414', '441']
No. That is the expected result because there are duplicate characters in your input string.
If all you are interested are the elements of the permutation then pass your list through set. If instead you NEED a list, pass it through list again
Example:
from itertools import permutations
a = permutations('144')
b = set(''.join(i) for i in a)
c = list(set(''.join(i) for i in a)) # note that I've removed square brackets
print(b)
print(c)
Protip: use generator expressions wherever possible
You have duplicate elements (characters) in your string. The function permutations will not distinguish between them.
You will have no duplicates if you iterate the permutations of the set of the characters, e.g.:
from itertools import permutations
a = permutations(set('144'))
b = [''.join(i) for i in a]
print(b)
Or, if you want to iterate over the disctinct permutations of the string containing even duplicates of the same characters, you can use the set of the permutations, like:
from itertools import permutations
a = permutations('144')
b = [''.join(i) for i in set(a)]
print(b)
Python programs are often short and concise and what usually requires bunch of lines in other programming languages (that I know of) can be accomplished in a line or two in python.
One such program I am trying to write was to extract every other letters from a string.
I have this working code, but wondering if any other concise way is possible?
>>> s
'abcdefg'
>>> b = ""
>>> for i in range(len(s)):
... if (i%2)==0:
... b+=s[i]
...
>>> b
'aceg'
>>>
>>> 'abcdefg'[::2]
'aceg'
Use Explain Python's slice notation:
>>> 'abcdefg'[::2]
'aceg'
>>>
The format for slice notation is [start:stop:step]. So, [::2] is telling Python to step through the string by 2's (which will return every other character).
The right way to do this is to just slice the string, as in the other answers.
But if you want a more concise way to write your code, which will work for similar problems that aren't as simple as slicing, there are two tricks: comprehensions, and the enumerate function.
First, this loop:
for i in range(len(foo)):
value = foo[i]
something with value and i
… can be written as:
for i, value in enumerate(foo):
something with value and i
So, in your case:
for i, c in enumerate(s):
if (i%2)==0:
b+=c
Next, any loop that starts with an empty object, goes through an iterable (string, list, iterator, etc.), and puts values into a new iterable, possibly running the values through an if filter or an expression that transforms them, can be turned into a comprehension very easily.
While Python has comprehensions for lists, sets, dicts, and iterators, it doesn't have comprehensions for strings—but str.join solves that.
So, putting it together:
b = "".join(c for i, c in enumerate(s) if i%2 == 0)
Not nearly as concise or readable as b = s[::2]… but a lot better than what you started with—and the same idea works when you want to do more complicated things, like if i%2 and i%3 (which doesn't map to any obvious slice), or doubling each letter with c*2 (which could be done by zipping together two slices, but that's not immediately obvious), etc.
Here is another example both for list and string:
sentence = "The quick brown fox jumped over the lazy dog."
sentence[::2]
Here we are saying: Take the entire string from the beginning to the end and return every 2nd character.
Would return the following:
'Teqikbonfxjme vrtelz o.'
You can do the same for a list:
colors = ["red", "organge", "yellow","green", "blue"]
colors[1:4]
would retrun:
['organge', 'yellow', 'green']
The way I read the slice is: If we have sentence[1:4]
Start at index 1 (remember the starting position is index 0) and Stop BEFORE the index 4
you could try using slice and join:
>>> k = list(s)
>>> "".join(k[::2])
'aceg'
Practically, slicing is the best way to go. However, there are also ways you could improve your existing code, not by making it shorter, but by making it more Pythonic:
>>> s
'abcdefg'
>>> b = []
>>> for index, value in enumerate(s):
if index % 2 == 0:
b.append(value)
>>> b = "".join(b)
or even better:
>>> b = "".join(value for index, value in enumerate(s) if index % 2 == 0)
This can be easily extended to more complicated conditions:
>>> b = "".join(value for index, value in enumerate(s) if index % 2 == index % 3 == 0)
Suppose I have this list:
lis = ['a','b','c','d']
If I do 'x'.join(lis) the result is:
'axbxcxd'
What would be a clean, simple way to get this output?
'xaxbxcxdx'
I could write a helper function:
def joiner(s, it):
return s+s.join(it)+s
and call it like joiner('x',lis) which returns xaxbxcxdx, but it doesn't look as clean as it could be. Is there a better way to get this result?
>>> '{1}{0}{1}'.format(s.join(lis), s)
'xaxbxcxdx'
You can join a list that begins and ends with an empty string:
>>> 'x'.join(['', *lis, ''])
'xaxbxcxdx'
You can use f-string:
s = 'x'
f'{s}{s.join(lis)}{s}'
In Python 3.8 you can also use the walrus operator:
f"{(s:='x')}{s.join(lis)}{s}"
or
(s:='x') + s.join(lis) + s
You can use str.replace() to interleave the characters:
>>> lis = ['a','b','c','d']
>>> ''.join(lis).replace('', 'x')
'xaxbxcxdx'
On the other hand, your original solution (or a trivial modification with string formatting) is IMO actually pretty clean and readable.
You may also do it as
'x'.join([''] + lis + [''])
But I'm not sure if it's cleaner.
It will produce only 1 separator on empty list instead of 2 as one in the question.
A generator can interleave the characters, and the result can be joined without having to create intermediate strings.
L = list('abcd')
def mixer(chars, insert):
yield insert
for char in chars:
yield char
yield insert
result = ''.join(mixer(L, 'x')) # -> 'xaxbxcxdx'
While it isn't a one-liner, I think it is clean and simple, unlike these itertools creations that I came up with initially:
from itertools import repeat, starmap, zip_longest
from operator import add
# L must have a len, so doesn't work with generators
''.join(a for b in itertools.zip_longest(repeat('x', len(L) + 1),
L, fillvalue='')
for a in b)
# As above, and worse still creates lots of intermediate strings
''.join(starmap(add, zip_longest(repeat('x', len(L) + 1), L, fillvalue='')))
Arguably there is a much simpler approach to be found here - just add the character before and after your join function. Others have suggested f-strings, which is a fancy way of achieving the same thing. String concatenation is also fine:
lis = ['a','b','c','d']
lis_str = 'x' + 'x'.join(lis) + 'x'
If your string is long and you don't want to repeat it multiple times, you can just put this into a variable and do the same thing
lis = ['a','b','c','d']
join_str = 'x-marks-the-spot'
lis_str = join_str + join_str.join(lis) + join_str
what is the easiest way to sort a list of strings with digits at the end where some have 3 digits and some have 4:
>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']
should put the 1234 one on the end. is there an easy way to do this?
is there an easy way to do this?
Yes
You can use the natsort module.
>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
Full disclosure, I am the package's author.
is there an easy way to do this?
No
It's perfectly unclear what the real rules are. The "some have 3 digits and some have 4" isn't really a very precise or complete specification. All your examples show 4 letters in front of the digits. Is this always true?
import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
m = key_pat.match(item)
return m.group(1), int(m.group(2))
That key function might do what you want. Or it might be too complex. Or maybe the pattern is really r"^(.*)(\d{3,4})$" or maybe the rules are even more obscure.
>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
What you're probably describing is called a Natural Sort, or a Human Sort. If you're using Python, you can borrow from Ned's implementation.
The algorithm for a natural sort is approximately as follows:
Split each value into alphabetical "chunks" and numerical "chunks"
Sort by the first chunk of each value
If the chunk is alphabetical, sort it as usual
If the chunk is numerical, sort by the numerical value represented
Take the values that have the same first chunk and sort them by the second chunk
And so on
l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))
You need a key function. You're willing to specify 3 or 4 digits at the end and I have a feeling that you want them to compare numerically.
sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:])))
Without the lambda and conditional expression that's
def key(s):
if key[-4] in '0123456789':
return (s[:-4], int(s[-4:]))
else:
return (s[:-3], int(s[-3:]))
sorted(list_, key=key)
This just takes advantage of the fact that tuples sort by the first element, then the second. So because the key function is called to get a value to compare, the elements will now be compared like the tuples returned by the key function. For example, 'asdfbad123' will compare to 'asd7890' as ('asdfbad', 123) compares to ('asd', 7890). If the last 3 characters of a string aren't in fact digits, you'll get a ValueError which is perfectly appropriate given the fact that you passed it data that doesn't fit the specs it was designed for.
The issue is that the sorting is alphabetical here since they are strings. Each sequence of character is compared before moving to next character.
>>> 'a1234' < 'a124' <----- positionally '3' is less than '4'
True
>>>
You will need to due numeric sorting to get the desired output.
>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>>
L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))
rather than splitting each line myself, I ask python to do it for me with re.findall():
import re
import sys
def SortKey(line):
result = []
for part in re.findall(r'\D+|\d+', line):
try:
result.append(int(part, 10))
except (TypeError, ValueError) as _:
result.append(part)
return result
print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),
Is there a pythonic way to insert an element into every 2nd element in a string?
I have a string: 'aabbccdd' and I want the end result to be 'aa-bb-cc-dd'.
I am not sure how I would go about doing that.
>>> s = 'aabbccdd'
>>> '-'.join(s[i:i+2] for i in range(0, len(s), 2))
'aa-bb-cc-dd'
Assume the string's length is always an even number,
>>> s = '12345678'
>>> t = iter(s)
>>> '-'.join(a+b for a,b in zip(t, t))
'12-34-56-78'
The t can also be eliminated with
>>> '-'.join(a+b for a,b in zip(s[::2], s[1::2]))
'12-34-56-78'
The algorithm is to group the string into pairs, then join them with the - character.
The code is written like this. Firstly, it is split into odd digits and even digits.
>>> s[::2], s[1::2]
('1357', '2468')
Then the zip function is used to combine them into an iterable of tuples.
>>> list( zip(s[::2], s[1::2]) )
[('1', '2'), ('3', '4'), ('5', '6'), ('7', '8')]
But tuples aren't what we want. This should be a list of strings. This is the purpose of the list comprehension
>>> [a+b for a,b in zip(s[::2], s[1::2])]
['12', '34', '56', '78']
Finally we use str.join() to combine the list.
>>> '-'.join(a+b for a,b in zip(s[::2], s[1::2]))
'12-34-56-78'
The first piece of code is the same idea, but consumes less memory if the string is long.
If you want to preserve the last character if the string has an odd length, then you can modify KennyTM's answer to use itertools.izip_longest:
>>> s = "aabbccd"
>>> from itertools import izip_longest
>>> '-'.join(a+b for a,b in izip_longest(s[::2], s[1::2], fillvalue=""))
'aa-bb-cc-d'
or
>>> t = iter(s)
>>> '-'.join(a+b for a,b in izip_longest(t, t, fillvalue=""))
'aa-bb-cc-d'
I tend to rely on a regular expression for this, as it seems less verbose and is usually faster than all the alternatives. Aside from having to face down the conventional wisdom regarding regular expressions, I'm not sure there's a drawback.
>>> s = 'aabbccdd'
>>> '-'.join(re.findall('..', s))
'aa-bb-cc-dd'
This version is strict about actual pairs though:
>>> t = s + 'e'
>>> '-'.join(re.findall('..', t))
'aa-bb-cc-dd'
... so with a tweak you can be tolerant of odd-length strings:
>>> '-'.join(re.findall('..?', t))
'aa-bb-cc-dd-e'
Usually you're doing this more than once, so maybe get a head start by creating a shortcut ahead of time:
PAIRS = re.compile('..').findall
out = '-'.join(PAIRS(in))
Or what I would use in real code:
def rejoined(src, sep='-', _split=re.compile('..').findall):
return sep.join(_split(src))
>>> rejoined('aabbccdd', sep=':')
'aa:bb:cc:dd'
I use something like this from time to time to create MAC address representations from 6-byte binary input:
>>> addr = b'\xdc\xf7\x09\x11\xa0\x49'
>>> rejoined(addr[::-1].hex(), sep=':')
'49:a0:11:09:f7:dc'
Here is one list comprehension way with conditional value depending of modulus of enumeration, odd last character will be in group alone:
for s in ['aabbccdd','aabbccdde']:
print(''.join([ char if not ind or ind % 2 else '-' + char
for ind,char in enumerate(s)
]
)
)
""" Output:
aa-bb-cc-dd
aa-bb-cc-dd-e
"""
This one-liner does the trick. It will drop the last character if your string has an odd number of characters.
"-".join([''.join(item) for item in zip(mystring1[::2],mystring1[1::2])])
As PEP8 states:
Do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations.
A pythonic way of doing this that avoids this kind of concatenation, and allows you to join iterables other than strings could be:
':'.join(f'{s[i:i+2]}' for i in range(0, len(s), 2))
And another more functional-like way could be:
':'.join(map('{}{}'.format, *(s[::2], s[1::2])))
This second approach has a particular feature (or bug) of only joining pairs of letters. So:
>>> s = 'abcdefghij'
'ab:cd:ef:gh:ij'
and:
>>> s = 'abcdefghi'
'ab:cd:ef:gh'