python sort strings with digits at the end - python

what is the easiest way to sort a list of strings with digits at the end where some have 3 digits and some have 4:
>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']
should put the 1234 one on the end. is there an easy way to do this?

is there an easy way to do this?
Yes
You can use the natsort module.
>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
Full disclosure, I am the package's author.

is there an easy way to do this?
No
It's perfectly unclear what the real rules are. The "some have 3 digits and some have 4" isn't really a very precise or complete specification. All your examples show 4 letters in front of the digits. Is this always true?
import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
m = key_pat.match(item)
return m.group(1), int(m.group(2))
That key function might do what you want. Or it might be too complex. Or maybe the pattern is really r"^(.*)(\d{3,4})$" or maybe the rules are even more obscure.
>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']

What you're probably describing is called a Natural Sort, or a Human Sort. If you're using Python, you can borrow from Ned's implementation.
The algorithm for a natural sort is approximately as follows:
Split each value into alphabetical "chunks" and numerical "chunks"
Sort by the first chunk of each value
If the chunk is alphabetical, sort it as usual
If the chunk is numerical, sort by the numerical value represented
Take the values that have the same first chunk and sort them by the second chunk
And so on

l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))

You need a key function. You're willing to specify 3 or 4 digits at the end and I have a feeling that you want them to compare numerically.
sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:])))
Without the lambda and conditional expression that's
def key(s):
if key[-4] in '0123456789':
return (s[:-4], int(s[-4:]))
else:
return (s[:-3], int(s[-3:]))
sorted(list_, key=key)
This just takes advantage of the fact that tuples sort by the first element, then the second. So because the key function is called to get a value to compare, the elements will now be compared like the tuples returned by the key function. For example, 'asdfbad123' will compare to 'asd7890' as ('asdfbad', 123) compares to ('asd', 7890). If the last 3 characters of a string aren't in fact digits, you'll get a ValueError which is perfectly appropriate given the fact that you passed it data that doesn't fit the specs it was designed for.

The issue is that the sorting is alphabetical here since they are strings. Each sequence of character is compared before moving to next character.
>>> 'a1234' < 'a124' <----- positionally '3' is less than '4'
True
>>>
You will need to due numeric sorting to get the desired output.
>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>>

L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))

rather than splitting each line myself, I ask python to do it for me with re.findall():
import re
import sys
def SortKey(line):
result = []
for part in re.findall(r'\D+|\d+', line):
try:
result.append(int(part, 10))
except (TypeError, ValueError) as _:
result.append(part)
return result
print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),

Related

How to change the index of an element in a list/array to another position/index without deleting/changing the original element and its value

For example lets say I have a list as below,
list = ['list4','this1','my3','is2'] or [1,6,'one','six']
So now I want to change the index of each element to match the number or make sense as I see fit (needn't be number) like so, (basically change the index of the element to wherever I want)
list = ['this1','is2','my3','list4'] or ['one',1,'six',6]
how do I do this whether there be numbers or not ?
Please help, Thanks in advance.
If you don't wanna use regex and learn it's mini language use this simpler method:
list1 = ['list4','this1', 'he5re', 'my3','is2']
def mySort(string):
if any(char.isdigit() for char in string): #Check if theres a number in the string
return [float(char) for char in string if char.isdigit()][0] #Return list of numbers, and return the first one (we are expecting only one number in the string)
list1.sort(key = mySort)
print(list1)
Inspired by this answer: https://stackoverflow.com/a/4289557/11101156
For the first one, it is easy:
>>> lst = ['list4','this1','my3','is2']
>>> lst = sorted(lst, key=lambda x:int(x[-1]))
>>> lst
['this1', 'is2', 'my3', 'list4']
But this assumes each item is string, and the last character of each item is numeric. Also it works as long as the numeric parts in each item is single digit. Otherwise it breaks. For the second one, you need to define "how you see it fit", in order to sort it in a logic.
If there are multiple numeric characters:
>>> import re
>>> lst = ['lis22t4','th2is21','my3','is2']
>>> sorted(lst, key=lambda x:int(re.search(r'\d+$', x).group(0)))
['is2', 'my3', 'list4', 'this21']
# or,
>>> ['is2', 'my3', 'lis22t4', 'th2is21']
But you can always do:
>>> lst = [1,6,'one','six']
>>> lst = [lst[2], lst[0], lst[3], lst[1]]
>>> lst
['one', 1, 'six', 6]
Also, don't use python built-ins as variable names. list is a bad variable name.
If you just want to move element in position 'y' to position 'x' of a list, you can try this one-liner, using pop and insert:
lst.insert(x, lst.pop(y))
If you know the order how you want to change indexes you can write simple code:
old_list= ['list4','this1','my3','is2']
order = [1, 3, 2, 0]
new_list = [old_list[idx] for idx in order]
If you can write your logic as a function, you can use sorted() and pass your function name as a key:
old_list= ['list4','this1','my3','is2']
def extract_number(string):
digits = ''.join([c for c in string if c.isdigit()])
return int(digits)
new_list = sorted(old_list, key = extract_number)
This case list is sorted by number, which is constructed by combining digits found in a string.
a = [1,2,3,4]
def rep(s, l, ab):
id = l.index(s)
q = s
del(l[id])
l.insert(ab, q)
return l
l = rep(a[0], a, 2)
print(l)
Hope you like this
Its much simpler

sorting list of a sentence and number

I have checked several of the answers on how to sort lists in python, but I can't figure this one out.
Let's say I have a list like this:
['Today is a good day,1', 'yesterday was a strange day,2', 'feeling hopeful,3']
Is there a way to sort by the number after each sentence?
I am trying to learn this stuff on my own, so I tried stuff like:
def sortMyList(string):
return len(string)-1
sortedList = sorted(MyList, key=sortMyList())
But of course it doesn't work becaue sortMyList expects one parameter.
Since no one has commented on your coding attempts so far:
def sortMyList(string):
return len(string)-1
sortedList = sorted(MyList, key=sortMyList())
You are on your way, but there are a few issues. First, the key argument expects a function. That function should be sortMyList. sortMyList() would be the result of calling a function - and besides, your function has a parameter (as it should), so calling it with no arguments wouldn't work. Just refer to the function itself.
sortedList = sorted(MyList, key=sortMyList)
Next, you need to tell sorted what is actually being compared when you compare two strings. len(string)-1 gets the length of the string and subtracts one. This would have the effect of sorting the strings by their length, which isn't what you're looking for. You want the character in the string at that index, so sorted will look at all those characters to form a basis for comparison.
def sortMyList(string):
return string[len(string)-1]
Next, you can use a negative index instead of calculating the length of the string, to directly get the last character:
def sortMyList(string):
return string[-1]
Next, we'd like to handle multi-digit numbers. It looks like there's a comma right before the number, so we'll split on that, starting from the right (in case the sentence itself has a comma). We only need the first split, so we'll specify a maxsplit of 1:
def sortMyList(string):
return string.rsplit(',', maxsplit=1)[1]
This will run into a problem: these "numbers" are actually still strings, so when you compare them, it will do so alphabetically, putting "10" before "2" and so on. To fix this, turn the number into an integer before returning it:
def sortMyList(string):
return int(string.rsplit(',', maxsplit=1)[1])
Putting it all together:
def sortMyList(string):
return int(string.rsplit(',', maxsplit=1)[1])
sortedList = sorted(MyList, key=sortMyList)
You can do this
>>> sorted(l, key=lambda x : int(x.split(',')[-1]))
['Today is a good day,1', 'yesterday was a strange day,2', 'feeling hopeful,3']
>>>
This would also work if you happen to have numbers in your string that have more than one digit
>>> l = ['Today is a good day,12', 'yesterday was a strange day,21', 'feeling hopeful,23']
>>> sorted(l, key=lambda x : int(x.split(',')[1]))
['Today is a good day,12', 'yesterday was a strange day,21', 'feeling hopeful,23'] # still works
>>> sorted(l, key=lambda x : x[-1])
['yesterday was a strange day,21', 'Today is a good day,12', 'feeling hopeful,23'] # doesn't work in this scenario
This worked for me:
sorted(myList, key=lambda x: x[-1])
If you need to go into double digits:
sorted(myList, key=lambda x: int(x.split(',')[1]))

Incorporate string with list entries - alternating

So SO, i am trying to "merge" a string (a) and a list of strings (b):
a = '1234'
b = ['+', '-', '']
to get the desired output (c):
c = '1+2-34'
The characters in the desired output string alternate in terms of origin between string and list. Also, the list will always contain one element less than characters in the string. I was wondering what the fastest way to do this is.
what i have so far is the following:
c = a[0]
for i in range(len(b)):
c += b[i] + a[1:][i]
print(c) # prints -> 1+2-34
But i kind of feel like there is a better way to do this..
You can use itertools.zip_longest to zip the two sequences, then keep iterating even after the shorter sequence ran out of characters. If you run out of characters, you'll start getting None back, so just consume the rest of the numerical characters.
>>> from itertools import chain
>>> from itertools import zip_longest
>>> ''.join(i+j if j else i for i,j in zip_longest(a, b))
'1+2-34'
As #deceze suggested in the comments, you can also pass a fillvalue argument to zip_longest which will insert empty strings. I'd suggest his method since it's a bit more readable.
>>> ''.join(i+j for i,j in zip_longest(a, b, fillvalue=''))
'1+2-34'
A further optimization suggested by #ShadowRanger is to remove the temporary string concatenations (i+j) and replace those with an itertools.chain.from_iterable call instead
>>> ''.join(chain.from_iterable(zip_longest(a, b, fillvalue='')))
'1+2-34'

Python comparing two strings

Is there a function to compare how many characters two strings (of the same length) differ by? I mean only substitutions. For example, AAA would differ from AAT by 1 character.
This will work:
>>> str1 = "AAA"
>>> str2 = "AAT"
>>> sum(1 for x,y in enumerate(str1) if str2[x] != y)
1
>>> str1 = "AAABBBCCC"
>>> str2 = "ABCABCABC"
>>> sum(1 for x,y in enumerate(str1) if str2[x] != y)
6
>>>
The above solution uses sum, enumerate, and a generator expression.
Because True can evaluate to 1, you could even do:
>>> str1 = "AAA"
>>> str2 = "AAT"
>>> sum(str2[x] != y for x,y in enumerate(str1))
1
>>>
But I personally prefer the first solution because it is clearer.
This is a nice use case for the zip function!
def count_substitutions(s1, s2):
return sum(x != y for (x, y) in zip(s1, s2))
Usage:
>>> count_substitutions('AAA', 'AAT')
1
From the docs:
zip(...)
zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
Return a list of tuples, where each tuple contains the i-th element
from each of the argument sequences. The returned list is truncated
in length to the length of the shortest argument sequence.
Building on what poke said I would suggest the jellyfish package. It has several distance measures like what you are asking for. Example from the documentation:
IN [1]: jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')
OUT[1]: 1
or using your example:
IN [2]: jellyfish.damerau_levenshtein_distance('AAA','AAT')
OUT[2]: 1
This will work for many different string lengths and should be able to handle most of what you throw at it.
Similar to simon's answer, but you don't have to zip things in order to just call a function on the resulting tuples because that's what map does anyway (and itertools.imap in Python 2). And there's a handy function for != in operator. Hence:
sum(map(operator.ne, s1, s2))

How to insert a character after every 2 characters in a string

Is there a pythonic way to insert an element into every 2nd element in a string?
I have a string: 'aabbccdd' and I want the end result to be 'aa-bb-cc-dd'.
I am not sure how I would go about doing that.
>>> s = 'aabbccdd'
>>> '-'.join(s[i:i+2] for i in range(0, len(s), 2))
'aa-bb-cc-dd'
Assume the string's length is always an even number,
>>> s = '12345678'
>>> t = iter(s)
>>> '-'.join(a+b for a,b in zip(t, t))
'12-34-56-78'
The t can also be eliminated with
>>> '-'.join(a+b for a,b in zip(s[::2], s[1::2]))
'12-34-56-78'
The algorithm is to group the string into pairs, then join them with the - character.
The code is written like this. Firstly, it is split into odd digits and even digits.
>>> s[::2], s[1::2]
('1357', '2468')
Then the zip function is used to combine them into an iterable of tuples.
>>> list( zip(s[::2], s[1::2]) )
[('1', '2'), ('3', '4'), ('5', '6'), ('7', '8')]
But tuples aren't what we want. This should be a list of strings. This is the purpose of the list comprehension
>>> [a+b for a,b in zip(s[::2], s[1::2])]
['12', '34', '56', '78']
Finally we use str.join() to combine the list.
>>> '-'.join(a+b for a,b in zip(s[::2], s[1::2]))
'12-34-56-78'
The first piece of code is the same idea, but consumes less memory if the string is long.
If you want to preserve the last character if the string has an odd length, then you can modify KennyTM's answer to use itertools.izip_longest:
>>> s = "aabbccd"
>>> from itertools import izip_longest
>>> '-'.join(a+b for a,b in izip_longest(s[::2], s[1::2], fillvalue=""))
'aa-bb-cc-d'
or
>>> t = iter(s)
>>> '-'.join(a+b for a,b in izip_longest(t, t, fillvalue=""))
'aa-bb-cc-d'
I tend to rely on a regular expression for this, as it seems less verbose and is usually faster than all the alternatives. Aside from having to face down the conventional wisdom regarding regular expressions, I'm not sure there's a drawback.
>>> s = 'aabbccdd'
>>> '-'.join(re.findall('..', s))
'aa-bb-cc-dd'
This version is strict about actual pairs though:
>>> t = s + 'e'
>>> '-'.join(re.findall('..', t))
'aa-bb-cc-dd'
... so with a tweak you can be tolerant of odd-length strings:
>>> '-'.join(re.findall('..?', t))
'aa-bb-cc-dd-e'
Usually you're doing this more than once, so maybe get a head start by creating a shortcut ahead of time:
PAIRS = re.compile('..').findall
out = '-'.join(PAIRS(in))
Or what I would use in real code:
def rejoined(src, sep='-', _split=re.compile('..').findall):
return sep.join(_split(src))
>>> rejoined('aabbccdd', sep=':')
'aa:bb:cc:dd'
I use something like this from time to time to create MAC address representations from 6-byte binary input:
>>> addr = b'\xdc\xf7\x09\x11\xa0\x49'
>>> rejoined(addr[::-1].hex(), sep=':')
'49:a0:11:09:f7:dc'
Here is one list comprehension way with conditional value depending of modulus of enumeration, odd last character will be in group alone:
for s in ['aabbccdd','aabbccdde']:
print(''.join([ char if not ind or ind % 2 else '-' + char
for ind,char in enumerate(s)
]
)
)
""" Output:
aa-bb-cc-dd
aa-bb-cc-dd-e
"""
This one-liner does the trick. It will drop the last character if your string has an odd number of characters.
"-".join([''.join(item) for item in zip(mystring1[::2],mystring1[1::2])])
As PEP8 states:
Do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations.
A pythonic way of doing this that avoids this kind of concatenation, and allows you to join iterables other than strings could be:
':'.join(f'{s[i:i+2]}' for i in range(0, len(s), 2))
And another more functional-like way could be:
':'.join(map('{}{}'.format, *(s[::2], s[1::2])))
This second approach has a particular feature (or bug) of only joining pairs of letters. So:
>>> s = 'abcdefghij'
'ab:cd:ef:gh:ij'
and:
>>> s = 'abcdefghi'
'ab:cd:ef:gh'

Categories

Resources