python max function with mixed strings and numbers

python max function with mixed strings and numbers - python

Could someone explain to me why the following code :
li = [u'ansible-1.1.tar.gz', u'ansible-1.2.1.tar.gz', u'ansible-1.2.2.tar.gz', u'ansible-1.2.3.tar.gz',
u'ansible-1.2.tar.gz', u'ansible-1.3.0.tar.gz', u'ansible-1.3.1.tar.gz', u'ansible-1.3.2.tar.gz',
u'ansible-1.3.3.tar.gz', u'ansible-1.3.4.tar.gz', u'ansible-1.4.1.tar.gz', u'ansible-1.4.2.tar.gz',
u'ansible-1.4.3.tar.gz', u'ansible-1.4.4.tar.gz', u'ansible-1.4.tar.gz']
print(max(li))
returns :
ansible-1.4.tar.gz
Thank you
PS: It returns 1.4.4 when there are only numbers (1.4, 1.4.4, etc)

Because they are compared lexicographically:
>>> ord('t'), ord('4')
(116, 52)
>>> 't' > '4'
True
>>> 'ansible-1.4.tar.gz' > 'ansible-1.4.4.tar.gz'
True
To get ansible-1.4.4.tar.gz as result, you need to pass key function.
For example:
>>> li = [u'ansible-1.1.tar.gz', u'ansible-1.2.1.tar.gz', u'ansible-1.2.2.tar.gz', u'ansible-1.2.3.tar.gz',
... u'ansible-1.2.tar.gz', u'ansible-1.3.0.tar.gz', u'ansible-1.3.1.tar.gz', u'ansible-1.3.2.tar.gz',
... u'ansible-1.3.3.tar.gz', u'ansible-1.3.4.tar.gz', u'ansible-1.4.1.tar.gz', u'ansible-1.4.2.tar.gz',
... u'ansible-1.4.3.tar.gz', u'ansible-1.4.4.tar.gz', u'ansible-1.4.tar.gz']
>>>
>>> import re
>>> def get_version(fn):
... return list(map(int, re.findall(r'\d+', fn)))
...
>>> get_version(u'ansible-1.4.4.tar.gz')
[1, 4, 4]
>>> max(li, key=get_version)
'ansible-1.4.4.tar.gz'

Here is another good solution,
Python has its own module called pkg_resources which has method to parse_version
>>> from pkg_resources import parse_version
>>> max(li, key=parse_version)
u'ansible-1.4.4.tar.gz'
>>>

Related

Sort sets of objects by numbers in python

I have a sequence of lists such as :
>>> result
[['Human_COII_1000-4566_hsp', 'Human_COII_500-789_hsp', 'Human_COII_100-300_hsp'], ['Human_COI_100-300_hsp', 'Human_COI_500-789_hsp', 'Human_COI_1000-4566_hsp']]
and I would like with each list to sort them by the number-number and get:
[['Human_COII_100-300_hsp', 'Human_COII_500-789_hsp', 'Human_COII_1000-4566_hsp'], ['Human_COI_100-300_hsp', 'Human_COI_500-789_hsp', 'Human_COI_1000-4566_hsp']]
I tried:
for i in result:
sorted(i)
but the order is not the one I wanted.

You could make a new sorted list using comprehension,
>>> import re
>>> x
[set(['Human_COII_1000-4566_hsp', 'Human_COII_100-300_hsp', 'Human_COII_500-789_hsp']), set(['Human_COI_100-300_hsp', 'Human_COI_500-789_hsp', 'Human_COI_1000-4566_hsp'])]
>>>
>>> # for python2
>>> [sorted(y, key=lambda item: map(int, re.findall(r'\d+', item))) for y in x]
[['Human_COII_100-300_hsp', 'Human_COII_500-789_hsp', 'Human_COII_1000-4566_hsp'], ['Human_COI_100-300_hsp', 'Human_COI_500-789_hsp', 'Human_COI_1000-4566_hsp']]
>>>
>>> # python3
>>> [sorted(y, key=lambda item: tuple(map(int, re.findall(r'\d+', item)))) for y in x]

Very weird rruleset behavior

Using the latest dateutil available on pip, I'm getting strange time and ordering-dependent behavior when calling the count method using a recurring DAILY rrule.
>>> import dateutil
>>> dateutil.__version__
'2.4.2'
>>> from dateutil import rrule
>>> import datetime
>>> rules = rrule.rruleset()
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179 # ??? Expected 0
>>> rules = rrule.rruleset()
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179 # ??? Expected 0
>>> rules = rrule.rruleset()
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
0
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
0 # Now its working???
>>> rules = rrule.rruleset()
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179 # ??? Expected 0
>>> rules = rrule.rruleset()
>>> rules.count()
0
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
0 # WHAT???
>>> rules.count()
0
>>> rules = rrule.rruleset()
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179 # IM DONE... WTF

The answer is simple, its because you have not included the dtstart parameter when creating the ruleset, when that is not included it defaults to datetime.datetime.now() , which is the current time, and it contains components upto the current microsecond.
Hence, when you first create the ruleset using -
>>> rules = rrule.rruleset()
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179
You got entries by starting at the current time , upto microsecond.
After some time, when you again try -
rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
You are again creating a rrule.rrule object, by starting at current time , so its not the same as the previous one that you have created in rules.
To fix the issue, you can specify the dtstart attribute to make sure it starts at the same time.
Example -
>>> rules = rrule.rruleset()
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0), dtstart=datetime.datetime(now.year,now.month,now.day,0,0,0)))
>>> rules.count()
8179
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0), dtstart=datetime.datetime(now.year,now.month,now.day,0,0,0)))
>>> l3 = list(rules)
>>> len(l3)
0
>>> rules.count()
0
Similar issue occurs throughout your other examples.
Given the above, I think there is an issue in the dateutil code, where they are actually caching the count (length) of ruleset when you first time call count() , and then its correct length is only recalculated when you iterate over it, etc.
The issue occurs in rrulebase class, which is the base class for ruleset . The code from that is (source - https://github.com/dateutil/dateutil/blob/master/dateutil/rrule.py) -
def count(self):
""" Returns the number of recurrences in this set. It will have go
trough the whole recurrence, if this hasn't been done before. """
if self._len is None:
for x in self:
pass
return self._len
So, even after applying exrule() if you had previously called .count(), it would keep giving back the same count.
I am not 100% sure if its a bug or if its intended to behave like that , most probably it is a bug.
I have openned issue for this.

Regex how to check last 4 numbers from long number

I would like to check only last 4 digit number with python
for example, if I have following numbers and I want to check last four number whether it start from 10
or 02
201600001057 ( I want to get 1057)
201600000216 ( I want to get 0216)
Thanks in advance

Why would you use regex for this?
last4 = str(number)[-4:]
if last4.startswith(('10', '02')):
print("yes, actually")

You can do it without regexp
>>> s="201600001057"
>>> s[-4:]
"1057"
>>> s[-4:].isdigit()
True
>>> s="201600001057a"
>>> s[-4:].isdigit()
False

(?=(?:10|02))\d{4}$
This should do it.See demo.
http://regex101.com/r/kP4pZ2/18
print re.findall(r"(?=(?:10|02))\d{4}$",x,re.M)
x is your string.

You could use re.search or re.match. It would match the strings only if the last four numbers starts with 10 or 02
>>> s = "201600001057"
>>> s1 = "201600000216"
>>> re.search(r'(?:10|02)\d{2}$', s)
<_sre.SRE_Match object at 0x7fdbb2b6d3d8>
>>> re.search(r'(?:10|02)\d{2}$', s).group()
'1057'
>>> re.search(r'(?:10|02)\d{2}$', s1).group()
'0216'
>>> if re.search(r'(?:10|02)\d{2}$', s1):
... print 'Matches'
...
Matches
>>> if re.search(r'(?:10|02)\d{2}$', s):
... print 'Matches'
...
Matches

the findall function in re module can be used
>>> import re
>>> x="201600001057"
>>> re.findall('\d{4}$', x)
['1057']

(Python) Splitting string only on single instance of delimiter

I'm trying to extract numeric values from text strings that use dashes as delimiters, but also to indicate negative values:
"1.3" # [1.3]
"1.3-2-3.9" # [1.3, 2, 3.9]
"1.3-2--3.9" # [1.3, 2, -3.9]
"-1.3-2--3.9" # [-1.3, 2, -3.9]
At the moment, I'm manually checking for the "--" sequence, but this seems really ugly and prone to breaking.
def get_values(text):
return map(lambda s: s.replace('n', '-'), text.replace('--', '-n').split('-'))
I've tried a few different approaches, using both the str.split() function and re.findall(), but none of them have quite worked.
For example, the following pattern should match all the valid strings, but I'm not sure how to use it with findall:
r"^-?\d(\.\d*)?(--?\d(\.\d*)?)*$"
Is there a general way to do this that I'm not seeing? Thanks!

You can try to split with this pattern with a lookbehind:
(?<=[0-9])-
(An hyphen preceded by a digit)
>>> import re
>>> re.split('(?<=[0-9])-', text)
With this condition, you are sure to not be after the start of the string or after an other hyphen.

#CasimiretHippolyte has given a very elegant Regex solution, but I would like to point out that you can do this pretty succinctly with just a list comprehension, iter, and next:
>>> def get_values(text):
... it = iter(text.split("-"))
... return [x or "-"+next(it) for x in it]
...
>>> get_values("1.3")
['1.3']
>>> get_values("1.3-2-3.9")
['1.3', '2', '3.9']
>>> get_values("1.3-2--3.9")
['1.3', '2', '-3.9']
>>> get_values("-1.3-2--3.9")
['-1.3', '2', '-3.9']
>>>
Also, if you use timeit.timeit, you will see that this solution is quite a bit faster than using Regex:
>>> from timeit import timeit
>>>
>>> # With Regex
>>> def get_values(text):
... import re
... return re.split('(?<=[0-9])-', text)
...
>>> timeit('get_values("-1.3-2--3.9")', 'from __main__ import get_values')
9.999720634885165
>>>
>>> # Without Regex
>>> def get_values(text):
... it = iter(text.split("-"))
... return [x or "-"+next(it) for x in it]
...
>>> timeit('get_values("-1.3-2--3.9")', 'from __main__ import get_values')
4.145546989910741
>>>

Python Regular expression repeat

I have a string like this
--x123-09827--x456-9908872--x789-267504
I am trying to get all value like
123:09827
456:9908872
789:267504
I've tried (--x([0-9]+)-([0-9])+)+
but it only gives me last pair result, I am testing it through python
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> p = "(--x([0-9]+)-([0-9]+))+"
>>> re.match(p,x)
>>> re.match(p,x).groups()
('--x789-267504', '789', '267504')
How should I write with nested repeat pattern?
Thanks a lot!
David

Code it like this:
x = "--x123-09827--x456-9908872--x789-267504"
p = "--x(?:[0-9]+)-(?:[0-9]+)"
print re.findall(p,x)

Just use the .findall method instead, it makes the expression simpler.
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> r = re.compile(r"--x(\d+)-(\d+)")
>>> r.findall(x)
[('123', '09827'), ('456', '9908872'), ('789', '267504')]
You can also use .finditer which might be helpful for longer strings.
>>> [m.groups() for m in r.finditer(x)]
[('123', '09827'), ('456', '9908872'), ('789', '267504')]

Use re.finditer or re.findall. Then you don't need the extra pair of parentheses that wrap the entire expression. For example,
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> p = "--x([0-9]+)-([0-9]+)"
>>> for m in re.finditer(p,x):
>>> print '{0} {1}'.format(m.group(1),m.group(2))

try this
p='--x([0-9]+)-([0-9]+)'
re.findall(p,x)

No need to use regex :
>>> "--x123-09827--x456-9908872--x789-267504".replace('--x',' ').replace('-',':').strip()
'123:09827 456:9908872 789:267504'

You don't need regular expressions for this. Here is a simple one-liner, non-regex solution:
>>> input = "--x123-09827--x456-9908872--x789-267504"
>>> [ x.replace("-", ":") for x in input.split("--x")[1:] ]
['123:09827', '456:9908872', '789:267504']
If this is an exercise on regex, here is a solution that uses the repetition (technically), though the findall(...) solution may be preferred:
>>> import re
>>> input = "--x123-09827--x456-9908872--x789-267504"
>>> regex = '--x(.+)'
>>> [ x.replace("-", ":") for x in re.match(regex*3, input).groups() ]
['123:09827', '456:9908872', '789:267504']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python max function with mixed strings and numbers - python

Here is another good solution, Python has its own module called pkg_resources which has method to parse_version >>> from pkg_resources import parse_version >>> max(li, key=parse_version) u'ansible-1.4.4.tar.gz' >>>

Related

Sort sets of objects by numbers in python

Very weird rruleset behavior

Regex how to check last 4 numbers from long number

(Python) Splitting string only on single instance of delimiter

Python Regular expression repeat

Categories

Resources