Very weird rruleset behavior - python

Using the latest dateutil available on pip, I'm getting strange time and ordering-dependent behavior when calling the count method using a recurring DAILY rrule.
>>> import dateutil
>>> dateutil.__version__
'2.4.2'
>>> from dateutil import rrule
>>> import datetime
>>> rules = rrule.rruleset()
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179 # ??? Expected 0
>>> rules = rrule.rruleset()
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179 # ??? Expected 0
>>> rules = rrule.rruleset()
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
0
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
0 # Now its working???
>>> rules = rrule.rruleset()
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179 # ??? Expected 0
>>> rules = rrule.rruleset()
>>> rules.count()
0
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
0 # WHAT???
>>> rules.count()
0
>>> rules = rrule.rruleset()
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179 # IM DONE... WTF

The answer is simple, its because you have not included the dtstart parameter when creating the ruleset, when that is not included it defaults to datetime.datetime.now() , which is the current time, and it contains components upto the current microsecond.
Hence, when you first create the ruleset using -
>>> rules = rrule.rruleset()
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
>>> rules.count()
8179
You got entries by starting at the current time , upto microsecond.
After some time, when you again try -
rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0)))
You are again creating a rrule.rrule object, by starting at current time , so its not the same as the previous one that you have created in rules.
To fix the issue, you can specify the dtstart attribute to make sure it starts at the same time.
Example -
>>> rules = rrule.rruleset()
>>> rules.rrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0), dtstart=datetime.datetime(now.year,now.month,now.day,0,0,0)))
>>> rules.count()
8179
>>> rules.exrule(rrule.rrule(rrule.DAILY, until=datetime.datetime(2038,1,1,0,0,0), dtstart=datetime.datetime(now.year,now.month,now.day,0,0,0)))
>>> l3 = list(rules)
>>> len(l3)
0
>>> rules.count()
0
Similar issue occurs throughout your other examples.
Given the above, I think there is an issue in the dateutil code, where they are actually caching the count (length) of ruleset when you first time call count() , and then its correct length is only recalculated when you iterate over it, etc.
The issue occurs in rrulebase class, which is the base class for ruleset . The code from that is (source - https://github.com/dateutil/dateutil/blob/master/dateutil/rrule.py) -
def count(self):
""" Returns the number of recurrences in this set. It will have go
trough the whole recurrence, if this hasn't been done before. """
if self._len is None:
for x in self:
pass
return self._len
So, even after applying exrule() if you had previously called .count(), it would keep giving back the same count.
I am not 100% sure if its a bug or if its intended to behave like that , most probably it is a bug.
I have openned issue for this.

Related

Trying to find latest date in an array using max but getting different results

I am trying to find latest date in an array using python max but im getting different results
The expected result for this is fine
>>> a = ["10-09-1988","20-10-1999"]
>>> max(a)
'20-10-1999'
Since 20-10-1999 is latest date
But for this one
>>> a = ["10-10-1999","20-10-1988"]
>>> max(a)
'20-10-1988'
The expected output is
10-10-1999
Since 10-10-1999 is latest date but im getting 20-10-1988
How get latest date in array on elements in python (date format is dd-mm-yyy)
for the above one i want (10-10-1999) as output
you need to use key to convert the string to datetime object:
>>> import datetime
>>> a = ["10-09-1988","20-10-1999"]
>>> max(a, key=lambda x: datetime.datetime.strptime(x, "%d-%m-%Y"))
'20-10-1999'
In order to compare them, you need to convert the date represented as string to datetime object.
>>> from datetime import datetime
>>> a = ["10-10-1999","20-10-1988"]
>>> for i, date in enumerate(a):
... a[i] = datetime.strptime(a[i], "%d-%m-%Y")
...
>>> a
[datetime.datetime(1999, 10, 10, 0, 0), datetime.datetime(1988, 10, 20, 0, 0)]
>>> max(a)
datetime.datetime(1999, 10, 10, 0, 0)
You can convert string to datetime object, or maybe this following code can do this:
>>> a = ["10-09-1988","20-02-1989","20-09-1988"]
>>> max(a,key=lambda x:(x[6:10],x[3:5],x[:2]))
'20-02-1989'
It is more efficient than convert string to datetime object:
>>> import timeit
>>> s = """
... a=['28-07-2002', '12-02-1976', '23-10-1967', '27-04-1913', '05-06-1901', '06-12-1964', '04-12-1982', '03-04-1929', '07-02-1943', '03-08-1955']
... import datetime
... max(a, key=lambda x: datetime.datetime.strptime(x, "%d-%m-%Y"))
... """
>>> timeit.timeit(stmt=s,number=10000)
1.0200450420379639
>>> s = """
... a=['28-07-2002', '12-02-1976', '23-10-1967', '27-04-1913', '05-06-1901', '06-12-1964', '04-12-1982', '03-04-1929', '07-02-1943', '03-08-1955']
... import datetime
... max(a,key=lambda x:(x[6:10],x[3:5],x[:2]))
... """
>>> timeit.timeit(stmt=s,number=10000)
0.04339408874511719

python max function with mixed strings and numbers

Could someone explain to me why the following code :
li = [u'ansible-1.1.tar.gz', u'ansible-1.2.1.tar.gz', u'ansible-1.2.2.tar.gz', u'ansible-1.2.3.tar.gz',
u'ansible-1.2.tar.gz', u'ansible-1.3.0.tar.gz', u'ansible-1.3.1.tar.gz', u'ansible-1.3.2.tar.gz',
u'ansible-1.3.3.tar.gz', u'ansible-1.3.4.tar.gz', u'ansible-1.4.1.tar.gz', u'ansible-1.4.2.tar.gz',
u'ansible-1.4.3.tar.gz', u'ansible-1.4.4.tar.gz', u'ansible-1.4.tar.gz']
print(max(li))
returns :
ansible-1.4.tar.gz
Thank you
PS: It returns 1.4.4 when there are only numbers (1.4, 1.4.4, etc)
Because they are compared lexicographically:
>>> ord('t'), ord('4')
(116, 52)
>>> 't' > '4'
True
>>> 'ansible-1.4.tar.gz' > 'ansible-1.4.4.tar.gz'
True
To get ansible-1.4.4.tar.gz as result, you need to pass key function.
For example:
>>> li = [u'ansible-1.1.tar.gz', u'ansible-1.2.1.tar.gz', u'ansible-1.2.2.tar.gz', u'ansible-1.2.3.tar.gz',
... u'ansible-1.2.tar.gz', u'ansible-1.3.0.tar.gz', u'ansible-1.3.1.tar.gz', u'ansible-1.3.2.tar.gz',
... u'ansible-1.3.3.tar.gz', u'ansible-1.3.4.tar.gz', u'ansible-1.4.1.tar.gz', u'ansible-1.4.2.tar.gz',
... u'ansible-1.4.3.tar.gz', u'ansible-1.4.4.tar.gz', u'ansible-1.4.tar.gz']
>>>
>>> import re
>>> def get_version(fn):
... return list(map(int, re.findall(r'\d+', fn)))
...
>>> get_version(u'ansible-1.4.4.tar.gz')
[1, 4, 4]
>>> max(li, key=get_version)
'ansible-1.4.4.tar.gz'
Here is another good solution,
Python has its own module called pkg_resources which has method to parse_version
>>> from pkg_resources import parse_version
>>> max(li, key=parse_version)
u'ansible-1.4.4.tar.gz'
>>>

Split string into tuple (Upper,lower) 'ABCDefgh' . Python 2.7.6

my_string = 'ABCDefgh'
desired = ('ABCD','efgh')
the only way I can think of doing this is creating a for loop and then scanning through and checking each element in the string individually and adding to string and then creating the tuple . . . is there a more efficient way to do this?
it will always be in the format UPPERlower
print re.split("([A-Z]+)",my_string)[1:]
Simple way (two passes):
>>> import itertools
>>> my_string = 'ABCDefgh'
>>> desired = (''.join(itertools.takewhile(lambda c:c.isupper(), my_string)), ''.join(itertools.dropwhile(lambda c:c.isupper(), my_string)))
>>> desired
('ABCD', 'efgh')
Efficient way (one pass):
>>> my_string = 'ABCDefgh'
>>> uppers = []
>>> done = False
>>> i = 0
>>> while not done:
... c = my_string[i]
... if c.isupper():
... uppers.append(c)
... i += 1
... else:
... done = True
...
>>> lowers = my_string[i:]
>>> desired = (''.join(uppers), lowers)
>>> desired
('ABCD', 'efgh')
Because I throw itertools.groupby at everything:
>>> my_string = 'ABCDefgh'
>>> from itertools import groupby
>>> [''.join(g) for k,g in groupby(my_string, str.isupper)]
['ABCD', 'efgh']
(A little overpowered here, but scales up to more complicated problems nicely.)
my_string='ABCDefg'
import re
desired = (re.search('[A-Z]+',my_string).group(0),re.search('[a-z]+',my_string).group(0))
print desired
A more robust approach without using re
import string
>>> txt = "ABCeUiioualfjNLkdD"
>>> tup = (''.join([char for char in txt if char in string.ascii_uppercase]),
''.join([char for char in txt if char not in string.ascii_uppercase]))
>>> tup
('ABCUNLD', 'eiioualfjkd')
the char not in string.ascii_uppercase instead of char in string.ascii_lowercase means that you'll never lose any data in case your string has non-letters in it, which could be useful if you suddenly start having errors when this input starts being rejected 20 function calls later.

Python Regular expression repeat

I have a string like this
--x123-09827--x456-9908872--x789-267504
I am trying to get all value like
123:09827
456:9908872
789:267504
I've tried (--x([0-9]+)-([0-9])+)+
but it only gives me last pair result, I am testing it through python
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> p = "(--x([0-9]+)-([0-9]+))+"
>>> re.match(p,x)
>>> re.match(p,x).groups()
('--x789-267504', '789', '267504')
How should I write with nested repeat pattern?
Thanks a lot!
David
Code it like this:
x = "--x123-09827--x456-9908872--x789-267504"
p = "--x(?:[0-9]+)-(?:[0-9]+)"
print re.findall(p,x)
Just use the .findall method instead, it makes the expression simpler.
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> r = re.compile(r"--x(\d+)-(\d+)")
>>> r.findall(x)
[('123', '09827'), ('456', '9908872'), ('789', '267504')]
You can also use .finditer which might be helpful for longer strings.
>>> [m.groups() for m in r.finditer(x)]
[('123', '09827'), ('456', '9908872'), ('789', '267504')]
Use re.finditer or re.findall. Then you don't need the extra pair of parentheses that wrap the entire expression. For example,
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> p = "--x([0-9]+)-([0-9]+)"
>>> for m in re.finditer(p,x):
>>> print '{0} {1}'.format(m.group(1),m.group(2))
try this
p='--x([0-9]+)-([0-9]+)'
re.findall(p,x)
No need to use regex :
>>> "--x123-09827--x456-9908872--x789-267504".replace('--x',' ').replace('-',':').strip()
'123:09827 456:9908872 789:267504'
You don't need regular expressions for this. Here is a simple one-liner, non-regex solution:
>>> input = "--x123-09827--x456-9908872--x789-267504"
>>> [ x.replace("-", ":") for x in input.split("--x")[1:] ]
['123:09827', '456:9908872', '789:267504']
If this is an exercise on regex, here is a solution that uses the repetition (technically), though the findall(...) solution may be preferred:
>>> import re
>>> input = "--x123-09827--x456-9908872--x789-267504"
>>> regex = '--x(.+)'
>>> [ x.replace("-", ":") for x in re.match(regex*3, input).groups() ]
['123:09827', '456:9908872', '789:267504']

Move an entire element in with lxml.etree

Within lxml, is it possible, given an element, to move the entire thing elsewhere in the xml document without having to read all of it's children and recreate it? My best example would be changing parents. I've rummaged around the docs a bit but haven't had much luck. Thanks in advance!
.append, .insert and other operations do that by default
>>> from lxml import etree
>>> tree = etree.XML('<a><b><c/></b><d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_d.append(node_b)
>>> etree.tostring(tree) # complete 'b'-branch is now under 'd', after 'e'
'<a><d><e><f/></e><b><c/></b></d></a>'
>>> node_f = tree.xpath('/a/d/e/f')[0] # Nothing stops us from moving it again
>>> node_f.append(node_b) # Now 'b' and its child are under 'f'
>>> etree.tostring(tree)
'<a><d><e><f><b><c/></b></f></e></d></a>'
Be careful when moving nodes having a tail text. In lxml tail text belong to the node and moves around with it. (Also, when you delete a node, its tail text is also deleted)
>>> tree = etree.XML('<a><b><c/></b>TAIL<d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_d.append(node_b)
>>> etree.tostring(tree)
'<a><d><e><f/></e><b><c/></b>TAIL</d></a>'
Sometimes it's a desired effect, but sometimes you will need something like that:
>>> tree = etree.XML('<a><b><c/></b>TAIL<d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_a = tree.xpath('/a')[0]
>>> # Manually move text
>>> node_a.text = node_b.tail
>>> node_b.tail = None
>>> node_d.append(node_b)
>>> etree.tostring(tree)
>>> # Now TAIL text stays within its old place
'<a>TAIL<d><e><f/></e><b><c/></b></d></a>'
You could use .append(), .insert() methods to add a subelement to the existing element:
>>> from lxml import etree
>>> from_ = etree.fromstring("<from/>")
>>> to = etree.fromstring("<to/>")
>>> to.append(from_)
>>> etree.tostring(to)
'<to><from/></to>'

Categories

Resources