Related
I have an array like below;
constants = ['(1,2)', '(1,5,1)', '1']
I would like to transform the array into like below;
constants = [(1,2), 1, 2, 3, 4, 5, 1]
For doing this, i tried some operations;
from ast import literal_eval
import numpy as np
constants = literal_eval(str(constants).replace("'",""))
constants = [(np.arange(*i) if len(i)==3 else i) if isinstance(i, tuple) else i for i in constants]
And the output was;
constants = [(1, 2), array([1, 2, 3, 4]), 1]
So, this is not expected result and I'm stuck in this step. The question is, how can i merge the array with its parent array?
This is one approach.
Demo:
from ast import literal_eval
constants = ['(1,2)', '(1,5,1)', '1']
res = []
for i in constants:
val = literal_eval(i) #Convert to python object
if isinstance(val, tuple): #Check if element is tuple
if len(val) == 3: #Check if no of elements in tuple == 3
val = list(val)
val[1]+=1
res.extend(range(*val))
continue
res.append(val)
print(res)
Output:
[(1, 2), 1, 2, 3, 4, 5, 1]
I'm going to assume that this question is very literal, and that you always want to transform this:
constants = ['(a, b)', '(x, y, z)', 'i']
into this:
transformed = [(a,b), x, x+z, x+2*z, ..., y, i]
such that the second tuple is a range from x to y with step z. So your final transformed array is the first element, then the range defined by your second element, and then your last element. The easiest way to do this is simply step-by-step:
constants = ['(a, b)', '(x, y, z)', 'i']
literals = [eval(k) for k in constants] # get rid of the strings
part1 = [literals[0]] # individually make each of the three parts of your list
part2 = [k for k in range(literals[1][0], literals[1][1] + 1, literals[1][2])] # or if you don't need to include y then you could just do range(literals[1])
part3 = [literals[2]]
transformed = part1 + part2 + part3
I propose the following:
res = []
for cst in constants:
if isinstance(cst,tuple) and (len(cst) == 3):
#add the range to the list
res.extend(range(cst[0],cst[1], cst[2]))
else:
res.append(cst)
res has the result you want.
There may be a more elegant way to solve it.
Please use code below to resolve parsing described above:
from ast import literal_eval
constants = ['(1,2)', '(1,5,1)', '1']
processed = []
for index, c in enumerate(constants):
parsed = literal_eval(c)
if isinstance(parsed, (tuple, list)) and index != 0:
processed.extend(range(1, max(parsed) + 1))
else:
processed.append(parsed)
print processed # [(1, 2), 1, 2, 3, 4, 5, 1]
I have strings in the format "1-3 6:10-11 7-9" and from them I want to create number sets as follows {1,2,3,6,10,11,7,8,9}.
For creating the set from the range of numbers, I have the following code:
def create_set(src):
lset = []
if len(src) > 0:
pos = src.find('-')
if pos != -1:
first = int(src[:pos])
last = int(src[pos+1:])
else:
return [int(src)] # Only one number
for j in range (first, last+1):
lset.append(j)
return set(lset)
But I cannot figure out how to correctly treat the ':' when it appears in the string. Can someone help me?
Thanks in advance!
EDIT: By the way, is there a more compact way of parsing such strings, perhaps using regular expressions?
Something like this might work for you:
s = '1-3 6:10-11 7-9'
s = s.replace(':', ' ')
lset = set()
fs = s.split()
for f in fs:
r = f.split('-')
if len(r)==1:
# add a single number
lset.add(int(r[0]))
else:
# add a range of numbers (inclusive of the endpoints)
lset |= set(range(int(r[0]), int(r[1])+1))
print(lset)
EDIT: By the way, is there a more compact way of parsing such strings,
perhaps using regular expressions?
Perhaps a cleaner (and slightly more efficient) way:
import re
import itertools
allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)
expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]
print {x for x in itertools.chain.from_iterable(expanded)}
Explanations:
Match all strings like 'a-b' or 'a:' and return a list of (a, b) and (a, '') pairs respectively:
allGroups = re.findall(r"(\d+)(?:-(\d+)|:)", s)
This produces:
[('1', '3'), ('6', ''), ('10', '11'), ('7', '9')]
Using list comprehension expand all pairs of (x, y) into the full list of numbers in the range (x, y + 1), taking care to handle the (x, '') case as (x, x+1):
expanded = [range(int(x), (int(x) if y == '' else int(y)) + 1) for x, y in allGroups]
This produces:
[[1, 2, 3], [6], [10, 11], [7, 8, 9]]
Use itertools.chain.from_iterable() to transform the list of lists into a single iterable which is iterated by a set comprehension into the final set:
print {x for x in itertools.chain.from_iterable(expanded)}
This produces:
set([1, 2, 3, 6, 7, 8, 9, 10, 11])
suppose the list
[7,7,7,7,3,1,5,5,1,4]
I would like to remove duplicates and get them counted while preserving the order of the list. To preserve the order of the list removing duplicates i use the function
def unique(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
that is giving to me the output
[7,3,1,5,1,4]
but the desired output i want would be (in the final list could exists) is:
[7,3,3,1,5,2,4]
7 is written because it's the first item in the list, then the following is checked if it's the different from the previous. If the answer is yes count the occurrences of the same item until a new one is found. Then repeat the procedure. Anyone more skilled than me that could give me a hint in order to get the desired output listed above? Thank you in advance
Perhaps something like this?
>>> from itertools import groupby
>>> seen = set()
>>> out = []
>>> for k, g in groupby(lst):
if k not in seen:
length = sum(1 for _ in g)
if length > 1:
out.extend([k, length])
else:
out.append(k)
seen.add(k)
...
>>> out
[7, 4, 3, 1, 5, 2, 4]
Update:
As per your comment I guess you wanted something like this:
>>> out = []
>>> for k, g in groupby(lst):
length = sum(1 for _ in g)
if length > 1:
out.extend([k, length])
else:
out.append(k)
...
>>> out
[7, 4, 3, 1, 5, 2, 1, 4]
Try this
import collections as c
lst = [7,7,7,7,3,1,5,5,1,4]
result = c.OrderedDict()
for el in lst:
if el not in result.keys():
result[el] = 1
else:
result[el] = result[el] + 1
print result
prints out: OrderedDict([(7, 4), (3, 1), (1, 2), (5, 2), (4, 1)])
It gives a dictionary though. For a list, use:
lstresult = []
for el in result:
# print k, v
lstresult.append(el)
if result[el] > 1:
lstresult.append(result[el] - 1)
It doesn't match your desired output but your desired output also seems like kind of a mangling of what is trying to be represented
Here is my list:
liPos = [(2,5),(8,9),(18,22)]
The first item of each tuple is the starting position and the second is the ending position.
Then I have a string like this:
s = "I hope that I will find an answer to my question!"
Now, considering my liPos list, I want to format the string by removing the chars between each starting and ending position (and including the surrounding numbers) provided in the tuples. Here is the result that I want:
"I tt I will an answer to my question!"
So basically, I want to remove the chars between 2 and 5 (including 2 and 5), then between 8,9 (including 8 and 9) and finally between 18,22 (including 18 and 22).
Any suggestion?
This assumes that liPos is already sorted, if it is not used sorted(liPos, reverse=True) in the for loop.
liPos = [(2,5),(8,9),(18,22)]
s = "I hope that I will find an answer to my question!"
for begin, end in reversed(liPos):
s = s[:begin] + s[end+1:]
print s
Here is an alternative method that constructs a new list of slice tuples to include, and then joining the string with only those included portions.
from itertools import chain, izip_longest
# second slice index needs to be increased by one, do that when creating liPos
liPos = [(a, b+1) for a, b in liPos]
result = "".join(s[b:e] for b, e in izip_longest(*[iter(chain([0], *liPos))]*2))
To make this slightly easier to understand, here are the slices generated by izip_longest:
>>> list(izip_longest(*[iter(chain([0], *liPos))]*2))
[(0, 2), (6, 8), (10, 18), (23, None)]
liPos = [(2,5),(8,9),(18,22)]
s = "I hope that I will find an answer to my question!"
exclusions = set().union(* (set(range(t[0], t[1]+1)) for t in liPos) )
pruned = ''.join(c for i,c in enumerate(s) if i not in exclusions)
print pruned
Here is one, compact possibility:
"".join(s[i] for i in range(len(s)) if not any(start <= i <= end for start, end in liPos))
This ... is a quick stab at the problem. There may be a better way, but it's a start at least.
>>> liPos = [(2,5),(8,9),(18,22)]
>>>
>>> toRemove = [i for x, y in liPos for i in range(x, y + 1)]
>>>
>>> toRemove
[2, 3, 4, 5, 8, 9, 18, 19, 20, 21, 22]
>>>
>>> s = "I hope that I will find an answer to my question!"
>>>
>>> s2 = ''.join([c for i, c in enumerate(s) if i not in toRemove])
>>>
>>> s2
'I tt I will an answer to my question!'
Given 2 lists:
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
I want to find the "overlap":
c = [3,4,5,5,6]
I'd also like it if i could extract the "remainder" the part of a and b that's not in c.
a_remainder = [5,]
b_remainder = [1,4,7,]
Note:
a has three 5's in it and b has two.
b has two 4's in it and a has one.
The resultant list c should have two 5's (limited by list b) and one 4 (limited by list a).
This gives me what i want, but I can't help but think there's a much better way.
import copy
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
c = []
for elem in copy.deepcopy(a):
if elem in b:
a.pop(a.index(elem))
c.append(b.pop(b.index(elem)))
# now a and b both contain the "remainders" and c contains the "overlap"
On another note, what is a more accurate name for what I'm asking for than "overlap" and "remainder"?
collection.Counter available in Python 2.7 can be used to implement multisets that do exactly what you want.
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
a_multiset = collections.Counter(a)
b_multiset = collections.Counter(b)
overlap = list((a_multiset & b_multiset).elements())
a_remainder = list((a_multiset - b_multiset).elements())
b_remainder = list((b_multiset - a_multiset).elements())
print overlap, a_remainder, b_remainder
Use python set
intersection = set(a) & set(b)
a_remainder = set(a) - set(b)
b_remainder = set(b) - set(a)
In the language of sets, overlap is 'intersection' and remainder is 'set difference'. If you had distinct items, you wouldn't have to do these operations yourself, check out http://docs.python.org/library/sets.html if you're interested.
Since we're not working with distinct elements, your approach is reasonable. If you wanted this to run faster, you could create a dictionary for each list and map the number to how many elements are in each array (e.g., in a, 3->1, 4->1, 5->2, etc.). You would then iterate through map a, determine if that letter existed, decrement its count and add it to the new list
Untested code, but this is the idea
def add_or_update(map,value):
if value in map:
map[value]+=1
else
map[value]=1
b_dict = dict()
for b_elem in b:
add_or_update(b_dict,b_elem)
intersect = []; diff = [];
for a_elem in a:
if a_elem in b_dict and b_dict[a_elem]>0:
intersect.add(a_elem);
for k,v in diff:
for i in range(v):
diff.add(k);
OK, verbose, but kind of cool (similar in spirit to the collections.Counter idea, but more home-made):
import itertools as it
flatten = it.chain.from_iterable
sorted(
v for u,v in
set(flatten(enumerate(g)
for k, g in it.groupby(a))).intersection(
set(flatten(enumerate(g)
for k, g in it.groupby(b))))
)
The basic idea is to make each of the lists into a new list which attaches a counter to each object, numbered to account for duplicates -- so that then you can then use set operations on these tuples after all.
To be slightly less verbose:
aa = set(flatten(enumerate(g) for k, g in it.groupby(a)))
bb = set(flatten(enumerate(g) for k, g in it.groupby(b)))
# aa = set([(0, 3), (0, 4), (0, 5), (0, 6), (1, 5), (2, 5)])
# bb = set([(0, 1), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (1, 4), (1, 5)])
cc = aa.intersection(bb)
# cc = set([(0, 3), (0, 4), (0, 5), (0, 6), (1, 5)])
c = sorted(v for u,v in cc)
# c = [3, 4, 5, 5, 6]
groupby -- produces a list of lists containing identical elements
(but because of the syntax needs the g for k,g in it.groupby(a) to extract each list)
enumerate -- appends a counter to each element of each sublist
flatten -- create a single list
set -- convert to a set
intersection -- find the common elements
sorted(v for u,v in cc) -- get rid of the counters and sort the result
Finally, I'm not sure what you mean by the remainders; it seems like it ought to be my aa-cc and bb-cc but I don't know where you get a_remainder = [4]:
sorted(v for u,v in aa-cc)
# [5]
sorted(v for u,v in bb-cc)
# [1, 4, 7]
A response from kerio in #python on freenode:
[ i for i in itertools.chain.from_iterable([k] * v for k, v in \
(Counter(a) & Counter(b)).iteritems())
]
Try difflib.SequenceMatcher(), "a flexible class for comparing pairs of sequences of any type"...
A quick try:
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
sm = difflib.SequenceMatcher(None, a, b)
c = []
a_remainder = []
b_remainder = []
for tag, i1, i2, j1, j2 in sm.get_opcodes():
if tag == 'replace':
a_remainder.extend(a[i1:i2])
b_remainder.extend(b[j1:j2])
elif tag == 'delete':
a_remainder.extend(a[i1:i2])
elif tag == 'insert':
b_remainder.extend(b[j1:j2])
elif tag == 'equal':
c.extend(a[i1:i2])
And now...
>>> print c
[3, 4, 5, 5, 6]
>>> print a_remainder
[5]
>>> print b_remainder
[1, 4, 7]
Aset = Set(a);
Bset = Set(b);
a_remainder = a.difference(b);
b_remainder = b.difference(a);
c = a.intersection(b);
But if you need c to have duplicates, and order is important for you,
you may look for w:Longest common subsequence problem
I don't think you should actually use this solution, but I took this opportunity to practice with lambda functions and here is what I came up with :)
a = [3,4,5,5,5,6]
b = [1,3,4,4,5,5,6,7]
dedup = lambda x: [set(x)] if len(set(x)) == len(x) else [set(x)] + dedup([x[i] for i in range(1, len(x)) if x[i] == x[i-1]])
default_set = lambda x: (set() if x[0] is None else x[0], set() if x[1] is None else x[1])
deduped = map(default_set, map(None, dedup(a), dedup(b)))
get_result = lambda f: reduce(lambda x, y: list(x) + list(y), map(lambda x: f(x[0], x[1]), deduped))
c = get_result(lambda x, y: x.intersection(y)) # [3, 4, 5, 6, 5]
a_remainder = get_result(lambda x, y: x.difference(y)) # [5]
b_remainder = get_result(lambda x, y: y.difference(x)) # [1, 7, 4]
I'm pretty sure izip_longest would have simplified this a bit (wouldn't have needed the default_set lambda), but I was testing this with Python 2.5.
Here are some of the intermediate values used in the calculation in case anyone wants to understand this:
dedup(a) = [set([3, 4, 5, 6]), set([5]), set([5])]
dedup(b) = [set([1, 3, 4, 5, 6, 7]), set([4, 5])]
deduped = [(set([3, 4, 5, 6]), set([1, 3, 4, 5, 6, 7])), (set([5]), set([4, 5])), (set([5]), set([]))]