Related
Suppose I have the following string:
trend = '(A|B|C)_STRING'
I want to expand this to:
A_STRING
B_STRING
C_STRING
The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)
would expand to
STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D
I also want to cover the case of an empty conditional:
(|A_)STRING would expand to:
A_STRING
STRING
Here's what I've tried so far:
def expandOr(trend):
parenBegin = trend.index('(') + 1
parenEnd = trend.index(')')
orExpression = trend[parenBegin:parenEnd]
originalTrend = trend[0:parenBegin - 1]
expandedOrList = []
for oe in orExpression.split("|"):
expandedOrList.append(originalTrend + oe)
But this is obviously not working.
Is there any easy way to do this using regex?
Here's a pretty clean way. You'll have fun figuring out how it works :-)
def expander(s):
import re
from itertools import product
pat = r"\(([^)]*)\)"
pieces = re.split(pat, s)
pieces = [piece.split("|") for piece in pieces]
for p in product(*pieces):
yield "".join(p)
Then:
for s in ('(A|B|C)_STRING',
'(|A_)STRING',
'STRING_(A|B)_STRING_(C|D)'):
print s, "->"
for t in expander(s):
print " ", t
displays:
(A|B|C)_STRING ->
A_STRING
B_STRING
C_STRING
(|A_)STRING ->
STRING
A_STRING
STRING_(A|B)_STRING_(C|D) ->
STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D
import exrex
trend = '(A|B|C)_STRING'
trend2 = 'STRING_(A|B)_STRING_(C|D)'
>>> list(exrex.generate(trend))
[u'A_STRING', u'B_STRING', u'C_STRING']
>>> list(exrex.generate(trend2))
[u'STRING_A_STRING_C', u'STRING_A_STRING_D', u'STRING_B_STRING_C', u'STRING_B_STRING_D']
I would do this to extract the groups:
def extract_groups(trend):
l_parens = [i for i,c in enumerate(trend) if c == '(']
r_parens = [i for i,c in enumerate(trend) if c == ')']
assert len(l_parens) == len(r_parens)
return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]
And then you can evaluate the product of those extracted groups using itertools.product:
expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]
Now it's just a question of splicing those back onto your original expression. I'll use re for that :)
#python3.3+
def _gen(it):
yield from it
p = re.compile('\(.*?\)')
for tup in product(*extract_groups(trend)):
gen = _gen(tup)
print(p.sub(lambda x: next(gen),trend))
STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D
There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.
It is easy to achieve with sre_yield module:
>>> import sre_yield
>>> trend = '(A|B|C)_STRING'
>>> strings = list(sre_yield.AllStrings(trend))
>>> print(strings)
['A_STRING', 'B_STRING', 'C_STRING']
The goal of sre_yield is to efficiently generate all values that can match a given regular expression, or count possible matches efficiently... It does this by walking the tree as constructed by sre_parse (same thing used internally by the re module), and constructing chained/repeating iterators as appropriate. There may be duplicate results, depending on your input string though -- these are cases that sre_parse did not optimize.
I have string below,and I want to get list,dict,var from this string.
How can I to split this string to specific format?
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
print('m1:',i)
I only get result 1 correctly.
Does anyone know how to do?
m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],
Use '=' to split instead, then you can work around with variable name and it's value.
You still need to handle the type casting for values (regex, split, try with casting may help).
Also, same as others' comment, using dict may be easier to handle
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []
for a in al[1:-1]:
var_l.append(a.split(',')[-1])
value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])
output = dict(zip(var_l, value_l))
print(output)
You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:
re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'),
# ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]
The answer is like below
import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
temp_d = {}
for i,j in m1:
temp = i.strip(',').split(',')
if len(temp)>1:
for k in temp[:-1]:
temp_d[k]=''
temp_d[temp[-1]] = j
else:
temp_d[temp[0]] = j
pprint(temp_d)
Output is like
{'Record': '',
'Save': '',
'a': '3',
'b': '1.3',
'c': 'abch',
'dict_a': '{a:2,b:3}',
'list_a': '[1]',
'list_c': '[1,2]'}
Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):
regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]
This gives a list of all the identifiers in the string:
['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']
We can now define a function to sequentially chop up s using the
above list to partition the string sequentially:
def chop(mystr, mylist):
temp = mystr.partition(mylist[0])[2]
cut = temp.find(mylist[1]) #strip leading bits
return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
mystr, mylist = chop(mystr, mylist)
temp.append(mystr)
This (convoluted) slicing operation gives this list of strings:
['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']
Now cut off the ends using each successive entry:
result = []
for x in range(len(temp) - 1):
cut = temp[x].find(temp[x+1]) - 1 #-1 to remove commas
result.append(temp[x][:cut])
result.append(temp.pop()) #get the last item
Now we have the full list:
['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']
Each element is easily parsable into key:value pairs (and is also executable via exec).
I have just come across an interesting interview style type of question which I couldn't get my head around.
Basically, given a number to alphabet mapping such that [1:A, 2:B, 3:C ...], print out all possible combinations.
For instance "123" will generate [ABC, LC, AW] since it can be separated into 12,3 and 1,23.
I'm thinking it has to be some type of recursive function where it checks with windows of size 1 and 2 and appending to a previous result if it's a valid letter mapping.
If anyone can formulate some pseudo/python code that'd be much appreciated.
So I managed to hack together an answer, it's not as pythonic as I'd like and there may be some redundancies, but it works with the 123 example to output ABC,AW, and LC.
I'll probably clean it up tomorrow (or if someone wants to clean it up), just posting it in case someone is also working on it and is wondering.
def num_to_alphabet(numbers, ans = ""):
if not numbers:
print ans
numbers = str(numbers)
window = numbers[:2]
alph = string.uppercase
ans = ans[:]
ans2 = ans[:]
window_val = ""
try:
if window[0]:
val = int(numbers[0])-1
if alph[val]:
ans += alph[val]
num_to_alphabet(numbers[1:], ans)
if window[1]:
val = int(window) -1
if alph[val]:
ans2 += alph[val]
if len(window) > 1:
num_to_alphabet(numbers[2:],ans2)
else:
num_to_alphabet(numbers[1:],ans2)
except IndexError:
pass
As simple as a tree
Let suppose you have give "1261"
Construct a tree with it a Root .
By defining the node(left , right ) , where left is always direct map and right is combo
version suppose for the if you take given Number as 1261
1261 ->
(1(261) ,12(61)) -> 1 is left-node(direct map -> a) 12 is right node(combo-map1,2->L)
(A(261) , L(61)) ->
(A(2(61),26(1))) ,L(6(1)) ->
(A(B(6(1)),Z(1)) ,L(F(1))) ->
(A(B(F(1)),Z(A)) ,L(F(A))) ->
(A(B(F(A)),Z(A)) ,L(F(A)))
so now you have got all the leaf node..
just print all paths from root to leaf node , this gives you all possible combinations .
like in this case
ABFA , AZA , LFA
So once you are done with the construction of tree just print all paths from root to node
which is your requirement .
charMap = {'1':'A', '2':'B' ... }
def getNodes(str):
results = []
if len(str) == 0: return results
c = str[0]
results.append(c)
results = results.join(c.join(getNodes(str[1:])))
if str[:2] in charMap.keys(): results = results.join(c.join(getNodes(str[2:])))
return results
def mapout(nodes):
cArray = []
for x in nodes:
cx = ''
for y in x:
cx = cx + charMap.get(y)
cArray.append(cx)
return cArray
res = getNodes('12345')
print(mapout(res))
Untested, but I believe this is along the lines of what you're looking for.
The following answer recursively tries all possibilities at the current position (there are more than two!) and goes on with the remainder of the string. That's it.
from string import ascii_uppercase
def alpha_combinations(s):
if len(s) == 0:
yield ""
return
for size in range(1, len(s) + 1):
v = int(s[:size])
if v > 26:
break
if v > 0:
c = ascii_uppercase[v - 1]
for ac in alpha_combinations(s[size:]):
yield c + ac
print(list(alpha_combinations(input())))
It expects a number as a string. It gives correct output for 101010 (['AAJ', 'AJJ', 'JAJ', 'JJJ']). (I think some of the other solutions don't handle zeroes correctly.)
So, I wanted to tackle this as well, since it’s actually a cool problem. So here goes my solution:
If we ignore the translations to strings for now, we are essentially looking for partitions of a set. So for the input 123 we have a set {1, 2, 3} and are looking for partitions. But of those partitions, only those are interesting which maintain the original order of the input. So we are actually not talking about a set in the end (where order doesn’t matter).
Anyway, I called this “ordered partition”—I don’t know if there actually exists a term for it. And we can generate those ordered partitions easily using recursion:
def orderedPartitions(s):
if len(s) == 0:
yield []
return
for i in range(1, len(s)+1):
for p in orderedPartitions(s[i:]):
yield [s[:i]] + p
For a string input '123', this gives us the following partions, which is exactly what we are looking for:
['1', '2', '3']
['1', '23']
['12', '3']
['123']
Now, to get back to the original problem which is asking for translations to strings, all we need to do is check each of those partitions, if they contain only valid numbers, i.e. 1 to 26. And if that is the case, translate those numbers and return the resulting string.
import string
def alphaCombinations(s):
for partition in orderedPartitions(str(s)):
# get the numbers
p = list(map(int, partition))
# skip invalid numbers
if list(filter(lambda x: x < 1 or x > 26, p)):
continue
# yield translated string
yield ''.join(map(lambda i: string.ascii_uppercase[i - 1], p))
And it works:
>>> list(alphaCombinations(123))
['ABC', 'AW', 'LC']
>>> list(alphaCombinations(1234))
['ABCD', 'AWD', 'LCD']
>>> list(alphaCombinations(4567))
['DEFG']
I still am not sure of the description, but this Python script first partitions the num into its 'breaks' then tries each break member as a whole as an index into its corresponding character; then converts each digit of the member into letters of a word. Both contributions are shown before showing the sum total of all conversions to letters/words for the num "123"
>>> import string
>>> mapping ={str(n):ch for n,ch in zip(range(1,27), string.ascii_uppercase)}
>>> num = '123'
>>> [[num[:i], num[i:]] for i in range(len(num)+1)]
[['', '123'], ['1', '23'], ['12', '3'], ['123', '']]
>>> breaks = set(part for part in sum(([num[:i], num[i:]] for i in range(len(num)+1)), []) if part)
>>> breaks
{'123', '12', '3', '1', '23'}
>>> as_a_whole = [mapping[p] for p in breaks if p in mapping]
>>> as_a_whole
['L', 'C', 'A', 'W']
>>> by_char = [''.join(mapping[n] for n in p) for p in breaks]
>>> by_char
['ABC', 'AB', 'C', 'A', 'BC']
>>> everything = sorted(set(as_a_whole + by_char))
>>> everything
['A', 'AB', 'ABC', 'BC', 'C', 'L', 'W']
>>>
With a structure like this
hapts = [('1|2', '1|2'), ('3|4', '3|4')]
I need to zip it (sort of...) to get the following:
end = ['1|1', '2|2', '3|3', '4|4']
I started working with the following code:
zipped=[]
for i in hapts:
tete = zip(i[0][0], i[1][0])
zipped.extend(tete)
some = zip(i[0][2], i[1][2])
zipped.extend(some)
... and got it zipped like this:
zipped = [('1', '1'), ('2', '2'), ('3', '3'), ('4', '4')]
Any suggestions on how to continue? Furthermore i'm sure there should a more elegant way to do this, but is hard to pass to Google an accurate definition of the question ;)
Thx!
You are very close to solving this, I would argue the best solution here is a simple str.join() in a list comprehension:
["|".join(values) for values in zipped]
This also has the bonus of working nicely with (potentially) more values, without modification.
If you wanted tuples (which is not what your requested output shows, as brackets don't make a tuple, a comma does), then it is trivial to add that in:
[("|".join(values), ) for values in zipped]
Also note that zipped can be produced more effectively too:
>>> zipped = itertools.chain.from_iterable(zip(*[part.split("|") for part in group]) for group in hapts)
>>> ["|".join(values) for values in zipped]
['1|1', '2|2', '3|3', '4|4']
And to show what I meant before about handling more values elegantly:
>>> hapts = [('1|2|3', '1|2|3', '1|2|3'), ('3|4|5', '3|4|5', '3|4|5')]
>>> zipped = itertools.chain.from_iterable(zip(*[part.split("|") for part in group]) for group in hapts)
>>> ["|".join(values) for values in zipped]
['1|1|1', '2|2|2', '3|3|3', '3|3|3', '4|4|4', '5|5|5']
The problem in this context is to
unfold the list
reformat it
fold it
Here is how you may approach the problem
>>> reformat = lambda t: map('|'.join, izip(*(e.split("|") for e in t)))
>>> list(chain(*(reformat(t) for t in hapts)))
['1|1', '2|2', '3|3', '4|4']
You don't need the working code in this context
Instead if you need to work on your output, just rescan it and join it with "|"
>>> ['{}|{}'.format(*t) for t in zipped]
['1|1', '2|2', '3|3', '4|4']
Note
Parenthesis are redundant in your output
Your code basically works, but here's a more elegant way to do it.
First define a transposition function that takes an entry of hapts and flips it:
>>> transpose = lambda tup: zip(*(y.split("|") for y in tup))
Then map that function over hapts:
>>> map(transpose, hapts)
... [[('1', '1'), ('2', '2')], [('3', '3'), ('4', '4')]]
and then if you want to flatten this into one list
>>> y = list(chain.from_iterable(map(transpose, hapts)))
... [('1', '1'), ('2', '2'), ('3', '3'), ('4', '4')]
Finally, to join it back up into strings again:
>>> map("|".join, y)
... ['1|1', '2|2', '3|3', '4|4']
end = []
for groups in hapts:
end.extend('|'.join(regrouped) for regrouped in zip([group.split('|') for group in groups]))
This should also continue to work with n-length groups of n-length pipe-delimited characters, and n-length groups of groups, though it will truncate the regrouped values to the shortest group of characters in each group of character groups.
I've looked at several other SO questions (and google'd tons) that are 'similar'-ish to this, but none of them seem to fit my question right.
I am trying to make a non fixed length, unique text string, only containing characters in a string I specify. E.g. made up of capital and lower case a-zA-Z characters. (for this example I use only a, b, and c lower case)
Something like this (broken code below)
def next(index, validCharacters = 'abc'):
return uniqueShortAsPossibleString
The index argument would be an index (integer) that relate to a text string, for instance:
next(1) == 'a'
next(2) == 'b'
next(3) == 'c'
next(4) == 'aa'
next(5) == 'ab'
next(6) == 'ac'
next(7) == 'ba'
next(8) == 'bb'
next(9) == 'bc'
next(10) == 'ca'
next(11) == 'cb'
next(12) == 'cc'
And so forth. The string:
Must be unique, I'll be using it as an identifier, and it can only be a-zA-Z chars
As short as possible, with lower index numbers being shortest (see above examples)
Contain only the characters specified in the given argument string validCharacters
In conclusion, how could I write the next() function to relate an integer index value to an unique short string with the characters specified?
P.S. I'm new to SO, this site has helped me tons throughout the years, and while I've never made an account or asked a question (till now), I really hope I've done an okay job explaining what I'm trying to accomplish with this.
What you are trying to do is write the parameter of the next function in another base.
Let's suppose validCharacters contains k characters: then the job of the next function will be to transform parameter p into base k by using the characters in validCharacters.
In your example, you can write the numbers in base 3 and then associate each digit with one letter:
next(1) -> 1 -> 'a'
next(2) -> 2 -> 'b'
next(4) -> 11 -> 'aa'
next(7) -> 21 -> 'ba'
And so forth.
With this method, you can call next(x) without knowing or computing any next(x-i), which you can't do with iterative methods.
You're trying to convert a number to a number in another base, but using arbitrary characters for the digits of that base.
import string
chars = string.lowercase + string.uppercase
def identifier(x, chars):
output = []
base = len(chars)
while x:
output.append(chars[x % base])
x /= base
return ''.join(reversed(output))
print identifier(1, chars)
This lets you jump to any position, you're counting so the identifiers are totally unique, and it is easy to use any character set of any length (of two or more), and lower numbers give shorter identifiers.
itertools can always give you obfuscated one-liner iterators:
from itertools import combinations_with_replacement, chain
chars = 'abc'
a = chain(*(combinations_with_replacement(chars, i) for i in range(1, len(chars) + 1)))
Basically, this code creates an iterator that combines all combinations of chars of lengths 1, 2, ..., len(chars).
The output of for x in a: print x is:
('a',)
('b',)
('c',)
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
You can't really "associate" the index with annoying, but the following is a generator that will yield and provide the output you're asking for:
from itertools import combinations_with_replacement
def uniquenames(chars):
for i in range(1, len(chars)):
for j in combinations_with_replacement(chars, i):
yield ''.join(j)
print list(uniquenames('abc'))
# ['a', 'b', 'c', 'aa', 'ab', 'ac', 'bb', 'bc', 'cc']
As far as I understood we shouldn't specify maximum length of output string. So range is not enough:
>>> from itertools import combinations_with_replacement, count
>>> def u(chars):
... for i in count(1):
... for k in combinations_with_replacement(chars, i):
... yield "".join(k)
...
>>> g = u("abc")
>>> next(g)
'a'
>>> next(g)
'b'
>>> next(g)
'c'
>>> next(g)
'aa'
>>> next(g)
'ab'
>>> next(g)
'ac'
>>> next(g)
'bb'
>>> next(g)
'bc'
So it seems like you are trying to enumerate through all the strings generated by the language {'a','b','c'}. This can be done using finite state automata (though you don't want to do that). One simple way to enumerate through the language is to start with a list and append all the strings of length 1 in order (so a then b then c). Then append each letter in the alphabet to each string of length n-1. This will keep it in order as long as you append all the letters in the alphabet to a given string before moving on to the lexicographically next string.