Related
I'm trying to generate an array of strings (or any other data structure that might be more useful for my task, but I can't think of anything else) in Python.
The program I'm working on has several sets of radio buttons. For example a set of "Block"/"Alternate" and "Single"/"Duplicate".
Examples on how the array of strings should look when they are activated:
Block, Single:
list = ['A', 'B', 'C', '1', '2', '3']
Alternating, Single:
list = ['A', '1', 'B', '2', 'C', '3']
Alternating, Duplicate:
list = ['A', 'A', '1', '1', 'B', 'B', '2', '2', 'C', 'C', '3', '3']
Those are only several examples, the program has way more, but the concept is the same.
I need to read this array of strings and use it as a schema of sorts to further select some data from my Pandas Dataframe.
How would I go about generating this array without writing an if clause for every single possible combination?
I tried to solve this using Jupyter Notebook and some widgets for the user interaction. I don' know how you get this informations but to replicate the same behaviour I am using this.
In [1]:
import ipywidgets as widgets # Widgets for user interactions
## Change the list as you want
my_list = ['A', 'B', 'C', '1', '2', '3']
single_duplicate = widgets.ToggleButtons(
description='Do you want to Duplicate the list ?',
options=['Single', 'Duplicate'],
value='Single',
style={'description_width': 'initial'}
)
single_duplicate
Out [1]:
In [2]:
block_alter = widgets.ToggleButtons(
description='Do you want to Alternate the list?',
options=['Block', 'Alternate'],
value='Block',
style={'description_width': 'initial'}
)
block_alter
Out [2]:
Then define the function to manipulate the list with the user input
In [3]:
def get_array(single_duplicate, block_alter, my_list):
temporary_list = []
## Answer to Single or Duplicate
if single_duplicate == 'Duplicate':
for elem in my_list:
temporary_list.append(elem)
temporary_list.append(elem)
else:
temporary_list = my_list
## Answer to Block or Aternate ?
new_list = []
if block_alter == 'Alternate':
half = int(len(temporary_list)/2)
for i in range(half):
new_list.append(temporary_list[i])
new_list.append(temporary_list[i + half])
else:
new_list = temporary_list
return new_list
Use get_array(single_duplicate.value, block_alter.value, my_list) to see the output with the user's choses
Here some exemples
In [3]:
print(get_array('Single', 'Block', my_list))
print(get_array('Duplicate', 'Block', my_list))
print(get_array('Single', 'Alternate', my_list))
print(get_array('Duplicate', 'Alternate', my_list))
Out [3]:
['A', 'B', 'C', '1', '2', '3']
['A', 'A', 'B', 'B', 'C', 'C', '1', '1', '2', '2', '3', '3']
['A', '1', 'B', '2', 'C', '3']
['A', '1', 'A', '1', 'B', '2', 'B', '2', 'C', '3', 'C', '3']
What is the best solution to split this str var into a continuous number list
My solution :
>>> str
> '2223334441214844'
>>> filter(None, re.split("(0+)|(1+)|(2+)|(3+)|(4+)|(5+)|(6+)|(7+)|(8+)|(9+)", str))
> ['222', '333', '444', '1', '2', '1', '4', '8', '44']
The more flexible way would be to use itertools.groupby which is made to match consecutive groups in iterables:
>>> s = '2223334441214844'
>>> import itertools
>>> [''.join(group) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
The key would be the single key that is being grouped on (in your case, the digit). And the group is an iterable of all the items in the group. Since the source iterable is a string, each item is a character, so in order to get back the fully combined group, we need to join the characters back together.
You could also repeat the key for the length of the group to get this output:
>>> [key * len(list(group)) for key, group in itertools.groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
If you wanted to use regular expressions, you could make use of backreferences to find consecutive characters without having to specify them explicitly:
>>> re.findall('((.)\\2*)', s)
[('222', '2'), ('333', '3'), ('444', '4'), ('1', '1'), ('2', '2'), ('1', '1'), ('4', '4'), ('8', '8'), ('44', '4')]
For finding consecutive characters in a string, this is essentially the same that groupby will do. You can then filter out the combined match to get the desired result:
>>> [x for x, *_ in re.findall('((.)\\2*)', s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
One solution without regex (that is not specific to digits) would be to use itertools.groupby():
>>> from itertools import groupby
>>> s = '2223334441214844'
>>> [''.join(g) for _, g in groupby(s)]
['222', '333', '444', '1', '2', '1', '4', '8', '44']
If you only need to extract consecutive identical digits, you may use a matching approach using r'(\d)\1*' regex:
import re
s='2223334441214844'
print([x.group() for x in re.finditer(r'(\d)\1*', s)])
# => ['222', '333', '444', '1', '2', '1', '4', '8', '44']
See the Python demo
Here,
(\d) - matches and captures into Group 1 any digit
\1* - a backreference to Group 1 matching the same value, 0+ repetitions.
This solution can be customized to match any specific consecutive chars (instead of \d, you may use \S - non-whitespace, \w - word, [a-fA-F] - a specific set, etc.). If you replace \d with . and use re.DOTALL modifier, it will work as the itertools solutions posted above.
Use a capture group and backreference.
str = '2223334441214844'
import re
print([i[0] for i in re.findall(r'((\d)\2*)', str)])
\2 matches whatever the (\d) capture group matched. The list comprehension is needed because when the RE contains capture groups, findall returns a list of the capture groups, not the whole match. So we need an extra group to get the whole match, and then need to extract that group from the result.
What about without importing any external module ?
You can create your own logic in pure python without importing any module Here is recursive approach,
string_1='2223334441214844'
list_2=[i for i in string_1]
def con(list_1):
group = []
if not list_1:
return 0
else:
track=list_1[0]
for j,i in enumerate(list_1):
if i==track[0]:
group.append(i)
else:
print(group)
return con(list_1[j:])
return group
print(con(list_2))
output:
['2', '2', '2']
['3', '3', '3']
['4', '4', '4']
['1']
['2']
['1']
['4']
['8']
['4', '4']
Whats the best way to filter out some subsets from a generator. For example I have a string "1023" and want to produce all possible combinations of each of the digits. All combinations would be:
['1', '0', '2', '3']
['1', '0', '23']
['1', '02', '3']
['1', '023']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']
I am not interested in a subset that contains a leading 0 on any of the items, so the valid ones are:
['1', '0', '2', '3']
['1', '0', '23']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']
I have two questions.
1) If using a generator, whats the best way to filter out the ones with leading zeroes. Currently, I generate all combinations then loop through it afterwards and only continuing if the subset is valid. For simplicity I am only printing the subset in the sample code. Assuming the generator that was created is very long or if it constains a lot of invalid subsets, its almost a waste to loop through the entire generator. Is there a way to stop the generator when it sees an invalid item (one with leading zero) then filter it off 'allCombinations'
2) If the above doesn't exist, whats a better way to generate these combinations (disregarding combinations with leading zeroes).
Code using a generator:
import itertools
def isValid(subset): ## DIGITS WITH LEADING 0 IS NOT VALID
valid = True
for num in subset:
if num[0] == '0' and len(num) > 1:
valid = False
break
return valid
def get_combinations(source, comb):
res = ""
for x, action in zip(source, comb + (0,)):
res += x
if action == 0:
yield res
res = ""
digits = "1023"
allCombinations = [list(get_combinations(digits, c)) for c in itertools.product((0, 1), repeat=len(digits) - 1)]
for subset in allCombinations: ## LOOPS THROUGH THE ENTIRE GENERATOR
if isValid(subset):
print(subset)
Filtering for an easy and obvious condition like "no leading zeros", it can be more efficiently done at the combination building level.
def generate_pieces(input_string, predicate):
if input_string:
if predicate(input_string):
yield [input_string]
for item_size in range(1, len(input_string)+1):
item = input_string[:item_size]
if not predicate(item):
continue
rest = input_string[item_size:]
for rest_piece in generate_pieces(rest, predicate):
yield [item] + rest_piece
Generating every combination of cuts, so long it's not even funny:
>>> list(generate_pieces('10002', lambda x: True))
[['10002'], ['1', '0002'], ['1', '0', '002'], ['1', '0', '0', '02'], ['1', '0', '0', '0', '2'], ['1', '0', '00', '2'], ['1', '00', '02'], ['1', '00', '0', '2'], ['1', '000', '2'], ['10', '002'], ['10', '0', '02'], ['10', '0', '0', '2'], ['10', '00', '2'], ['100', '02'], ['100', '0', '2'], ['1000', '2']]
Only those where no fragment has leading zeros:
>>> list(generate_pieces('10002', lambda x: not x.startswith('0')))
[['10002'], ['1000', '2']]
Substrings that start with a zero were never considered for the recursive step.
One common solution is to try filtering just before using yield. I have given you an example of filtering just before yield:
import itertools
def my_gen(my_string):
# Create combinations
for length in range(len(my_string)):
for my_tuple in itertools.combinations(my_string, length+1):
# This is the string you would like to output
output_string = "".join(my_tuple)
# filter here:
if output_string[0] != '0':
yield output_string
my_string = '1023'
print(list(my_gen(my_string)))
EDIT: Added in a generator comprehension alternative
import itertools
my_string = '1023'
my_gen = ("".join(my_tuple)[0] for length in range(len(my_string))
for my_tuple in itertools.combinations(my_string, length+1)
if "".join(my_tuple)[0] != '0')
I have this string:
string = '9x3420aAbD8'
How can I turn string into:
'023489ADabx'
What function would I use?
You can just use the built-in function sorted to sort the string lexicographically. It takes in an iterable, sorts each element, then returns a sorted list. Per the documentation:
sorted(iterable[, key][, reverse])
Return a new sorted list from the items in iterable.
You can apply that like so:
>>> ''.join(sorted(string))
'023489ADabx'
Since sorted returns a list, like so:
>>> sorted(string)
['0', '2', '3', '4', '8', '9', 'A', 'D', 'a', 'b', 'x']
Just join them together to create the desired string.
You can use the sorted() function to sort the string, but this will return a list
sorted(string)
['0', '2', '3', '4', '8', '9', 'A', 'D', 'a', 'b', 'x']
to turn this back into a string you just have to join it, which is commonly done using ''.join()
So, all together:
sorted_string = ''.join(sorted(string))
I'm trying to extract numbers and both previous and following characters (excluding digits and whitespaces) of a string. The expected return of the function is a list of tuples, with each tuple having the shape:
(previous_sequence, number, next_sequence)
For example:
string = '200gr T34S'
my_func(string)
>>[('', '200', 'gr'), ('T', '34', 'S')]
My first iteration was to use:
def my_func(string):
res_obj = re.findall(r'([^\d\s]+)?(\d+)([^\d\s]+)?', string)
But this function doesn't do what I expect when I pass a string like '2AB3' I would like to output [('','2','AB'), ('AB','3','')] and instead, it is showing [('','2','AB'), ('','3','')], because 'AB' is part of the previous output.
How could I fix this?
Since there is no overlapping numbers, a single trailing
assertion should be all you need.
Something like ([^\d\s]+)?(\d+)(?=([^\d\s]+)?)
This ([^\d\s]*)(\d+)(?=([^\d\s]*)) if you care about
the difference between NULL and the empty string.
Instead of modifier + and ? you can simply use * :
>>> re.findall(r'([^\d\s]*)(\d+)([^\d\s]*)',string)
[('', '200', 'gr'), ('T', '34', 'S')]
But if you mean to match the overlapped strings you can use a positive look ahead to fine all the overlapped matches :
>>> re.findall(r'(?=([^\d\s]*)(\d+)([^\d\s]*))','2AB3')
[('', '2', 'AB'), ('AB', '3', ''), ('B', '3', ''), ('', '3', '')]
Another way can be using regex and functions!
import re
#'200gr T34S' '2AB3'
def s(x):
tmp=[]
d = re.split(r'\s+|(\d+)',x)
d = ['' if v is None else v for v in d] #remove None
t_ = [i for i in d if len(i)>0]
digits = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
nms = [i for i in t_ if i[0] in digits]
for i in nms:
if d.index(i)==0:
tmp.append(('',i,d[d.index(i)+1]))
elif d.index(i)==len(d):
tmp.append((d[d.index(i)-1],i,''))
else:
tmp.append((d[d.index(i)-1],i,d[d.index(i)+1]))
return tmp
print s('2AB3')
Prints-
[('', '2', 'AB'), ('AB', '3', '')]