Random data generator mathing a regex in python - python

In python, I am looking for python code which I can use to create random data matching any regex. For example, if the regex is
\d{1,100}
I want to have a list of random numbers with a random length between 1 and 100 (equally distributed)
There are some 'regex inverters' available (see here) which compute ALL possible matches, which is not what I want, and which is extremely impracticable. The example above, for example, has more then 10^100 possible matches, which never can be stored in a list. I just need a function to return a match by random.
Maybe there is a package already available which can be used to accomplish this? I need a function that creates a matching string for ANY regex, not just the given one or some other, but maybe 100 different regex. I just cannot code them myself, I want the function extract the pattern to return me a matching string.

If the expressions you match do not have any "advanced" features, like look-ahead or look-behind, then you can parse it yourself and build a proper generator
Treat each part of the regex as a function returning something (e.g., between 1 and 100 digits) and glue them together at the top:
import random
from string import digits, uppercase, letters
def joiner(*items):
# actually should return lambda as the other functions
return ''.join(item() for item in items)
def roll(item, n1, n2=None):
n2 = n2 or n1
return lambda: ''.join(item() for _ in xrange(random.randint(n1, n2)))
def rand(collection):
return lambda: random.choice(collection)
# this is a generator for /\d{1,10}:[A-Z]{5}/
print joiner(roll(rand(digits), 1, 10),
rand(':'),
roll(rand(uppercase), 5))
# [A-C]{2}\d{2,20}#\w{10,1000}
print joiner(roll(rand('ABC'), 2),
roll(rand(digits), 2, 20),
rand('#'),
roll(rand(letters), 10, 1000))
Parsing the regex would be another question. So this solution is not universal, but maybe it's sufficient

Two Python libraries can do this: sre-yield and Hypothesis.
sre-yield
sre-yeld will generate all values matching a given regular expression. It uses SRE, Python's default regular expression engine.
For example,
import sre_yield
list(sre_yield.AllStrings('[a-z]oo$'))
['aoo', 'boo', 'coo', 'doo', 'eoo', 'foo', 'goo', 'hoo', 'ioo', 'joo', 'koo', 'loo', 'moo', 'noo', 'ooo', 'poo', 'qoo', 'roo', 'soo', 'too', 'uoo', 'voo', 'woo', 'xoo', 'yoo', 'zoo']
For decimal numbers,
list(sre_yield.AllStrings('\d{1,2}'))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99']
Hypothesis
The unit test library Hypothesis will generate random matching examples. It is also built using SRE.
import hypothesis
g=hypothesis.strategies.from_regex(r'^[A-Z][a-z]$')
g.example()
with output such as:
'Gssov', 'Lmsud', 'Ixnoy'
For decimal numbers
d=hypothesis.strategies.from_regex(r'^[0-9]{1,2}$')
will output one or two digit decimal numbers: 65, 7, 67 although not evenly distributed. Using \d yielded unprintable strings.
Note: use begin and end anchors to prevent extraneous characters.

From this answer
You could try using python to call this perl module:
https://metacpan.org/module/String::Random

Related

filter python list based on values in a different list

I think this is an entry level computer science 101 course question about algorithms and data structures.
I have a list:
VAV_ID_list = ['36','38','21','29','31','25','9','13','14','19','30','8','26','6','34','11','12028','20','27','15','12032','23','16','24','37','39','12033','10']
How I can I filter out these values in VAV_ID_exclude_list from VAV_ID_list?
VAV_ID_exclude_list = ['36','38','21','29','31','25','9','13','14','19','30','8','26','6']
This code below obviously doesnt do anything any tips greatly appreciated.
filtered_VAV_ID_list = [zone for zone in VAV_ID_list if zone == 36]
print(filtered_VAV_ID_list)
This is what you want
list2= [zone for zone in VAV_ID_list if zone not in VAV_ID_exclude_list]
You can do it in multiple ways:
This is the most straightforward way.
>>> [i for i in VAV_ID_list if i not in VAV_ID_exclude_list]
['34', '11', '12028', '20', '27', '15', '12032', '23', '16', '24', '37', '39', '12033', '10']
You can even use sets if the order is not important and you don't have duplicates.
>>> list(set(VAV_ID_list) - set(VAV_ID_exclude_list))
['24', '11', '39', '27', '20', '23', '12033', '12032', '16', '37', '34', '15', '12028', '10']
for el in VAV_ID_list:
if el not in VAV_ID_exclude_list:
print(el)
I think this will do it.

How to automate single line code, with changes in input?

I have created a variable named 'j' which has some values and I want my code to pick one value at a time and execute.
I tried writing code but it does not work. I'm sharing my code please see when it can be improved.
j = ['0', '1', '3', '4', '6', '7', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67']
for i in j:
labels('i') = mne.read_labels_from_annot('sub-CC721377_T1w', parc='aparc', subjects_dir=subjects_dir)['i']
done
According to the MNE documentation, the function read_labels_from_annot returns a list of labels.
Thus, instead of indexing the result with ...)[0] at the end, you should just capture the entire list:
labels = mne.read_labels_from_annot(...)
This would capture a list of labels, rather than a single label, which would have the effect of 'indexing at the end "[0]" from 0 - 67'.
You asked about adding all the results together into a label_all variable. You didn't specify (and I don't know anything about the MNE package), so it's not clear: do the labels ever repeat? Is is possible that "lab123" will occur in every input file? If so, should the label_all store multiple copies of the same value, or just the unique label names?
I think something like this is what you're after:
import mne
def get_labels_for_subject(sub, *, hemi='both', parc='aparc', **kwargs):
"""Get MNE labels for a given subject. **kwargs allows passing named
parameters like subjects_dir, regexp, and others that default to None."""
labels = mne.read_labels_from_annot(sub, hemi=hemi, parc=parc, **kwargs)
return labels
# List of all the subjects
subjects = [
'sub-CC721377_T1w',
'sub-next???',
]
label_all = []
for s in subjects:
label_all.extend(get_labels_for_subject(s, subjects_dir='.'))
print("Got labels:", label_all)

Python set remove Non Numeric values

I have set value like below:
set(['Virtual', '120', 'P', '130', '90', '250', '100', '10', 'Mar', 'indicates', '18', '50', '40', '1', '|'])
How do i remove all Non Numeric value?
Output expected:
set(['120', '130', '90', '250', '100', '10','18', '50', '40', '1'])
You can create a new set:
number_set = set()
for object in old_set:
try:
number_set.add(int(object))
except ValueError:
print("Not a number")
print(number_set)
You can also try removing all non-numeric objects from the set:
for object in old_set:
try:
x = int(object)
execpt ValueError:
old_set.remove(object)
You can use a filter to clean your set:
s = set(['Virtual', '120', 'P', '130', '90', '250', '100', '10', 'Mar', 'indicates', '18', '50', '40', '1', '|'])
def isInt(text):
"""Returns True for a text that is convertable to int() else False."""
try:
_ = int(text)
return True
except ValueError:
return False
# apply filter:
filteredSet = set( filter(lambda x:isInt(x), s))
print(filteredSet)
Output:
{'18', '90', '130', '120', '40', '50', '10', '1', '100', '250'}
This output differs from your want's - but thats how python prints a set with print.

if i use re.findall How to register in order not to separate the point

I need to extract the numbers from this string :
str="((8,52),(30,52),2,0.5)"
if i used : re.findall('\d+',str)
i well get :
['20', '48', '48', '48', '2', '0', '5']
There is a problem with 0.5
How do I get 0.5 together to get :
['20', '48', '48', '48', '2', '0.5']
re.findall("\d+\.\d+|\d+",str)
The first grouping in the regular expression will find numbers on either side of a decimal, and the second grouping will find whole numbers.
Use a numeric parse standard (?:\d+(?:\.\d*)?|\.\d+)
Covers all cases:
5
5.
5.1
.1
re.findall(r'\d+\.?\d*',str)
output:
['8', '52', '30', '52', '2', '0.5']

python set manipulation :the simple script does not output right set after unioning or disjoining?

all_tags = ['24', '02', '26', '03', '33', '32', '31', '30', '29', '68', '11']
ref_tag = str('24')
union_tags = set(all_tags) | set(ref_tag)
left_tags = set(all_tags) - set(ref_tag)
print(union_tags)
print(left_tags)
The above is the simple code which I expect elements in union_tags should be the same as those in all_tags. However, the result is
set
(['24', '02', '26', '03', '33', '32', '31', '30', '29', '68', '2', '4', '11'])
The union_tags instead contains two extra elements '2' and '4', which I think it is the result splitting the str '24'.
Again, left_tags should exclude element '24'. However, the result still have the '24'.
Please let me know why. I use the python 2.7 as the interpreter.
Set function accept an iterable with hashable items and convert it to a set object, and since strings are iterables when you pass the string 24 to your set function it converts your string to following set:
{'2', '4'}
And at last the unioin of this set with all_tags would contain items 2 and 4.
If you want to put the 24 in a set as one item you can use {} in order to create your expected set:
>>> ref_tag = {'24'}
set(['24'])

Categories

Resources