re to identify range from string

re to identify range from string - python

How to write and Regular expression to get list from string like if we have string:
value = '88-94'
value = '88 to 94'
value = '88'
value = '88-94, 96-108'
outcome should be:
[88, 89, 90, 91, 92, 93, 94]
[88, 89, 90, 91, 92, 93, 94]
[88]
[88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108]
Programming language is python2.7
Here is a working Solution for python2.7 and regex but do have to check for last case having single value as separate case:
>>> import regex
>>> m = regex.match(r"(?:(?P<digits>\d+).(?P<digits>\d+))", "88-94")
>>> a = m.captures("digits")
>>> a
['88', '94']
>>> m = regex.match(r"(?:(?P<digits>\d+).(?P<digits>\d+))", "88 94")
>>> a = m.captures("digits")
>>> a
['88', '94']
>>> range(int(a[0]), int(a[1])+1)
[88, 89, 90, 91, 92, 93, 94]
>>>
Here is a solution which address above cases but what about 88-94, 96-98 etc
>>> import re
>>> a = map(int, re.findall(r'\d+', '88-94'))
>>> range(a[0], a[-1]+1)
[88, 89, 90, 91, 92, 93, 94]
>>> a = map(int, re.findall(r'\d+', '88 94'))
>>> range(a[0], a[-1]+1)
[88, 89, 90, 91, 92, 93, 94]
>>> a = map(int, re.findall(r'\d+', '88'))
>>> range(a[0], a[-1]+1)
[88]
>>>
Solution that cover almost all Cases:
>>> import re
>>> a = map(int, re.findall(r'\d+', '88-94, 96-108'))
>>> c = zip(a[::2], a[1::2])
>>> [m for k in [range(i,j+1) for i, j in c] for m in k]
[88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108]
>>> a = map(int, re.findall(r'\d+', '88-94, 96-108, 125 129'))
>>> c = zip(a[::2], a[1::2])
>>> [m for k in [range(i,j+1) for i, j in c] for m in k]
[88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 125, 126, 127, 128, 129]
>>> a = map(int, re.findall(r'\d+', '88-94, 96-108, 125 129, 132 to 136'))
>>> c = zip(a[::2], a[1::2])
>>> [m for k in [range(i,j+1) for i, j in c] for m in k]
[88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 125, 126, 127, 128, 129, 132, 133, 134, 135, 136]
>>>
Can anyone suggest reason to downvote or vote for close?
Any Help will be appreciated and can anyone suggest how to update question I am not asking for alternate solutions as I know how to split and loop even re to strip digits and loop, my question is how to do it with re in single statement if possible? Answer could be no but not question as Off-topic.

import re
def get_numbers(value):
value = re.sub(r'^(\d+)$', r'\1-\1', value) # '88' -> '88-88'
start, stop = map(int, re.findall(r'\d+', value))
return range(start, stop+1)
print get_numbers('88-94')
print get_numbers('88 to 94')
print get_numbers('88')
output:
[88, 89, 90, 91, 92, 93, 94]
[88, 89, 90, 91, 92, 93, 94]
[88]

range(*map(int,mystring.split("-")))
No need for regex

Related

Sorting Algorithm output at end of pass 3

Given the following initially unsorted list:
[77, 101, 40, 43, 81, 129, 85, 144]
Which sorting algorithm produces the following list at the end of Pass Number 3? Is it Bubble, Insertion or Selection?
[40, 43, 77, 81, 85, 101, 129, 144]
Can someone give me a clue on how I can solve this please.

Insertion sort would change the relative order of at most 3 items in 3 passes resulting in the first 3 items being in order and the rest unchanged. Selection sort would affect only the positions of the first 3 items and the 3 smallest (or greatest) items. Only the bubble sort would swap other items around. The movements of 40 and 129 is a telltale sign that points to a Bubble sort.
Note that this may be a trick question because all numbers that need to be shifted are at most 2 positions off except 2 of them (101 & 129 which are the 2nd and 3rd largest and would end up in their right places after 2 passes). A properly implemented Bubble sort would not get to a 3rd pass. So the answer could be "none of them"

Insertion sort:
def insertion_sort(array):
for i in range(1, len(array)):
key_item = array[i]
j = i - 1
while j >= 0 and array[j] > key_item:
array[j + 1] = array[j]
j -= 1
array[j + 1] = key_item
print("Step",i,":",array)
return array
data=[77, 101, 40, 43, 81, 129, 85, 144]
insertion_sort(data)
Output:
Step 1 : [77, 101, 40, 43, 81, 129, 85, 144]
Step 2 : [40, 77, 101, 43, 81, 129, 85, 144]
Step 3 : [40, 43, 77, 101, 81, 129, 85, 144]
Step 4 : [40, 43, 77, 81, 101, 129, 85, 144]
Step 5 : [40, 43, 77, 81, 101, 129, 85, 144]
Step 6 : [40, 43, 77, 81, 85, 101, 129, 144]
Step 7 : [40, 43, 77, 81, 85, 101, 129, 144]
Bubble sort:
def bubble_sort(array):
n = len(array)
for i in range(n):
already_sorted = True
for j in range(n - i - 1):
if array[j] > array[j + 1]:
array[j], array[j + 1] = array[j + 1], array[j]
already_sorted = False
if already_sorted:
break
print("Step:",n-j-1)
print(array)
return array
data = [77, 101, 40, 43, 81, 129, 85, 144]
bubble_sort(data)
Output:
Step: 1
[77, 40, 43, 81, 101, 85, 129, 144]
Step: 2
[40, 43, 77, 81, 85, 101, 129, 144]
Selection Sort:
def selectionSort(array, size):
for step in range(size):
min_idx = step
for i in range(step + 1, size):
if array[i] < array[min_idx]:
min_idx = i
(array[step], array[min_idx]) = (array[min_idx], array[step])
print("step",step+1,":",end="")
print(array)
data = [77, 101, 40, 43, 81, 129, 85, 144]
size = len(data)
selectionSort(data, size)
Output:
step 1 :[40, 101, 77, 43, 81, 129, 85, 144]
step 2 :[40, 43, 77, 101, 81, 129, 85, 144]
step 3 :[40, 43, 77, 101, 81, 129, 85, 144]
step 4 :[40, 43, 77, 81, 101, 129, 85, 144]
step 5 :[40, 43, 77, 81, 85, 129, 101, 144]
step 6 :[40, 43, 77, 81, 85, 101, 129, 144]
step 7 :[40, 43, 77, 81, 85, 101, 129, 144]
step 8 :[40, 43, 77, 81, 85, 101, 129, 144]
You can also get more guidelines from the link below how to run algorithms:
https://realpython.com/sorting-algorithms-python/

Bin sequential values in list

I have the list:
new_maks = [75, 76, 77, 78, 79, 80, 81, 85, 86, 87, 88, 89, 91]
I want to bin the elements into areas where the next element are sequentially increased by 1. My initial idea is basically to initialize two lists bin_start and bin_end and iterate through new_maks to check for sequential values.
bin_start = []
bin_end = []
counter = 0
for i in range(len(new_maks)):
if new_maks[i] == new_maks[0]:
bin_start.append(new_maks[i])
elif (new_maks[i] - new_maks[i-1]) ==1:
try:
bin_end[counter] = new_maks[i]
except:
bin_end.append(new_maks[i])
elif (new_maks[i] - new_maks[i-1]) >1:
if new_maks[i] != new_maks[-1]:
bin_start.append(new_maks[i])
counter +=1
Which produces the desired result of:
bin_start= [75, 85]
bin_end = [81, 89]
Is there a simpler/vectorized way to achieve this result?

Here's for performance efficiency with NumPy tools -
def start_stop_seq1(a):
m = np.r_[False,np.diff(a)==1,False]
return a[m[:-1]!=m[1:]].reshape(-1,2).T
Sample run -
In [34]: a # input array
Out[34]:
array([ 75, 76, 77, 78, 79, 80, 81, 85, 86, 87, 88, 89, 91,
92, 93, 100, 101, 110])
In [35]: start_stop_seq1(a)
Out[35]:
array([[ 75, 85, 91, 100],
[ 81, 89, 93, 101]])
Alternative #1 : One liner with one more np.diff
We can go one step further to achieve compactness -
In [43]: a[np.diff(np.r_[False,np.diff(a)==1,False])].reshape(-1,2).T
Out[43]:
array([[ 75, 85, 91, 100],
[ 81, 89, 93, 101]])

A simpler way could be to use groupby and count:
from itertools import groupby, count
counter = count(1)
new_mask = [75, 76, 77, 78, 79, 80, 81, 85, 86, 87, 88, 89]
generator = ((first, last) for key, (first, *_, last) in groupby(new_mask, key=lambda val: val - next(counter)))
bin_start, bin_end = zip(*generator)
print(bin_start)
print(bin_end)
Output
(75, 85)
(81, 89)
This is based in an old itertools recipe. If you fancy pandas you could do something like this:
import pandas as pd
new_mask = [75, 76, 77, 78, 79, 80, 81, 85, 86, 87, 88, 89]
s = pd.Series(data=new_mask)
result = s.groupby(s.values - s.index).agg(['first', 'last'])
bin_start, bin_end = zip(*result.itertuples(index=False))
print(bin_start)
print(bin_end)
Again this is based on the principle that consecutive increasing (by 1) values will have the same difference against an running sequence. As mentioned in the linked documentation:
The key to the solution is differencing with a range so that
consecutive numbers all appear in same group.

Python Nested list function to return score for students exam's.

I am trying to write a function named studentGrades(gradeList) that takes a nested list and returns a list with the average score for each student.
An example would be:
grades= [['Student','Quiz 1','Quiz 2','Quiz 3','Final'],
['John', 100, 90, 80, 90],
['McVay', 88, 99, 11, 15],
['Rita', 45, 56, 67, 89],
['Ketan', 59, 61, 67, 32],
['Saranya', 73, 79, 83, 45],
['Min', 89, 97, 101, 100]]
studentGrades(grades)
# Sample output below
>>> [90, 53, 64, 54, 70, 96]
I don't know how to do this using a nested loop. Any help or guidance is appreciated.

Incase you need a one liner
[int(sum(i[1:])/len(i[1:])) for i in grades[1:]]
Output:
[90, 53, 64, 54, 70, 96]

As I suggested in the comments, you can also use a dictionary for that. Dictionaries are basically made for situations like these and if you want to use your data any further, they might prove beneficial.
You would first have to convert your current structure, which I assumed is fixed in the sense that you have a "header" in grades, and then lists of the form [name,points..]. You can do:
grades= [['Student','Quiz 1','Quiz 2','Quiz 3','Final'],
['John', 100, 90, 80, 90],
['McVay', 88, 99, 11, 15],
['Rita', 45, 56, 67, 89],
['Ketan', 59, 61, 67, 32],
['Saranya', 73, 79, 83, 45],
['Min', 89, 97, 101, 100]]
gradedict = {}
for rows in grades[1:]:
gradedict[rows[0]] = rows[1:]
To initialize the dictionary. Then the following function:
def studentGrades(gradedict):
avgdict = {}
for key,lst in gradedict.items():
avgdict[key] = sum(lst)/len(lst)
return avgdict
Returns another dictionary with the corresponding averages. You could loop through the names, i.e. the keys, of either of those to print that. For example:
gradedict = studentGrades(gradedict)
for x in gradedict:
print("student ",x, " achieved an average of: ",gradedict[x])
which is just to show you how to access the elements. you can of course also loop through keys,items as I did in the function.
Hope this helps.

You can do something like this:
def studentGrades(a):
# return [sum(k[1:])/(len(k)-1) for k in a[1:]]
# Or
l = len(a[0]) - 1 # how many quizzes, thanks to #Mad Physicist
return [sum(k[1:])/l for k in a[1:]]
grades= [['Student','Quiz 1','Quiz 2','Quiz 3','Final'],
['John', 100, 90, 80, 90],
['McVay', 88, 99, 11, 15],
['Rita', 45, 56, 67, 89],
['Ketan', 59, 61, 67, 32],
['Saranya', 73, 79, 83, 45],
['Min', 89, 97, 101, 100]]
final = studentGrades(grades)
print(final)
Output:
[90, 53, 64, 54, 70, 96]

Why not just use pandas?
df = pd.DataFrame(grades).set_index(0).iloc[1:].mean(axis=1).astype(int)
Output:
John 90
McVay 53
Rita 64
Ketan 54
Saranya 70
Min 96
or
list(df.values)
Output:
[90, 53, 64, 54, 70, 96]

grades= [['Student','Quiz 1','Quiz 2','Quiz 3','Final'],
['John', 100, 90, 80, 90],
['McVay', 88, 99, 11, 15],
['Rita', 45, 56, 67, 89],
['Ketan', 59, 61, 67, 32],
['Saranya', 73, 79, 83, 45],
['Min', 89, 97, 101, 100]]
def average(grades):
for m in grades[1:]:
t = m[1:]
l = len(t)
s = sum(t)
yield s//l
list(average(grades))

Produce pandas Series of numpy.arrays from DataFrame in parallel with dask

I've got a pandas DataFrame with a column, containing images as numpy 2D arrays.
I need to have a Series or DataFrame with their histograms, again in a single column, in parallel with dask.
Sample code:
import numpy as np
import pandas as pd
import dask.dataframe as dd
def func(data):
result = np.histogram(data.image.ravel(), bins=128)[0]
return result
n = 10
df = pd.DataFrame({'image': [(np.random.random((60, 24)) * 255).astype(np.uint8) for i in np.arange(n)],
'n1': np.arange(n),
'n2': np.arange(n) * 2,
'n3': np.arange(n) * 4
}
)
print 'DataFrame\n', df
hists = pd.Series([func(r[1]) for r in df.iterrows()])
# MAX_PROCESSORS = 4
# ddf = dd.from_pandas(df, npartitions=MAX_PROCESSORS)
# hists = ddf.apply(func, axis=1, meta=pd.Series(name='data', dtype=np.ndarray)).compute()
print 'Histograms \n', hists
Desired output
DataFrame
image n1 n2 n3
0 [[51, 254, 167, 61, 230, 135, 40, 194, 101, 24... 0 0 0
1 [[178, 130, 204, 196, 80, 97, 61, 51, 195, 38,... 1 2 4
2 [[122, 126, 47, 31, 208, 130, 85, 189, 57, 227... 2 4 8
3 [[185, 141, 206, 233, 9, 157, 152, 128, 129, 1... 3 6 12
4 [[131, 6, 95, 23, 31, 182, 42, 136, 46, 118, 2... 4 8 16
5 [[111, 89, 173, 139, 42, 131, 7, 9, 160, 130, ... 5 10 20
6 [[197, 223, 15, 40, 30, 210, 145, 182, 74, 203... 6 12 24
7 [[161, 87, 44, 198, 195, 153, 16, 195, 100, 22... 7 14 28
8 [[0, 158, 60, 217, 164, 109, 136, 237, 49, 25,... 8 16 32
9 [[222, 64, 64, 37, 142, 124, 173, 234, 88, 40,... 9 18 36
Histograms
0 [81, 87, 80, 94, 99, 79, 86, 90, 90, 113, 96, ...
1 [93, 76, 103, 83, 76, 101, 85, 83, 96, 92, 87,...
2 [84, 93, 87, 113, 83, 83, 89, 89, 114, 92, 86,...
3 [98, 101, 95, 111, 77, 92, 106, 72, 91, 100, 9...
4 [95, 96, 87, 82, 89, 87, 99, 82, 70, 93, 76, 9...
5 [77, 94, 95, 85, 82, 90, 77, 92, 87, 89, 94, 7...
6 [73, 86, 81, 91, 91, 82, 96, 94, 112, 95, 74, ...
7 [88, 89, 87, 88, 76, 95, 96, 98, 108, 96, 92, ...
8 [83, 84, 76, 88, 96, 112, 89, 80, 93, 94, 98, ...
9 [91, 78, 85, 98, 105, 75, 83, 66, 79, 86, 109,...
You can see commented lines, calling dask.DataFrame.apply. If I have uncommented them, I've got the exception dask.async.ValueError: Shape of passed values is (3, 128), indices imply (3, 4)
And here is the exception stack:
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\base.py", line 94, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\base.py", line 201, in compute
results = get(dsk, keys, **kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\threaded.py", line 76, in get
**kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\async.py", line 500, in get_async
raise(remote_exception(res, tb))
dask.async.ValueError: Shape of passed values is (3, 128), indices imply (3, 4)
How can I overcome it?
My goal is to process this data frame in parallel.

map_partitions was the answer. After several days of experiments and time measurements, I've come to the following code. It gives 2-4 times speedup compared to list comprehensions or generator expressions wrapping pandas.DataFrame.itertuples
def func(data):
filtered = # filter data.image
result = np.histogram(filtered)
return result
def func_partition(data, additional_args):
result = data.apply(func, args=(bsifilter, ), axis=1)
return result
if __name__ == '__main__':
dask.set_options(get=dask.multiprocessing.get)
n = 30000
df = pd.DataFrame({'image': [(np.random.random((180, 64)) * 255).astype(np.uint8) for i in np.arange(n)],
'n1': np.arange(n),
'n2': np.arange(n) * 2,
'n3': np.arange(n) * 4
}
)
ddf = dd.from_pandas(df, npartitions=MAX_PROCESSORS)
dhists = ddf.map_partitions(func_partition, bfilter, meta=pd.Series(dtype=np.ndarray))
print 'Delayed dhists = \n', dhists
hists = pd.Series(dhists.compute())

One-liner to calculate multiples of a certain number between two values in python

I have the following code to do what the title says:
def multiples(small, large, multiple):
multiples = []
for k in range(small, large+1):
if k % multiple == 0:
multiples.append(k)
return multiples
What it outputs:
>>> multiples(39, 51, 12)
[48]
>>> multiples(39, 51, 11)
[44]
>>> multiples(39, 51, 10)
[40, 50]
>>> multiples(39, 51, 9)
[45]
>>> multiples(39, 51, 8)
[40, 48]
>>> multiples(39, 51, 7)
[42, 49]
>>> multiples(39, 51, 6)
[42, 48]
>>> multiples(39, 51, 5)
[40, 45, 50]
>>> multiples(39, 51, 4)
[40, 44, 48]
>>> multiples(39, 51, 3)
[39, 42, 45, 48, 51]
>>> multiples(39, 51, 2)
[40, 42, 44, 46, 48, 50]
However, this is a lot of code to write, and I was looking for a pythonic one-liner to do what this does. Is there anything out there?

Just change your code to a List Comprehension, like this
return [k for k in range(small, large+1) if k % multiple == 0]
If you are just going to iterate through the results, then you can simply return a generator expression, like this
return (k for k in xrange(small, large+1) if k % multiple == 0)
If you really want to get all the multiples as a list, then you can convert that to a list like this
list(multiples(39, 51, 12))

You can do it as:
def get_multiples(low, high, num):
return [i for i in range(low,high+1) if i%num==0]
Examples:
>>> print get_multiples(4, 345, 56)
[56, 112, 168, 224, 280, 336]
>>> print get_multiples(39, 51, 2)
[40, 42, 44, 46, 48, 50]
>>> print get_multiples(2, 1234, 43)
[43, 86, 129, 172, 215, 258, 301, 344, 387, 430, 473, 516, 559, 602, 645, 688, 731, 774, 817, 860, 903, 946, 989, 1032, 1075, 1118, 1161, 1204]

range((small+multiple-1)//multiple * multiple, large+1, multiple)

Perfect application for a generator expression:
>>> sm=31
>>> lg=51
>>> mult=5
>>> (m for m in xrange(sm,lg+1) if not m%mult)
<generator object <genexpr> at 0x101e3f2d0>
>>> list(_)
[35, 40, 45, 50]
If on Python3+, use range instead of xrange...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

re to identify range from string - python

range(*map(int,mystring.split("-"))) No need for regex

Related

Sorting Algorithm output at end of pass 3

Bin sequential values in list

Python Nested list function to return score for students exam's.

Produce pandas Series of numpy.arrays from DataFrame in parallel with dask

One-liner to calculate multiples of a certain number between two values in python

Categories

Resources