This is my file
7377.0 Angebot: [100, 28, 176, 6, 73, 133, 77, 137, 174, 104, 191, 97, 156, 148, 164, 56, 107, 91, 177, 84, 161, 197, 90, 105, 41, 126, 12, 76, 25, 129, 135, 149, 85, 145, 110, 48, 53, 89, 122, 5, 121, 45, 141, 49, 165, 128, 167, 109, 75, 147, 168, 142, 93, 13, 44, 22, 120, 65, 139, 171, 87, 70, 184, 132, 158, 152, 144, 47, 16, 94, 74, 138, 66, 72, 82, 60, 59, 169, 194, 185, 71, 46, 119, 7, 86, 79, 190, 188, 101, 31, 14, 157, 117, 113, 124, 103, 125, 51, 112, 182, 29, 166, 78, 134, 33, 11, 155, 32, 57, 30, 175, 187, 92, 178, 127, 189, 180, 199, 160, 27, 21, 58, 62, 192, 198, 173, 68, 23, 136, 193, 106, 159, 83, 116, 102, 9, 96, 181, 99, 17, 38, 114, 10, 111, 143, 1, 200, 26, 24, 39, 15, 18, 172, 130, 63, 69, 55, 3, 183, 195, 88, 67, 34, 2, 150, 35, 64, 163, 140, 4, 36, 196, 50, 131, 118, 8, 162, 81, 154, 20, 42, 170, 98, 52, 186, 146, 179, 54, 80, 95, 153, 43, 61, 40, 151, 123, 115, 108, 19, 37]
The structure of every line looks like this
Double<SPACE>Angebot:<Space>[...,...]
I want to read that complete file
But with that it didn't work
import pandas as pd
df = pd.read_csv('results.txt', delimiter = " ")
df.head()
Example file:
855.0 Angebot: [5,1,2,3,4]
8895.0 Angebot: [5,8,9,6,4]
225.0 Angebot: [5,14,2,5,4]
7485.0 Angebot: [5,18,94,51]
The problem is, I can't create such a dataframe with pandas so I created an example textfile and this would have to be saved by myself, if someone can tell me how I can create this better I would be happy about tips.
What I want is something like
Double | Array
855.0 | [5,1,2,3,4]
8895.0 | [5,8,9,6,4]
225.0 | [5,14,2,5,4]
7485.0 | [5,18,94,51]
How could I read the file so that I get with pandas the output?
My question now is, how can I read the complete text file and save it in a dataframe so that the desired output comes out?
What I also tried
import pandas as pd
df = pd.read_csv('results.txt', delimiter = " Angebot: ")
df.head()
What I got
<ipython-input-13-9994bd826634>:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
df = pd.read_csv('results.txt', delimiter = " Angebot: ")
You can pass engine as python while using df.read_csv
df = pd.read_csv(f, header=None, sep='Angebot:', engine='python', names=['Double', 'Array'])
OUTPUT:
Double Array
0 855.0 [5,1,2,3,4]
1 8895.0 [5,8,9,6,4]
2 225.0 [5,14,2,5,4]
3 7485.0 [5,18,94,51]
You can use the separator \s+ and use ast.literal_eval to transform object "array" to actual array(list):
import pandas as pd
import io
from ast import literal_eval
file_txt = io.StringIO(
"""855.0 Angebot: [5,1,2,3,4]
8895.0 Angebot: [5,8,9,6,4]
225.0 Angebot: [5,14,2,5,4]
7485.0 Angebot: [5,18,94,51]"""
)
dataf = pd.read_csv(file_txt, sep="\s+", usecols=[0,2], names=['Double','Array'], converters={2: lambda x: literal_eval(x)})
dataf
I think this could work for you:
import pandas as pd
import numpy as np
my_list = list()
f = open("file.txt", "r")
for line in f.readlines():
l = line.strip()
my_list.append([float(line.split()[0]), np.array(l.split('Angebot: ')[1].rstrip())])
pd.DataFrame(my_list, columns=['float', 'array'])
Result:
float array
0 855.0 [5,1,2,3,4]
1 8895.0 [5,8,9,6,4]
2 225.0 [5,14,2,5,4]
3 7485.0 [5,18,94,51]
Input file:
855.0 Angebot: [5,1,2,3,4]
8895.0 Angebot: [5,8,9,6,4]
225.0 Angebot: [5,14,2,5,4]
7485.0 Angebot: [5,18,94,51]
Related
My Code
import numpy as np import pandas as pd
print('-' * 50)
filename = r'''C:\Users\Computer\Documents\Python Scripts\weather.txt'''
df = pd.read_csv(filename)
pd.set_option('display.max_columns', None)
print (df.describe())
print (df.record_high)
My data
month, avg_high, avg_low, record_high, record_low, avg_percipitation
Jan, 58, 42, 74, 22, 2.95
Feb, 61, 45, 78, 26, 3.02
Mar, 65, 48, 84, 25, 2.34
Apr, 67, 50, 92, 28, 1.02
May, 71, 53, 98, 35, 0.48
Jun, 75, 56, 107, 41, 0.11
Jul, 77, 58, 105, 44, 0.0
Aug, 77, 59, 102, 43, 0.03
Sep, 77, 57, 103, 40, 0.17
Oct, 73, 54, 96, 34, 0.81
Nov, 64, 48, 84, 30, 1.7
Dec, 58, 42, 73, 21, 2.56
When I run it, it gives me an error saying AttributeError: 'DataFrame' object has no attribute 'record_high' but there clearly is that attribute. Does anyone have a solution?
There may be a spacing error in your data. Try accessing the column by doing (df[' record_high']).
If that is the case, run
df.columns = df.columns.str.strip()
after you read in df. You should then be able to access
df['record_high']
I have a database called _ucDB which has 262 rows of data that look something like this:
indexID matchID order userClean
1 21 0 dirty
1 32 1 dirty
1 145 2 dirty
4 5 3 clean
4 43 4 dirty
4 180 5 dirty
4 184 6 dirty
6 7 7 clean
6 13 8 dirty
6 93 9 dirty
6 132 10 dirty
6 153 11 dirty
6 172 12 dirty
6 196 13 dirty
8 9 14 clean
8 171 15 dirty
12 13 16 clean
12 93 17 dirty
12 132 18 dirty
12 153 19 dirty
12 181 20 dirty
12 196 21 dirty
I have a list of probabilities that is a list of 131 values that look like this:
[0.99966824, 0.96239686, 0.99911624, 0.28857997, 0.003755328, 0.0046950155, 0.0044651907, 0.0047618235, 0.23484087, 0.962187, 3.0091974e-22, 8.1519043e-22, 0.9905359, 0.00011853044, 4.4233568e-14, 7.127504e-07, 1.864812e-17, 0.99703133, 3.17426e-16, 0.50278896, 0.55311096, 1.159942e-05, 0.53562385, 0.16331102, 1.5920829e-06, 7.9792744e-07, 5.823995e-07, 0.284861, 0.46748465, 0.46383706, 0.25041214, 0.99107516, 1.5370236e-11, 0.8576025, 0.0010161225, 0.58321816, 0.76292366, 0.00010934622, 0.72824544, 0.38391674, 0.0097409785, 4.3164547e-08, 1.7280547e-05, 0.7246928, 5.9006602e-08, 5.0709765e-05, 0.978512, 3.5036015e-12, 1.5390156e-11, 0.6185394, 0.017997066, 0.00023294186, 0.13520418, 6.6481048e-06, 0.00015752365, 7.000092e-06, 7.17631e-06, 0.07471306, 0.0015149566, 0.0012117986, 2.0014808e-12, 0.0013824155, 0.040859833, 0.14533857, 0.9288511, 4.464196e-09, 0.07058981, 0.8535712, 0.81062424, 3.734015e-05, 0.22207999, 4.903828e-21, 0.08622761, 0.041497793, 0.018137224, 0.019342968, 0.015368458, 0.41454336, 0.08082744, 0.004606869, 0.0035861062, 0.002696093, 0.8877732, 2.1275096e-06, 6.6134373e-07, 0.0008052338, 0.42654076, 0.17369142, 0.3299104, 1.858753e-18, 0.7474273, 0.14151353, 0.0010253238, 5.308538e-06, 3.493124e-06, 0.00033286438, 0.8685754, 0.7645787, 0.701938, 0.3150338, 2.9346756e-08, 7.83391e-12, 3.4358197e-10, 1.960794e-11, 8.5792645e-17, 0.9964175, 1.3673732e-14, 2.3826202e-14, 7.9876345e-14, 2.4482112e-14, 4.786919e-16, 0.15512297, 0.41997427, 0.25056317, 0.4547511, 0.29294935, 0.29281262, 1.3639165e-06, 2.9399953e-06, 0.6283169, 0.48729306, 6.892901e-06, 3.1108675e-06, 0.009136838, 2.9103248e-10, 5.8614324e-12, 0.6969736, 0.6400705, 0.0028972547, 0.27473485, 0.42833236]
And lastly I have another column in a database containing 131 values that is called _deMeta['evalID'] which looks like this:
[3, 14, 16, 27, 44, 46, 50, 61, 63, 70, 74, 81, 90, 126, 130, 154, 166, 177, 183, 197, 210, 220, 223, 226, 235, 252, 253, 261, 10, 19, 21, 25, 26, 30, 31, 32, 36, 37, 38, 41, 43, 45, 47, 49, 51, 52, 54, 55, 56, 57, 58, 59, 62, 65, 68, 73, 76, 77, 78, 79, 82, 83, 86, 88, 89, 92, 93, 94, 96, 101, 106, 107, 108, 110, 112, 116, 123, 124, 125, 127, 128, 131, 132, 134, 135, 140, 143, 144, 147, 148, 156, 157, 158, 162, 169, 172, 173, 175, 176, 181, 184, 185, 187, 191, 193, 198, 199, 201, 202, 203, 204, 205, 209, 212, 215, 216, 217, 218, 224, 225, 227, 230, 231, 233, 237, 238, 240, 245, 247, 257, 258]
Basically, the probability reflects the probability of the data being clean. And the 'ID' of the probability is the same as the 'evalID'. Meaning, the first probability of 0.99966824 in the probabilities list, corresponds to the first entry in the database column called _deMeta['evalID'] which is 3. That value corresponds to the order in the _ucDB database which is the fourth entry in _ucDB.
I want to create a new database called _newucDB that adds in another column called 'Probability' and that reflects the probability of the order.
For example, if the code correctly matches the first probability of evalID 3 to order 3, the new database should look like this:
indexID matchID order userClean Probability
1 21 0 dirty
1 32 1 dirty
1 145 2 dirty
4 5 3 clean 0.99966824
4 43 4 dirty
4 180 5 dirty
4 184 6 dirty
6 7 7 clean
Note that not all rows would have a probability value. Rows without the probability value should be left blank. Thanks!
I am assuming that you will be reading the data into python
new_data
indexID matchID order userClean
1 21 0 dirty
1 32 1 dirty
1 145 2 dirty
4 5 3 clean
4 43 4 dirty
4 180 5 dirty
4 184 6 dirty
6 7 7 clean
6 13 8 dirty
6 93 9 dirty
6 132 10 dirty
6 153 11 dirty
6 172 12 dirty
6 196 13 dirty
8 9 14 clean
8 171 15 dirty
12 13 16 clean
12 93 17 dirty
12 132 18 dirty
12 153 19 dirty
12 181 20 dirty
12 196 21 dirty
Code
l_prob = [0.99966824, 0.96239686, 0.99911624, 0.28857997, 0.003755328, 0.0046950155, 0.0044651907, 0.0047618235, 0.23484087, 0.962187, 3.0091974e-22, 8.1519043e-22, 0.9905359, 0.00011853044, 4.4233568e-14, 7.127504e-07, 1.864812e-17, 0.99703133, 3.17426e-16, 0.50278896, 0.55311096, 1.159942e-05, 0.53562385, 0.16331102, 1.5920829e-06, 7.9792744e-07, 5.823995e-07, 0.284861, 0.46748465, 0.46383706, 0.25041214, 0.99107516, 1.5370236e-11, 0.8576025, 0.0010161225, 0.58321816, 0.76292366, 0.00010934622, 0.72824544, 0.38391674, 0.0097409785, 4.3164547e-08, 1.7280547e-05, 0.7246928, 5.9006602e-08, 5.0709765e-05, 0.978512, 3.5036015e-12, 1.5390156e-11, 0.6185394, 0.017997066, 0.00023294186, 0.13520418, 6.6481048e-06, 0.00015752365, 7.000092e-06, 7.17631e-06, 0.07471306, 0.0015149566, 0.0012117986, 2.0014808e-12, 0.0013824155, 0.040859833, 0.14533857, 0.9288511, 4.464196e-09, 0.07058981, 0.8535712, 0.81062424, 3.734015e-05, 0.22207999, 4.903828e-21, 0.08622761, 0.041497793, 0.018137224, 0.019342968, 0.015368458, 0.41454336, 0.08082744, 0.004606869, 0.0035861062, 0.002696093, 0.8877732, 2.1275096e-06, 6.6134373e-07, 0.0008052338, 0.42654076, 0.17369142, 0.3299104, 1.858753e-18, 0.7474273, 0.14151353, 0.0010253238, 5.308538e-06, 3.493124e-06, 0.00033286438, 0.8685754, 0.7645787, 0.701938, 0.3150338, 2.9346756e-08, 7.83391e-12, 3.4358197e-10, 1.960794e-11, 8.5792645e-17, 0.9964175, 1.3673732e-14, 2.3826202e-14, 7.9876345e-14, 2.4482112e-14, 4.786919e-16, 0.15512297, 0.41997427, 0.25056317, 0.4547511, 0.29294935, 0.29281262, 1.3639165e-06, 2.9399953e-06, 0.6283169, 0.48729306, 6.892901e-06, 3.1108675e-06, 0.009136838, 2.9103248e-10, 5.8614324e-12, 0.6969736, 0.6400705, 0.0028972547, 0.27473485, 0.42833236]
eval_id = [3, 14, 16, 27, 44, 46, 50, 61, 63, 70, 74, 81, 90, 126, 130, 154, 166, 177, 183, 197, 210, 220, 223, 226, 235, 252, 253, 261, 10, 19, 21, 25, 26, 30, 31, 32, 36, 37, 38, 41, 43, 45, 47, 49, 51, 52, 54, 55, 56, 57, 58, 59, 62, 65, 68, 73, 76, 77, 78, 79, 82, 83, 86, 88, 89, 92, 93, 94, 96, 101, 106, 107, 108, 110, 112, 116, 123, 124, 125, 127, 128, 131, 132, 134, 135, 140, 143, 144, 147, 148, 156, 157, 158, 162, 169, 172, 173, 175, 176, 181, 184, 185, 187, 191, 193, 198, 199, 201, 202, 203, 204, 205, 209, 212, 215, 216, 217, 218, 224, 225, 227, 230, 231, 233, 237, 238, 240, 245, 247, 257, 258]
new_data['probability'] = ''
order = list(map(int , new_data['order']))
for i in range(len(eval_id)):
try:
pos = order.index(eval_id[i])
new_data['probability'][pos] = l_prob[i]
except:
pass
Another Approach
new_data['order'] = list(map(int, new_data['order']))
temp_data = pd.DataFrame()
temp_data['order'] = eval_id
temp_data['probability'] = l_prob
pd.merge(new_data, temp_data[['order','probability']], how='left' ,on='order')
I've got a pandas DataFrame with a column, containing images as numpy 2D arrays.
I need to have a Series or DataFrame with their histograms, again in a single column, in parallel with dask.
Sample code:
import numpy as np
import pandas as pd
import dask.dataframe as dd
def func(data):
result = np.histogram(data.image.ravel(), bins=128)[0]
return result
n = 10
df = pd.DataFrame({'image': [(np.random.random((60, 24)) * 255).astype(np.uint8) for i in np.arange(n)],
'n1': np.arange(n),
'n2': np.arange(n) * 2,
'n3': np.arange(n) * 4
}
)
print 'DataFrame\n', df
hists = pd.Series([func(r[1]) for r in df.iterrows()])
# MAX_PROCESSORS = 4
# ddf = dd.from_pandas(df, npartitions=MAX_PROCESSORS)
# hists = ddf.apply(func, axis=1, meta=pd.Series(name='data', dtype=np.ndarray)).compute()
print 'Histograms \n', hists
Desired output
DataFrame
image n1 n2 n3
0 [[51, 254, 167, 61, 230, 135, 40, 194, 101, 24... 0 0 0
1 [[178, 130, 204, 196, 80, 97, 61, 51, 195, 38,... 1 2 4
2 [[122, 126, 47, 31, 208, 130, 85, 189, 57, 227... 2 4 8
3 [[185, 141, 206, 233, 9, 157, 152, 128, 129, 1... 3 6 12
4 [[131, 6, 95, 23, 31, 182, 42, 136, 46, 118, 2... 4 8 16
5 [[111, 89, 173, 139, 42, 131, 7, 9, 160, 130, ... 5 10 20
6 [[197, 223, 15, 40, 30, 210, 145, 182, 74, 203... 6 12 24
7 [[161, 87, 44, 198, 195, 153, 16, 195, 100, 22... 7 14 28
8 [[0, 158, 60, 217, 164, 109, 136, 237, 49, 25,... 8 16 32
9 [[222, 64, 64, 37, 142, 124, 173, 234, 88, 40,... 9 18 36
Histograms
0 [81, 87, 80, 94, 99, 79, 86, 90, 90, 113, 96, ...
1 [93, 76, 103, 83, 76, 101, 85, 83, 96, 92, 87,...
2 [84, 93, 87, 113, 83, 83, 89, 89, 114, 92, 86,...
3 [98, 101, 95, 111, 77, 92, 106, 72, 91, 100, 9...
4 [95, 96, 87, 82, 89, 87, 99, 82, 70, 93, 76, 9...
5 [77, 94, 95, 85, 82, 90, 77, 92, 87, 89, 94, 7...
6 [73, 86, 81, 91, 91, 82, 96, 94, 112, 95, 74, ...
7 [88, 89, 87, 88, 76, 95, 96, 98, 108, 96, 92, ...
8 [83, 84, 76, 88, 96, 112, 89, 80, 93, 94, 98, ...
9 [91, 78, 85, 98, 105, 75, 83, 66, 79, 86, 109,...
You can see commented lines, calling dask.DataFrame.apply. If I have uncommented them, I've got the exception dask.async.ValueError: Shape of passed values is (3, 128), indices imply (3, 4)
And here is the exception stack:
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\base.py", line 94, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\base.py", line 201, in compute
results = get(dsk, keys, **kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\threaded.py", line 76, in get
**kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\async.py", line 500, in get_async
raise(remote_exception(res, tb))
dask.async.ValueError: Shape of passed values is (3, 128), indices imply (3, 4)
How can I overcome it?
My goal is to process this data frame in parallel.
map_partitions was the answer. After several days of experiments and time measurements, I've come to the following code. It gives 2-4 times speedup compared to list comprehensions or generator expressions wrapping pandas.DataFrame.itertuples
def func(data):
filtered = # filter data.image
result = np.histogram(filtered)
return result
def func_partition(data, additional_args):
result = data.apply(func, args=(bsifilter, ), axis=1)
return result
if __name__ == '__main__':
dask.set_options(get=dask.multiprocessing.get)
n = 30000
df = pd.DataFrame({'image': [(np.random.random((180, 64)) * 255).astype(np.uint8) for i in np.arange(n)],
'n1': np.arange(n),
'n2': np.arange(n) * 2,
'n3': np.arange(n) * 4
}
)
ddf = dd.from_pandas(df, npartitions=MAX_PROCESSORS)
dhists = ddf.map_partitions(func_partition, bfilter, meta=pd.Series(dtype=np.ndarray))
print 'Delayed dhists = \n', dhists
hists = pd.Series(dhists.compute())
I have a python list question:
Input:
l=[2, 5, 6, 7, 10, 11, 12, 19, 20, 26, 28, 33, 34, 45, 46, 47, 50, 57, 59, 64, 67, 77, 79, 87, 93, 97, 106, 110, 111, 113, 115, 120, 125, 126, 133, 135, 142, 148, 160, 166, 169, 176, 202, 228, 234, 253, 274, 365, 433, 435, 436, 468, 476, 529, 570, 575, 577, 581, 614, 766, 813, 944, 1058, 1079, 1245, 1363, 1389, 1428, 1758, 2129, 2336, 2402, 2405, 2576, 3013, 3993, 7687, 8142, 8455, 8456]
Now I want to write mark the numbers in a [0]*10000 list, such that the beginning is like:
Output:
lp=[0,1,0,0,1,...]
The second and fifth elements are marked since they appeared in the input.
lp = [0] * 10000
for index in l:
lp[index - 1] = 1
You could use the following list comprehension
lp = [1 if i in l else 0 for i in range(1, 10001)]
Though I'd recommend since l could be long that you convert it to a set first
set_l = set(l)
lp = [1 if i in set_l else 0 for i in range(1, 10001)]
How to write and Regular expression to get list from string like if we have string:
value = '88-94'
value = '88 to 94'
value = '88'
value = '88-94, 96-108'
outcome should be:
[88, 89, 90, 91, 92, 93, 94]
[88, 89, 90, 91, 92, 93, 94]
[88]
[88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108]
Programming language is python2.7
Here is a working Solution for python2.7 and regex but do have to check for last case having single value as separate case:
>>> import regex
>>> m = regex.match(r"(?:(?P<digits>\d+).(?P<digits>\d+))", "88-94")
>>> a = m.captures("digits")
>>> a
['88', '94']
>>> m = regex.match(r"(?:(?P<digits>\d+).(?P<digits>\d+))", "88 94")
>>> a = m.captures("digits")
>>> a
['88', '94']
>>> range(int(a[0]), int(a[1])+1)
[88, 89, 90, 91, 92, 93, 94]
>>>
Here is a solution which address above cases but what about 88-94, 96-98 etc
>>> import re
>>> a = map(int, re.findall(r'\d+', '88-94'))
>>> range(a[0], a[-1]+1)
[88, 89, 90, 91, 92, 93, 94]
>>> a = map(int, re.findall(r'\d+', '88 94'))
>>> range(a[0], a[-1]+1)
[88, 89, 90, 91, 92, 93, 94]
>>> a = map(int, re.findall(r'\d+', '88'))
>>> range(a[0], a[-1]+1)
[88]
>>>
Solution that cover almost all Cases:
>>> import re
>>> a = map(int, re.findall(r'\d+', '88-94, 96-108'))
>>> c = zip(a[::2], a[1::2])
>>> [m for k in [range(i,j+1) for i, j in c] for m in k]
[88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108]
>>> a = map(int, re.findall(r'\d+', '88-94, 96-108, 125 129'))
>>> c = zip(a[::2], a[1::2])
>>> [m for k in [range(i,j+1) for i, j in c] for m in k]
[88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 125, 126, 127, 128, 129]
>>> a = map(int, re.findall(r'\d+', '88-94, 96-108, 125 129, 132 to 136'))
>>> c = zip(a[::2], a[1::2])
>>> [m for k in [range(i,j+1) for i, j in c] for m in k]
[88, 89, 90, 91, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 125, 126, 127, 128, 129, 132, 133, 134, 135, 136]
>>>
Can anyone suggest reason to downvote or vote for close?
Any Help will be appreciated and can anyone suggest how to update question I am not asking for alternate solutions as I know how to split and loop even re to strip digits and loop, my question is how to do it with re in single statement if possible? Answer could be no but not question as Off-topic.
import re
def get_numbers(value):
value = re.sub(r'^(\d+)$', r'\1-\1', value) # '88' -> '88-88'
start, stop = map(int, re.findall(r'\d+', value))
return range(start, stop+1)
print get_numbers('88-94')
print get_numbers('88 to 94')
print get_numbers('88')
output:
[88, 89, 90, 91, 92, 93, 94]
[88, 89, 90, 91, 92, 93, 94]
[88]
range(*map(int,mystring.split("-")))
No need for regex