Weighted mean with Numpy in file txt - python

Good morning,
i've this txt file space separated:
drive-------------------------------------------------------------------------k='6' seq nelsa s length: 4044778
# 51 1
# 64 1
# 65 1
# 67 2
# 70 1
# 72 1
# 73 1
# 77 1
# 79 1
# 86 1
# 88 2
# 89 1
# 92 1
# 94 1
# 95 1
# 96 2
# 100 1
# 103 1
# 105 1
# 108 1
# 112 1
# 119 1
# 123 1
# 126 1
# 127 1
# 129 1
# 130 1
# 133 3
# 134 1
# 135 1
# 138 2
# 139 1
# 140 1
# 141 1
# 142 1
# 143 1
# 144 2
# 145 1
# 148 2
# 150 3
# 151 1
# 152 1
# 153 1
# 154 1
# 155 1
# 156 1
# 157 1
# 159 3
# 160 1
# 161 1
# 162 1
# 163 1
# 164 1
# 165 2
# 167 2
# 168 1
# 169 1
# 170 1
# 172 2
# 173 1
# 174 1
# 175 1
-------------------------------------------------------------------------k='7' seq nelsa s length: 4044778
# 4 1
# 5 1
# 8 1
# 9 1
# 10 3
# 11 3
# 12 4
# 13 7
# 14 6
# 15 5
# 16 11
# 17 7
# 18 14
# 19 8
# 20 15
# 21 13
# 22 10
# 23 6
# 24 22
# 25 14
# 26 19
# 27 17
# 28 20
# 29 25
# 30 15
# 31 22
# 32 18
# 33 23
# 34 30
# 35 24
# 36 35
# 37 39
# 38 27
# 39 33
# 40 36
# 41 34
# 42 40
# 43 43
# 44 44
# 45 44
# 46 43
# 47 50
# 48 51
# 49 54
# 50 55
# 51 44
# 52 49
# 53 56
# 54 35
# 55 52
# 56 47
# 57 48
# 58 65
# 59 56
# 60 53
# 61 54
# 62 66
# 63 47
# 64 61
# 65 50
# 66 46
# 67 69
# 68 65
# 69 66
# 70 59
# 71 59
# 72 55
# 73 73
# 74 91
# 75 73
# 76 56
# 77 66
# 78 63
# 79 67
# 80 78
# 81 51
# 82 69
# 83 60
# 84 64
# 85 73
# 86 58
# 87 60
# 88 64
# 89 73
# 90 63
# 91 65
# 92 59
# 93 69
# 94 67
# 95 73
# 96 50
# 97 53
# 98 68
# 99 65
# 100 63
# 101 55
# 102 73
# 103 76
# 104 66
# 105 70
# 106 75
# 107 66
# 108 56
# 109 49
# 110 68
# 111 52
# 112 66
# 113 67
# 114 66
# 115 52
# 116 61
# 117 59
# 118 65
# 119 67
# 120 56
# 121 60
# 122 64
# 123 53
# 124 59
# 125 66
# 126 58
# 127 77
# 128 51
# 129 67
# 130 53
# 131 56
# 132 62
# 133 64
# 134 56
# 135 42
# 136 71
# 137 57
# 138 53
# 139 52
# 140 65
# 141 59
# 142 61
# 143 60
# 144 64
I have to calculate with python max weighted_mean and min from the first # to the last # before the -----k='' separator for each K.
How can i say to python this, to do calculus from # to # for each K?
This is the code i've wrote so far:
#! /usr/bin/python
import numpy
import csv
file = open('Acetobacterium_woodii_DSM_1030.mdistr', 'rb')
data = csv.reader(file, delimiter=' ')
table = [row for row in data]
a = table[10]
print a[2]
It's just an example.
Thank you in advance.

An easy solution is the following:
#! /usr/bin/python
import numpy as np
with open('Acetobacterium_woodii_DSM_1030.mdistr', 'r') as f:
# read file
lines = f.read().splitlines()
values = []
weights = []
for line in lines:
if line.startswith("-"):
# print statistics and reset variables
print ("Min: %d, Max: %d, Avg: %d" %(min(values),max(values),
np.average(values,
weights=weights)))
values = []
weights = []
elif line.startswith("#"):
row = line.split()
values.append(int(row[1]))
weights.append(int(row[2]))
# print statistics before quitting
print ("Min: %d, Max: %d, Avg: %d" %(min(values),max(values),
np.average(values,
weights=weights)))

I'm sure there is a more elegant answer out there, but this works:
#parse data
with open('data.txt') as f:
lines = f.readlines()
alldata = []
kdata = []
for i,line in enumerate(lines):
if len(line) > 1:
if line[0] =='#':
x = int(line.split(' ')[1])
y = int(line.split(' ')[2][:-1])
kdata.append([x,y])
else:
if i !=0:
alldata.append([kval,kdata])
ptr = line.find('k=')+3
kval = int(line[ptr:ptr+1])
kdata=[]
alldata.append([kval,kdata])
#analyse data
for kblock in alldata:
kval = kblock[0]
sumx = sum([x for z in kblock[1:] for (x,y) in z])
sumy = sum([y for z in kblock[1:] for (x,y) in z])
sumxy = sum([x*y for z in kblock[1:] for (x,y) in z])
mean = sumxy/sumy
print('The mean of k-{0} is :{1}'.format(kval,mean))
numpywould have been my first choice but if your data is of variable size then this becomes more difficult. Hope this helps.

Related

Python, how can I execute the output in tabular form: ten code-symbol pairs in each line?

def display_code_ascii():
for i in range(32, 128):
print(chr(i))
print(display_code_ascii())
This is my code. the output is:
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
#
A
B
C
D
E
But I want to print in a console like this:
32 is 33 is ! 34 is " 35 is # 36 is $ 37 is % 38 is & 39 is ' 40 is ( 41 is )
42 is * 43 is + 44 is , 45 is - 46 is . 47 is / 48 is 0 49 is 1 50 is 2 51 is 3
52 is 4 53 is 5 54 is 6 55 is 7 56 is 8 57 is 9 58 is : 59 is ; 60 is < 61 is =
62 is > 63 is ? 64 is # 65 is A 66 is B 67 is C 68 is D 69 is E 70 is F 71 is G
72 is H 73 is I 74 is J 75 is K 76 is L 77 is M 78 is N 79 is O 80 is P 81 is Q
82 is R 83 is S 84 is T 85 is U 86 is V 87 is W 88 is X 89 is Y 90 is Z 91 is [
92 is \ 93 is ] 94 is ^ 95 is _ 96 is ` 97 is a 98 is b 99 is c 100 is d 101 is e
102 is f 103 is g 104 is h 105 is i 106 is j 107 is k 108 is l 109 is m 110 is n 111 is o
112 is p 113 is q 114 is r 115 is s 116 is t 117 is u 118 is v 119 is w 120 is x 121 is y
122 is z 123 is { 124 is | 125 is } 126 is ~ 127 is None
# set chunk_size which is number of initial elements to handle per each outputted line
chunk_size = 10
def format_element(x):
x = chr(x)
return x if x != '\x7f' else "None"
# prepare initial list elements with output strings
ll = [f"{x} is {format_element(x)}" for x in range(32, 128)]
# split list into chunks using chunk_size
ll = [ll[i:i+chunk_size] for i in range(len(ll))[::chunk_size]]
# join inner lists into output lines strings
ll = [" ".join(x) for x in ll]
# print each line separately
for i in ll:
print(i)
Output:
32 is 33 is ! 34 is " 35 is # 36 is $ 37 is % 38 is & 39 is ' 40 is ( 41 is )
42 is * 43 is + 44 is , 45 is - 46 is . 47 is / 48 is 0 49 is 1 50 is 2 51 is 3
52 is 4 53 is 5 54 is 6 55 is 7 56 is 8 57 is 9 58 is : 59 is ; 60 is < 61 is =
62 is > 63 is ? 64 is # 65 is A 66 is B 67 is C 68 is D 69 is E 70 is F 71 is G
72 is H 73 is I 74 is J 75 is K 76 is L 77 is M 78 is N 79 is O 80 is P 81 is Q
82 is R 83 is S 84 is T 85 is U 86 is V 87 is W 88 is X 89 is Y 90 is Z 91 is [
92 is \ 93 is ] 94 is ^ 95 is _ 96 is ` 97 is a 98 is b 99 is c 100 is d 101 is e
102 is f 103 is g 104 is h 105 is i 106 is j 107 is k 108 is l 109 is m 110 is n 111 is o
112 is p 113 is q 114 is r 115 is s 116 is t 117 is u 118 is v 119 is w 120 is x 121 is y
122 is z 123 is { 124 is | 125 is } 126 is ~ 127 is None

From Matlab to Python Code [z,index]=sort(abs(z));

i am trying to convert code from matlab to python.
Can you please help me to convert this code from matlab to python?
in matlab code
z is list and z length is 121
z= 7.0502 5.8030 4.4657 3.0404 1.5416 0 -1.5416 -3.0404 -4.4657
-5.8030 -7.0502 7.5944 6.3059 4.8990 3.3662 1.7189 0 -1.7189 -3.3662 -4.8990 -6.3059 -7.5944 8.2427 6.9282 5.4611 3.8122 1.9735 0 -1.9735 -3.8122 -5.4611 -6.9282 -8.2427 9.0135 7.7027 6.2075 4.4590 2.3803 0 -2.3803 -4.4590 -6.2075 -7.7027 -9.0135 9.9185 8.6576 7.2038 5.4466 3.1530 0 -3.1530 -5.4466 -7.2038 -8.6576 -9.9185 10.9545 9.7980 8.4853 6.9282 4.8990 0 -4.8990 -6.9282 -8.4853 -9.7980 -10.9545 12.0986 11.0885 9.9947 8.8128 7.6119 -6.9282 -7.6119 -8.8128 -9.9947 -11.0885 -12.0986 13.3133 12.4632 11.5988 10.7649 10.0829 -9.7980 -10.0829 -10.7649 -11.5988 -12.4632 -13.3133 14.5583 13.8564 13.1842 12.5910 12.1612 -12.0000 -12.1612 -12.5910 -13.1842 -13.8564 -14.5583 15.8011 15.2238 14.6969 14.2594 13.9626 -13.8564 -13.9626 -14.2594 -14.6969 -15.2238 -15.8011 17.0207 16.5431 16.1227 15.7875 15.5684 -15.4919 -15.5684 -15.7875 -16.1227 -16.5431 -17.0207
Matlab code : [z,index]=sort(abs(z));
after the code
z = 0 0 0 0 0 0 1.5416 1.5416 1.7189 1.7189 1.9735 1.9735 2.3803 2.3803 3.0404 3.0404 3.1530 3.1530 3.3662 3.3662 3.8122 3.8122 4.4590 4.4590 4.4657 4.4657 4.8990 4.8990 4.8990 4.8990 5.4466 5.4466 5.4611 5.4611 5.8030 5.8030 6.2075 6.2075 6.3059 6.3059 6.9282 6.9282 6.9282 6.9282 6.9282 7.0502 7.0502 7.2038 7.2038 7.5944 7.5944 7.6119 7.6119 7.7027 7.7027 8.2427 8.2427 8.4853 8.4853 8.6576 8.6576 8.8128 8.8128 9.0135 9.0135 9.7980 9.7980 9.7980 9.9185 9.9185 9.9947 9.9947 10.0829 10.0829 10.7649 10.7649 10.9545 10.9545 11.0885 11.0885 11.5988 11.5988 12.0000 12.0986 12.0986 12.1612 12.1612 12.4632 12.4632 12.5910
12.5910 13.1842 13.1842 13.3133 13.3133 13.8564 13.8564 13.8564 13.9626 13.9626 14.2594 14.2594 14.5583 14.5583 14.6969 14.6969 15.2238 15.2238 15.4919 15.5684 15.5684 15.7875 15.7875 15.8011 15.8011 16.1227 16.1227 16.5431 16.5431 17.0207 17.0207
and index is
index = 6 17 28 39 50 61 5 7 16 18 27 29 38 40 4 8 49 51 15 19 26 30 37 41 3 9 14 20 60 62 48 52 25 31 2 10 36 42 13 21 24 32 59 63 72 1 11 47 53 12 22 71 73 35 43 23 33 58 64 46 54 70 74 34 44 57 65 83 45 55 69 75 82 84 81 85 56 66 68 76 80 86 94 67 77 93 95 79 87 92 96 91 97 78 88 90 98 105 104 106 103 107 89 99 102 108 101 109 116 115 117 114 118 100 110 113 119 112 120 111 121
so what is the [z,index] in python ?
Do you need to return the index? If you don't, you could use:
z = abs(z)
new_list = sorted(map(abs, z))
index = sorted(range(len(z)), key=lambda k: z[k])
where x is the output and z is the list.
EDIT:
Try that now

Grayscale image array rotation for data augmentation

I am trying to rotate some images for data augmentation to train a network for image segmentation task. After searching a lot, the best candidate for rotating each image and its corresponding mask was to use the scipy.ndimage.rotate function, but the problem with this is that after rotating the mask image numpy array ( which includes only 0 and 255 values for pixel values) the rotated mask has got all the values from 0 to 255 while I expect the mask array to have only 0 and 255 as its pixel values.
Here is the code:
from scipy.ndimage import rotate
import numpy as np
ample = dataset[1]
print(np.unique(sample['image']))
print(np.unique(sample['mask']))
print(sample['image'].shape)
print(sample['mask'].shape)
rot_image = rotate(sample['image'], 60, reshape = False)
rot_mask = rotate(sample['mask'], 60, reshape = False)
print(np.unique(rot_image))
print(np.unique(rot_mask))
print(rot_image.shape)
print(rot_mask.shape)
Here are the results:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 114 115 118 119 120 121 125 139]
[ 0 255]
(512, 512, 1)
(512, 512, 1)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 115 117 118 124 125 132 135]
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17 18 20
24 25 26 28 31 34 35 38 39 41 42 43 45 46 48 49 50 51
52 58 59 62 66 67 68 73 75 76 79 80 82 85 86 88 90 96
98 101 108 109 111 114 116 118 119 123 124 125 127 128 130 138 140 142
146 148 151 156 157 158 161 164 165 166 168 169 176 180 184 185 188 189
194 196 197 198 199 201 203 204 205 207 208 210 211 213 216 217 218 219
220 221 222 225 228 229 230 231 233 234 235 237 239 240 241 242 243 244
245 246 247 248 249 250 251 252 253 254 255]
(512, 512, 1)
(512, 512, 1)
It seems to be a simple problem to rotate image array, but I'm searching for days and I didn't find any solution to this problem. I am really confused how to prevent mask array values( 0 and 255) to take all values from 0 to 255 after rotation. I mean something like this:
x = np.unique(sample['mask'])
rot_mask = rotate(sample['mask'], 30, reshape = False)
x_rot = np.unique(rot_mask)
print(np.unique(x - x_rot))
[ 0]
Since you are using numpy arrays to represent images, why not using numpy functions? This library has all sorts of array manipulations. Try the rot90 function

Transpose .csv File: changing Header Time Stamps to Line TimeStamp

My Data Looks Like this:
statnr datum ele h01 h02 h03 h04 h05 h06 h07 h08 h09 h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23 h24
----------- ----------- --- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
20101 20020401 D6K 103 126 115 114 105 101 118 118 130 129 126 128 132 133 131 130 130 131 130 130 125 117 122 124
20101 20020402 D6K 126 118 119 120 114 111 107 119 124 126 122 130 130 130 128 128 126 119 129 134 132 127 112 118
........
20101 20150909 D6K 72 82 75 76 82 93 91 96 99 101 108 108 103 100 94 90 82 92 88 79 77 89 94 92
20101 20020401 FLP 54 61 58 61 66 67 65 56 47 46 40 40 39 32 34 34 37 43 45 45 50 54 59 63
20101 20020402 FLP 64 61 67 66 68 69 67 56 50 46 42 39 33 32 33 34 39 48 55 58 61 62 65 68
........
20101 20150909 FLP 93 95 92 94 94 96 95 92 90 84 87 75 81 75 75 74 83 87 89 96 94 92 91 94
20101 20070906 GSE 32700 0 0 0 0 0 3 10 17 30 28 27 37 44 37 25 16 5 1 0 0 0 0 0
20101 20070907 GSE 0 0 0 0 0 0 11 48 72 107 257 264 290 216 255 178 122 57 6 0 0 0 0 0
........
20101 20150909 GSE 0 0 0 0 0 1 17 51 71 118 82 200 116 130 142 156 48 15 1 0 0 0 0 0
20101 20020101 SUV 0 0 0 0 0 0 0 0 9 10 10 10 10 10 10 10 2 0 0 0 0 0 0 0
........
20101 20150909 SUV 0 0 0 0 0 0 0 0 0 1 0 5 1 4 4 9 2 0 0 0 0 0 0 0
20101 20020401 TEX 30 18 21 18 9 10 18 42 69 91 114 117 126 135 133 127 114 87 58 47 39 33 27 24
........
20101 20150909 TEX 50 46 48 50 50 49 57 67 77 85 80 111 95 100 101 92 74 67 59 53 49 49 49 47
20101 20020401 QVX 6 10 9 8 13 25 19 15 16 19 24 24 19 23 24 22 24 23 19 13 12 16 16 18
........
20101 20150909 QVX 40 42 37 34 30 34 22 22 27 31 26 28 37 38 42 43 52 54 59 81 80 69 78 60
as you can see it is a huge sheet with a statnr Row, DateRow, ele stands for the parameter and than h01 - h24 are as you can imagine the hours.
I need to adjust the format from that Sheet to the Format of the other Files I'm working with (Plotting and processing reasons)
I'm currently trying to bring this FileSheet into this Format:
Date Time D6K FLP GSE SUV TEX QVX
01.04.2002 01:00 103 54 0 30 6
.....
09.09.2015 23:59 92 94 0 0 47 60
So what I'm trying to do is:
1) Get rid of row[0] (statnr)
2) Switch the Header with Row[2] so that all parameters are in the header and link them to the new Time Date fmt in the lines
3) Convert the time fmt from %H%M%D to %D%M&Y %H:%M
Since I'm new to python and coding I thought I'd ask if there's maybe a package out there that deals with that kind of Problem, and if there's a term for that Problem in general (switching header with lines) --> thanks (Peter Wood) I switched the Title to Transpose
Thanks for suggestions
For Clarification:
the ........ indicates that I left some rows out
the ----------- are in the file
Because you may have missing data, this isn't a simple case of transposing blocks. I think what you need to do is read the input file into a data structure from which you can then look up the values as required to generate your output. In Python you can use a dictionary whose key is a tuple of your element type, date, and hour:
mydict = {}
with open('F:\myfile.txt') as f:
z = f.readline() # discard headings
z = f.readline() # discard row of dashes
for line in f:
fields = line.split()
date = fields[1]
ele = fields[2]
for hour, value in enumerate(fields[3:27]):
mydict[(ele, date, hour)] = value
Now you have all the data in a big dictionary that's addressable by ele, date and hour. I'm going to guess that the ele values are fixed and you can hardcode them, but you'll want to build a list of the unique dates you actually found in the input file, and put them in ascending order:
dateset=set()
for k in mydict.keys():
dateset.add(k[1])
dates=list(dateset)
dates.sort()
Now you're ready to build your output file.
for date in dates:
for hour in range(24):
output = date + '\t' + hour
for ele in ['D6K', 'FLP', 'GSE', 'SUV', 'TEX', 'QVX']:
output = output + '\t' + mydict.get((ele, date, hour), '')
print(output)
Using the get method on the dictionary allows you to specify a default value to be returned if the key you supplied isn't in the dictionary.
I haven't dealt with the date formatting (note that 'hour' ranges from 0 to 23), or writing the output to a file, but the above should get you going.

Python Nested Loop For Sequence of Several Rowes

How do i use a loop nested within a loop to create this output:
100
101 102
103 104 105
106 107 108 109
You can do this using a while loop with certain other variables:
>>> st, end, length = 100, 110, 1
>>> while st < end:
... print (" ".join(map(lambda x: "%3d" % x, range(st,st+length))))
... st += length
... length += 1
...
100
101 102
103 104 105
106 107 108 109
Note that I have used a lambda instead of str in the map, this is so that different width numbers don't break up the indentation.
You can similarly base this on the width of the final line as below (in which case there is no need to keep track of the end variable I was doing above):
>>> st, width = 1, 1
>>> while width <= 15:
... print (" ".join(map(lambda x: "%3d" % x, range(st, st + width))))
... st += width
... width += 1
...
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74 75 76 77 78
79 80 81 82 83 84 85 86 87 88 89 90 91
92 93 94 95 96 97 98 99 100 101 102 103 104 105
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
Something like this???
start = 100
lines = 5
for i in range(lines):
stop = start + i + 1
for j in range(a, stop):
print j,
start = stop
print
Just use two-level for loop:
x = 100
for i in range(1, 5):
for j in range(i):
print(x, end=" ")
x += 1
print("")

Categories

Resources