Different result for manual calculation of cosine distance and scipy calculation - python

Im trying to calculate the cosine distance between 2 vectors via scipy and the manual way. As a reference i tried to use the first and second answer out of this thread:
Scipy Code:
distance = dist.cosine(v1,v2)
Selfmade Code
distance = np.inner(v1,v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
When printing both results however, they are vastly different, with the first one being the result of my code and the second one of scipy:
1.1443029291450655e-05
0.369629880560304
I am aware that i am technically missing a 1- before my own calculation to match the one in scipy, but even if i added it, the result would even be close to matching.
Why are the results so different? Where did i calculate wrong?
Edit:
Many suggest that the 1- is whats missing however as seen below the results are not even in the same range.
My Selfmade Code is (depending on the added 1- or not) in the range x<0.01 or x>0.99.
Scipy however is grossly in the range between 0.1<x<0.9.
I have added 2 example inputs and corresponding results at the bottom.
Selfmade Result
2.0613046661121477e-05
Scipy Result
0.17675768302346695
EXAMPLE INPUT
v1
[ 97 99 104 109 105 101 100 98 103 115 122 127 136 137 143 146 151 157
171 175 178 185 198 207 213 215 220 92 93 97 98 89 79 76 77 85
95 102 110 118 126 127 131 141 151 162 164 180 184 191 204 212 214 215
85 89 90 79 66 56 57 61 64 65 68 74 83 88 97 112 124 129
140 151 160 167 177 187 193 193 200 80 80 73 62 51 46 46 50 50
46 43 47 51 58 68 78 83 91 102 111 123 137 150 156 164 170 179
73 66 55 44 38 33 34 37 40 38 34 33 31 34 37 38 45 51
55 70 83 95 99 115 130 136 137 57 48 41 29 23 27 32 30 31
29 30 26 20 17 18 17 19 23 28 32 36 47 48 59 82 97 108
55 44 32 24 26 29 30 26 25 22 21 19 16 18 15 13 14 15
16 18 20 22 27 29 40 54 77 44 33 24 25 26 26 30 29 27
25 24 19 17 19 18 18 16 15 16 16 16 14 19 22 25 35 58
36 31 24 24 30 36 42 50 52 50 51 51 51 46 39 35 30 23
23 20 19 17 17 19 23 26 38 35 31 28 35 49 57 64 73 80
86 91 89 92 95 85 70 57 48 39 34 30 28 24 24 27 30 30
35 35 45 56 71 81 91 102 103 108 114 114 117 127 126 114 101 84
68 62 56 52 45 43 46 45 41 44 51 62 73 84 92 101 109 113
117 127 134 139 146 151 152 145 131 115 104 99 94 84 70 65 64 61
49 59 65 72 82 90 103 110 117 123 132 143 155 166 173 179 176 167
160 144 132 119 108 97 87 80 75 49 58 66 75 86 94 106 116 121
125 132 142 155 177 187 188 190 183 182 176 162 149 132 120 111 105 99
59 64 73 84 93 102 108 120 127 128 132 141 159 177 188 187 186 187
189 183 176 168 154 140 131 122 113 66 71 79 87 98 103 108 121 129
135 136 142 156 165 172 176 180 181 177 168 164 166 159 152 149 143 131
68 77 86 92 101 106 109 117 123 131 130 137 143 142 147 153 155 154
146 142 143 148 148 148 149 148 141 72 79 87 93 98 102 106 111 116
124 125 129 131 134 141 147 145 140 132 125 124 128 131 127 128 133 134
72 78 85 91 95 94 101 105 111 119 124 130 131 136 139 139 135 128
120 114 114 115 115 113 113 114 120 74 76 85 91 91 91 95 100 108
111 105 95 87 78 68 57 48 40 33 39 46 63 84 96 101 103 108
74 76 83 85 83 83 81 79 68 51 35 28 33 36 18 7 6 43
93 24 4 8 26 53 79 96 101 77 76 78 78 74 64 47 30 20
23 38 69 108 91 24 7 3 79 167 38 7 11 34 61 50 74 98
75 74 72 65 51 33 21 25 42 63 83 123 152 106 24 8 8 16
22 13 12 16 64 138 102 53 88 75 69 62 50 42 38 35 49 79
97 108 145 156 115 37 9 9 8 10 11 13 25 90 170 163 92 67
80 69 58 56 60 55 48 62 79 95 114 137 149 137 78 19 11 9
9 12 20 48 128 183 177 178 110 83 73 65 67 71 63 55 69 84
92 102 119 127 129 114 59 28 29 49 34 47 94 137 141 130 141 100
91 83 78 81 86 75 63 72 88 96 97 96 99 110 109 102 96 109
131 108 116 121 111 108 113 109 101 105 96 93 92 96 90 75 77 88
95 101 103 107 112 109 111 123 144 140 138 144 141 125 119 126 134 137
115 104 100 98 99 96 87 87 92 103 114 120 129 128 127 127 134 148
143 138 133 122 113 106 110 127 146]
v2
[101 103 103 101 95 88 81 81 81 80 74 73 79 87 90 92 98 100
97 91 85 70 59 51 40 31 26 106 109 107 105 98 90 84 82 80
78 73 73 80 87 87 90 98 101 97 90 83 73 62 50 40 32 26
112 113 111 105 98 87 86 83 80 78 75 75 81 87 89 94 99 99
96 89 81 72 59 49 38 31 29 121 118 113 107 97 89 88 83 80
77 73 74 78 82 84 90 96 93 91 90 84 71 57 45 38 29 26
125 122 115 108 98 88 86 86 82 77 73 72 75 76 78 85 93 93
87 83 78 68 59 44 37 28 24 125 123 116 106 99 92 90 85 79
73 71 71 70 72 77 83 86 87 81 74 67 62 55 43 36 29 23
129 120 110 102 99 94 92 85 76 69 67 65 66 68 75 77 76 75
75 71 63 59 54 41 35 28 22 128 115 108 103 102 101 93 80 73
67 66 63 62 64 68 70 68 67 68 65 58 51 48 40 34 29 24
127 118 114 110 114 107 87 71 64 61 60 62 62 62 64 68 68 68
66 64 59 50 45 40 34 30 25 129 129 125 123 126 104 75 56 53
54 56 61 62 61 62 67 68 70 66 64 62 56 47 40 35 30 26
135 135 134 141 131 89 55 45 50 53 58 64 62 62 68 72 73 74
76 73 66 62 54 44 37 32 27 144 145 151 150 122 68 45 46 52
55 61 65 65 67 73 79 84 89 86 81 73 70 60 50 44 34 28
148 157 162 149 104 52 47 53 57 62 66 69 71 75 83 90 95 94
91 86 83 80 71 61 56 45 33 157 164 163 140 83 51 57 61 65
72 76 78 83 87 97 102 104 104 102 99 93 92 85 73 66 57 44
157 165 161 127 67 60 66 71 79 84 85 90 96 99 110 122 127 125
120 120 109 102 93 82 76 69 57 159 170 154 110 67 69 75 87 97
98 94 99 107 112 121 140 160 158 144 135 124 115 104 93 83 74 61
158 155 146 97 76 83 91 104 114 113 104 103 106 116 126 141 174 183
166 148 128 115 104 96 85 75 63 157 147 135 96 83 97 108 117 129
126 117 111 105 111 121 140 163 168 156 144 128 116 106 95 81 72 67
158 146 129 103 93 109 117 124 130 125 113 106 106 108 117 136 145 148
140 132 126 120 113 97 76 72 77 154 139 124 108 100 112 117 108 97
93 73 61 80 103 114 130 139 138 130 123 122 119 111 93 74 71 89
141 133 123 109 97 103 107 95 100 102 83 79 87 99 108 120 127 126
121 115 116 111 101 84 65 63 84 133 129 116 104 92 90 101 106 115
111 100 102 109 117 113 113 122 122 117 110 107 105 94 75 60 63 78
126 119 106 100 92 80 78 87 94 94 94 96 110 129 135 123 117 112
108 103 103 103 95 77 59 54 66 119 106 98 95 90 83 77 74 73
70 75 89 110 130 141 140 129 116 110 109 113 115 105 80 52 36 41
105 99 96 92 89 87 83 81 77 72 76 91 110 125 140 150 138 130
126 126 131 126 109 81 47 36 47 99 97 95 95 91 88 84 84 82
79 84 97 114 126 138 148 141 135 139 141 141 132 110 83 54 45 54
99 97 96 95 90 84 83 87 86 85 89 98 114 122 133 140 136 131
139 144 139 124 106 83 63 53 57 99 98 97 95 89 86 86 86 87
89 92 98 108 114 127 128 122 121 123 128 129 121 101 82 67 57 58
100 97 93 92 91 92 88 85 90 92 95 102 109 112 120 117 114 118
116 120 124 115 98 84 72 63 59]

scipy defines cosine similarity as 1 - dot product of two vectors over the product of the square roots of the sum of all the elements squared. It might be easier to see the formula on wikipedia than read my description of it.
Your code seems to be missing the "1 -" part.
a = [random.randint(1, 100) for _ in range(100)]
b = [random.randint(1, 100) for _ in range(100)]
1 - np.inner(a,b) / (np.linalg.norm(a) * np.linalg.norm(b))
# 0.2412...
dist.cosine(a, b)
# 0.241286...

Related

How to create a heatmap with different color ranges for the sum column?

I've got my pd.dataframe ready.
Communication Services Consumer Discretionary Consumer Staples Energy Financials Health Care Industrials Materials Real Estate Technology Utilities Sum
Date
2020-09-15 61 65 39 3 36 53 68 89 74 43 53 584
2020-09-14 50 70 39 7 54 45 67 92 64 28 53 569
2020-09-11 38 54 30 0 28 27 46 82 25 18 28 376
2020-09-10 30 52 24 0 16 19 30 67 32 12 25 307
2020-09-09 50 57 36 0 33 30 52 71 51 30 42 452
2020-09-08 34 55 21 0 24 16 24 46 48 12 25 305
2020-09-04 53 59 51 3 66 32 47 71 74 35 28 519
2020-09-03 57 67 57 0 48 40 49 82 80 52 32 564
2020-09-02 73 85 78 3 80 74 94 89 87 94 64 821
2020-09-01 69 78 54 3 54 51 79 85 51 77 14 615
2020-08-31 76 73 78 7 50 61 75 64 54 70 21 629
2020-08-28 92 81 75 30 81 48 86 89 77 76 17 752
2020-08-27 88 77 81 11 83 53 82 82 70 64 14 705
2020-08-26 92 81 75 11 46 43 79 89 45 69 7 637
2020-08-25 92 86 78 23 65 45 82 82 64 64 21 702
2020-08-24 92 88 90 38 62 38 90 75 54 61 39 727
2020-08-21 80 78 69 11 28 37 71 50 45 49 17 535
2020-08-20 84 72 63 11 34 45 78 57 45 57 17 563
2020-08-19 80 83 81 34 48 56 84 71 29 60 35 661
2020-08-18 88 88 90 53 48 62 91 71 70 64 42 767
2020-08-17 80 95 87 80 69 62 94 78 77 63 42 827
2020-08-14 84 100 90 80 83 56 94 78 64 57 42 828
2020-08-13 88 98 87 69 81 56 95 78 64 66 57 839
2020-08-12 73 96 87 96 83 58 98 75 90 63 64 883
2020-08-11 73 86 72 84 89 50 95 78 77 53 46 803
2020-08-10 80 93 87 88 83 53 93 78 90 64 82 891
2020-08-07 69 81 84 65 84 58 91 60 83 71 89 835
2020-08-06 73 80 81 73 60 53 84 57 54 78 67 760
2020-08-05 69 81 87 73 68 69 89 64 51 78 64 793
2020-08-04 80 63 87 73 46 66 64 53 67 81 85 765
2020-08-03 69 55 78 50 60 74 68 42 51 81 78 706
2020-07-31 65 62 78 42 60 61 64 46 58 74 92 702
2020-07-30 65 62 75 34 65 74 71 50 64 61 89 710
2020-07-29 73 78 90 88 90 87 79 85 70 64 85 889
2020-07-28 46 67 81 38 71 72 78 85 61 47 89 735
2020-07-27 61 78 90 61 86 75 76 96 32 74 75 804
2020-07-24 80 77 87 73 87 72 83 100 32 56 96 843
2020-07-23 84 81 90 73 90 85 91 100 38 73 96 901
2020-07-22 88 90 90 84 92 93 94 100 45 90 96 962
2020-07-21 76 91 93 96 92 93 93 100 25 85 92 936
2020-07-20 65 81 81 34 62 91 84 96 32 87 89 802
2020-07-17 76 86 93 38 65 95 91 96 51 77 100 868
2020-07-16 80 90 93 50 81 93 89 96 22 70 85 849
2020-07-15 80 96 87 53 78 95 91 96 45 76 75 872
2020-07-14 69 59 81 23 53 82 73 96 25 60 82 703
2020-07-13 57 34 69 0 46 54 56 71 9 43 75 514
2020-07-10 61 44 66 0 43 59 39 60 35 66 64 537
2020-07-09 46 31 42 0 18 61 36 32 32 61 46 405
2020-07-08 50 42 57 3 34 67 50 46 25 61 57 492
2020-07-07 53 34 60 0 18 66 43 75 22 50 46 467
2020-07-06 50 52 54 7 30 75 64 89 41 76 53 591
Now I'd like to plot a heatmap by using matplotlib. The resulting heatmap should look something like this:
For the inner part (columns besides "sum"), if the value is above 50, then the color should be green, and the color should be darker for the largest values. Same logic for the values below 50.
For the "sum" column, the threshold is 550. How to achieve the gradual change in color?
A sns.diverging_palette(20, 145) standard has white in the center. Possible hue values for red are 20, and 145 for green.
vmin= will then set the numeric value corresponding to red and vmax= for the value corresponding to green. The value in the center will be white.
You need to create 2 separate heatmaps as they have different color ranges. The ax= keyword tells on which subplot the heatmap should be created. The colorbars can be left out: the numbers inside the cells already indicate the correspondence.
A newline character in the label names helps to better use the available space.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
# df = pd.read_csv(...)
# df.set_index('Date', inplace=True)
column_labels = [col.replace(' ', '\n') for col in df.columns[:-1]]
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 10),
gridspec_kw={'width_ratios': [10, 1], 'wspace': 0.02, 'bottom': 0.14})
cmap = sns.diverging_palette(20, 145)
sns.heatmap(df[df.columns[:-1]], cmap=cmap, vmin=0, vmax=100, annot=True, fmt='.0f', annot_kws={'fontsize': 10},
lw=0.6, xticklabels=column_labels, cbar=False, ax=ax1)
sns.heatmap(df[df.columns[-1:]], cmap=cmap, vmin=0, vmax=1100, annot=True, fmt='.0f', annot_kws={'fontsize': 10},
lw=0.6, yticklabels=[], cbar=False, ax=ax2)
ax2.set_ylabel('')
ax2.tick_params(axis='x', labelrotation=90)
plt.show()
plt.figure(figsize=(15, 15))
sns.heatmap(data, annot=True, cmap="YlGnBu", linewidths=.5)
Is that what you are looking for.
and also If you want to add range on the values you can use vmin, vmax parameters.

read_csv() doesn't work with StreamingBody object where python engine is required

I am encountering a problem while using pandas read_csv(). Data is being read in from s3 as StreamingBody object and noticed it worked only when c engine parser is used. (per pandas [documentation][1], skipfooter only works with python engine parser)
Did anyone encounter a similar issue before? Or any advice to solve this problem? Thanks
The following is how I re-produced this issue.
import boto3
import s3fs
import pandas as pd
s3 = boto3.client("s3")
response = s3.get_object(Bucket="df-raw-869771", Key="csv/customer-01.csv")
pd.read_csv(response["Body"])
customer_id|store_id|first_name|last_name|email|address_id|activebool|dw_insert_date|dw_update_date|active
0 9|2|MARGARET|MOORE|MARGARET.MOORE#sakilacustom...
1 13|2|KAREN|JACKSON|KAREN.JACKSON#sakilacustome...
2 17|1|DONNA|THOMPSON|DONNA.THOMPSON#sakilacusto...
3 21|1|MICHELLE|CLARK|MICHELLE.CLARK#sakilacusto...
4 25|1|DEBORAH|WALKER|DEBORAH.WALKER#sakilacusto...
... ...
1188 587|1|SERGIO|STANFIELD|SERGIO.STANFIELD#sakila...
1189 591|1|KENT|ARSENAULT|KENT.ARSENAULT#sakilacust...
1190 595|1|TERRENCE|GUNDERSON|TERRENCE.GUNDERSON#sa...
1191 599|2|AUSTIN|CINTRON|AUSTIN.CINTRON#sakilacust...
1192 4|2|BARBARA|JONES|BARBARA#sakilacustomer.org|8...
[1193 rows x 1 columns]
if passing in skipfooter argument
import boto3
import s3fs
import pandas as pd
s3 = boto3.client("s3")
response = s3.get_object(Bucket="df-raw-869771", Key="csv/customer-01.csv")
pd.read_csv(response["Body"], skipfooter=1)
__main__:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
99 117 115 116 111 109 101 114 95 105 100 124 115.1 116.1 111.1 ... 114.28 103.11 124.112 53.6 55.4 124.113 65.51 99.26 116.33 105.31 118.13 101.35 124.114 49.70 51.21
0 45 49 49 45 50 48 49 57 124 124 49 10 53 55 124 ... 69 83 84 124 69 68 78 65 46 87 69 83 84 64 115
1 97 107 105 108 97 99 117 115 116 111 109 101 114 46 111 ... 76 68 73 78 69 46 80 69 82 75 73 78 83 64 115
2 97 107 105 108 97 99 117 115 116 111 109 101 114 46 111 ... 73 76 76 73 65 77 83 79 78 64 115 97 107 105 108
3 97 99 117 115 116 111 109 101 114 46 111 114 103 124 50 ... 114 103 124 50 55 48 124 65 99 116 105 118 101 124 49
4 51 45 49 49 45 50 48 49 57 124 124 49 10 50 54 ... 115 97 107 105 108 97 99 117 115 116 111 109 101 114 46
.. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
84 99 116 105 118 101 124 49 51 45 49 49 45 50 48 49 ... 51 45 49 49 45 50 48 49 57 124 124 49 10 51 56
85 51 124 49 124 77 65 82 84 73 78 124 66 65 76 69 ... 51 53 124 50 124 82 73 67 75 89 124 83 72 69 76
86 66 89 124 82 73 67 75 89 46 83 72 69 76 66 89 ... 88 84 69 82 124 72 69 67 84 79 82 46 80 79 73
87 78 68 69 88 84 69 82 64 115 97 107 105 108 97 99 ... 103 124 53 52 53 124 65 99 116 105 118 101 124 49 51
88 45 49 49 45 50 48 49 57 124 124 49 10 53 52 51 ... 116 111 109 101 114 46 111 114 103 124 53 57 55 124 65
[89 rows x 1024 columns]

Grayscale image array rotation for data augmentation

I am trying to rotate some images for data augmentation to train a network for image segmentation task. After searching a lot, the best candidate for rotating each image and its corresponding mask was to use the scipy.ndimage.rotate function, but the problem with this is that after rotating the mask image numpy array ( which includes only 0 and 255 values for pixel values) the rotated mask has got all the values from 0 to 255 while I expect the mask array to have only 0 and 255 as its pixel values.
Here is the code:
from scipy.ndimage import rotate
import numpy as np
ample = dataset[1]
print(np.unique(sample['image']))
print(np.unique(sample['mask']))
print(sample['image'].shape)
print(sample['mask'].shape)
rot_image = rotate(sample['image'], 60, reshape = False)
rot_mask = rotate(sample['mask'], 60, reshape = False)
print(np.unique(rot_image))
print(np.unique(rot_mask))
print(rot_image.shape)
print(rot_mask.shape)
Here are the results:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 114 115 118 119 120 121 125 139]
[ 0 255]
(512, 512, 1)
(512, 512, 1)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 115 117 118 124 125 132 135]
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17 18 20
24 25 26 28 31 34 35 38 39 41 42 43 45 46 48 49 50 51
52 58 59 62 66 67 68 73 75 76 79 80 82 85 86 88 90 96
98 101 108 109 111 114 116 118 119 123 124 125 127 128 130 138 140 142
146 148 151 156 157 158 161 164 165 166 168 169 176 180 184 185 188 189
194 196 197 198 199 201 203 204 205 207 208 210 211 213 216 217 218 219
220 221 222 225 228 229 230 231 233 234 235 237 239 240 241 242 243 244
245 246 247 248 249 250 251 252 253 254 255]
(512, 512, 1)
(512, 512, 1)
It seems to be a simple problem to rotate image array, but I'm searching for days and I didn't find any solution to this problem. I am really confused how to prevent mask array values( 0 and 255) to take all values from 0 to 255 after rotation. I mean something like this:
x = np.unique(sample['mask'])
rot_mask = rotate(sample['mask'], 30, reshape = False)
x_rot = np.unique(rot_mask)
print(np.unique(x - x_rot))
[ 0]
Since you are using numpy arrays to represent images, why not using numpy functions? This library has all sorts of array manipulations. Try the rot90 function

re.findall seems to have side effects

I'm rather new to python. Interested in the imaging library (pillow). I wrote a program that collects data from images then writes it to a file. When retrieving the data from the file I use regular expressions (re.findall()) to find, and separate the data, then a print statement, to see the data in the terminal (Linux).
The part that creates and writes to the file is;
fout = open('pic_data', 'a')
fout.write(' '.join(map(str, FN1)))
fout.write('\n')
fout.write(' '.join(map(str, X1)))
fout.write('\n')
fout.close()
The data file contains:
/img-col/new-one/pic1_picture-file_1280.jpg
19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81
/img-col/new-one/pic2_picture-file_500.jpg
108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27
/img-col/new-one/pic3_picture-file_500.jpg
169 126 91 37 24 18 170 115 74 158 117 87 196 130 96 201 136 104 207 135 110 200 136 101 160 110 61 188 121 92 196 174 153 176 158 144
/img-col/new-one/pic4_picture-file_500.jpg
108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27
/img-col/new-one/pic5_picture-file_1280.jpg
19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81
The code that opens the file and prints it to the terminal is:
import re
from numpy import *
file = open('pic_data', 'r')
for line in file.readlines():
line = line.strip('\n')
#print line, type(line)
a1 = re.findall('^\/.*\.jpg', line, flags=0)
print a1, 'number 1 - a1'
a2 = re.findall('^\d.*\d$', line, flags=0)
print a2, 'number 2 - a2'
x = array(a2)
print x, 'number 3 - x'
This is where the misbehavior happens, when printing to the terminal there are empty holes in the data. This is what the output looks like:
grumpy#grumpy-desktop ~ $ python /home/grumpy/z-working-stuff/py-2.py
['/img-col/new-one/pic1_picture-file_1280.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81'] number 2 - a2
[ '19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81'] number 3 - x
['/img-col/new-one//img-col/new-one/pic2_picture-file_500.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27'] number 2 - a2
[ '108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27'] number 3 - x
['/img-col/new-one//img-col/new-one/pic3_picture-file_500.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['169 126 91 37 24 18 170 115 74 158 117 87 196 130 96 201 136 104 207 135 110 200 136 101 160 110 61 188 121 92 196 174 153 176 158 144'] number 2 - a2
[ '169 126 91 37 24 18 170 115 74 158 117 87 196 130 96 201 136 104 207 135 110 200 136 101 160 110 61 188 121 92 196 174 153 176 158 144'] number 3 - x
['/img-col/new-one//img-col/new-one/pic4_picture-file_500.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27'] number 2 - a2
[ '108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27'] number 3 - x
['/img-col/new-one//img-col/new-one/pic5_picture-file_1280.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81'] number 2 - a2
[ '19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81'] number 3 - x
grumpy#grumpy-desktop ~ $
Technically all the data is present, but with holes - the empty square brackets, these should not be empty.
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
The 'number # - X' is for debugging, let me know what should be there, etc.
The problem is I am running a coef (coef = corrcoef(x,y)) on the data and the module chocks on empty data sets. Where are these empty matches coming from, and how can I avoid or ignore them?

Array reshape not mapping correctly to numpy meshgrid

I have a long 121 element array where the data is stored in ascending order and I want to reshape to an 11x11 matrix and so I use the NumPy reshape command
Z = data.attributevalue[2,time,axial,:]
Z = np.reshape(Z, (int(math.sqrt(datacount)), int(math.sqrt(datacount))))
The data should be oriented in a Cartesian plane and I create the mesh grid with the following
x = np.arange(1.75, 12.5, 1)
y = np.arange(1.75, 12.5, 1)
X,Y = np.meshgrid(x, y)
The issue is that rows of Z are in the wrong order so the data in the last row of the matrix should be in the first and vice-versa. I want to rearrange so the rows are filled in the proper manner. The starting array Z is assembled in the following arrangement [datapoint #1, datapoint #2 ...., datapoint #N]. Datapoint #1 should be in the top left and the last point in the bottom right. Is there a simple way of accomplishing this or do I have to make a function to changed the order of the rows?
my plot statement is the following
surf = self.ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,
linewidth=1, antialiased=True)
***UPDATE****
I tried populating the initial array backwards and still no luck. I changed the orientation of the axis to the following
y = np.arrange(12.5,1,-1)
This flipped the data but my axis label is wrong so it is not a real solution to my issue. Any ideas?
It is possible that your original array does not look like a 1x121 array. The following code block shows how you reshape an array from 1x121 to 11x11.
import numpy as np
A = np.arange(1,122)
print A
print A.reshape((11,11))
Gives:
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121]
[[ 1 2 3 4 5 6 7 8 9 10 11]
[ 12 13 14 15 16 17 18 19 20 21 22]
[ 23 24 25 26 27 28 29 30 31 32 33]
[ 34 35 36 37 38 39 40 41 42 43 44]
[ 45 46 47 48 49 50 51 52 53 54 55]
[ 56 57 58 59 60 61 62 63 64 65 66]
[ 67 68 69 70 71 72 73 74 75 76 77]
[ 78 79 80 81 82 83 84 85 86 87 88]
[ 89 90 91 92 93 94 95 96 97 98 99]
[100 101 102 103 104 105 106 107 108 109 110]
[111 112 113 114 115 116 117 118 119 120 121]]

Categories

Resources