Having a table "tempcc" of value with x,y geografic coords (don't know attaching files here, there is 86 rows in my csv):
X Y Temp
0 35.268 55.618 1.065389
1 35.230 55.682 1.119160
2 35.508 55.690 1.026214
3 35.482 55.652 1.007834
4 35.289 55.664 1.087598
5 35.239 55.655 1.099459
6 35.345 55.662 1.066117
7 35.402 55.649 1.035958
8 35.506 55.643 0.991939
9 35.526 55.688 1.018137
10 35.541 55.695 1.017870
11 35.471 55.682 1.033929
12 35.573 55.668 0.985559
13 35.547 55.651 0.982335
14 35.425 55.671 1.042975
15 35.505 55.675 1.016236
16 35.600 55.681 0.985532
17 35.458 55.717 1.063691
18 35.538 55.720 1.037523
19 35.230 55.726 1.146047
20 35.606 55.707 1.003364
21 35.582 55.700 1.006711
22 35.350 55.696 1.087173
23 35.309 55.677 1.088988
24 35.563 55.687 1.003785
25 35.510 55.764 1.079220
26 35.334 55.736 1.119026
27 35.429 55.745 1.093300
28 35.366 55.752 1.119061
29 35.501 55.745 1.068676
.. ... ... ...
56 35.472 55.800 1.117183
57 35.538 55.855 1.134721
58 35.507 55.834 1.129712
59 35.256 55.845 1.211969
60 35.338 55.823 1.174397
61 35.404 55.835 1.162387
62 35.460 55.826 1.138965
63 35.497 55.831 1.130774
64 35.469 55.844 1.148516
65 35.371 55.510 0.945187
66 35.378 55.545 0.969400
67 35.456 55.502 0.902285
68 35.429 55.517 0.925932
69 35.367 55.710 1.090652
70 35.431 55.490 0.903296
71 35.284 55.606 1.051335
72 35.234 55.634 1.088135
73 35.284 55.591 1.041181
74 35.354 55.587 1.010446
75 35.332 55.581 1.015004
76 35.356 55.606 1.023234
77 35.311 55.545 0.997468
78 35.307 55.575 1.020845
79 35.363 55.645 1.047831
80 35.401 55.628 1.021373
81 35.340 55.629 1.045491
82 35.440 55.643 1.017227
83 35.293 55.630 1.063910
84 35.370 55.623 1.029797
85 35.238 55.601 1.065699
I try to create isolines with:
from numpy import meshgrid,linspace
data=tempcc
m = Basemap(lat_0 = np.mean(tempcc['Y'].values),\
lon_0 = np.mean(tempcc['X'].values),\
llcrnrlon=35,llcrnrlat=55.3, \
urcrnrlon=35.9, urcrnrlat=56.0, resolution='l')
x = linspace(m.llcrnrlon, m.urcrnrlon, data.shape[1])
y = linspace(m.llcrnrlat, m.urcrnrlat, data.shape[0])
xx, yy = meshgrid(x, y)
m.contour(xx, yy, data,latlon=True)
#pt.legend()
m.scatter(tempcc['X'].values, tempcc['Y'].values, latlon=True)
#m.contour(x,y,data,latlon=True)
But I can't manage correctly, although everything seems to be fine. As far as I understand I have to make a 2D matrix of values, where i is lat, and j is lon, but I can't find the example.
The result I get
as you see, region is correct, but interpolation is not good.
What's the matter? Which parameter have I forgotten?
You could use a Triangulation and then call tricontour() instead of contour()
import matplotlib.pyplot as plt
from matplotlib.tri import Triangulation
from mpl_toolkits.basemap import Basemap
import numpy
m = Basemap(lat_0 = np.mean(tempcc['Y'].values),
lon_0 = np.mean(tempcc['X'].values),
llcrnrlon=35,llcrnrlat=55.3,
urcrnrlon=35.9, urcrnrlat=56.0, resolution='l')
triMesh = Triangulation(tempcc['X'].values, tempcc['Y'].values)
tctr = m.tricontour(triMesh, tempcc['Temp'].values,
levels=numpy.linspace(min(tempcc['Temp'].values),
max(tempcc['Temp'].values), 7),
latlon=True)
Related
I'm working on a project involving railway tracks and I'm trying to find an algorithm that could detect curves(left/right) or straight lines based on time-series GPS coordinates.
The data contains latitude, longitude, and altitude values along with many different sensor readings of a vehicle in a specific range of time.
Example dataframe of a curve looks as follows:
latitude longitude altitude
1 43.46724 -5.823470 145.0
2 43.46726 -5.823653 145.2
3 43.46728 -5.823837 145.4
4 43.46730 -5.824022 145.6
5 43.46730 -5.824022 145.6
6 43.46734 -5.824394 146.0
7 43.46738 -5.824768 146.3
8 43.46738 -5.824768 146.3
9 43.46742 -5.825146 146.7
10 43.46742 -5.825146 146.7
11 43.46746 -5.825527 147.1
12 43.46746 -5.825527 147.1
13 43.46750 -5.825910 147.3
14 43.46751 -5.826103 147.4
15 43.46753 -5.826295 147.6
16 43.46753 -5.826489 147.8
17 43.46753 -5.826685 148.1
18 43.46753 -5.826878 148.2
19 43.46752 -5.827073 148.4
20 43.46750 -5.827266 148.6
21 43.46748 -5.827458 148.9
22 43.46744 -5.827650 149.2
23 43.46741 -5.827839 149.5
24 43.46736 -5.828029 149.7
25 43.46731 -5.828212 150.1
26 43.46726 -5.828393 150.4
27 43.46720 -5.828572 150.5
28 43.46713 -5.828746 150.8
29 43.46706 -5.828914 151.0
30 43.46698 -5.829078 151.2
31 43.46690 -5.829237 151.4
32 43.46681 -5.829392 151.6
33 43.46671 -5.829540 151.8
34 43.46661 -5.829680 152.0
35 43.46650 -5.829816 152.2
36 43.46639 -5.829945 152.4
37 43.46628 -5.830066 152.4
38 43.46616 -5.830180 152.4
39 43.46604 -5.830287 152.5
40 43.46591 -5.830384 152.6
41 43.46579 -5.830472 152.8
42 43.46566 -5.830552 152.9
43 43.46552 -5.830623 153.2
44 43.46539 -5.830687 153.4
45 43.46526 -5.830745 153.6
46 43.46512 -5.830795 153.8
47 43.46499 -5.830838 153.9
48 43.46485 -5.830871 153.9
49 43.46471 -5.830895 154.0
50 43.46458 -5.830911 154.2
51 43.46445 -5.830919 154.3
52 43.46432 -5.830914 154.7
53 43.46418 -5.830896 155.1
54 43.46406 -5.830874 155.6
55 43.46393 -5.830842 155.9
56 43.46381 -5.830803 156.0
57 43.46368 -5.830755 155.5
58 43.46356 -5.830700 155.3
59 43.46332 -5.830575 155.1
I've found out about spline interpolation on this old post asking the same question and decided to apply it in my problem:
from scipy.interpolate import make_interp_spline
## read csv file with pandas
df = pd.read_csv("Curvas/Curva_2.csv")
# take latitude and longitude columns
df['latitude'].fillna(method='ffill',inplace=True)
df['longitude'].fillna(method='ffill',inplace=True)
# plot the data
# df.plot(x='longitude', y='latitude', style='o')
# plt.show()
# using longitude and latitude data, use spline interpolation to create a new curve
x = df['longitude']
y = df['latitude']
xnew = np.linspace(x.min(), x.max(), x.shape[0])
ynew = make_interp_spline(xnew, y)(x)
plt.plot(xnew, ynew, zorder=2)
plt.show()
## Error results using different coordinates/routes
## Curve_1 → Left (e = 0.04818886515888465)
## Curve_2 → Left (e = 0.019459215874292113)
## Straight_1 → Straight (e = 0.03839597167971931)
I've calculated the error between the interpolated points and the real ones but I'm not quite sure how to proceed next or what threshold to use to figure out the direction.
What I tried was this:
import numpy as np
def test_random(nr_selections, n, prob):
selected = np.random.choice(n, size=nr_selections, replace= False, p = prob)
print(str(nr_selections) + ': ' + str(selected))
n = 100
prob = np.random.choice(100, n)
prob = prob / np.sum(prob) #only for demonstration purpose
for i in np.arange(10, 100, 10):
np.random.seed(123)
test_random(i, n, prob)
The result was:
10: [68 32 25 54 72 45 96 67 49 40]
20: [68 32 25 54 72 45 96 67 49 40 36 74 46 7 21 20 53 65 89 77]
30: [68 32 25 54 72 45 96 67 49 40 36 74 46 7 21 20 53 62 86 60 35 37 8 48
52 47 31 92 95 56]
40: ...
Contrary to my expectation and hope, the 30 numbers selected do not contain all of the 20 numbers. I also tried using numpy.random.default_rng, but only strayed further away from my desired output. I also simplified the original problem somewhat in the above example. Any help would be greatly appreciated. Thank you!
Edit for clarification: I do not want to generate all the sequences in one loop (like in the example above) but rather use the related sequences in different runs of the same program. (Ideally, without storing them somewhere)
I am new in using python.
I am trying to graph 2 variables in Y1 and Y2 (secondary y axis) , and the date in the x axis from a csv file.
I think my main problem is with converting the date in csv.
Moreover is it possible to save the 3 graphs according to the ID (A,B,C)... Thanks a lot.
I added the CSV file that I have and an image of the figure that i am looking for.
Thanks a lot for your advice
ID date Y1 Y2
A 40480 136 83
A 41234 173 23
A 41395 180 29
A 41458 124 60
A 41861 158 27
A 42441 152 26
A 43009 155 51
A 43198 154 38
B 40409 185 71
B 40612 156 36
B 40628 165 39
B 40989 139 77
B 41346 138 20
B 41558 132 85
B 41872 157 58
B 41992 120 91
B 42245 139 43
B 42397 131 34
B 42745 114 68
C 40711 110 68
C 40837 156 38
C 40946 110 63
C 41186 161 46
C 41243 187 20
C 41494 122 55
C 41970 103 19
C 42183 148 78
C 42247 115 33
C 42435 132 92
C 42720 187 43
C 43228 127 28
C 43426 183 45
Try the matplotlib library, if i understood right, it should work.
from mpl_toolkits import mplot3d
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')
Data for a three-dimensional line
zaxis = y1
xaxis = date
yaxis = y2
ax.plot3D(xaxis, yaxis, zaxis, 'red')
Data for three-dimensional scattered points
zdat = y1
xdat = date
ydat = y2
ax.scatter3D(xdat, ydat, zdat, c=xdat, cmap='Greens')
If I understand you correctly, you are looking for three separate graphs for ID=A, ID=B, ID=C. Here is how you could get that:
import pandas as pd
import pylab as plt
data = pd.read_csv('data.dat', sep='\t') # read your datafile, you might have a different name here
for i, (label, subset) in enumerate(data.groupby('ID')):
plt.subplot(131+i)
plt.plot(subset['date'], subset['Y1'])
plt.plot(subset['date'], subset['Y2'], 'o')
plt.title('ID: {}'.format(label))
plt.show()
Note that this treats your dates as integers (same as in the datafile).
I'm trying to return a list of list of vertical, horizontal and diagonal nearest neighbors of every item of a 2D numpy array
import numpy as np
import copy
tilemap = np.arange(99).reshape(11, 9)
print(tilemap)
def get_neighbor(pos, array):
x = copy.deepcopy(pos[0])
y = copy.deepcopy(pos[1])
grid = copy.deepcopy(array)
split = []
split.append([grid[y-1][x-1]])
split.append([grid[y-1][x]])
split.append([grid[y-1][x+1]])
split.append([grid[y][x - 1]])
split.append([grid[y][x+1]])
split.append([grid[y+1][x-1]])
split.append([grid[y+1][x]])
split.append([grid[y+1][x+1]])
print("\n Neighbors of ITEM[{}]\n {}".format(grid[y][x],split))
cordinates = [5, 6]
get_neighbor(pos=cordinates, array=tilemap)
i would want a list like this:
first item = 0
[[1],[12],[13],
[1,2], [12,24],[13,26],
[1,2,3], [12,24,36], [13,26,39]....
till it get to the boundaries completely then proceeds to second item = 1
and keeps adding to the list. if there is a neighbor above it should be add too..
MY RESULT
[[ 0 1 2 3 4 5 6 7 8]
[ 9 10 11 12 13 14 15 16 17]
[18 19 20 21 22 23 24 25 26]
[27 28 29 30 31 32 33 34 35]
[36 37 38 39 40 41 42 43 44]
[45 46 47 48 49 50 51 52 53]
[54 55 56 57 58 59 60 61 62]
[63 64 65 66 67 68 69 70 71]
[72 73 74 75 76 77 78 79 80]
[81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98]]
Neighbors of ITEM[59]
[[49], [50], [51], [58], [60], [67], [68], [69]]
Alright, what about a using a function like this? This takes the array, your target index, and the "radius" of the elements to be included.
def get_idx_adj(arr, idx, radius):
num_rows, num_cols = arr.shape
idx_row, idx_col = idx
slice_1 = np.s_[max(0, idx_row - radius):min(num_rows, idx_row + radius + 1)]
slice_2 = np.s_[max(0, idx_col - radius):min(num_cols, idx_col + radius + 1)]
return arr[slice_1, slice_2]
I'm currently trying to find the best way to transform the index of the element, so that the function can be used on its own output successively to get all the subarrays of various sizes.
I might be doing something wrong but I can't figure out what it is. I'm trying to reproduce some results from a real state dataset from Baton Rouge, LA. The original code is written in WinBUGS here. There are some minor differences between the dataset used in the link above and the one I'm using right now. However, I think that is not significant. This is the code:
import pymc as pm, pandas as pd, numpy as np
from scipy.spatial.distance import pdist, squareform
from numpy.linalg import inv
# Loading dataset
df = pd.read_table('http://pastebin.com/raw.php?i=41us4HVj', sep=' ')
# Setting priors
beta = pm.Normal('beta', 0.0, 0.1, size=3)
mu = pm.Lambda('mu', lambda b=beta:
b[0]+b[1]*df['LivingArea']/1000.0+b[2]*df['Age'])
tau = pm.Gamma('tau', 0.1, 0.1)
phi = pm.Uniform('phi', 0.1, 10)
# Trying to build a covariate matrix
A = squareform(pdist(np.array(zip(df['Latitude'], df['Longitude']))))
# Using the powered exponential to obtain a precision matrix
precision = pm.Lambda('exp', lambda u=A, tau=tau, phi=phi, kappa=1:
inv((1/tau)*np.exp(-(phi*u)**kappa)))
If I inspect the value of mu, I get this:
mu.value
Out[2]:
0 24.568272
1 2.909063
2 -2.778916
3 28.206696
4 -0.270571
5 -2.865153
6 14.158162
7 31.466438
8 44.681351
9 22.191397
10 -6.412350
11 11.709424
12 25.453254
13 24.366674
14 34.711048
...
55 24.625763
56 21.763089
57 65.108136
58 15.428714
59 20.992329
60 36.384037
61 16.730507
62 23.021763
63 54.887747
64 30.612696
65 52.685840
66 59.612372
67 18.822422
68 18.940658
69 72.678188
Length: 70, dtype: float64
However, after running MvNormal, the value of mu is changed:
w = pm.MvNormal('w', mu, precision)
mu.value
Out[4]:
0 -107.913779
1 -1.243466
2 8.283926
3 26.412651
4 1.806728
5 -1.300734
6 -80.657396
7 71.614343
8 -3.817774
9 -10.283683
10 -3.804962
11 8.639403
12 18.927553
13 -10.004095
14 -37.431770
...
55 88.612179
56 18.011459
57 -7.421157
58 7.974531
59 -3.697444
60 -17.520367
61 36.453531
62 -39.235745
63 -6.701737
64 68.672902
65 -44.040923
66 11.826075
67 -21.995198
68 -15.886362
69 4.653335
Length: 70, dtype: float64
By the way, this only happens to mu. The precision variable remains the same.
Did I make a mistake?
UPDATE:
Already filed an issue on GitHub. After further inspection, the culprit seems to be the pd.Series object that is used in the mu variable. If I convert or remove the Series, mu won't change after calling MvNormal.
Thanks!