how to nested boxplot groupBy - python

I have a dataset of more than 50 features that correspond to the specific movement during leg rehabilitation. I compare the group that used our rehabilitation device with the group recovering without using it. The group includes patients with 3 diagnoses and I want to compare boxplots of before (red boxplot) and after (blue boxplot) for each diagnosis.
This is the snippet I was using and the output I am getting.
Control group data:
dataKONTR
Row DG DKK ... LOS_DCL_LB LOS_DCL_L LOS_DCL_LF
0 Williams1 distorze 0.0 ... 63 57 78
1 Williams2 distorze 0.0 ... 91 68 67
2 Norton1 LCA 1.0 ... 58 90 64
3 Norton2 LCA 1.0 ... 29 91 87
4 Chavender1 distorze 1.0 ... 61 56 75
5 Chavender2 distorze 1.0 ... 54 74 80
6 Bendis1 distorze 1.0 ... 32 57 97
7 Bendis2 distorze 1.0 ... 55 69 79
8 Shawn1 AS 1.0 ... 15 74 75
9 Shawn2 AS 1.0 ... 67 86 79
10 Cichy1 LCA 0.0 ... 45 83 80
This is the snippet I was using and the output I am getting.
temp = "c:/Users/novos/ŠKOLA/Statistika/data Mariana/%s.xlsx"
dataKU = pd.read_excel(temp % "VestlabEXP_KU", engine = "openpyxl", skipfooter= 85) # patients using our rehabilitation tool
dataKONTR = pd.read_excel(temp % "VestlabEXP_kontr", engine = "openpyxl", skipfooter=51) # control group
dataKU_diag = dataKU.dropna()
dataKONTR_diag = dataKONTR.dropna()
dataKUBefore = dataKU_diag[dataKU_diag['Row'].str.contains("1")] # Patients data ending with 1 are before rehab
dataKUAfter = dataKU_diag[dataKU_diag['Row'].str.contains("2")] # Patients data ending with 2 are before rehab
dataKONTRBefore = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("1")]
dataKONTRAfter = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("2")]
b1 = dataKUBefore.boxplot(column=list(dataKUBefore.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='r', whiskers='r', medians='r', caps='r'),layout=(2,4),return_type='axes')
plt.ylim(0.5, 1.5)
plt.suptitle("")
plt.suptitle("Before, KU")
b2 = dataKUAfter.boxplot(column=list(dataKUAfter.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='b', whiskers='b', medians='b', caps='b'),layout=(2,4),return_type='axes')
# dataKUPredP
plt.suptitle("")
plt.suptitle("After, KU")
plt.ylim(0.5, 1.5)
plt.show()
Output is in two figures (red boxplot is all the "before rehab" data and blue boxplot is all the "after rehab")
Can you help me how make the red and blue boxplots next to each other for each diagnosis?
Thank you for any help.

You can try to concatenate the data into a single dataframe:
dataKUPlot = pd.concat({
'Before': dataKUBefore,
'After': dataKUAfter,
}, names=['When'])
You should see an additional index level named When in the output.
Using the example data you posted it looks like this:
>>> pd.concat({'Before': df, 'After': df}, names=['When'])
Row DG DKK ... LOS_DCL_LB LOS_DCL_L LOS_DCL_LF
When
Before 0 Williams1 distorze 0.0 ... 63 57 78
1 Williams2 distorze 0.0 ... 91 68 67
2 Norton1 LCA 1.0 ... 58 90 64
3 Norton2 LCA 1.0 ... 29 91 87
4 Chavender1 distorze 1.0 ... 61 56 75
After 0 Williams1 distorze 0.0 ... 63 57 78
1 Williams2 distorze 0.0 ... 91 68 67
2 Norton1 LCA 1.0 ... 58 90 64
3 Norton2 LCA 1.0 ... 29 91 87
4 Chavender1 distorze 1.0 ... 61 56 75
Then you can plot all of the boxes with a single command and thus on the same plots, by modifying the by grouper:
dataKUAfter.boxplot(column=dataKUPlot.filter(regex='LOS_RT').columns.to_list(), by=['DG', 'When'], rot = 45, layout=(2,4), return_type='axes')
I believe that’s the only “simple” way, I’m afraid that looks a little confused:
Any other way implies manual plotting with matplotlib − and thus better control. For example iterate on all desired columns:
fig, axes = plt.subplots(nrows=2, ncols=3, sharey=True, sharex=True)
pos = 1 + np.arange(max(dataKUBefore['DG'].nunique(), dataKUAfter['DG'].nunique()))
redboxes = {f'{x}props': dict(color='r') for x in ['box', 'whisker', 'median', 'cap']}
blueboxes = {f'{x}props': dict(color='b') for x in ['box', 'whisker', 'median', 'cap']}
ax_it = axes.flat
for colname, ax in zip(dataKUBefore.filter(regex='LOS_RT').columns, ax_it):
# Making a dataframe here to ensure the same ordering
show = pd.DataFrame({
'before': dataKUBefore[colname].groupby(dataKUBefore['DG']).agg(list),
'after': dataKUAfter[colname].groupby(dataKUAfter['DG']).agg(list),
})
ax.boxplot(show['before'].values, positions=pos - .15, **redboxes)
ax.boxplot(show['after'].values, positions=pos + .15, **blueboxes)
ax.set_xticks(pos)
ax.set_xticklabels(show.index, rotation=45)
ax.set_title(colname)
ax.grid(axis='both')
# Hide remaining axes:
for ax in ax_it:
ax.axis('off')

You could add a new column to separate 'Before' and 'After'. Seaborn's boxplots can use that new column as hue. sns.catplot(kind='box', ...) creates a grid of boxplots:
import seaborn as sns
import pandas as pd
import numpy as np
names = ['Adams', 'Arthur', 'Buchanan', 'Buren', 'Bush', 'Carter', 'Cleveland', 'Clinton', 'Coolidge', 'Eisenhower', 'Fillmore', 'Ford', 'Garfield', 'Grant', 'Harding', 'Harrison', 'Hayes', 'Hoover', 'Jackson', 'Jefferson', 'Johnson', 'Kennedy', 'Lincoln', 'Madison', 'McKinley', 'Monroe', 'Nixon', 'Obama', 'Pierce', 'Polk', 'Reagan', 'Roosevelt', 'Taft', 'Taylor', 'Truman', 'Trump', 'Tyler', 'Washington', 'Wilson']
rows = np.array([(name + '1', name + '2') for name in names]).flatten()
dataKONTR = pd.DataFrame({'Row': rows,
'DG': np.random.choice(['AS', 'Distorze', 'LCA'], len(rows)),
'LOS_RT_A': np.random.randint(15, 100, len(rows)),
'LOS_RT_B': np.random.randint(15, 100, len(rows)),
'LOS_RT_C': np.random.randint(15, 100, len(rows)),
'LOS_RT_D': np.random.randint(15, 100, len(rows)),
'LOS_RT_E': np.random.randint(15, 100, len(rows)),
'LOS_RT_F': np.random.randint(15, 100, len(rows))})
dataKONTR = dataKONTR.dropna()
dataKONTR['When'] = ['Before' if r[-1] == '1' else 'After' for r in dataKONTR['Row']]
cols = [c for c in dataKONTR.columns if 'LOS_RT' in c]
df_long = dataKONTR.melt(value_vars=cols, var_name='Which', value_name='Value', id_vars=['When', 'DG'])
g = sns.catplot(kind='box', data=df_long, x='DG', col='Which', col_wrap=3, y='Value', hue='When')
g.set_axis_labels('', '') # remove the x and y labels

Related

Seaborn figure with multiple axis (year) and month on x-axis

I try to become warm with seaborn. I want to create one or both of that figures (bar plot & line plot). You see 12 months on the X-axis and 3 years each one with its own line or bar color.
That is the data creating script including the data in comments.
#!/usr/bin/env python3
import random as rd
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
rd.seed(0)
a = pd.DataFrame({
'Y': [2016]*12 + [2017]*12 + [2018]*12,
'M': list(range(1, 13)) * 3,
'n': rd.choices(range(100), k=36)
})
print(a)
# Y M n
# 0 2016 1 84
# 1 2016 2 75
# 2 2016 3 42
# ...
# 21 2017 10 72
# 22 2017 11 89
# 23 2017 12 68
# 24 2018 1 47
# 25 2018 2 10
# ...
# 34 2018 11 54
# 35 2018 12 1
b = a.pivot_table(columns='M', index='Y')
print(b)
# n
# M 1 2 3 4 5 6 7 8 9 10 11 12
# Y
# 2016 84 75 42 25 51 40 78 30 47 58 90 50
# 2017 28 75 61 25 90 98 81 90 31 72 89 68
# 2018 47 10 43 61 91 96 47 86 26 80 54 1
I'm even not sure which form (a or b or something elese) of a dataframe I should use here.
What I tried
I assume in seaboarn speech it is a countplot() I want. Maybe I am wrong?
>>> sns.countplot(data=a)
<AxesSubplot:ylabel='count'>
>>> plt.show()
The result is senseless
I don't know how I could add the pivoted dataframe b to seaborn.
You could do the first plot with a relplot, using hue as a categorical grouping variable:
sns.relplot(data=a, x='M', y='n', hue='Y', kind='line')
I'd use these colour and size settings to make it more similar to the plot you wanted:
sns.relplot(data=a, x='M', y='n', hue='Y', kind='line', palette='pastel', height=3, aspect=3)
The equivalent axes-level code would be sns.lineplot(data=a, x='M', y='n', hue='Y', palette='pastel')
Your second can be done with catplot:
sns.catplot(kind='bar', data=a, x='M', y='n', hue='Y')
Or the axes-level function sns.barplot. In that case let's move the default legend location:
sns.barplot(data=a, x='M', y='n', hue='Y')
plt.legend(bbox_to_anchor=(1.05, 1))

How to detect curves and straight paths using GPS coordinates?

I'm working on a project involving railway tracks and I'm trying to find an algorithm that could detect curves(left/right) or straight lines based on time-series GPS coordinates.
The data contains latitude, longitude, and altitude values along with many different sensor readings of a vehicle in a specific range of time.
Example dataframe of a curve looks as follows:
latitude longitude altitude
1 43.46724 -5.823470 145.0
2 43.46726 -5.823653 145.2
3 43.46728 -5.823837 145.4
4 43.46730 -5.824022 145.6
5 43.46730 -5.824022 145.6
6 43.46734 -5.824394 146.0
7 43.46738 -5.824768 146.3
8 43.46738 -5.824768 146.3
9 43.46742 -5.825146 146.7
10 43.46742 -5.825146 146.7
11 43.46746 -5.825527 147.1
12 43.46746 -5.825527 147.1
13 43.46750 -5.825910 147.3
14 43.46751 -5.826103 147.4
15 43.46753 -5.826295 147.6
16 43.46753 -5.826489 147.8
17 43.46753 -5.826685 148.1
18 43.46753 -5.826878 148.2
19 43.46752 -5.827073 148.4
20 43.46750 -5.827266 148.6
21 43.46748 -5.827458 148.9
22 43.46744 -5.827650 149.2
23 43.46741 -5.827839 149.5
24 43.46736 -5.828029 149.7
25 43.46731 -5.828212 150.1
26 43.46726 -5.828393 150.4
27 43.46720 -5.828572 150.5
28 43.46713 -5.828746 150.8
29 43.46706 -5.828914 151.0
30 43.46698 -5.829078 151.2
31 43.46690 -5.829237 151.4
32 43.46681 -5.829392 151.6
33 43.46671 -5.829540 151.8
34 43.46661 -5.829680 152.0
35 43.46650 -5.829816 152.2
36 43.46639 -5.829945 152.4
37 43.46628 -5.830066 152.4
38 43.46616 -5.830180 152.4
39 43.46604 -5.830287 152.5
40 43.46591 -5.830384 152.6
41 43.46579 -5.830472 152.8
42 43.46566 -5.830552 152.9
43 43.46552 -5.830623 153.2
44 43.46539 -5.830687 153.4
45 43.46526 -5.830745 153.6
46 43.46512 -5.830795 153.8
47 43.46499 -5.830838 153.9
48 43.46485 -5.830871 153.9
49 43.46471 -5.830895 154.0
50 43.46458 -5.830911 154.2
51 43.46445 -5.830919 154.3
52 43.46432 -5.830914 154.7
53 43.46418 -5.830896 155.1
54 43.46406 -5.830874 155.6
55 43.46393 -5.830842 155.9
56 43.46381 -5.830803 156.0
57 43.46368 -5.830755 155.5
58 43.46356 -5.830700 155.3
59 43.46332 -5.830575 155.1
I've found out about spline interpolation on this old post asking the same question and decided to apply it in my problem:
from scipy.interpolate import make_interp_spline
## read csv file with pandas
df = pd.read_csv("Curvas/Curva_2.csv")
# take latitude and longitude columns
df['latitude'].fillna(method='ffill',inplace=True)
df['longitude'].fillna(method='ffill',inplace=True)
# plot the data
# df.plot(x='longitude', y='latitude', style='o')
# plt.show()
# using longitude and latitude data, use spline interpolation to create a new curve
x = df['longitude']
y = df['latitude']
xnew = np.linspace(x.min(), x.max(), x.shape[0])
ynew = make_interp_spline(xnew, y)(x)
plt.plot(xnew, ynew, zorder=2)
plt.show()
## Error results using different coordinates/routes
## Curve_1 → Left (e = 0.04818886515888465)
## Curve_2 → Left (e = 0.019459215874292113)
## Straight_1 → Straight (e = 0.03839597167971931)
I've calculated the error between the interpolated points and the real ones but I'm not quite sure how to proceed next or what threshold to use to figure out the direction.

how to do complex calculations in pandas dataframe

sample dataframe:
df = pd.DataFrame({'sales': ['2020-01','2020-02','2020-03','2020-04','2020-05','2020-06'],
'2020-01': [24,42,18,68,24,30],
'2020-02': [24,42,18,68,24,30],
'2020-03': [64,24,70,70,88,57],
'2020-04': [22,11,44,3,5,78],
'2020-05': [11,35,74,12,69,51]}
I want to find below df['L2']
I studied pandas rolling,groupby,etcs, cannot solve it.
please read L2 formula & givee me a opinion
L2 formula
L2(Jan-20) = 24
-------------------
sales 2020-01
0 2020-01 24
-------------------
L2(Feb-20) = 132 (sum of below matrix 2x2)
sales 2020-01 2020-02
0 2020-01 24 24
1 2020-02 42 42
-------------------
L2(Mar-20) = 154 (sum of matrix 2x2)
sales 2020-02 2020-03
0 2020-02 42 24
1 2020-03 18 70
-------------------
L2(Apr-20) = 187 (sum of below maxtrix 2x2)
sales 2020-03 2020-04
0 2020-03 70 44
1 2020-04 70 3
output
Unnamed: 0 sales Jan-20 Feb-20 Mar-20 Apr-20 May-20 L2 L3
0 0 Jan-20 24 24 64 22 11 24 24
1 1 Feb-20 42 42 24 11 35 132 132
2 2 Mar-20 18 18 70 44 74 154 326
3 3 Apr-20 68 68 70 3 12 187 350
4 4 May-20 24 24 88 5 69 89 545
5 5 Jun-20 30 30 57 78 51 203 433
Values=f.values[:,1:]
L2=[]
RANGE=Values.shape[0]
for a in range(RANGE):
if a==0:
result=Values[a,a]
else:
if Values[a-1:a+1,a-1:a+1].shape==(2,1):
result=np.sum(Values[a-1:a+1,a-2:a])
else:
result=np.sum(Values[a-1:a+1,a-1:a+1])
L2.append(result)
print(L2)
L2 output:-->[24, 132, 154, 187, 89, 203]
f["L2"]=L2
f:
import pandas as pd
import numpy as np
# make a dataset
df = pd.DataFrame({'sales': ['2020-01','2020-02','2020-03','2020-04','2020-05','2020-06'],
'2020-01': [24,42,18,68,24,30],
'2020-02': [24,42,18,68,24,30],
'2020-03': [64,24,70,70,88,57],
'2020-04': [22,11,44,3,5,78],
'2020-05': [11,35,74,12,69,51]})
print(df)
# datawork(L2)
for i in range(0,df.shape[0]):
if i==0:
df.loc[i,'L2']=df.loc[i,'2020-01']
else:
if i!=df.shape[0]-1:
df.loc[i,'L2']=df.iloc[i-1:i+1,i:i+2].sum().sum()
if i==df.shape[0]-1:
df.loc[i,'L2']=df.iloc[i-1:i+1,i-1:i+1].sum().sum()
print(df)
# sales 2020-01 2020-02 2020-03 2020-04 2020-05 L2
#0 2020-01 24 24 64 22 11 24.0
#1 2020-02 42 42 24 11 35 132.0
#2 2020-03 18 18 70 44 74 154.0
#3 2020-04 68 68 70 3 12 187.0
#4 2020-05 24 24 88 5 69 89.0
#5 2020-06 30 30 57 78 51 203.0
I tried another method.
this method uses reshape long(in python : melt), but I applyed reshape long twice in python because time frequency of sales and other columns in df is monthly and not daily, so I did reshape long one more time to make int column corresponding to monthly date.
(I have used Stata more often than python, in Stata, I can only do reshape long one time because it has monthly time frequency, and reshape task is much easier than that of pandas, python)
if you are interested, take a look
# 00.module
import pandas as pd
import numpy as np
from order import order # https://stackoverflow.com/a/68464246/16478699
# 0.make a dataset
df = pd.DataFrame({'sales': ['2020-01', '2020-02', '2020-03', '2020-04', '2020-05', '2020-06'],
'2020-01': [24, 42, 18, 68, 24, 30],
'2020-02': [24, 42, 18, 68, 24, 30],
'2020-03': [64, 24, 70, 70, 88, 57],
'2020-04': [22, 11, 44, 3, 5, 78],
'2020-05': [11, 35, 74, 12, 69, 51]}
)
df.to_stata('dataset.dta', version=119, write_index=False)
print(df)
# 1.reshape long(in python: melt)
t = list(df.columns)
t.remove('sales')
df_long = df.melt(id_vars='sales', value_vars=t, var_name='var', value_name='val')
df_long['id'] = list(range(1, df_long.shape[0] + 1)) # make id for another resape long
print(df_long)
# 2.another reshape long(in python: melt, reason: make int(col name: tid) corresponding to monthly date of sales and monthly columns in df)
df_long2 = df_long.melt(id_vars=['id', 'val'], value_vars=['sales', 'var'])
df_long2['tid'] = df_long2['value'].apply(lambda x: 1 + list(df_long2.value.unique()).index(x))
print(df_long2)
# 3.back to wide form with tid(in python: pd.pivot)
df_wide = pd.pivot(df_long2, index=['id', 'val'], columns='variable', values=['value', 'tid'])
df_wide.columns = df_wide.columns.map(lambda x: x[1] if x[0] == 'value' else f'{x[0]}_{x[1]}') # change multiindex columns name into just normal columns name
df_wide = df_wide.reset_index()
print(df_wide)
# 4.make values of L2
for i in df_wide.tid_sales.unique():
if list(df_wide.tid_sales.unique()).index(i) + 1 == len(df_wide.tid_sales.unique()):
df_wide.loc[df_wide['tid_sales'] == i, 'L2'] = df_wide.loc[(((df_wide['tid_sales'] == i) | (
df_wide['tid_sales'] == i - 1)) & ((df_wide['tid_var'] == i - 1) | (
df_wide['tid_var'] == i - 2))), 'val'].sum()
else:
df_wide.loc[df_wide['tid_sales'] == i, 'L2'] = df_wide.loc[(((df_wide['tid_sales'] == i) | (
df_wide['tid_sales'] == i - 1)) & ((df_wide['tid_var'] == i) | (
df_wide['tid_var'] == i - 1))), 'val'].sum()
print(df_wide)
# 5.back to shape of df with L2(reshape wide, in python: pd.pivot)
df_final = df_wide.drop(columns=df.filter(regex='^tid')) # no more columns starting with tid needed
df_final = pd.pivot(df_final, index=['sales', 'L2'], columns='var', values='val').reset_index()
df_final = order(df_final, 'L2', f_or_l='last') # order function is made by me
print(df_final)

Normalizing a color map for plotting a Confusion Matrix with ConfusionMatrixDisplay from Sklearn

I am trying to create a color map for my 10x10 confusion matrix that is provided by sklearn. I would like to be able to customize the color map to be normalized between [0,1] but I have had no success. I am trying to use ax_ and matplotlib.colors.Normalize but am struggling to get something to work since ConfusionMatrixDisplay is a sklearn object that creates a different than usual matplotlib plot.
My code is the following:
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
train_confuse_matrix = confusion_matrix(y_true = ytrain, y_pred = y_train_pred_labels)
print(train_confuse_matrix)
cm_display = ConfusionMatrixDisplay(train_confuse_matrix, display_labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
print(cm_display)
cm_display.plot(cmap = 'Greens')
plt.show()
plt.clf()
[[3289 56 84 18 55 7 83 61 48 252]
[ 2 3733 0 1 2 1 16 1 3 220]
[ 81 15 3365 64 81 64 273 18 6 17]
[ 17 37 71 3015 127 223 414 44 6 64]
[ 3 1 43 27 3659 24 225 35 0 3]
[ 5 23 38 334 138 3109 224 80 4 25]
[ 3 1 19 10 12 7 3946 1 1 5]
[ 4 7 38 69 154 53 89 3615 2 27]
[ 62 67 12 7 25 3 62 4 3595 153]
[ 2 30 1 2 4 0 15 2 0 3957]]
Let's try imshow and annotate manually:
accuracies = conf_mat/conf_mat.sum(1)
fig, ax = plt.subplots(figsize=(10,8))
cb = ax.imshow(accuracies, cmap='Greens')
plt.xticks(range(len(classes)), classes,rotation=90)
plt.yticks(range(len(classes)), classes)
for i in range(len(classes)):
for j in range(len(classes)):
color='green' if accuracies[i,j] < 0.5 else 'white'
ax.annotate(f'{conf_mat[i,j]}', (i,j),
color=color, va='center', ha='center')
plt.colorbar(cb, ax=ax)
plt.show()
Output:
I would comment on above great answer by #quang-hoang, but do not have enough reputation.
The annotation position needs to be swappeed to (j,i) since the output from imshow is transposed.
Code:
classes = ['A','B','C']
accuracies = np.random.random((3,3))
fig, ax = plt.subplots(figsize=(10,8))
cb = ax.imshow(accuracies, cmap='Greens')
plt.xticks(range(len(classes)), classes,rotation=90)
plt.yticks(range(len(classes)), classes)
for i in range(len(classes)):
for j in range(len(classes)):
color='green' if accuracies[i,j] < 0.5 else 'white'
ax.annotate(f'{accuracies[i,j]:.2f}', (j,i),
color=color, va='center', ha='center')
plt.colorbar(cb, ax=ax)
plt.show()
Output

How to create contours over points with Basemap?

Having a table "tempcc" of value with x,y geografic coords (don't know attaching files here, there is 86 rows in my csv):
X Y Temp
0 35.268 55.618 1.065389
1 35.230 55.682 1.119160
2 35.508 55.690 1.026214
3 35.482 55.652 1.007834
4 35.289 55.664 1.087598
5 35.239 55.655 1.099459
6 35.345 55.662 1.066117
7 35.402 55.649 1.035958
8 35.506 55.643 0.991939
9 35.526 55.688 1.018137
10 35.541 55.695 1.017870
11 35.471 55.682 1.033929
12 35.573 55.668 0.985559
13 35.547 55.651 0.982335
14 35.425 55.671 1.042975
15 35.505 55.675 1.016236
16 35.600 55.681 0.985532
17 35.458 55.717 1.063691
18 35.538 55.720 1.037523
19 35.230 55.726 1.146047
20 35.606 55.707 1.003364
21 35.582 55.700 1.006711
22 35.350 55.696 1.087173
23 35.309 55.677 1.088988
24 35.563 55.687 1.003785
25 35.510 55.764 1.079220
26 35.334 55.736 1.119026
27 35.429 55.745 1.093300
28 35.366 55.752 1.119061
29 35.501 55.745 1.068676
.. ... ... ...
56 35.472 55.800 1.117183
57 35.538 55.855 1.134721
58 35.507 55.834 1.129712
59 35.256 55.845 1.211969
60 35.338 55.823 1.174397
61 35.404 55.835 1.162387
62 35.460 55.826 1.138965
63 35.497 55.831 1.130774
64 35.469 55.844 1.148516
65 35.371 55.510 0.945187
66 35.378 55.545 0.969400
67 35.456 55.502 0.902285
68 35.429 55.517 0.925932
69 35.367 55.710 1.090652
70 35.431 55.490 0.903296
71 35.284 55.606 1.051335
72 35.234 55.634 1.088135
73 35.284 55.591 1.041181
74 35.354 55.587 1.010446
75 35.332 55.581 1.015004
76 35.356 55.606 1.023234
77 35.311 55.545 0.997468
78 35.307 55.575 1.020845
79 35.363 55.645 1.047831
80 35.401 55.628 1.021373
81 35.340 55.629 1.045491
82 35.440 55.643 1.017227
83 35.293 55.630 1.063910
84 35.370 55.623 1.029797
85 35.238 55.601 1.065699
I try to create isolines with:
from numpy import meshgrid,linspace
data=tempcc
m = Basemap(lat_0 = np.mean(tempcc['Y'].values),\
lon_0 = np.mean(tempcc['X'].values),\
llcrnrlon=35,llcrnrlat=55.3, \
urcrnrlon=35.9, urcrnrlat=56.0, resolution='l')
x = linspace(m.llcrnrlon, m.urcrnrlon, data.shape[1])
y = linspace(m.llcrnrlat, m.urcrnrlat, data.shape[0])
xx, yy = meshgrid(x, y)
m.contour(xx, yy, data,latlon=True)
#pt.legend()
m.scatter(tempcc['X'].values, tempcc['Y'].values, latlon=True)
#m.contour(x,y,data,latlon=True)
But I can't manage correctly, although everything seems to be fine. As far as I understand I have to make a 2D matrix of values, where i is lat, and j is lon, but I can't find the example.
The result I get
as you see, region is correct, but interpolation is not good.
What's the matter? Which parameter have I forgotten?
You could use a Triangulation and then call tricontour() instead of contour()
import matplotlib.pyplot as plt
from matplotlib.tri import Triangulation
from mpl_toolkits.basemap import Basemap
import numpy
m = Basemap(lat_0 = np.mean(tempcc['Y'].values),
lon_0 = np.mean(tempcc['X'].values),
llcrnrlon=35,llcrnrlat=55.3,
urcrnrlon=35.9, urcrnrlat=56.0, resolution='l')
triMesh = Triangulation(tempcc['X'].values, tempcc['Y'].values)
tctr = m.tricontour(triMesh, tempcc['Temp'].values,
levels=numpy.linspace(min(tempcc['Temp'].values),
max(tempcc['Temp'].values), 7),
latlon=True)

Categories

Resources