Related
I'm trying to convert a numpy array into a new array by using each value in the existing array and finding its corresponding key from a dictionary. The new array should consist of the corresponding dictionary keys.
Here is what I have:
# dictionary where values are lists
available_weights = {0.009174311926605505: [7, 14, 21, 25, 31, 32, 35, 45, 52, 82, 83, 96, 112, 119, 142], 0.009523809523809525: [33, 37, 43, 44, 69, 73, 75, 78, 79, 80, 102, 104, 110, 115, 150], 0.1111111111111111: [91], 0.019230769230769232: [36, 50, 127, 139], 0.010869565217391304: [10, 48, 55, 62, 77, 88, 103, 124, 131, 137, 147], 0.014084507042253521: [2, 3, 4, 22, 27, 30, 41, 53, 87, 122, 123, 132, 143], 0.011494252873563218: [20, 34, 99, 125, 135, 138, 141], 0.045454545454545456: [0, 109], 0.01818181818181818: [49, 64, 72, 90, 146, 148], 0.07142857142857142: [106], 0.01282051282051282: [16, 63, 68, 98, 114, 130, 145], 0.010638297872340425: [8, 28, 40, 57, 61, 66, 71, 74, 76, 84, 85, 86, 128, 144], 0.02040816326530612: [6, 65], 0.021739130434782608: [29, 67, 92, 93], 0.02127659574468085: [47, 118, 120], 0.011111111111111112: [1, 13, 19, 24, 42, 54, 70, 89, 94, 107, 117, 126, 129, 140], 0.015625: [38, 60, 101, 133, 134, 136], 0.03333333333333333: [56, 58, 97, 121], 0.016666666666666666: [5, 26, 105, 113], 0.014705882352941176: [17, 46, 95]}
# existing numpy array
train_idx = [134, 45, 137, 140, 79, 98, 128, 80, 99, 71, 145, 35, 94, 122, 77, 23, 113, 44, 68, 21, 20, 125, 74, 139, 29, 109, 25, 34, 6, 81, 22, 114, 12, 95, 150, 106, 84, 19, 58, 59, 88, 143, 136, 43, 72, 132, 117, 13, 65, 111, 39, 14, 56, 11, 26, 90, 119, 112, 27, 57, 46, 147, 123, 16, 36, 100, 141, 38, 62, 32, 75, 146, 89, 37, 31, 40, 64, 87, 3, 103, 102, 104, 78, 53, 1, 142, 47, 130, 105, 4, 93, 52, 42, 10, 9, 115, 76, 54, 49, 116, 69, 5, 86, 66, 101, 107, 96, 110, 8, 73, 121, 138, 67, 124, 108, 97, 120, 2, 148, 127, 135, 18, 149, 82, 41, 144, 129, 118, 51, 126, 33, 85, 24, 0, 61, 92, 70, 15, 17, 50, 83, 30, 28, 91, 60, 48, 133, 55, 63, 7, 131]
So I want to use each value in train_idx to find the corresponding dictionary key in available_weights. The expected output should look like this (with a length of all 150 values):
new_array = [0.015625, 0.009174311926605505, 0.010869565217391304, ... ,0.01282051282051282, 0.009174311926605505, 0.010869565217391304]
Any help would be appreciated!
result = []
flipped = dict()
for value in train_idx:
flipped[value] = []
for key in available_weights:
if value in available_weights[key]:
flipped[value].append(key)
result.append(key)
The Clark-Evans index is one of the most basic statistics to measure point aggregation in spatial analysis. However, I can't find any implementation in Python. So I adapted the R code from the hyperlink above. I want to ask if the statistic and p-value are correct with such irregular study areas:
Function
import numpy as np
import os, math
from shapely.geometry import Polygon, Point
from sklearn.neighbors import KDTree
from statistics import NormalDist
def clarkEvans (X, Y, roi):
""" Clark evans index takes point x,y coordinates and a polygon for cell shape (roi) and outputs Clark-Evans index:
R ~1 suggests spatial randomness, while R<<1 suggests clustering and R>>1 suggests ordering"""
# Import cell boundries from roi file
pgon = Polygon(roi)
# Calculate intensity from points/area
areaW = pgon.area
npts = len(X)
intensity = npts/areaW
if npts <2:
return(np.nan)
tmp_df = list(zip(X, Y))
# Get nearest neighbours for each observation
kdt = KDTree(tmp_df, leaf_size=30, metric='euclidean') # This is very good for large datasets, but maybe bad for small ones?
dists, ids = kdt.query(tmp_df, k=2)
dists = [x[1] for x in dists]
# Clark-Evans Index (mean NN distances/mean NN distances under poisson)
Dobs = np.mean(dists)
Dpoison = 1/(2 * math.sqrt(intensity))
Rnaive = Dobs/Dpoison
# Calculate p-value under normal distribution
SE = math.sqrt(((4 - math.pi) * areaW)/(4 * math.pi))/npts
Z = (Dobs - Dpoison)/SE
# Diff between observed and expected NN distances should have Normal Distribution according to Central Limit Theorem (CLT)
p_val = NormalDist().pdf(Z) # p_val for clustering
# Return the ClarkEvans Index and p-value
return(round(Rnaive,3), round(p_val, 3))
Output
In the image is my Clark-Evans Index being applied to and plotted with two different datasets. The index is similar for both patterns, one of which seems more obviously clustered. The p-values seem switched, I would think the second plot would have a significant p-val, being the clustered one.
Input data
# The xy coordinates of observations plus the point vertices of the study area (roi)
x1 = [123, 105, 71, 109, 96, 49, 86, 80, 120, 98, 59, 100, 118, 69, 84, 21, 95, 77, 158, 118, 87, 77, 87, 77, 82, 106, 120, 125, 61, 24, 53, 106, 52, 103, 89, 99, 111, 58, 97, 83, 51, 45, 64, 112, 114, 73, 55, 111, 110, 102, 116, 107, 84, 97, 118, 96, 116, 45, 102, 145, 126, 50, 103, 98, 20, 79, 113, 99, 90, 143, 36, 120, 106, 91, 95, 15, 122, 69, 28, 71, 66, 119, 78, 75, 113, 44, 85, 60, 88, 68, 116, 40, 59, 105, 65, 94, 79, 95, 120, 67, 78, 59, 89, 84, 111, 78, 72, 156, 162, 134, 157, 120, 126, 86, 58, 137, 32, 91, 68, 119, 112, 70, 120, 62, 118, 114, 66, 55, 99, 72, 91, 109, 53, 94, 71, 145, 146, 106, 15, 83, 104, 61, 129, 51, 58, 59, 113, 107, 94, 94, 69, 118, 74, 124, 107, 99, 66, 115, 159, 71, 115, 122, 76, 68, 79, 107, 81, 104, 87, 106, 105, 112, 111, 79, 54, 108, 62, 115, 36, 74, 84, 75, 64, 92, 64, 82, 77, 56, 75, 69, 88, 105, 96, 61, 84, 106, 31, 53, 173, 102, 99, 124, 87, 70, 25, 19, 122, 101, 126, 60, 94, 78, 97, 64, 45, 92, 114, 87, 96, 160, 88, 66, 40, 124, 103, 60, 129, 120, 35, 95, 56, 76, 116, 65, 7, 103, 160, 63, 134, 101, 56, 50, 89, 92, 99, 89, 120, 47, 58, 47, 74, 124, 8, 93, 121, 53, 66, 63, 90, 114, 91, 71, 123, 55, 142, 97, 69, 141, 92, 76, 69, 74, 66, 90, 81, 96, 110, 61, 58, 62, 50, 125, 106, 115, 79, 94, 118, 117, 64, 99, 55, 53, 93, 57, 116, 61, 125, 10, 119, 74, 64, 77, 127, 115, 59, 53, 99, 81, 68, 101, 43, 122, 129, 109, 108, 84, 103, 59, 105, 76, 122, 101, 101, 108, 79, 75, 60, 111, 97, 104, 82, 67, 96, 70, 96, 104, 103, 66, 89, 114, 121, 119, 104, 93, 156, 108, 88, 98, 52, 112, 65, 99, 107, 90, 107, 115, 73, 106, 100, 120, 128, 66, 116, 69, 113, 69, 103, 62, 124, 110, 124, 72, 76, 115, 73, 84, 95, 100, 51, 61, 82, 97, 106, 68, 112, 69, 115, 67, 80, 72, 63, 123, 92, 101, 61, 69, 103, 112, 70, 59, 91, 90, 102, 111, 41, 101, 90, 33, 122, 161, 161]
y1 = [37, 51, 35, 67, 94, 114, 62, 24, 64, 92, 55, 11, 74, 38, 79, 77, 90, 77, 70, 70, 41, 46, 81, 83, 81, 65, 63, 43, 56, 95, 26, 8, 68, 82, 44, 78, 77, 72, 45, 68, 83, 99, 100, 58, 91, 89, 115, 34, 46, 68, 79, 71, 41, 43, 48, 83, 67, 69, 42, 55, 63, 69, 47, 67, 102, 72, 33, 77, 67, 1, 123, 59, 69, 47, 73, 79, 89, 48, 55, 97, 56, 92, 121, 70, 48, 47, 114, 62, 84, 78, 54, 55, 79, 76, 62, 63, 83, 71, 74, 83, 50, 67, 84, 81, 75, 59, 12, 77, 97, 6, 26, 55, 10, 74, 58, 59, 77, 76, 77, 68, 60, 50, 53, 89, 76, 87, 67, 86, 86, 73, 79, 74, 62, 54, 67, 58, 23, 76, 95, 63, 38, 76, 117, 18, 52, 46, 98, 62, 44, 36, 86, 52, 74, 51, 85, 100, 75, 73, 63, 38, 64, 91, 47, 70, 77, 88, 70, 88, 88, 39, 52, 45, 79, 56, 74, 60, 59, 69, 116, 44, 55, 48, 70, 83, 66, 87, 78, 73, 58, 76, 46, 50, 43, 81, 102, 45, 115, 88, 80, 34, 55, 55, 97, 103, 112, 122, 111, 97, 90, 81, 22, 36, 87, 86, 48, 39, 42, 83, 57, 16, 100, 89, 115, 75, 69, 86, 69, 69, 74, 39, 52, 23, 63, 49, 92, 96, 71, 105, 10, 75, 84, 80, 30, 30, 59, 52, 32, 119, 107, 74, 79, 101, 106, 99, 77, 66, 89, 83, 102, 94, 97, 78, 91, 93, 16, 11, 33, 16, 78, 50, 30, 26, 79, 34, 32, 86, 64, 40, 63, 51, 58, 52, 92, 98, 35, 36, 34, 47, 86, 88, 60, 80, 92, 96, 94, 94, 98, 111, 49, 54, 56, 36, 72, 94, 92, 102, 105, 32, 40, 30, 73, 59, 107, 39, 46, 40, 53, 57, 93, 92, 63, 59, 65, 68, 81, 69, 56, 53, 53, 85, 56, 55, 93, 45, 40, 68, 101, 93, 29, 44, 93, 93, 46, 67, 38, 34, 97, 93, 72, 90, 62, 68, 32, 31, 74, 71, 59, 38, 51, 95, 73, 82, 5, 53, 50, 34, 49, 43, 82, 77, 65, 88, 87, 89, 30, 38, 45, 36, 79, 89, 88, 100, 98, 45, 41, 20, 35, 51, 77, 64, 60, 63, 33, 44, 78, 82, 83, 70, 74, 78, 41, 61, 71, 40, 124, 82, 67, 121, 5, 65, 66]
roi1 = [[152.5078125, 3.7060546875], [158.8408203125, 12.455078125], [165.5126953125, 25.3154296875], [170.796875, 38.787109375], [171.013671875, 46.02734375], [172.6083984375, 53.0615234375], [172.6083984375, 63.9306640625], [174.419921875, 70.9169921875], [174.419921875, 85.41015625], [175.947265625, 92.4296875], [175.7998046875, 103.2626953125], [169.52734375, 116.3212890625], [166.9765625, 118.89453125], [159.7451171875, 119.2177734375], [152.7265625, 121.01953125], [138.2333984375, 121.029296875], [131.21875, 122.8408203125], [73.248046875, 122.8408203125], [66.2119140625, 124.5546875], [58.966796875, 124.65234375], [51.9990234375, 126.4638671875], [23.013671875, 126.4638671875], [19.42578125, 125.958984375], [16.5361328125, 123.7734375], [10.20703125, 115.0283203125], [0.5068359375, 95.57421875], [0.5537109375, 91.951171875], [9.0869140625, 80.318359375], [12.552734375, 73.9599609375], [18.884765625, 65.2119140625], [25.89453125, 56.994140625], [35.611328125, 41.7626953125], [42.345703125, 33.296875], [45.7568359375, 26.90625], [53.634765625, 14.7744140625], [58.1103515625, 9.078125], [64.916015625, 6.8984375], [86.654296875, 6.8984375], [93.6904296875, 5.291015625], [100.89453125, 4.57421875], [104.2763671875, 3.275390625], [122.3837890625, 3.025390625], [129.3935546875, 1.4638671875], [143.8857421875, 1.4638671875], [147.376953125, 0.4931640625]]
clarkEvans (x1, y1, roi1)
x2 = [94, 111, 79, 95, 86, 46, 30, 34, 53, 17, 44, 20, 42, 56, 23, 21, 50, 16, 50, 52, 47, 132, 44, 40, 43, 33, 29, 52, 24, 125, 86, 84]
y2 = [17, 71, 94, 88, 108, 132, 116, 115, 121, 132, 120, 121, 123, 116, 116, 139, 121, 124, 116, 140, 141, 33, 119, 118, 125, 130, 123, 122, 40, 23, 80, 107]
roi2 = [[129.4560546875, 3.6552734375], [132.3408203125, 5.84765625], [134.4638671875, 12.7744140625], [134.4638671875, 45.3828125], [132.65234375, 56.0302734375], [132.65234375, 66.8994140625], [131.4169921875, 70.3056640625], [130.7021484375, 77.5029296875], [129.029296875, 84.5419921875], [129.029296875, 88.1650390625], [127.2177734375, 95.1728515625], [127.16796875, 106.04296875], [125.40625, 113.0712890625], [125.40625, 116.6943359375], [123.896484375, 119.9873046875], [120.6533203125, 121.6025390625], [110.0654296875, 123.9130859375], [99.6181640625, 126.8427734375], [89.896484375, 131.7041015625], [83.8388671875, 135.638671875], [77.03515625, 138.134765625], [73.4228515625, 138.40625], [56.181640625, 143.8408203125], [45.568359375, 142.029296875], [31.076171875, 142.029296875], [27.4736328125, 141.6455078125], [20.626953125, 139.302734375], [15.5029296875, 134.1787109375], [11.0546875, 128.5234375], [4.537109375, 115.5849609375], [0.40625, 98.0078125], [0.513671875, 72.646484375], [4.927734375, 55.0859375], [11.4091796875, 42.123046875], [15.333984375, 36.0859375], [21.94921875, 27.4912109375], [34.7587890625, 14.6806640625], [43.416015625, 8.205078125], [57.9013671875, 7.970703125], [61.28125, 6.666015625], [71.9052734375, 4.541015625], [82.7705078125, 4.34765625], [89.7578125, 2.5361328125], [107.873046875, 2.5361328125], [114.8916015625, 1.015625], [122.126953125, 0.724609375]]
clarkEvans (x2, y2, roi2)
Using the original R function yields similar but not equal results:
clarkevans.test(X, alternative = "clustered")
>R= 0.87719, p-value = 9.542e-07 # First dataset
>R= 0.83365, p-value = 0.03591 # Second dataset
I'm not sure if the statistic and p-value calculation are valid since my study areas are irregularly shaped. The variable SE is calculated with pi, which seems like it is estimating a random distribution in a circular study area. Should I do Monte Carlo simulations instead? Is there a way of avoiding that?
Cheers!
I have not worked with the Clark-Evans (CE) index before, but having read the information you linked to and studied your code, my interpretation is this:
The index value for Dataset2 is less than the index value for Dataset1. This correctly reflects the visual difference in clusteredness, that is, the smaller index value is associated with data that is more clustered.
It is probably not meaningful to say that two CE index values are similar, other than special cases like observing that two CE index values are both smaller than 1 or both greater than 1, or if A < B < C then AB are more similar than AC.
The p-value and the index value measure different things. The index value measures degree of clusteredness (if less than 1) or regularity (if greater than 1). The p-value (inversely) measures how certain it is that the data are more clustered than would be expected by chance, or more regular than would be expected by chance. The p-value in particular is sensitive to the sample size as well as the distribution of points.
The use of pi in calculating SE reflects the assumption of Euclidean distances between points (rather than, say, city block distances). That is, the nearest neighbour of a point is the one at the smallest radial distance. The use of pi in calculating SE does not make any assumptions about the shape of the region of interest.
Particularly for small datasets (like Dataset2) you will want to track down information about the potential impact of boundary effects on the index value or the p-value.
More speculatively, I wonder if it would be useful to use a convex hull to help determine the region of interest rather than do this subjectively.
I wanted to write a simple script for create a matrix 10x10 with all the numbers from 1 to 99
appending to a list group of 10 element at each time.
The result i expected was list[[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18,19],...]
But the output is very strange:
here's the script:
lista=[]
lista2=[]
z=0
for a in range (10):
lista2.clear()
for x in range (10):
lista2.append(z)
z+=1
print(lista2)
lista.append(lista2)
print(lista)
here's the output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[[20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]]
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
[[30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]]
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
[[40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]]
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
[[50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]
[[60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69]]
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
[[70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79]]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]
[[80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89]]
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
[[90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
I added print(lista2) to check that there were only 10 elements each time in it.
lista = []
z = 0
for i in range(10):
lista2 = []
for j in range(10):
lista2.append(z)
z += 1
lista.append(lista2)
print(lista)
or just:
lista = [[10*i+j for j in range(10)] for i in range(10)]
print(lista)
Python represents escape sequences with \ as I understand. So if I tryo to insert a single backslash into a string, I get the string variable with double backslashes as below:
x = '/x91/x84/xa4/x74'
b = x.replace(r'/', '\\')
>>> b
'\\x91\\x84\\xa4\\x74'
But then If I have two bytes objects - one with single backslash and another with double backslashes, and give them each to pandas.read_msgpack() function, why does it give different outputs in each case? Please see what I have tried below:
byte_obj1 = b'\x91\x84\xa4\x74\x69\x6d\x65\x92\xcb\x41\xdd\xcd\x65\x00\x00\x00\x00\xcb\x41\xdd\xcd\x65\x00\x00\xa3\xd7\xa4\x76\x61\x72\x30\x92\xcb\x40\x49\x0c\xcc\xcc\xcc\xcc\xcd\xcb\x40\x49\x0c\xcc\xcc\xcc\xcc\xcd\xa4\x76\x61\x72\x31\x92\xcb\xff\xf8\x00\x00\x00\x00\x00\x00\xcb\x40\x4e\x0c\xcc\xcc\xcc\xcc\xcd\xa4\x76\x61\x72\x32\x92\xcb\xff\xf8\x00\x00\x00\x00\x00\x00\xcb\xff\xf8\x00\x00\x00\x00\x00\x00'
d1=pandas.read_msgpack(byte_obj1)
>>> d1
({'time': (2000000000.0, 2000000000.01), 'var0': (50.1, 50.1), 'var1': (nan, 60.1), 'var2': (nan, nan)},)
byte_obj2=
b'\\x91\\x84\\xa4\\x74\\x69\\x6d\\x65\\x92\\xcb\\x41\\xdd\\xcd\\x65\\x00\\x00\\x00\\x00\\xcb\\x41\\xdd\\xcd\\x65\\x00\\x00\\xa3\\xd7\\xa4\\x76\\x61\\x72\\x30\\x92\\xcb\\x40\\x49\\x0c\\xcc\\xcc\\xcc\\xcc\\xcd\\xcb\\x40\\x49\\x0c\\xcc\\xcc\\xcc\\xcc\\xcd\\xa4\\x76\\x61\\x72\\x31\\x92\\xcb\\xff\\xf8\\x00\\x00\\x00\\x00\\x00\\x00\\xcb\\x40\\x4e\\x0c\\xcc\\xcc\\xcc\\xcc\\xcd\\xa4\\x76\\x61\\x72\\x32\\x92\\xcb\\xff\\xf8\\x00\\x00\\x00\\x00\\x00\\x00\\xcb\\xff\\xf8\\x00\\x00\\x00\\x00\\x00\\x00'
d2=pandas.read_msgpack(byte_obj2)
>>> d2
[92, 120, 57, 49, 92, 120, 56, 52, 92, 120, 97, 52, 92, 120, 55, 52, 92, 120, 54, 57, 92, 120, 54, 100, 92, 120, 54, 53, 92, 120, 57, 50, 92, 120, 99, 98, 92, 120, 52, 49, 92, 120, 100, 100, 92, 120, 99, 100, 92, 120, 54, 53, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 99, 98, 92, 120, 52, 49, 92, 120, 100, 100, 92, 120, 99, 100, 92, 120, 54, 53, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 97, 51, 92, 120, 100, 55, 92, 120, 97, 52, 92, 120, 55, 54, 92, 120, 54, 49, 92, 120, 55, 50, 92, 120, 51, 48, 92, 120, 57, 50, 92, 120, 99, 98, 92, 120, 52, 48, 92, 120, 52, 57, 92, 120, 48, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 100, 92, 120, 99, 98, 92, 120, 52, 48, 92, 120, 52, 57, 92, 120, 48, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 100, 92, 120, 97, 52, 92, 120, 55, 54, 92, 120, 54, 49, 92, 120, 55, 50, 92, 120, 51, 49, 92, 120, 57, 50, 92, 120, 99, 98, 92, 120, 102, 102, 92, 120, 102, 56, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 99, 98, 92, 120, 52, 48, 92, 120, 52, 101, 92, 120, 48, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 99, 92, 120, 99, 100, 92, 120, 97, 52, 92, 120, 55, 54, 92, 120, 54, 49, 92, 120, 55, 50, 92, 120, 51, 50, 92, 120, 57, 50, 92, 120, 99, 98, 92, 120, 102, 102, 92, 120, 102, 56, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 99, 98, 92, 120, 102, 102, 92, 120, 102, 56, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48, 92, 120, 48, 48]
Why does Python not consider double backslahes and '\' same as in case of escape
sequence? Could someone please help me in this dilemma. Thank you very much in advance.
In your initial setting, you wrote x = '/x91/x84/xa4/x74'. These are forward slashes and not backward slashes. Backward slashes in python are escape characters, so the first backslash in a double backslash functions as an escape character for the second backslash.
I'm trying to to use np.polyfit to fit a fairly simple dataset, but it's off by a fairly large margin:
And the code:
import numpy as np
import matplotlib as plt
fit = np.polyfit(xvals, yvals, 1)
f = np.poly1d(fit)
plt.scatter(xvals, yvals, color="blue", label="input")
plt.scatter(xvals, f(yvals), color="red", label="fit")
plt.legend()
What am I doing wrong? How can I improve the fit?
The original data:
xvals = array([ 0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 14,
15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 27, 28, 29,
30, 31, 32, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44,
45, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 60,
61, 62, 63, 64, 65, 67, 68, 69, 70, 71, 72, 74, 75,
76, 77, 78, 80, 81, 82, 83, 84, 85, 87, 88, 89, 90,
91, 92, 94, 95, 96, 97, 98, 100])
yvals = array([ 0, 3, 5, 8, 10, 12, 15, 17, 19, 21, 23, 25, 27,
28, 30, 32, 33, 35, 36, 37, 39, 40, 41, 43, 44, 45,
46, 47, 48, 49, 50, 51, 52, 53, 54, 54, 55, 56, 57,
58, 58, 59, 60, 61, 61, 62, 63, 63, 64, 65, 66, 66,
67, 67, 68, 69, 70, 70, 71, 72, 73, 73, 74, 75, 76,
77, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89,
90, 91, 92, 94, 95, 97, 98, 100])
You need f(xvals) not f(yvals). But of course you can do much better for this data we a higher order polynomial. E.g.,
import numpy as np
import matplotlib.pyplot as plt
xvals = np.array([ 0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 14,
15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 27, 28, 29,
30, 31, 32, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44,
45, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 60,
61, 62, 63, 64, 65, 67, 68, 69, 70, 71, 72, 74, 75,
76, 77, 78, 80, 81, 82, 83, 84, 85, 87, 88, 89, 90,
91, 92, 94, 95, 96, 97, 98, 100])
yvals = np.array([ 0, 3, 5, 8, 10, 12, 15, 17, 19, 21, 23, 25, 27,
28, 30, 32, 33, 35, 36, 37, 39, 40, 41, 43, 44, 45,
46, 47, 48, 49, 50, 51, 52, 53, 54, 54, 55, 56, 57,
58, 58, 59, 60, 61, 61, 62, 63, 63, 64, 65, 66, 66,
67, 67, 68, 69, 70, 70, 71, 72, 73, 73, 74, 75, 76,
77, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89,
90, 91, 92, 94, 95, 97, 98, 100])
fit = np.polyfit(xvals, yvals, 3)
f = np.poly1d(fit)
#print f
fig, ax = plt.subplots(1,1,figsize=(6,4),dpi=400)
ax.scatter(xvals, yvals, color="blue", label="input")
ax.scatter(xvals, f(xvals), color="red", label="fit")
ax.legend()
plt.show()