Got the following code
import pandas as pd
import plotly.graph_objects as go
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/tiago-peres/immersion/master/Platforms_dataset.csv')
fig = px.scatter_3d(df, x='Functionality ', y='Accessibility', z='Immersion', color='Platforms')
grey = [[0,'#C0C0C0'],[1,'#C0C0C0']]
zero_pt = pd.Series([0])
z = zero_pt.append(df['Immersion'], ignore_index = True).reset_index(drop = True)
y = zero_pt.append(df['Accessibility'], ignore_index = True).reset_index(drop = True)
x = zero_pt.append(df['Functionality '], ignore_index = True).reset_index(drop = True)
length_data = len(z)
z_plane_pos = 66.5*np.ones((length_data,length_data))
fig.add_trace(go.Surface(x=x, y=y, z=z_plane_pos, colorscale=grey, showscale=False))
fig.add_trace(go.Surface(x=x.apply(lambda x: 15.69), y=y, z = np.array([z]*length_data), colorscale= grey, showscale=False))
fig.add_trace(go.Surface(x=x, y= y.apply(lambda x: 55), z = np.array([z]*length_data).transpose(), colorscale=grey, showscale=False))
fig.update_layout(scene = dict(
xaxis = dict(nticks=4, range=[0,31.38],),
yaxis = dict(nticks=4, range=[0,110],),
zaxis = dict(nticks=4, range=[0,133],),),
legend_orientation="h",margin=dict(l=0, r=0, b=0, t=0))
that can be opened in Google Colab which produces the following output
As you can see, the planes are not filling up the entire axis space, they should respect the axis range. In other words, the planes
z=66.5 - should exist between [0, 31.38] in x and [0, 110] in y
x=15.59 - should exist between [0, 110] in y and [0, 133] in z
y=55 - should exist between [0, 31.38] in x and [0, 133] in z
How can that be done?
With this new adjustment,
import plotly.express as px
import pandas as pd
import plotly.graph_objects as go
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/tiago-peres/immersion/master/Platforms_dataset.csv')
fig = px.scatter_3d(df, x='Functionality ', y='Accessibility', z='Immersion', color='Platforms')
grey = [[0,'#C0C0C0'],[1,'#C0C0C0']]
zero_pt = pd.Series([0])
z1 = np.arange(0, 134, 1)
print(z1)
y1 = np.arange(0, 111, 1)
print(z1)
x1 = np.arange(0, 32.38, 1)
print(z1)
z = zero_pt.append(df['Immersion'], ignore_index = True).reset_index(drop = True)
y = zero_pt.append(df['Accessibility'], ignore_index = True).reset_index(drop = True)
x = zero_pt.append(df['Functionality '], ignore_index = True).reset_index(drop = True)
print(zero_pt)
print(z)
test1 = pd.Series([133])
test = z.append(test1)
length_data1 = len(z1)
z_plane_pos = 66.5*np.ones((length_data1,length_data1))
length_data2 = len(y1)
y_plane_pos = 55*np.ones((length_data2,length_data2))
length_data3 = len(x1)
x_plane_pos = 15.69*np.ones((length_data3,length_data3))
fig.add_trace(go.Surface(x=x1, y=y1, z=z_plane_pos, colorscale=grey, showscale=False))
fig.add_trace(go.Surface(x=x.apply(lambda x: 15.69), y=y1, z = np.array([test]*length_data1), colorscale= grey, showscale=False))
fig.add_trace(go.Surface(x=x1, y= y.apply(lambda x: 55), z = np.array([test]*length_data1).transpose(), colorscale=grey, showscale=False))
fig.update_layout(scene = dict(
xaxis = dict(nticks=4, range=[0,31.38],),
yaxis = dict(nticks=4, range=[0,110],),
zaxis = dict(nticks=4, range=[0,133],),),
legend_orientation="h",margin=dict(l=0, r=0, b=0, t=0))
nearly having the job done but the planes x=15.59 and y=55 aren't going to the maximum 133 in Immersion
The issue was that the arrays I was plotting weren't the right shapes.
By properly splitting the bit where created the input arrays and the plotting, was able to discover this, and consequently made input arrays (for plotting) of the right size and appropriate content.
import plotly.express as px
import pandas as pd
import plotly.graph_objects as go
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/tiago-peres/immersion/master/Platforms_dataset.csv')
zero_pt = pd.Series([0])
z1 = np.arange(0, 134, 1)
y1 = np.arange(0, 111, 1)
x1 = np.arange(0, 32.38, 1)
z = zero_pt.append(df['Immersion'], ignore_index = True).reset_index(drop = True)
y = zero_pt.append(df['Accessibility'], ignore_index = True).reset_index(drop = True)
x = zero_pt.append(df['Functionality '], ignore_index = True).reset_index(drop = True)
test1 = pd.Series([133])
test = z.append(test1)
length_data1 = len(z1)
z_plane_pos = 66.5*np.ones((length_data1,length_data1))
length_data2 = len(y1)
y_plane_pos = 55*np.ones((length_data2,length_data2))
length_data3 = len(x1)
x_plane_pos = 15.69*np.ones((length_data3,length_data3))
xvals = x.apply(lambda x: 15.69)
xvals2 = x1
yvals = y1
yvals2 = y.apply(lambda x: 55)
zvals = np.zeros((len(yvals), len(xvals)))
zvals[:, -1] = 133 # np.array([test]*length_data2)
zvals2 = np.zeros((len(yvals2), len(xvals2)))
zvals2[-1, :] = 133
fig = px.scatter_3d(df, x='Functionality ', y='Accessibility', z='Immersion', color='Platforms')
grey = [[0,'#C0C0C0'],[1,'#C0C0C0']]
fig.add_trace(go.Surface(x=x1, y=y1, z=z_plane_pos, colorscale=grey, showscale=False))
fig.add_trace(go.Surface(x=xvals, y=yvals, z = zvals, colorscale= grey, showscale=False))
fig.add_trace(go.Surface(x=xvals2, y=yvals2, z = zvals2, colorscale=grey, showscale=False))
fig.update_layout(scene = dict(
xaxis = dict(nticks=4, range=[0,31.38],),
yaxis = dict(nticks=4, range=[0,110],),
zaxis = dict(nticks=4, range=[0,133],),),
legend_orientation="h",margin=dict(l=0, r=0, b=0, t=0))
See it here.
Related
The following plots two separate scatterplots using Plotly. I want to combine the points from each subplot into a single legend. However, if I plot the figure as is, there are some duplicate entries. On the other hand, if I hide a legend from a certain subplot, not all entries are displayed.
df = pd.DataFrame({'Type' : ['1','1','1','1','1','2','2','2','2','2'],
'Category' : ['A','D','D','D','F','B','D','A','D','E']
})
df['Color'] = df['Category'].map(dict(zip(df['Category'].unique(),
px.colors.qualitative.Dark24[:len(df['Category'].unique())])))
df = pd.concat([df]*10, ignore_index = True)
df['Lat'] = np.random.randint(0, 20, 100)
df['Lon'] = np.random.randint(0, 20, 100)
Color = df['Color'].unique()
Category = df['Category'].unique()
cats = dict(zip(Color, Category))
df_type_1 = df[df['Type'] == '1'].copy()
df_type_2 = df[df['Type'] == '2'].copy()
fig = make_subplots(
rows = 1,
cols = 2,
specs = [[{"type": "scattermapbox"}, {"type": "scattermapbox"}]],
vertical_spacing = 0.05,
horizontal_spacing = 0.05
)
for c in df_type_1['Color'].unique():
df_color = df_type_1[df_type_1['Color'] == c]
fig.add_trace(go.Scattermapbox(
lat = df_color['Lat'],
lon = df_color['Lon'],
mode = 'markers',
name = cats[c],
marker = dict(color = c),
opacity = 0.8,
#legendgroup = 'group2',
#showlegend = True,
),
row = 1,
col = 1
)
for c in df_type_2['Color'].unique():
df_color = df_type_2[df_type_2['Color'] == c]
fig.add_trace(go.Scattermapbox(
lat = df_color['Lat'],
lon = df_color['Lon'],
mode = 'markers',
name = cats[c],
marker = dict(color = c),
opacity = 0.8,
#legendgroup = 'group2',
#showlegend = False,
),
row = 1,
col = 2
)
fig.update_layout(height = 600, width = 800, margin = dict(l = 10, r = 10, t = 30, b = 10));
fig.update_layout(mapbox1 = dict(zoom = 2, style = 'carto-positron'),
mapbox2 = dict(zoom = 2, style = 'carto-positron'),
)
fig.show()
output: duplicate entries
if I use showlegend = False on either subplot, then the legend will not show all applicable entries.
output: (subplot 2 showlegend = False)
The best way to remove duplicate legends at this time is to use set() to remove duplicates from the created legend and update it with that content. I am saving this as a snippet. I am getting the snippet from this answer. I have also changed the method to use the color information set in the columns. I have also redesigned it so that it can be created in a single loop process without creating an extra data frame.
fig = make_subplots(
rows = 1,
cols = 2,
specs = [[{"type": "scattermapbox"}, {"type": "scattermapbox"}]],
vertical_spacing = 0.05,
horizontal_spacing = 0.05
)
for t in df['Type'].unique():
dff = df.query('Type ==#t')
for c in dff['Category'].unique():
dffc = dff.query('Category == #c')
fig.add_trace(go.Scattermapbox(
lat = dffc['Lat'],
lon = dffc['Lon'],
mode = 'markers',
name = c,
marker = dict(color = dffc['Color']),
opacity = 0.8,
),
row = 1,
col = int(t)
)
fig.update_layout(height = 600, width = 800, margin = dict(l = 10, r = 10, t = 30, b = 10));
fig.update_layout(mapbox1 = dict(zoom = 2, style = 'carto-positron'),
mapbox2 = dict(zoom = 2, style = 'carto-positron'),
)
names = set()
fig.for_each_trace(
lambda trace:
trace.update(showlegend=False)
if (trace.name in names) else names.add(trace.name))
fig.show()
how can I add a linar regression to this bokeh?, I have trouble with this, and dont know how to add to the figure the lr (don't know how to add to the curdoc expression). I've seen other posts, but havent found the way to add it to the bokeh. Please, help me with this showing how to add that line to the figure.
import pandas as pd
from bokeh.layouts import column, row
from bokeh.models import Select
from bokeh.palettes import Spectral5
from bokeh.plotting import curdoc, figure
from bokeh.sampledata.autompg import autompg_clean as df
df = df.copy()
SIZES = list(range(6, 22, 3))
COLORS = Spectral5
N_SIZES = len(SIZES)
N_COLORS = len(COLORS)
# data cleanup
df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)
del df['name']
columns = sorted(df.columns)
discrete = [x for x in columns if df[x].dtype == object]
continuous = [x for x in columns if x not in discrete]
def create_figure():
xs = df[x.value].values
ys = df[y.value].values
x_title = x.value.title()
y_title = y.value.title()
kw = dict()
if x.value in discrete:
kw['x_range'] = sorted(set(xs))
if y.value in discrete:
kw['y_range'] = sorted(set(ys))
kw['title'] = "%s vs %s" % (x_title, y_title)
p = figure(height=600, width=800, tools='pan,box_zoom,hover,reset', **kw)
p.xaxis.axis_label = x_title
p.yaxis.axis_label = y_title
if x.value in discrete:
p.xaxis.major_label_orientation = pd.np.pi / 4
sz = 9
if size.value != 'None':
if len(set(df[size.value])) > N_SIZES:
groups = pd.qcut(df[size.value].values, N_SIZES, duplicates='drop')
else:
groups = pd.Categorical(df[size.value])
sz = [SIZES[xx] for xx in groups.codes]
c = "#31AADE"
if color.value != 'None':
if len(set(df[color.value])) > N_COLORS:
groups = pd.qcut(df[color.value].values, N_COLORS, duplicates='drop')
else:
groups = pd.Categorical(df[color.value])
c = [COLORS[xx] for xx in groups.codes]
p.circle(x=xs, y=ys, color=c, size=sz, line_color="white", alpha=0.6, hover_color='white', hover_alpha=0.5)
return p
def update(attr, old, new):
layout.children[1] = create_figure()
x = Select(title='X-Axis', value='mpg', options=columns)
x.on_change('value', update)
y = Select(title='Y-Axis', value='hp', options=columns)
y.on_change('value', update)
size = Select(title='Size', value='None', options=['None'] + continuous)
size.on_change('value', update)
color = Select(title='Color', value='None', options=['None'] + continuous)
color.on_change('value', update)
controls = column(x, y, color, size, width=200)
layout = row(controls, create_figure())
curdoc().add_root(layout)
curdoc().title = "Crossfilter"
Dear Python programmers,
I am currently working with curve_fit from scipy inorder to find out what correlation the x and y data have with echouter. However, the curve fit becomes really weird even when I fit a simple lineair formule towards it. I've tried changing the array to a numpy array at the: def func(x, a, b, c): "Fit functie" return a * np.asarray(x) + b part but it still gives me a graph that looks like a 3 year old who scratched with some red pencil.
One thing I do remember is sorting the values of massflows and rms_smote from low to high. Which you can view above the def func(x, a, b, c) bit. Since the curve_fit was giving me a fit. Yet also kinda scratched out as if you're sketching when the values ware unsorted. I don't know if curve_fit considers data differently if it's sorted or not.
If you need any more information, let me know :) Any suggestion is welcome!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy.stats import linregress
from scipy.optimize import curve_fit
data_15 = pd.read_csv(r"C:\Users\Thomas\Documents\Pythondata\2022-01-15_SMOTERapport.csv", header= 0, sep=';', decimal=',')
data_06 = pd.read_csv(r"C:\Users\Thomas\Documents\Pythondata\2022-02-06_SMOTERapport.csv", header= 0, sep=';', decimal=',')
data_10 = pd.read_csv(r"C:\Users\Thomas\Documents\Pythondata\2022-02-10_SMOTERapport.csv", header= 0, sep=';', decimal=',')
speed_15 = data_15['SPEED_ACT']
speed_06 = data_06['SPEED_ACT']
speed_10 = data_10['SPEED_ACT']
"Data filter 01_15"
filter = [i for i, e in enumerate(speed_15) if e >= 80]
s_15 = pd.DataFrame(data_15)
speed15 = s_15.filter(items = filter, axis=0)
speed15.reset_index(drop=True, inplace=True)
temp15 = speed15['TP_SMOTE']
foutmetingen2 = [i for i, e in enumerate(temp15) if e < 180]
speed15 = speed15.drop(foutmetingen2)
tp_strip15 = speed15['TP_AMBIENT']
tp_target15 = speed15['TP_TARGET']
tp_smote15 = speed15['TP_SMOTE']
v_15 = speed15['SPEED_ACT']
width15 = speed15['STRIP_WIDTH']
thickness15 = speed15['STRIP_THICKNESS']
power15 = speed15['POWER_INVERTER_PRE']
voltage15 = speed15['VOLTAGE_INVERTER_PRE']
"Data filter 02_06"
filter = [i for i, e in enumerate(speed_06) if e >= 80]
s_06 = pd.DataFrame(data_06)
speed06 = s_06.filter(items = filter, axis=0)
speed06.reset_index(drop=True, inplace=True)
temp06 = speed06['TP_SMOTE']
foutmetingen2 = [i for i, e in enumerate(temp06) if e < 180]
speed06 = speed06.drop(foutmetingen2)
tp_strip06 = speed06['TP_AMBIENT']
tp_target06 = speed06['TP_TARGET']
tp_smote06 = speed06['TP_SMOTE']
v_06 = speed06['SPEED_ACT']
width06 = speed06['STRIP_WIDTH']
thickness06 = speed06['STRIP_THICKNESS']
power06 = speed06['POWER_INVERTER_PRE']
voltage06 = speed06['VOLTAGE_INVERTER_PRE']
"Data filter 02_10"
filter = [i for i, e in enumerate(speed_10) if e >= 80]
s_10 = pd.DataFrame(data_10)
speed10 = s_10.filter(items = filter, axis=0)
speed10.reset_index(drop=True, inplace=True)
temp_01 = speed10['TP_SMOTE']
foutmetingen2 = [i for i, e in enumerate(temp_01) if e < 180]
speed10 = speed10.drop(foutmetingen2)
tp_strip10 = speed10['TP_AMBIENT']
tp_target10 = speed10['TP_TARGET']
tp_smote10 = speed10['TP_SMOTE']
v_10 = speed10['SPEED_ACT']
width10 = speed10['STRIP_WIDTH']
thickness10 = speed10['STRIP_THICKNESS']
power10 = speed10['POWER_INVERTER_PRE']
voltage10 = speed10['VOLTAGE_INVERTER_PRE']
"Constanten"
widthmax = 1253
Kra = 0.002033636
Kosc = 0.073086272
Pnominal = 2200
meting_15 = np.arange(0, len(speed15), 1)
meting_06 = np.arange(0, len(speed06), 1)
meting_10 = np.arange(0, len(speed10), 1)
cp = 480
rho = 7850
"---------------------------------------------------------------------"
def temp(power, speed, width, thickness, tp_strip, tp_target, tp_smote,
voltage):
"Berekende temperatuur vergelijken met target temperatuur"
massflow = (speed/60)*width*10**-3*thickness*10**-3*rho
LossesRA = Kra*Pnominal*(width/widthmax)
LossesOSC = Kosc*Pnominal*(voltage/100)**2
Plosses = (LossesRA + LossesOSC)
power_nl = (power/100)*Pnominal - Plosses
temp_c = ((power_nl*1000)/(massflow*cp)) + tp_strip
verschil_t = (temp_c/tp_target)*100-100
verschil_smote = (temp_c/tp_smote)*100-100
return temp_c, verschil_t, verschil_smote, massflow
temp_15 = temp(power15, v_15, width15, thickness15, tp_strip15, tp_target15,
tp_smote15, voltage15)
temp_06 = temp(power06, v_06, width06, thickness06, tp_strip06, tp_target06,
tp_smote06, voltage06)
temp_10 = temp(power10, v_10, width10, thickness10, tp_strip10, tp_target10,
tp_smote10, voltage10)
"---------------------------------------------------------------------"
def rms(Temperatuurberekend, TemperatuurGemeten):
"De Root Mean Square berekenen tussen berekend en gemeten data"
rootmeansquare = (TemperatuurGemeten - Temperatuurberekend)
rootmeansquare_totaal = np.sum(rootmeansquare)
rootmeansquare_gem = rootmeansquare_totaal/len(rootmeansquare)
return rootmeansquare, rootmeansquare_totaal, rootmeansquare_gem
rms_tp_smote15 = (rms(temp_15[0], tp_smote15))
rms_tp_smote06 = (rms(temp_06[0], tp_smote06))
rms_tp_smote10 = (rms(temp_10[0], tp_smote10))
"----------------------------------------------------------------------"
massflows = [np.sum(temp_06[3])/len(temp_06[3]), np.sum(temp_15[3])/
len(temp_15[3]), np.sum(temp_10[3])/len(temp_10[3])]
rms_smote = [rms_tp_smote06[2], rms_tp_smote10[2], rms_tp_smote15[2]]
rms_tp_smote_pre = np.append(rms_tp_smote15[0].tolist(),
rms_tp_smote06[0].tolist())
rms_tp_smote = np.append(rms_tp_smote_pre, rms_tp_smote10[0].tolist())
massflow_pre = np.append(temp_15[3].tolist(), temp_06[3].tolist())
massflow = np.append(massflow_pre, temp_10[3].tolist())
massflow_sort = np.sort(massflow)
rms_tp_smote_sort = [x for _, x in sorted(zip(massflow, rms_tp_smote))]
a,b,r,p, s_a= linregress (massflows,rms_smote)
print('RC: ' ,a ,'\n','std: ', s_a , '\n', 'Offset: ', b)
def func(x, a, b, c):
"Fit functie"
return a * np.asarray(x) + b
popt, pcov = curve_fit(func, massflow_sort, rms_tp_smote_sort)
popt
functie = func(massflow_sort, *popt)
sns.set_theme(style='whitegrid')
fig, axs = plt.subplots(2, figsize=(10, 10))
axs[0].plot(massflows, rms_smote, label='Temp afwijking als f(massflow)')
axs[0].plot ([massflows[0] ,massflows[len (massflows) -1]] ,
[a*massflows [0]+b,a*massflows[len (massflows) -1]+b] ,
label ='trendlijn')
axs[0].set(xlabel='Mass flow ($kg/s$)',
ylabel='Temperatuur afwijking gem ($\u00b0C$)', title='Met Verliezen')
axs[0].legend(loc='upper right')
axs[1].plot(massflow_sort, rms_tp_smote_sort, 'o', label='Temp/Massflow 01-15')
#axs[1].plot(temp_06[3], rms_tp_smote06[0], 'o', label='Temp/Massflow 02-06')
#axs[1].plot(temp_10[3], rms_tp_smote10[0], 'o', label='Temp/Massflow 02-10')
axs[1].plot(massflow, func(massflow_sort, *popt), 'r-',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
axs[1].set(xlabel='Mass flow ($kg/s$)',
ylabel='Temperatuur afwijking gem ($\u00b0C$)')
axs[1].legend(loc='upper right')
print("Gemiddelde verschil temperatuur smote: ", rms_tp_smote15[1])
print("Gemiddelde uitwijking temperatuur smote: ", rms_tp_smote15[2])
For this example, I modified the dataset and code from https://plotly.com/python/map-subplots-and-small-multiples/ to add a column to plot it in colors.
What I want to do here and in my dataset is to add a legend with a color scale.
Here the range is the same (0-9) among maps, so, a general legend or a legend for each subplot would work. Here the general legend is wrong.
Related: Plotly contour subplots each having their own colorbar
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/1962_2006_walmart_store_openings.csv')
df.head()
# new column
import numpy as np
df["counts"] = np.random.choice(range(0,10),df.shape[0])
data = []
layout = dict(
title = 'New Walmart Stores per year 1962-2006<br>\
Source: <a href="http://www.econ.umn.edu/~holmes/data/WalMart/index.html">\
University of Minnesota</a>',
# showlegend = False,
autosize = False,
width = 1000,
height = 900,
hovermode = False,
legend = dict(
x=0.7,
y=-0.1,
bgcolor="rgba(255, 255, 255, 0)",
font = dict( size=11 ),
)
)
years = df['YEAR'].unique()
for i in range(len(years)):
geo_key = 'geo'+str(i+1) if i != 0 else 'geo'
lons = list(df[ df['YEAR'] == years[i] ]['LON'])
lats = list(df[ df['YEAR'] == years[i] ]['LAT'])
mycolor = list(df[ df['YEAR'] == years[i] ]['counts']) # new
# Walmart store data
data.append(
dict(
type = 'scattergeo',
showlegend=False,
lon = lons,
lat = lats,
geo = geo_key,
name = int(years[i]),
marker = dict(
color = mycolor, # new
#color = "rgb(0, 0, 255)",
opacity = 0.5
,showscale=True # new
)
)
)
# Year markers
data.append(
dict(
type = 'scattergeo',
showlegend = False,
lon = [-78],
lat = [47],
geo = geo_key,
text = [years[i]],
mode = 'text',
)
)
layout[geo_key] = dict(
scope = 'usa',
showland = True,
landcolor = 'rgb(229, 229, 229)',
showcountries = False,
domain = dict( x = [], y = [] ),
subunitcolor = "rgb(255, 255, 255)",
)
z = 0
COLS = 5
ROWS = 9
for y in reversed(range(ROWS)):
for x in range(COLS):
geo_key = 'geo'+str(z+1) if z != 0 else 'geo'
layout[geo_key]['domain']['x'] = [float(x)/float(COLS), float(x+1)/float(COLS)]
layout[geo_key]['domain']['y'] = [float(y)/float(ROWS), float(y+1)/float(ROWS)]
z=z+1
if z > 42:
break
fig = go.Figure(data=data, layout=layout)
fig.update_layout(width=800)
config = {'staticPlot': True}
fig.show(config=config)
I had to build the coordinates for each colorbar:
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/1962_2006_walmart_store_openings.csv')
df.head()
z1 = z = 0
COLS = 5
ROWS = 9
mylist = [[],[]]
for y in reversed(range(ROWS)):
for x in range(COLS):
mylist[0].append((float(x+1)/float(COLS))-.02)
mylist[1].append((float(y+1)/float(ROWS))-.05)
z1=z1+1
if z1 > 42:
break
# new column
import numpy as np
df["counts"] = np.random.choice(range(0,10),df.shape[0])
data = []
layout = dict(
title = 'New Walmart Stores per year 1962-2006<br>\
Source: <a href="http://www.econ.umn.edu/~holmes/data/WalMart/index.html">\
University of Minnesota</a>',
# showlegend = False,
autosize = False,
width = 1000,
height = 900,
hovermode = False,
legend = dict(
x=0.7,
y=-0.1,
bgcolor="rgba(255, 255, 255, 0)",
font = dict( size=11 ),
)
)
years = df['YEAR'].unique()
for i in range(len(years)):
geo_key = 'geo'+str(i+1) if i != 0 else 'geo'
lons = list(df[ df['YEAR'] == years[i] ]['LON'])
lats = list(df[ df['YEAR'] == years[i] ]['LAT'])
mycolor = list(df[ df['YEAR'] == years[i] ]['counts']) # new
# Walmart store data
data.append(
dict(
type = 'scattergeo',
showlegend=False,
lon = lons,
lat = lats,
geo = geo_key,
name = int(years[i]),
marker = dict(
color = mycolor, # new
#color = "rgb(0, 0, 255)",
opacity = 0.5,
showscale=True
,colorbar=dict(len=0.1
, x=mylist[0][i]
, y=mylist[1][i]
,thickness=5
)
)
)
)
# Year markers
data.append(
dict(
type = 'scattergeo',
showlegend = False,
lon = [-78],
lat = [47],
geo = geo_key,
text = [years[i]],
mode = 'text',
)
)
layout[geo_key] = dict(
scope = 'usa',
showland = True,
landcolor = 'rgb(229, 229, 229)',
showcountries = False,
domain = dict( x = [], y = [] ),
subunitcolor = "rgb(255, 255, 255)",
)
for y in reversed(range(ROWS)):
for x in range(COLS):
geo_key = 'geo'+str(z+1) if z != 0 else 'geo'
layout[geo_key]['domain']['x'] = [float(x)/float(COLS), float(x+1)/float(COLS)]
layout[geo_key]['domain']['y'] = [float(y)/float(ROWS), float(y+1)/float(ROWS)]
z=z+1
if z > 42:
break
fig = go.Figure(data=data, layout=layout)
fig.update_layout(width=800)
config = {'staticPlot': True}
fig.show(config=config)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv('D:\ history/segment.csv')
data = pd.DataFrame(data)
data = data.sort_values(['Prob_score'], ascending=[False])
one = len(data)
actualpaid_overall = len(data.loc[data['paidstatus'] == 1])
data_split = np.array_split(data, 10)
data1 = data_split[0]
actualpaid_ten = len(data1.loc[data1['paidstatus'] == 1])
percent_ten = actualpaid_ten/actualpaid_overall
data2 = data_split[1]
actualpaid_twenty = len(data2.loc[data2['paidstatus'] == 1])
percent_twenty = (actualpaid_twenty/actualpaid_overall) + percent_ten
data3 = data_split[2]
actualpaid_thirty = len(data3.loc[data3['paidstatus'] == 1])
percent_thirty = (actualpaid_thirty/actualpaid_overall) + percent_twenty
data4 = data_split[3]
actualpaid_forty = len(data4.loc[data4['paidstatus'] == 1])
percent_forty = (actualpaid_forty/actualpaid_overall) + percent_thirty
data5 = data_split[4]
actualpaid_fifty = len(data5.loc[data5['paidstatus'] == 1])
percent_fifty = (actualpaid_fifty/actualpaid_overall) + percent_forty
data6 = data_split[5]
actualpaid_sixty = len(data6.loc[data6['paidstatus'] == 1])
percent_sixty = (actualpaid_sixty/actualpaid_overall) + percent_fifty
data7 = data_split[6]
actualpaid_seventy = len(data7.loc[data7['paidstatus'] == 1])
percent_seventy = (actualpaid_seventy/actualpaid_overall) + percent_sixty
data8 = data_split[7]
actualpaid_eighty = len(data8.loc[data8['paidstatus'] == 1])
percent_eighty = (actualpaid_eighty/actualpaid_overall) + percent_seventy
data9 = data_split[8]
actualpaid_ninenty = len(data9.loc[data9['paidstatus'] == 1])
percent_ninenty = (actualpaid_ninenty/actualpaid_overall) + percent_eighty
data10 = data_split[9]
actualpaid_hundred = len(data10.loc[data10['paidstatus'] == 1])
percent_hundred = (actualpaid_hundred/actualpaid_overall) + percent_ninenty
array_x = [10,20,30,40,50,60,70,80,90,100]
array_y = [ percent_ten, percent_twenty, percent_thirty, percent_forty,percent_fifty, percent_sixty, percent_seventy, percent_eighty, percent_ninenty, percent_hundred]
plt.xlabel(' Base')
plt.ylabel(' percent')
ax = plt.plot(array_x,array_y)
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth=0.5, color='0.1')
plt.grid( which='both', axis = 'both', linewidth=0.5,color='0.75')
The above is my python code i have splitted my dataframe into 10 equal sections and plotted the graph but I'm not satisfied with this i have two concerns:
array_x = [10,20,30,40,50,60,70,80,90,100] in this line of code i have manually taken the x values, is there any possible way to process automatically as i have taken split(data,10) it should show 10 array values
As we can see the whole data1,2,3,4...10 is being repeated again and again is there a solution to write this in a function or loop.
Any help with codes will be appreciated. Thanks
I believe you need list comprehension and for count is possible use simplier way - sum of boolean mask, True values are processes like 1, then convert list to numpy array and use numpy.cumsum:
data = pd.read_csv('D:\ history/segment.csv')
data = data.sort_values('Prob_score', ascending=False)
one = len(data)
actualpaid_overall = (data['paidstatus'] == 1).sum()
data_split = np.array_split(data, 10)
x = [len(x) for x in data_split]
y = [(x['paidstatus'] == 1).sum()/actualpaid_overall for x in data_split]
array_x = np.cumsum(np.array(x))
array_y = np.cumsum(np.array(y))
plt.xlabel(' Base')
plt.ylabel(' percent')
ax = plt.plot(array_x,array_y)
plt.minorticks_on()
plt.grid(which='major', linestyle='-', linewidth=0.5, color='0.1')
plt.grid( which='both', axis = 'both', linewidth=0.5,color='0.75')
Sample:
np.random.seed(2019)
N = 1000
data = pd.DataFrame({'paidstatus':np.random.randint(3, size=N),
'Prob_score':np.random.randint(100, size=N)})
#print (data)
data = data.sort_values(['Prob_score'], ascending=[False])
actualpaid_overall = (data['paidstatus'] == 1).sum()
data_split = np.array_split(data, 10)
x = [len(x) for x in data_split]
y = [(x['paidstatus'] == 1).sum()/actualpaid_overall for x in data_split]
array_x = np.cumsum(np.array(x))
array_y = np.cumsum(np.array(y))
print (array_x)
[ 100 200 300 400 500 600 700 800 900 1000]
print (array_y)
[0.09118541 0.18844985 0.27963526 0.38601824 0.49848024 0.61702128
0.72036474 0.81155015 0.9331307 1. ]