creating timesliced array in numpy - python

I want to create a numpy array.
T = 200
I want to create an array from 0 to 199, in which each value will be divided by 200.
l = [0, 1/200, 2/200, ...]
Numpy have any such method for calculation?

Alternatively one can use linspace:
>>> np.linspace(0, 1., 200, endpoint=False)
array([ 0. , 0.005, 0.01 , 0.015, 0.02 , 0.025, 0.03 , 0.035,
0.04 , 0.045, 0.05 , 0.055, 0.06 , 0.065, 0.07 , 0.075,
...
0.92 , 0.925, 0.93 , 0.935, 0.94 , 0.945, 0.95 , 0.955,
0.96 , 0.965, 0.97 , 0.975, 0.98 , 0.985, 0.99 , 0.995])

Use np.arange:
>>> import numpy as np
>>> np.arange(200, dtype=np.float)/200
array([ 0. , 0.005, 0.01 , 0.015, 0.02 , 0.025, 0.03 , 0.035,
0.04 , 0.045, 0.05 , 0.055, 0.06 , 0.065, 0.07 , 0.075,
0.08 , 0.085, 0.09 , 0.095, 0.1 , 0.105, 0.11 , 0.115,
...
0.88 , 0.885, 0.89 , 0.895, 0.9 , 0.905, 0.91 , 0.915,
0.92 , 0.925, 0.93 , 0.935, 0.94 , 0.945, 0.95 , 0.955,
0.96 , 0.965, 0.97 , 0.975, 0.98 , 0.985, 0.99 , 0.995])

T = 200.0
l = [x / float(T) for x in range(200)]

import numpy as np
T = 200
np.linspace(0.0, 1.0 - 1.0 / float(T), T)
Personally I prefer linspace for creating evenly spaced arrays in general. It is more complex in this case as the endpoint depends on the number of points T.

Related

How can I get value from dataframe/matrix into tuple of list

I have a matrix that store values like table below:
play_tv
play_series
Null
purchase
Conversion
Start
0.02
0.03
0.04
0.05
0.06
play_series
0.07
0.08
0.09
0.10
0.11
play_tv
0.12
0.13
0.14
0.15
0.16
Null
0.17
0.18
0.19
0.20
0.21
purchase
0.22
0.23
0.24
0.25
0.26
Conversion
0.27
0.28
0.29
0.30
0.31
and I have dataframe like this below:
session_id
path
path_pair
T01
[Start, play_series, Null]
[(Start, play_series),( play_series, Null)]
T02
[Start, play_tv, purchase, Conversion]
[(Start, play_tv),(play_tv, purchase),(purchase, Conversion)]
I want to get value from the matrix to replace column path_pair or create new column in my current dataframe. It's choose be list of values and How can I do that?
[(Start, play_series), (play_series, Null)] -> [0.03, 0.09]
[(Start, play_tv), (play_tv, purchase), (purchase, conversion)] -> [0.02, 0.15, 0.26 ]
result I want:
session_id
path
path_pair
T01
[Start, play_series, Null]
[0.03, 0.09]
T02
[Start, play_tv, purchase, Conversion]
[0.02, 0.15, 0.26]
script I try to get value from the matrix:
trans_matrix[trans_matrix.index=="Start"]["play_series"].values[0]
Given your input:
df1 = pd.DataFrame({'play_tv': [0.02, 0.07, 0.12, 0.17, 0.22, 0.27],
'play_series': [0.03, 0.08, 0.13, 0.18, 0.23, 0.28],
'Null': [0.04, 0.09, 0.14, 0.19, 0.24, 0.29],
'purchase': [0.05, 0.1, 0.15, 0.2, 0.25, 0.3],
'Conversion': [0.06, 0.11, 0.16, 0.21, 0.26, 0.31]},
index=['Start','play_series','play_tv','Null','purchase','Conversion'])
df2 = pd.DataFrame({'session_id': ['T01', 'T02'],
'path': [['Start', 'play_series', 'Null'],
['Start', 'play_tv', 'purchase', 'Conversion']],
'path_pair': [[('Start', 'play_series'),( 'play_series', 'Null')],
[('Start', 'play_tv'),('play_tv', 'purchase'),('purchase', 'Conversion')]]})
You can update df2 by applying a function to column 'path_pair' that looks up values in df1:
df2['path_pair'] = df2['path_pair'].apply(lambda lst: [df1.loc[x,y] for (x,y) in lst])
Output:
session_id path path_pair
0 T01 [Start, play_series, Null] [0.03, 0.09]
1 T02 [Start, play_tv, purchase, Conversion] [0.02, 0.15, 0.26]

CVXPY, least-squares Optimization, wrong constraint formulation

First of all, I'm sorry if my questions doesn't make sens, I am new using CVXPY library and I don't understand everything :/
I am trying to solve a minimization problem that I thought would be easy handle.
I got a matrix S dimensions (9,7) with known coefficients, B dimensions (1,7) with known coefficients, Alpha dimensions (1,7) what I need to find, with various constraints :
Alpha must be positive
The sum of all the coefficients of Alpha must be equal to 1
I need to optimize Alpha such as : A # Alpha-B=0.
I discovered CVXPY and thought least square optimization was perfect for this issue.
This is the code I wrote :
Alpha = cp.Variable(7)
objective = cp.Minimize(cp.sum_squares(S # Alpha - B))
constraints = [0 <= Alpha, Alpha<=1, np.sum(Alpha.value)==1]
prob = cp.Problem(objective, constraints)
result = prob.solve()
print(Alpha.value)
With
S= np.array([[0.03,0.02,0.072,0.051,0.058,0.0495,0.021 ],
[0.0295, 0.025 , 0.1 , 0.045 , 0.064 , 0.055 , 0.032 ],
[0.02 , 0.018 , 0.16 , 0.032 , 0.054 , 0.064 , 0.025 ],
[0.0195, 0.03 , 0.144 , 0.027 , 0.04 , 0.06 , 0.04 ],
[0.02 , 0.0315, 0.156 , 0.0295 ,0.027 , 0.0615 ,0.05 ],
[0.021 , 0.033 , 0.168 , 0.03 , 0.0265 ,0.063 , 0.09 ],
[0.02 , 0.05 , 0.28 , 0.039 , 0.035 , 0.055 , 0.04 ],
[0.021 , 0.03 , 0.22 , 0.0305, 0.0255, 0.057 , 0.009 ],
[0.0195, 0.008 , 0.2 , 0.021 , 0.01 , 0.048 , 0.0495]])
B=np.array([0.1015, 0.0888, 0.0911, 0.0901, 0.0945, 0.0909, 0.078 , 0.0913,
0.0845])
My issue is the following one :
Without the constraint np.sum(Alpha.value)==1, the code gives me results; but when I add the constraint it returns me
None
I presume the formulation is not good, but I have no Idea how to write it in another way?
Or maybe the problem doesn't have solution?
Thank you for your time
Use just sum(Alpha) == 1. You are not supposed to use numpy functions in CVXPY expressions, you must use CVXPY functions listed in https://www.cvxpy.org/tutorial/functions/index.html

Using seaborn to plot pre-grouped line data

I have data that I have pre-grouped. Specifically they are PR-curves for 3 different classes and I want to plot them on the same axes:
import numpy as np
data_groups = {
'ap=0.16: cat_3 (4/19)': {
'precision': np.array([0. , 0. , 0. , 0. , 0.2 ,
0.16666667, 0.14285714, 0.25 , 0.22222222, 0.2 ,
0.18181818, 0.16666667, 0.15384615, 0.14285714, 0.13333333,
0.21052632], dtype=np.float64),
'recall': np.array([0. , 0. , 0. , 0. , 0.25, 0.25, 0.25, 0.5 , 0.5 , 0.5 , 0.5 ,
0.5 , 0.5 , 0.5 , 0.5 , 1. ], dtype=np.float64),
},
'ap=0.20: cat_1 (3/19)': {
'precision': np.array([0. , 0.5 , 0.33333333, 0.25 , 0.2 ,
0.16666667, 0.14285714, 0.25 , 0.22222222, 0.2 ,
0.18181818, 0.16666667, 0.15384615, 0.14285714, 0.13333333,
0.15789474], dtype=np.float64),
'recall': np.array([0. , 0.33333333, 0.33333333, 0.33333333, 0.33333333,
0.33333333, 0.33333333, 0.66666667, 0.66666667, 0.66666667,
0.66666667, 0.66666667, 0.66666667, 0.66666667, 0.66666667,
1. ], dtype=np.float64),
},
'ap=0.54: cat_2 (8/19)': {
'precision': np.array([0. , 0.5 , 0.33333333, 0.5 , 0.6 ,
0.66666667, 0.71428571, 0.75 , 0.66666667, 0.6 ,
0.63636364, 0.58333333, 0.53846154, 0.5 , 0.46666667,
0.42105263], dtype=np.float64),
'recall': np.array([0. , 0.125, 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.75 ,
0.75 , 0.875, 0.875, 0.875, 0.875, 0.875, 1. ], dtype=np.float64),
},
}
I would like to use seaborn to plot these multiple lines in a single plot, but to do so I seem to need to transform this grouped data into a single long-form pandas table.
import pandas as pd
longform = []
for key, subdata in data_groups.items():
subdata = pd.DataFrame.from_dict(subdata)
subdata['label'] = key
longform.append(subdata)
data = pd.concat(longform)
Which effectively duplicates this "label" attribute for each item in the list:
recall precision label
0 0.000000 0.000000 ap=0.54: cat_2 (8/19)
1 0.125000 0.500000 ap=0.54: cat_2 (8/19)
2 0.125000 0.333333 ap=0.54: cat_2 (8/19)
...
0 0.000000 0.000000 ap=0.20: cat_1 (3/19)
1 0.333333 0.500000 ap=0.20: cat_1 (3/19)
2 0.333333 0.333333 ap=0.20: cat_1 (3/19)
3 0.333333 0.250000 ap=0.20: cat_1 (3/19)
...
0 0.000000 0.000000 ap=0.16: cat_3 (4/19)
1 0.000000 0.000000 ap=0.16: cat_3 (4/19)
2 0.000000 0.000000 ap=0.16: cat_3 (4/19)
At which point I can plot it:
import seaborn as sns
sns.lineplot(
data=data, x='recall', y='precision',
hue='label', style='label')
But I was wondering if there was a more efficient way to send the pre-grouped data into seaborn. I would like to avoid duplication the "label" attribute and I imagine it must effectively be inverting the pd.concat operation I just performed.
In the data structures accepted by seaborn (https://seaborn.pydata.org/tutorial/data_structure.html) they only mention this long-form (which I understand pretty well) and wide-form data (which makes much less sense to me).
This pre-grouped data isn't a wide-form variant right? I just want to verify that performing the extra concat is currently the only way to do this.
You don't have to send the entire data to seaborn at once. You can plot line by line, and they will still appear on the same plot. Seaborn can handle well with numpy array (long-form), so you can send each item to plotting separately and it still works:
from matplotlib import pyplot as plt
import seaborn as sns
for key, subdata in data_groups.items():
sns.lineplot(x=subdata['recall'], y=subdata['precision'], label=key)
plt.show()
result:
Of course you need to take care of extra styling, like legend position, confidence interval and etc - but essentially, it's plotting directly each group without direct conversation to a dataframe.

Time Series with Pandas, Python, and Plotly

I'm trying to create a data visualization that's essentially a time series chart. But I have to use Panda, Python, and Plotly, and I'm stuck on how to actually label the dates. Right now, the x labels are just integers from 1 to 60, and when you hover over the chart, you get that integer instead of the date.
I'm pulling values from a Google spreadsheet, and for now, I'd like to avoid parsing csv things.
I'd really like some help on how to label x as dates! Here's what I have so far:
import pandas as pd
from matplotlib import pyplot as plt
import bpr
%matplotlib inline
import chart_studio.plotly as pl
import plotly.express as px
import plotly.graph_objects as go
f = open("../credentials.txt")
u = f.readline()
plotly_user = str(u[:-1])
k = f.readline()
plotly_api_key = str(k)
pl.sign_in(username = plotly_user, api_key = plotly_api_key)
rand_x = np.arange(61)
rand_x = np.flip(rand_x)
rand_y = np.array([0.91 , 1 , 1.24 , 1.25 , 1.4 , 1.36 , 1.72 , 1.3 , 1.29 , 1.17 , 1.57 , 1.95 , 2.2 , 2.07 , 2.03 , 2.14 , 1.96 , 1.87 , 1.25 , 1.34 , 1.13 , 1.31 , 1.35 , 1.54 , 1.38 , 1.53 , 1.5 , 1.32 , 1.26 , 1.4 , 1.89 , 1.55 , 1.98 , 1.75 , 1.14 , 0.57 , 0.51 , 0.41 , 0.24 , 0.16 , 0.08 , -0.1 , -0.24 , -0.05 , -0.15 , 0.34 , 0.23 , 0.15 , 0.12 , -0.09 , 0.13 , 0.24 , 0.22 , 0.34 , 0.01 , -0.08 , -0.27 , -0.6 , -0.17 , 0.28 , 0.38])
test_data = pd.DataFrame(columns=['X', 'Y'])
test_data['X'] = rand_x
test_data['Y'] = rand_y
test_data.head()
def create_line_plot(data, x, y, chart_title="Rate by Date", labels_dict={}, c=["indianred"]):
fig = px.line(
data,
x = x,
y = y,
title = chart_title,
labels = labels_dict,
color_discrete_sequence = c
)
fig.show()
return fig
fig = create_line_plot(test_data, 'X', 'Y', labels_dict={'X': 'Date', 'Y': 'Rate (%)'}) ```
Right now, the x labels are just integers from 1 to 60, and when you hover over the chart, you get that integer instead of the date.
This happens because you are setting rand_x as x labels, and rand_x is an array of integer. Setting labels_dict={'X': 'Date', 'Y': 'Rate (%)'} only adding text Date before x value. What you need to do is parsing an array of datetime values into x. For example:
rand_x = np.array(['2020-01-01','2020-01-02','2020-01-03'], dtype='datetime64')

Inadvertantly Shifting Plots in Matplotlib

I've got some weird behaviour in matplotlib that I couldn't explain, and I was wondering if someone could see what was going on. What's essentially happening is that I'm trying to place what used to be two figures into one. I do so by creating two GridSpec objects, one for the left half of the figure and the other for the right. I draw the left hand side and add a colorbar, but when I select my first subplot on the right hand side, the figure on the left shifts to the right under the colorbar. If you try executing the example code excluding the last two lines, you will see what you expect, but if you execute the entirety of it, the plot on the left shifts. What's going on?
import matplotlib.gridspec as gridspec
import numpy as np
import pylab as pl
scores = np.array([[ 0.32 , 0.32 , 0.32 , 0.32 , 0.32 ,
0.32 , 0.32 , 0.32 , 0.32 ],
[ 0.32 , 0.32 , 0.32 , 0.49333333, 0.85333333,
0.92666667, 0.32 , 0.32 , 0.32 ],
[ 0.32 , 0.32 , 0.51333333, 0.87333333, 0.96 ,
0.95333333, 0.89333333, 0.44 , 0.34 ],
[ 0.32 , 0.51333333, 0.88 , 0.96 , 0.96666667,
0.95333333, 0.90666667, 0.47333333, 0.34 ],
[ 0.51333333, 0.88 , 0.96 , 0.96 , 0.96 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.88 , 0.96 , 0.96 , 0.96 , 0.94666667,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.96 , 0.96 , 0.96666667, 0.96 , 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.96 , 0.96666667, 0.96666667, 0.94666667, 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.96666667, 0.97333333, 0.96 , 0.94666667, 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.96666667, 0.96666667, 0.96666667, 0.94666667, 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.95333333, 0.96 , 0.96666667, 0.94666667, 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ]])
C_range = 10.0 ** np.arange(-2, 9)
gamma_range = 10.0 ** np.arange(-5, 4)
pl.figure(0, figsize=(16,6))
gs = gridspec.GridSpec(1,1)
gs.update(left=0.05, right=0.45, bottom=0.15, top=0.95)
pl.subplot(gs[0,0])
pl.imshow(scores, interpolation='nearest', cmap=pl.cm.spectral)
pl.xlabel('gamma')
pl.ylabel('C')
pl.colorbar()
pl.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)
pl.yticks(np.arange(len(C_range)), C_range)
gs = gridspec.GridSpec(3,3)
gs.update(left=0.5, right=0.95, bottom=0.05, top=0.95)
pl.subplot(gs[0,0]) # here's where the shift happens
You can create the colorbar after # here's where the shift happens
pl.figure(0, figsize=(16,6))
gs = gridspec.GridSpec(1,1)
gs.update(left=0.05, right=0.45, bottom=0.15, top=0.95)
ax = pl.subplot(gs[0,0]) # save the axes to ax
pl.imshow(scores, interpolation='nearest', cmap=pl.cm.spectral)
pl.xlabel('gamma')
pl.ylabel('C')
pl.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)
pl.yticks(np.arange(len(C_range)), C_range)
gs = gridspec.GridSpec(3,3)
gs.update(left=0.5, right=0.95, bottom=0.05, top=0.95)
pl.subplot(gs[0,0]) # here's where the shift happens
pl.colorbar(ax=ax) # create colorbar for ax
pl.show()

Categories

Resources