I've got some weird behaviour in matplotlib that I couldn't explain, and I was wondering if someone could see what was going on. What's essentially happening is that I'm trying to place what used to be two figures into one. I do so by creating two GridSpec objects, one for the left half of the figure and the other for the right. I draw the left hand side and add a colorbar, but when I select my first subplot on the right hand side, the figure on the left shifts to the right under the colorbar. If you try executing the example code excluding the last two lines, you will see what you expect, but if you execute the entirety of it, the plot on the left shifts. What's going on?
import matplotlib.gridspec as gridspec
import numpy as np
import pylab as pl
scores = np.array([[ 0.32 , 0.32 , 0.32 , 0.32 , 0.32 ,
0.32 , 0.32 , 0.32 , 0.32 ],
[ 0.32 , 0.32 , 0.32 , 0.49333333, 0.85333333,
0.92666667, 0.32 , 0.32 , 0.32 ],
[ 0.32 , 0.32 , 0.51333333, 0.87333333, 0.96 ,
0.95333333, 0.89333333, 0.44 , 0.34 ],
[ 0.32 , 0.51333333, 0.88 , 0.96 , 0.96666667,
0.95333333, 0.90666667, 0.47333333, 0.34 ],
[ 0.51333333, 0.88 , 0.96 , 0.96 , 0.96 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.88 , 0.96 , 0.96 , 0.96 , 0.94666667,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.96 , 0.96 , 0.96666667, 0.96 , 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.96 , 0.96666667, 0.96666667, 0.94666667, 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.96666667, 0.97333333, 0.96 , 0.94666667, 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.96666667, 0.96666667, 0.96666667, 0.94666667, 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ],
[ 0.95333333, 0.96 , 0.96666667, 0.94666667, 0.94 ,
0.96 , 0.90666667, 0.47333333, 0.34 ]])
C_range = 10.0 ** np.arange(-2, 9)
gamma_range = 10.0 ** np.arange(-5, 4)
pl.figure(0, figsize=(16,6))
gs = gridspec.GridSpec(1,1)
gs.update(left=0.05, right=0.45, bottom=0.15, top=0.95)
pl.imshow(scores, interpolation='nearest', cmap=pl.cm.spectral)
pl.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)
pl.yticks(np.arange(len(C_range)), C_range)
gs = gridspec.GridSpec(3,3)
gs.update(left=0.5, right=0.95, bottom=0.05, top=0.95)
pl.subplot(gs[0,0]) # here's where the shift happens
You can create the colorbar after # here's where the shift happens
pl.figure(0, figsize=(16,6))
gs = gridspec.GridSpec(1,1)
gs.update(left=0.05, right=0.45, bottom=0.15, top=0.95)
ax = pl.subplot(gs[0,0]) # save the axes to ax
pl.imshow(scores, interpolation='nearest', cmap=pl.cm.spectral)
pl.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)
pl.yticks(np.arange(len(C_range)), C_range)
gs = gridspec.GridSpec(3,3)
gs.update(left=0.5, right=0.95, bottom=0.05, top=0.95)
pl.subplot(gs[0,0]) # here's where the shift happens
pl.colorbar(ax=ax) # create colorbar for ax
I'm in the process of decompiling a hex string made up of analog values. I was told that it consists of 4 byte hex per number.
#0-799 Load data, 4 bytes each
#800-1599 Position data, 4 bytes each
I'm trying to decode it but can't seem to get the results that were given to me. I'm wondering if perhaps my scaling functions are incorrect for unsigned 32 to float conversion. I realize it could be my engineering units (new max, new min) that it is being scaled too but i want to rule out that i am doing something incorrectly.
Here is my code
import matplotlib.pyplot as plt
#this hex string is from the sample given
hex_string = '00706A450090574500F0484500C0394500802D45001027450050284500E0304500603E4500204D450010594500C05E4500505C4500F051450060414500C02D4500E01A4500C00C4500C0064500000B4500D019450060314500E04D4500006A450008804500A08545007884450030794500205F4500E03F4500D0214500F00A450080FF440070024500D0124500202E450060504500F0744500D08B4500009B4500A0A7450040B24500F8BB4500F0C54500D0D0450080DC450028E8450068F2450000FA450028FE4500D8FE4500F0FC4500B8F9450088F6450068F44500D8F34500C8F44500B8F6450028F94500A0FB4500F0FD45000C0046002401460058024600B803460038054600B8064600F4074600980846005C08460010074600B40446009001460050FC450040F6450048F2450068F1450020F4450040FA45006C01460044064600BC0A4600F80D4600280F4600CC0D4600B409460010034600C0F44500D0E04500F8CB4500C8B7450090A5450038964500308A45009881450060784500B072450010704500206E4500706A45000000005C8F0240D7A380408FC2CD409A9911413D0A3F4152B86E4152B89041F628AC41713DCA41A470EB413D0A08427B141C42CDCC314285EB4842C3F56042B81E79420A578842856B93423D8A9D4214AEA6420000AF42ECD1B6425C8FBE423D8AC6425C0FCF425238D8420000E2423333EC42E17AF64252380043ECD104439AD9084333330C43D7E30E43CD0C114348E11243299C1443146E1643D7631843146E1A43856B1C43A4301E43B89E1F43F6A82043B85E214329DC21431F45224385AB22430A1723435278234314AE23439A9923437B1423437B1422435C8F20437B941E4333331C438F821943CD8C16430A5713439AD90F433D0A0C4366E60743A4700343C375FD4214AEF34252B8E94252B8DF4271BDD542CDCCCB4285EBC1427B14B8428F42AE42A470A44200809A42CD4C9042F6A88542B81E75427B145E420AD74642F628304285EB1A4252B807429A99ED410000D0418FC2B5410AD79D410AD787413D0A6741713D4241A4702141EC5104413333D3400000A040CDCC5C4000000040295C4F3F0AD7A33D00000000'
#Signed Integer: A 16-bit signed integer ranging from -32,768 to +32,767
#Unsigned Integer: A 16-bit unsigned integer ranging from 0 to 65535.
#signed 32 bit int range is -2147483648 to 2147483647
#unsigned 32 int range is 0 to 4294967295
#sampe hex string:#5C8F0240
#Converted 32 bit equivalent: 1552876096
#Card hex string byte layout Description
#0-799 Load data, 4 bytes each
#800-1599 Position data, 4 bytes each
#create a new function that scales the 32 bit unsigned integer position value into a float value. Assuming a stroke range of 0-168 inches and that the integer value is unsigned 32 bit
def u32int_pos_to_float(u32int_value):
OldValue = u32int_value
OldMin = 0
OldMax = 4294967295
NewMin = 0
NewMax = 168
NewValue = (((OldValue - OldMin) * (NewMax - NewMin)) / (OldMax - OldMin)) + NewMin
return NewValue
#create a new function that scales the 32 bit unsigned integer load value into a float value. Assuming a load cell of 0-30000 lbs and that the integer value is unsigned 32 bit
def u32int_load_to_float(u32int_value):
OldValue = u32int_value
OldMin = 0
OldMax = 4294967295
NewMin = 0
NewMax = 30000
NewValue = (((OldValue - OldMin) * (NewMax - NewMin)) / (OldMax - OldMin)) + NewMin
return NewValue
#Card hex string byte layout Description
#0-799 Load data, 4 bytes each
#800-1599 Position data, 4 bytes each
#A byte (or octet) is 8 bits so is always represented by 2 Hex characters in the range 00 to FF
#4 bytes = 32 bits
#Find the middle index of the hex_string and split the string into two halves
print(' ')
print('-----this is the start of the hex string conversion logic-----')
print(' ')
print('The hex string is: ' + hex_string)
print(' ')
print('hex string length: ' + str(len(hex_string)))
middle_of_String = int(len(hex_string)/2)
print('middle of string is:',middle_of_String,)
lastloadinstring = middle_of_String
startposinstring = middle_of_String
hex_load_string = hex_string[:lastloadinstring]
hex_pos_string = hex_string[startposinstring:]
print(' ')
print('----Start of the hexadecial load and position lists from Hex String dividing in half----')
print(' ')
print('hex_load_string length:',len(hex_load_string))
print('hex_pos_string length:',len(hex_pos_string))
#parse the hex strings into 4 byte chunks
hex_load_list = [hex_load_string[i:i+8] for i in range(0, len(hex_load_string), 8)]
hex_pos_list = [hex_pos_string[i:i+8] for i in range(0, len(hex_pos_string), 8)]
print(' ')
print('---start of the hexadecimal load and position 4 byte "chunks" list----')
print('----Note from developer: 0-799 Load data and 800-1599 Position data are 4 bytes each----')
print(' ')
print('hex_load_list length:',len(hex_load_list))
print('hex_pos_list length:',len(hex_pos_list))
#convert each hex chunk to 32 bit unsigned integer
load_list_int = []
pos_list_int = []
for i in range(0, len(hex_load_list)):
print(' ')
print('----start of the load and position unsigned 32 bit integer list----')
print(' ')
print('load_list_int length:',len(load_list_int))
print('pos_list_int length:',len(pos_list_int))
#using the new function, convert the 32 bit unsigned integers to a new list of floats
load_list_float = []
for i in range(0, len(load_list_int)):
pos_list_float = []
for i in range(0, len(pos_list_int)):
print(' ')
print('----start of the load and position floating value lists----')
print('--these are scaled using the functions are the top of the code. All scaled as unsigned 32 bit integers---')
print('---engineering units for scaling in function comments-----')
print(' ')
print('load_list_float length:',len(load_list_float))
print('pos_list_float length:',len(pos_list_float))
#create a scatter plot of the position and load data
plt.scatter(pos_list_float, load_list_float)
plt.xlabel('Position (inches)')
plt.ylabel('Load (lbs)')
plt.title('Position vs Load')
the following are the expected results.
position load
I'm a newby at python and been staring at this too long so any help would be appreciated.
Your most likely error is a little-endian big-endian misinterpretation. Looking at the structure of the data it's much more likely that these are little-endian values. Beyond that, you haven't shown realistic-enough code, and a complete-enough specification to pinpoint exactly why your actual data don't match the sample; but at least the endianness fix produces results that are more plausible.
To even get into the neighbourhood of the expected values, you need to replace your scales (i.e. 30,000) with inferred values from linear regression. This does not produce exactly the expected results, but again, they're in the neighbourhood.
Don't scatterplot. Since these data vary somewhat continuously it's important to see the relationship of readings over time.
import struct
import matplotlib.pyplot as plt
hex_string = (
Card hex string byte layout description
0-799 Load data, 4 bytes each
800-1599 Position data, 4 bytes each
uints = struct.unpack('<200I', bytearray.fromhex(hex_string))
floats = [x/(1 << 32) for x in uints]
load = [x*173611 - 32993 for x in floats[:100]]
pos = [x*63 - 15.6437 for x in floats[100:]]
print(f'{"pos":>8} {"load":>8}')
for p, l in zip(pos, load):
print(f'{p:8.2f} {l:8.2f}')
# Omit 0-valued position endpoints
plt.plot(pos[1:-1], load[1:-1])
plt.xlabel('Position (inches)')
plt.ylabel('Load (lbs)')
plt.title('Position vs Load')
For anyone encountering the same problem. When converting data types in python i think its a good practice to convert the original data to binary THEN to the target format. In my problem i was going directly from original type to target type which was causing all sort of problems due to python making assumptions. I suppose its a good and bad thing. Anyways, use struct and binascii libraries for this conversion. Below is my code. Also refer to python sections.
import struct as st
import binascii as b
#this function is used to convert the hex string to a position and load list of floating points
def hex_str_to_pl_lists(hex_string):
#convert the hex string to binary
hex_string_bin = b.unhexlify(hex_string)
#unpack the binary string into a list of floats in little endian format
hex_string_float = st.unpack('<' + 'f' * (len(hex_string_bin) // 4), hex_string_bin
First of all, I'm sorry if my questions doesn't make sens, I am new using CVXPY library and I don't understand everything :/
I am trying to solve a minimization problem that I thought would be easy handle.
I got a matrix S dimensions (9,7) with known coefficients, B dimensions (1,7) with known coefficients, Alpha dimensions (1,7) what I need to find, with various constraints :
Alpha must be positive
The sum of all the coefficients of Alpha must be equal to 1
I need to optimize Alpha such as : A # Alpha-B=0.
I discovered CVXPY and thought least square optimization was perfect for this issue.
This is the code I wrote :
Alpha = cp.Variable(7)
objective = cp.Minimize(cp.sum_squares(S # Alpha - B))
constraints = [0 <= Alpha, Alpha<=1, np.sum(Alpha.value)==1]
prob = cp.Problem(objective, constraints)
result = prob.solve()
S= np.array([[0.03,0.02,0.072,0.051,0.058,0.0495,0.021 ],
[0.0295, 0.025 , 0.1 , 0.045 , 0.064 , 0.055 , 0.032 ],
[0.02 , 0.018 , 0.16 , 0.032 , 0.054 , 0.064 , 0.025 ],
[0.0195, 0.03 , 0.144 , 0.027 , 0.04 , 0.06 , 0.04 ],
[0.02 , 0.0315, 0.156 , 0.0295 ,0.027 , 0.0615 ,0.05 ],
[0.021 , 0.033 , 0.168 , 0.03 , 0.0265 ,0.063 , 0.09 ],
[0.02 , 0.05 , 0.28 , 0.039 , 0.035 , 0.055 , 0.04 ],
[0.021 , 0.03 , 0.22 , 0.0305, 0.0255, 0.057 , 0.009 ],
[0.0195, 0.008 , 0.2 , 0.021 , 0.01 , 0.048 , 0.0495]])
B=np.array([0.1015, 0.0888, 0.0911, 0.0901, 0.0945, 0.0909, 0.078 , 0.0913,
My issue is the following one :
Without the constraint np.sum(Alpha.value)==1, the code gives me results; but when I add the constraint it returns me
I presume the formulation is not good, but I have no Idea how to write it in another way?
Or maybe the problem doesn't have solution?
Thank you for your time
Use just sum(Alpha) == 1. You are not supposed to use numpy functions in CVXPY expressions, you must use CVXPY functions listed in https://www.cvxpy.org/tutorial/functions/index.html
I'm trying to create a data visualization that's essentially a time series chart. But I have to use Panda, Python, and Plotly, and I'm stuck on how to actually label the dates. Right now, the x labels are just integers from 1 to 60, and when you hover over the chart, you get that integer instead of the date.
I'm pulling values from a Google spreadsheet, and for now, I'd like to avoid parsing csv things.
I'd really like some help on how to label x as dates! Here's what I have so far:
import pandas as pd
from matplotlib import pyplot as plt
import bpr
%matplotlib inline
import chart_studio.plotly as pl
import plotly.express as px
import plotly.graph_objects as go
f = open("../credentials.txt")
u = f.readline()
plotly_user = str(u[:-1])
k = f.readline()
plotly_api_key = str(k)
pl.sign_in(username = plotly_user, api_key = plotly_api_key)
rand_x = np.arange(61)
rand_x = np.flip(rand_x)
rand_y = np.array([0.91 , 1 , 1.24 , 1.25 , 1.4 , 1.36 , 1.72 , 1.3 , 1.29 , 1.17 , 1.57 , 1.95 , 2.2 , 2.07 , 2.03 , 2.14 , 1.96 , 1.87 , 1.25 , 1.34 , 1.13 , 1.31 , 1.35 , 1.54 , 1.38 , 1.53 , 1.5 , 1.32 , 1.26 , 1.4 , 1.89 , 1.55 , 1.98 , 1.75 , 1.14 , 0.57 , 0.51 , 0.41 , 0.24 , 0.16 , 0.08 , -0.1 , -0.24 , -0.05 , -0.15 , 0.34 , 0.23 , 0.15 , 0.12 , -0.09 , 0.13 , 0.24 , 0.22 , 0.34 , 0.01 , -0.08 , -0.27 , -0.6 , -0.17 , 0.28 , 0.38])
test_data = pd.DataFrame(columns=['X', 'Y'])
test_data['X'] = rand_x
test_data['Y'] = rand_y
def create_line_plot(data, x, y, chart_title="Rate by Date", labels_dict={}, c=["indianred"]):
fig = px.line(
x = x,
y = y,
title = chart_title,
labels = labels_dict,
color_discrete_sequence = c
return fig
fig = create_line_plot(test_data, 'X', 'Y', labels_dict={'X': 'Date', 'Y': 'Rate (%)'}) ```
Right now, the x labels are just integers from 1 to 60, and when you hover over the chart, you get that integer instead of the date.
This happens because you are setting rand_x as x labels, and rand_x is an array of integer. Setting labels_dict={'X': 'Date', 'Y': 'Rate (%)'} only adding text Date before x value. What you need to do is parsing an array of datetime values into x. For example:
rand_x = np.array(['2020-01-01','2020-01-02','2020-01-03'], dtype='datetime64')
I want to create a numpy array.
T = 200
I want to create an array from 0 to 199, in which each value will be divided by 200.
l = [0, 1/200, 2/200, ...]
Numpy have any such method for calculation?
Alternatively one can use linspace:
>>> np.linspace(0, 1., 200, endpoint=False)
array([ 0. , 0.005, 0.01 , 0.015, 0.02 , 0.025, 0.03 , 0.035,
0.04 , 0.045, 0.05 , 0.055, 0.06 , 0.065, 0.07 , 0.075,
0.92 , 0.925, 0.93 , 0.935, 0.94 , 0.945, 0.95 , 0.955,
0.96 , 0.965, 0.97 , 0.975, 0.98 , 0.985, 0.99 , 0.995])
Use np.arange:
>>> import numpy as np
>>> np.arange(200, dtype=np.float)/200
array([ 0. , 0.005, 0.01 , 0.015, 0.02 , 0.025, 0.03 , 0.035,
0.04 , 0.045, 0.05 , 0.055, 0.06 , 0.065, 0.07 , 0.075,
0.08 , 0.085, 0.09 , 0.095, 0.1 , 0.105, 0.11 , 0.115,
0.88 , 0.885, 0.89 , 0.895, 0.9 , 0.905, 0.91 , 0.915,
0.92 , 0.925, 0.93 , 0.935, 0.94 , 0.945, 0.95 , 0.955,
0.96 , 0.965, 0.97 , 0.975, 0.98 , 0.985, 0.99 , 0.995])
T = 200.0
l = [x / float(T) for x in range(200)]
import numpy as np
T = 200
np.linspace(0.0, 1.0 - 1.0 / float(T), T)
Personally I prefer linspace for creating evenly spaced arrays in general. It is more complex in this case as the endpoint depends on the number of points T.
I have a dataframe 16k records and multiple groups of countries and other fields. I have produced an initial output of the a data that looks like the snipit below. Now i need to do some data cleansing, manipulating, remove skews or outliers and replace it with a value based on certain rules.
i.e. on the below how could i identify the skewed points (any value greater than 1) and replace them with the average of the next two records or previous record if there no later records.(in that group)
So in the dataframe below I would like to replace Bill%4 for IT week1 of 1.21 with the average of week2 and week3 for IT so it is 0.81.
any tricks for this?
Country Week Bill%1 Bill%2 Bill%3 Bill%4 Bill%5 Bill%6
IT week1 0.94 0.88 0.85 1.21 0.77 0.75
IT week2 0.93 0.88 1.25 0.80 0.77 0.72
IT week3 0.94 1.33 0.85 0.82 0.76 0.76
IT week4 1.39 0.89 0.86 0.80 0.80 0.76
FR week1 0.92 0.86 0.82 1.18 0.75 0.73
FR week2 0.91 0.86 1.22 0.78 0.75 0.71
FR week3 0.92 1.29 0.83 0.80 0.75 0.75
FR week4 1.35 0.87 0.84 0.78 0.78 0.74
I don't know of any built-ins to do this, but you should be able to customize this to meet your needs, no?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10,5),columns=list('ABCDE'))
df.index = list('abcdeflght')
# Define cutoff value
cutoff = 0.90
for col in df.columns:
# Identify index locations above cutoff
outliers = df[col][ df[col]>cutoff ]
# Browse through outliers and average according to index location
for idx in outliers.index:
# Get index location
loc = df.index.get_loc(idx)
# If not one of last two values in dataframe
if loc<df.shape[0]-2:
df[col][loc] = np.mean( df[col][loc+1:loc+3] )
df[col][loc] = np.mean( df[col][loc-3:loc-1] )