Pyplot not plotting marker for detected peaks - python

I'm writing a Python script that plots a candlestick chart of with x markers indicating peak candlesticks. The used data is a series of USD/JPY rates read using pandas.read_csv() from a csv file provided by Oanda API. The result of pandas.DataFrame.head() is as follows:
time close open high low volume
0 2016/08/19 06:00:00 100.256 99.919 100.471 99.887 30965
1 2016/08/22 06:00:00 100.335 100.832 100.944 100.221 32920
2 2016/08/23 06:00:00 100.253 100.339 100.405 99.950 26069
3 2016/08/24 06:00:00 100.460 100.270 100.619 100.104 22340
4 2016/08/25 06:00:00 100.546 100.464 100.627 100.314 17224
While the candlestick chart itself is displayed properly (although it needs some foramtting), I don't see any markers on it.
What I expect is something like an example graph output shown on the scipy.signal.find_peaks document, only it is a candlestick chart instead of a line graph.
Here is my code:
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
import mpl_finance
df = pd.read_csv(sys.argv[1])
opens = df['open']
highs = df['high']
lows = df['low']
closes = df['close']
indices = find_peaks(highs)[0]
fig = plt.figure(figsize=(12, 4))
ax1 = fig.add_subplot(1, 1, 1)
mpl_finance.candlestick2_ohlc(ax1, opens, highs, lows, closes, width=4, colorup='k', colordown='r', alpha=0.75)
ax1.plot(x=indices, y=[highs[j] for j in indices], fmt="x", label="peak highs")
ax1.grid()
plt.show()
I suspected it's either the x or y parameter of ax1.plot() being empty, which is shown otherwise using pdb debugger:
-> ax1.plot(x=indices, y=[highs[j] for j in indices], fmt="x", label="peak highs")
(Pdb) indices
array([ 1, 10, 15, 18, 23, 25, 29, 34, 39, 47, 50, 59, 66,
70, 74, 76, 78, 81, 84, 87, 92, 95, 99, 101, 107, 113,
118, 126, 130, 138, 143, 145, 158, 161, 164, 170, 172, 176, 182,
186, 196, 203, 208, 215, 220, 222, 226, 230, 233, 237, 241, 246,
248, 256, 261, 263, 267, 282, 286, 290, 293, 296, 304, 306, 308,
310, 313, 316, 322, 331, 336, 342, 349, 352, 359, 367, 369, 373,
378, 382, 391, 395, 400, 403, 405, 411, 416, 422, 425, 428, 438,
441, 444, 447, 450, 454, 459, 466, 471, 473, 477, 485, 493, 497],
dtype=int32)
(Pdb) [highs[j] for j in indices]
[100.944, 104.33, 103.07, 103.367, 102.79799999999999, 101.258, 101.851, 104.17399999999999, 104.64299999999999, 104.882, 105.544, 106.95700000000001, 111.375, 113.911, 114.837, 114.78399999999999, 114.415, 116.134, 118.676, 118.251, 117.822, 118.624, 117.54299999999999, 116.89, 115.634, 115.38600000000001, 113.538, 114.962, 113.787, 114.765, 115.512, 115.2, 112.213, 111.48, 111.587, 109.23299999999999, 109.5, 111.79, 113.05799999999999, 114.39299999999999, 112.135, 111.721, 110.823, 111.8, 112.47399999999999, 112.935, 113.696, 114.505, 113.583, 112.429, 112.21600000000001, 110.99, 111.05799999999999, 110.95700000000001, 109.833, 109.85600000000001, 110.678, 112.72399999999999, 113.264, 113.20200000000001, 113.446, 112.834, 113.589, 114.10700000000001, 114.25, 114.462, 114.288, 114.742, 113.91799999999999, 111.70100000000001, 113.095, 113.758, 113.64399999999999, 113.398, 113.39299999999999, 111.49, 111.23200000000001, 109.77799999999999, 110.491, 109.79, 107.912, 107.685, 106.47, 107.06200000000001, 107.305, 106.65, 107.01799999999999, 107.499, 107.405, 107.788, 109.552, 110.044, 109.406, 110.02600000000001, 110.461, 111.40299999999999, 109.84899999999999, 110.275, 110.85799999999999, 110.91, 110.765, 111.14399999999999, 112.80799999999999, 113.18700000000001]
Could anyone give me a possible solution or an explanation of the cause?

Related

Candlestick chart add_trace(mode="markers") gives wrong output

I'm currently building a financial dashboard with dash and plotly. I added the following candlestick chart to my dashboard:
candlestick_chart = go.Figure(data=[go.Candlestick(x=financial_data["Date"],
open=financial_data['Open'],
high=financial_data['High'],
low=financial_data['Low'],
close=financial_data['Close'])])
Which returns the expected result:
I would like to be able to highlight specific candlesticks (e.g. with a marker)
I tried to achieve this with the add_trace function and the following code:
candlestick_chart.add_trace(
go.Scatter(
x=["2020-07-01"],
y=["350"],
mode="markers",
marker=dict(symbol="6")
)
)
But this ruins the chart.
Why does that happen? How can I fix this?
EDIT: ADDED DATASOURCE
I got the data from https://finance.yahoo.com/quote/SPY/history?p=SPY with Time period set to max.
I parsed the data the following way:
start = "2000-01-01"
end = "2021-01-01"
# Get a pandas dataframe
datapath = ('D:\\Programmieren\\trading_bot\\etf_data\\SPY.csv')
financial_data = pd.read_csv(datapath,
parse_dates=True,
index_col=0)
financial_data= financial_data.loc[start:end]
# Process data
financial_data = financial_data["2020-06-01":"2021-01-01"]
financial_data.reset_index(inplace=True)
EDIT2: SYSTEM AND VERSIONS
My packages have the following versions:
print(pd.__version__) # 1.2.3
print(plotly.__version__) # 4.14.3
And I am working with:
Windows 10 Home (64-Bit)
Python 3.9
Python 3.8 doesn't work either
This could be regarded as a version issue, but the core problem is that you've defined your y-value as a list of strings with ["350"] instead of a number like [350] in:
go.Scatter(
x=["2020-07-01"],
y=["350"],
mode="markers",
marker=dict(symbol="6")
)
)
Different versions of plotly seem to handle this differently. Simply remove the quotation marks to let Plotly interpret the value as a number instead to produce this:
Complete code with sample data
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
# data
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')
df=df.tail(10)
# set up figure with values not high and not low
# include candlestick with rangeselector
fig = go.Figure(go.Candlestick(x=df['Date'],
open=df['AAPL.Open'], high=df['AAPL.High'],
low=df['AAPL.Low'], close=df['AAPL.Close']))
fig.add_trace(
go.Scatter(
x=["2017-02-10"],
y=[135],
mode="markers+text",
marker=dict(symbol='triangle-down-open', size = 12),
# text = 'important',
# textposition = 'middle right'
)
)
fig.show()
When I run your code I get the following error:
ValueError:
Invalid value of type 'builtins.str' received for the 'symbol' property of scatter.marker
Received value: '6'
The 'symbol' property is an enumeration that may be specified as:
- One of the following enumeration values:
[0, 'circle', 100, 'circle-open', 200, 'circle-dot', 300,
'circle-open-dot', 1, 'square', 101, 'square-open', 201,
'square-dot', 301, 'square-open-dot', 2, 'diamond', 102,
'diamond-open', 202, 'diamond-dot', 302,
'diamond-open-dot', 3, 'cross', 103, 'cross-open', 203,
'cross-dot', 303, 'cross-open-dot', 4, 'x', 104, 'x-open',
204, 'x-dot', 304, 'x-open-dot', 5, 'triangle-up', 105,
'triangle-up-open', 205, 'triangle-up-dot', 305,
'triangle-up-open-dot', 6, 'triangle-down', 106,
'triangle-down-open', 206, 'triangle-down-dot', 306,
'triangle-down-open-dot', 7, 'triangle-left', 107,
'triangle-left-open', 207, 'triangle-left-dot', 307,
'triangle-left-open-dot', 8, 'triangle-right', 108,
'triangle-right-open', 208, 'triangle-right-dot', 308,
'triangle-right-open-dot', 9, 'triangle-ne', 109,
'triangle-ne-open', 209, 'triangle-ne-dot', 309,
'triangle-ne-open-dot', 10, 'triangle-se', 110,
'triangle-se-open', 210, 'triangle-se-dot', 310,
'triangle-se-open-dot', 11, 'triangle-sw', 111,
'triangle-sw-open', 211, 'triangle-sw-dot', 311,
'triangle-sw-open-dot', 12, 'triangle-nw', 112,
'triangle-nw-open', 212, 'triangle-nw-dot', 312,
'triangle-nw-open-dot', 13, 'pentagon', 113,
'pentagon-open', 213, 'pentagon-dot', 313,
'pentagon-open-dot', 14, 'hexagon', 114, 'hexagon-open',
214, 'hexagon-dot', 314, 'hexagon-open-dot', 15,
'hexagon2', 115, 'hexagon2-open', 215, 'hexagon2-dot',
315, 'hexagon2-open-dot', 16, 'octagon', 116,
'octagon-open', 216, 'octagon-dot', 316,
'octagon-open-dot', 17, 'star', 117, 'star-open', 217,
'star-dot', 317, 'star-open-dot', 18, 'hexagram', 118,
'hexagram-open', 218, 'hexagram-dot', 318,
'hexagram-open-dot', 19, 'star-triangle-up', 119,
'star-triangle-up-open', 219, 'star-triangle-up-dot', 319,
'star-triangle-up-open-dot', 20, 'star-triangle-down',
120, 'star-triangle-down-open', 220,
'star-triangle-down-dot', 320,
'star-triangle-down-open-dot', 21, 'star-square', 121,
'star-square-open', 221, 'star-square-dot', 321,
'star-square-open-dot', 22, 'star-diamond', 122,
'star-diamond-open', 222, 'star-diamond-dot', 322,
'star-diamond-open-dot', 23, 'diamond-tall', 123,
'diamond-tall-open', 223, 'diamond-tall-dot', 323,
'diamond-tall-open-dot', 24, 'diamond-wide', 124,
'diamond-wide-open', 224, 'diamond-wide-dot', 324,
'diamond-wide-open-dot', 25, 'hourglass', 125,
'hourglass-open', 26, 'bowtie', 126, 'bowtie-open', 27,
'circle-cross', 127, 'circle-cross-open', 28, 'circle-x',
128, 'circle-x-open', 29, 'square-cross', 129,
'square-cross-open', 30, 'square-x', 130, 'square-x-open',
31, 'diamond-cross', 131, 'diamond-cross-open', 32,
'diamond-x', 132, 'diamond-x-open', 33, 'cross-thin', 133,
'cross-thin-open', 34, 'x-thin', 134, 'x-thin-open', 35,
'asterisk', 135, 'asterisk-open', 36, 'hash', 136,
'hash-open', 236, 'hash-dot', 336, 'hash-open-dot', 37,
'y-up', 137, 'y-up-open', 38, 'y-down', 138,
'y-down-open', 39, 'y-left', 139, 'y-left-open', 40,
'y-right', 140, 'y-right-open', 41, 'line-ew', 141,
'line-ew-open', 42, 'line-ns', 142, 'line-ns-open', 43,
'line-ne', 143, 'line-ne-open', 44, 'line-nw', 144,
'line-nw-open']
- A tuple, list, or one-dimensional numpy array of the above
To fix this, I simply just gave the marker value one of the values that it instructed for example I did marker=dict(symbol='triangle-down-open') which returned a graph like this:
The code for the graphs is:
candlestick_chart = go.Figure(data=[go.Candlestick(x=financial_data["Date"],
open=financial_data['Open'],
high=financial_data['High'],
low=financial_data['Low'],
close=financial_data['Close'])])
candlestick_chart.add_trace(
go.Scatter(
x=["2020-07-01"],
y=["350"],
mode="markers",
marker=dict(symbol='triangle-down-open')
)
)
candlestick_chart.show()

How to manually reproject from a specific projection to lat/lon

I have an array from Euro-Cordex data which has a rotated pole projection from a Netcdf file:
grid_mapping_name: rotated_latitude_longitude
grid_north_pole_latitude: 39.25
grid_north_pole_longitude: -162.0
float64 rlon(rlon)
standard_name: grid_longitude
long_name: longitude in rotated pole grid
units: degrees
axis: X
unlimited dimensions:
current shape = (424,)
filling on, default _FillValue of 9.969209968386869e+36 used),
('rlat', <class 'netCDF4._netCDF4.Variable'>
float64 rlat(rlat)
standard_name: grid_latitude
long_name: latitude in rotated pole grid
units: degrees
axis: Y
unlimited dimensions:
current shape = (412,)
The dimensions are rlon (424) and rlat (412). I used some codes to convert these rotated lat lons into normal lat/lons. Now, I have two matrices with shape of (424, 412). The first one shows the longitude coordinates, and the second one shows the latitude coordinates.
Now, I want to convert the initial image (424, 412) to a image with the extents that I want:Min lon : 25, Max lon: 45, Min Lat: 35, Max lat: 43
lats = np.empty((len(rlat), len(rlon)))
lons = np.empty((len(rlat), len(rlon)))
for j in range (len(rlon)):
for i in range(len(rlat)):
lons[i, j] = unrot_lon(rlat[i],rlon[j],39.25,-162.0)
lats[i, j] = unrot_lat(rlat[i],rlon[j],39.25,-162.0)
a = lons<=45
aa = lons>=25
aaa = a*aa
b = lats<=43
bb = lats>=35
bbb = b*bb
c = bbb*aaa
The last matrix (c) is a boolean matrix which shows the pixels that I am interested according to the extents that I defined:
Now, I want to do two things that I fail in both:
First I would like to plot this image with the boundries on a basemap. For that I located the llcrnlon, llcrnlat, urcrnlon and urcrnlon based on the boolean matrix and by using some imagination:
llcrlon = 25.02#ok
llcrlat = np.nanmin(lats[c])# ok
urcrlon = np.nanmax(lons[c])#ok
urcrlat = np.nanmax(lats[np.where(lons==urcrlon)])#ok
Then I used the following codes to plot the image on a basemap:
lonss = np.linspace(np.min(lons[c]), np.max(lons[c]), (424-306+1))
latss = np.linspace(np.min(lats[c]), np.max(lats[c]), (170-73+1))
pl.figure(dpi = 250)
map = Basemap(projection='rotpole',llcrnrlon=llcrlon,llcrnrlat=llcrlat,urcrnrlon=urcrlon,urcrnrlat=urcrlat,resolution='i', o_lat_p = 39.25, o_lon_p =-162., lon_0=35, lat_0=45)
map.drawcoastlines()
map.drawstates()
parallels = np.arange(35,43,2.) #
meridians = np.arange(25,45,2.) #
map.drawparallels(parallels,labels=[1,0,0,0],fontsize=10)
map.drawmeridians(meridians,labels=[0,0,0,1],fontsize=10)
lons, lats = np.meshgrid(lonss, latss)
x, y = map(lons, lats)
mapp = map.pcolormesh(x,y,WTD[73:170, 306:])
So, the map is not well-fit to the basemap projection. I would like to find out what is wrong.
Second, I would like to reproject this map to normal lat/lon. For that, I use the following codes to define a new grid:
targ_lons = np.linspace(25, 45, 170)
targ_lats = np.linspace(43, 35, 70)
T_Map = np.empty((len(targ_lats), len(targ_lons)))
T_Map[:] = np.nan
Then, I am trying to figure out the differences between the lon/lat matrices I produced in the beginning and my newly defined grids. Then, using the indices which represent the minimum/less than a specific threshold, fill in the new gridded image.
for i in range(len(targ_lons)):
for j in range(len(targ_lats)):
lon_extr = np.where(abs(lons-targ_lons[i])<0.01)
lat_extr = np.where(abs(lats-targ_lats[j])<0.01)
So here, if we have i=0 and j=0,
then:
lon_extr = (array([ 7, 16, 25, 34, 35, 43, 44, 53, 63, 72, 73, 82, 83, 92, 93, 102, 103, 112, 113, 122, 123, 133, 143, 153, 154, 164,
174, 175, 185, 195, 196, 206, 217, 227, 238, 248, 259, 269, 280,
290, 300, 321, 331, 341, 360, 370, 389], dtype=int64),
array([320, 319, 318, 317, 317, 316, 316, 315, 314, 313, 313, 312, 312,
311, 311, 310, 310, 309, 309, 308, 308, 307, 306, 305, 305, 304,
303, 303, 302, 301, 301, 300, 299, 298, 297, 296, 295, 294, 293,
292, 291, 289, 288, 287, 285, 284, 282], dtype=int64))
and
lat_extr=(array([143, 143, 143, 143, 143, 143, 143, 143, 143, 143, 143, 143, 143,
143, 143, 143, 143, 144, 144, 144, 144, 144, 144, 145, 145, 145,
145, 146, 146, 146, 146, 147, 147, 147, 148, 148, 149, 149, 150,
150, 151, 151, 152, 152, 153, 153, 154, 154, 155, 156, 156, 157,
157, 158, 158, 159, 159, 160, 160, 161, 162, 162, 163, 164, 164,
165, 167, 168, 168, 169, 169, 170, 170, 171, 174, 175, 177, 178,
180, 181, 183, 186, 190, 191, 192, 204, 205, 210, 214], dtype=int64),
array([251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263,
264, 265, 266, 267, 227, 228, 229, 289, 290, 291, 214, 215, 303,
304, 204, 205, 313, 314, 196, 321, 322, 189, 329, 182, 336, 176,
342, 170, 348, 165, 353, 160, 358, 155, 363, 150, 146, 372, 142,
376, 138, 380, 134, 384, 130, 388, 126, 123, 395, 119, 116, 402,
405, 106, 103, 415, 100, 418, 97, 421, 94, 86, 83, 78, 75,
70, 68, 63, 56, 47, 45, 43, 19, 17, 8, 1], dtype=int64))
Now, I need to be able to pull the common coordinates and fill in the T_Map. I'm confused at this point. Is there a function for easy way to pull out the common lat/lon from these two arrays?
The problem was solved. I used the Longitude and Latitude matrices to find the nearest pixels (less than the resolution which is 0.11 degree for this case) and fill up the new defined grid. Hope this helps others who have a similar problem:
#(45-25)*111/12.5
#(43-35)*110/12.5
targ_lons = np.linspace(25, 45, 170)
targ_lats = np.linspace(43, 35, 70)
T_Map = np.empty((len(targ_lats), len(targ_lons)))
T_Map[:] = np.nan
for i in range(len(targ_lons)):
for j in range(len(targ_lats)):
lon_extr = np.where(abs(lons-targ_lons[i])<0.1)
lat_extr = np.where(abs(lats[lon_extr]-targ_lats[j])<0.1)
if len(lat_extr[0])>0:
point_to_extract = np.where(lats == lats[lon_extr][lat_extr][0])
T_Map[j, i] = (WTD[point_to_extract])

Remove content of a list out of other list

I have some code that creates a list with numbers, from 1 to 407. What I want to do it to take the numbers of the "ultimate" and "super_rare" list out of the "common" list. How can I do that? This is the general code I have.
import random
def common(x):
list = []
for i in range(1,x+1):
list.append(i)
return (list)
cid = common(407)
ultimate = [404, 200, 212, 15, 329, 214, 406, 259, 126, 160, 343, 180, 169, 297, 226, 305, 250, 373, 142, 357, 181, 113, 149, 399, 287, 341, 37, 284, 41, 328, 400, 217, 253, 204, 290, 18, 174, 36, 310, 303, 6, 108, 47, 298, 130]
super_rare = [183, 349, 134, 69, 103, 342, 83, 380, 93, 56, 86, 95, 147, 161, 403, 197, 215, 312, 375, 359, 263, 221, 340, 102, 153, 234, 54, 7, 238, 193, 90, 367, 197, 397, 33, 366, 334, 222, 394, 371, 313, 83, 276, 35, 351, 83, 347, 170, 57, 201, 137, 188, 179, 170, 65, 107, 234, 48, 2, 85, 74, 221, 23, 171, 101, 377, 63, 248, 102, 272, 129, 276, 86, 88, 51, 197, 248, 202, 244, 153, 138, 101, 330, 68, 368, 292, 340, 315, 185, 219, 381, 89, 274, 175, 385, 19, 257, 313, 191, 211]
def new_list(cid, ultimate):
new_list = []
for i in range(len(cid)):
new_list.append(cid[i])
for i in range(len(ultimate)):
new_list.remove(ultimate[i])
return (new_list)
#print (new_list(cid, ultimate))
cid_mod0 = new_list(cid, ultimate)
cid_mod1 = new_list(cid_mod0, super_rare)
print (cid_mod0)
Most of the prints and whatnot are just tries to see if it's working.
I recommend using sets for this. You can check if an item is in a set in constant time. For example:
import random
def common(x):
return list(range(1, x + 1))
cid = common(407)
ultimate = { 404, 200, ... }
super_rare = { 183, 349, ... }
def list_difference(l, s):
return [ elem for elem in l if elem not in s ]
cid_mod0 = list_difference(cid, ultimate)
cid_mod1 = list_difference(cid_mod0, super_rare)
If you don't care about the order of your resulting list you can use a set for that as well for a bit more convenience:
import random
def common(x):
return list(range(1, x + 1))
cid = set(common(407))
ultimate = { 404, 200, ... }
super_rare = { 183, 349, ... }
cid_mod0 = cid - ultimate
cid_mod1 = cid_mod0 - super_rare
Use this loop to remove the elements out of common that are in the super_rare and ultimate lists:
for x, cnum in enumerate(cid):
if cnum in ultimate or cnum in super_rare:
del cid[x]
print(cid)
The loop assumes you have a list named cid that is already established.
If you want to keep the original order for cid, you could try to use OrderDict to convert cid as an ordered dict object, and then remove keys that you want, the code would be something like:
from random import choices, seed
from collections import OrderedDict
seed(123)
ultimate = [404, 200, 212, 15, 329, 214, 406, 259, 126, 160, 343, 180, 169, 297, 226, 305, 250, 373, 142, 357, 181, 113, 149, 399, 287, 341, 37, 284, 41, 328, 400, 217, 253, 204, 290, 18, 174, 36, 310, 303, 6, 108, 47, 298, 130]
super_rare = [183, 349, 134, 69, 103, 342, 83, 380, 93, 56, 86, 95, 147, 161, 403, 197, 215, 312, 375, 359, 263, 221, 340, 102, 153, 234, 54, 7, 238, 193, 90, 367, 197, 397, 33, 366, 334, 222, 394, 371, 313, 83, 276, 35, 351, 83, 347, 170, 57, 201, 137, 188, 179, 170, 65, 107, 234, 48, 2, 85, 74, 221, 23, 171, 101, 377, 63, 248, 102, 272, 129, 276, 86, 88, 51, 197, 248, 202, 244, 153, 138, 101, 330, 68, 368, 292, 340, 315, 185, 219, 381, 89, 274, 175, 385, 19, 257, 313, 191, 211]
cid = OrderedDict.fromkeys(choices(range(407), k=407))
_ = map(cid.pop, set(ultimate + super_rare))
result = cid.keys()
If you don't need the original order, you could try to convert cid as a dict, it's superfast to remove key from a hashmap, the code would be something like:
cid = dict.fromkeys(range(407))
_ = map(cid.pop, set(ultimate + super_rare))
result = cid.keys()
Apart from the dictionary method, you can also try to convert everything into a set variable like the following:
result = set(range(407)) - set(ultimate) - set(super_rare)
Hope it helps.
You can create your target list that includes all the numbers from the target range, but without those numbers from ultimate and super_rare list, by list comprehension:
my_filtered_list = [i for i in range(1, 408) if i not in ultimate and i not in super_rare]
print(my_filtered_list)
Make a union set of both sets of numbers you want to exclude.
>>> su = set(ultimate) | set(super_rare)
Then filter the input list based on whether the value is not present in the set.
>>> list(filter(lambda i: i not in su, cid))
[1, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 16, 17, 20, 21, 22, 24, 25, 26, 27, 28,
29, 30, 31, 32, 34, 38, 39, 40, 42, 43, 44, 45, 46, 49, 50, 52, 53, 55, 58, 59,
60, 61, 62, 64, 66, 67, 70, 71, 72, 73, 75, 76, 77, 78, 79, 80, 81, 82, 84, 87,
91, 92, 94, 96, 97, 98, 99, 100, 104, 105, 106, 109, 110, 111, 112, 114, 115,
116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 127, 128, 131, 132, 133, 135,
136, 139, 140, 141, 143, 144, 145, 146, 148, 150, 151, 152, 154, 155, 156, 157,
158, 159, 162, 163, 164, 165, 166, 167, 168, 172, 173, 176, 177, 178, 182, 184,
186, 187, 189, 190, 192, 194, 195, 196, 198, 199, 203, 205, 206, 207, 208, 209,
210, 213, 216, 218, 220, 223, 224, 225, 227, 228, 229, 230, 231, 232, 233, 235,
236, 237, 239, 240, 241, 242, 243, 245, 246, 247, 249, 251, 252, 254, 255, 256,
258, 260, 261, 262, 264, 265, 266, 267, 268, 269, 270, 271, 273, 275, 277, 278,
279, 280, 281, 282, 283, 285, 286, 288, 289, 291, 293, 294, 295, 296, 299, 300,
301, 302, 304, 306, 307, 308, 309, 311, 314, 316, 317, 318, 319, 320, 321, 322,
323, 324, 325, 326, 327, 331, 332, 333, 335, 336, 337, 338, 339, 344, 345, 346,
348, 350, 352, 353, 354, 355, 356, 358, 360, 361, 362, 363, 364, 365, 369, 370,
372, 374, 376, 378, 379, 382, 383, 384, 386, 387, 388, 389, 390, 391, 392, 393,
395, 396, 398, 401, 402, 405, 407]
If you don't want to use filter, just use a list comprehension.
>>> [v for v in cid if v not in su]
You could also do the whole thing with sets like
>>> list(set(cid) - (set(ultimate) | set(super_rare)))
Others have suggested this already (I take no credit). I'm not sure how guaranteed it is to come back in the right order. Seems to be ok on my py2 and py3, but doing the last step as a list will give you the order absolutely guaranteed (not as an implementation detail) and wont need converting back to a list as a final step.
If you want to see the changes in the original list, you can just assign back to the original variable.
cid = [v for v in cid if v not in su]
This is assigning a different list to the same variable though, so other holders of references to that list won't see the changes. You can call id(cid) before and after the assignment to see that its a different list.
If you wanted to assign back to the exact same list instance you can use
cid[:] = [v for v in cid if v not in su]
and the id will remain the same.

How to add a legend to matplotlib scatter plot

I'm attempting to plot a PCA and one of the colours is label 1 and the other should be label 2. When I want to add a legend with ax1.legend() I only get the label for the blue dot or no label at all. How can I add the legend with the correct labels for both the blue and purple dots?
sns.set(style = 'darkgrid')
fig, ax1 = sns.plt.subplots()
x1, x2 = X_bar[:,0], X_bar[:,1]
ax1.scatter(x1, x2, 100, edgecolors='none', c = colors)
fig.set_figheight(8)
fig.set_figwidth(15)
It looks like you are plotting each point oscillating between two colours. As per the answer to this question subsampling every nth entry in a numpy array You can use numpys array slicing to plot two separate arrays, then do legend as normal.
For some sample data:
import numpy as np
import numpy.random as nprnd
import matplotlib.pyplot as plt
A = nprnd.randint(1000, size=100)
A.shape = (50,2)
x1, x2 = np.sort(A[:,0], axis=0), np.sort(A[:,1], axis=0)
x1
Out[50]:
array([ 46, 63, 84, 96, 118, 127, 137, 142, 181, 187, 187, 207, 210,
238, 238, 330, 334, 335, 346, 346, 350, 392, 400, 426, 467, 531,
550, 567, 569, 572, 583, 625, 637, 661, 671, 677, 698, 713, 777,
796, 837, 850, 866, 868, 874, 890, 919, 972, 992, 993])
x2
Out[51]:
array([ 2, 44, 49, 51, 72, 84, 86, 118, 120, 133, 150, 155, 156,
159, 199, 202, 250, 281, 289, 317, 317, 386, 405, 414, 427, 461,
507, 510, 543, 552, 553, 555, 559, 576, 618, 622, 633, 647, 665,
672, 682, 685, 745, 767, 776, 802, 808, 813, 847, 973])
labels=['blue','red']
fig, ax1 = plt.subplots()
ax1.scatter(x1[0::2], x2[0::2], 100, edgecolors='none', c='red', label = 'red')
ax1.scatter(x1[1::2], x2[1::2], 100, edgecolors='none', c='black', label = 'black')
plt.legend()
plt.show()
For your code, you can do:
sns.set(style = 'darkgrid')
fig, ax1 = sns.plt.subplots()
x1, x2 = X_bar[:,0], X_bar[:,1]
ax1.scatter(x1[0::2], x2[0::2], 100, edgecolors='none', c = colors[0], label='one')
ax1.scatter(x1[1::2], x2[1::2], 100, edgecolors='none', c = colors[1], label='two')
fig.set_figheight(8)
fig.set_figwidth(15)
plt.legend()

Matplotlib: colorbar breaks when using PySAL natural breaks

I'm making a choropleth map based on this tutorial.
But instead of splitting the data into equal intervals, like this:
bins = np.linspace(values.min(), values.max(), 7)
I'm using PySAL's Jenks natural breaks because my data is unevenly distributed:
from pysal.esda.mapclassify import Natural_Breaks as nb
# values is a pandas Series
breaks = nb( values, initial=150, k = 7)
This makes the map colors look good, but it messes up the legend:
So I tried assigning Jenks colors to the map, and equal intervals to the legend, but this happens:
The colorbar is assigned the right tick labels, but at the wrong position. So my question is: how can I get the colorbar to be equal intervals but the tick labels to be the Natural Breaks values in the right position?
Here's the pertinent code for the legend:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from pysal.esda.mapclassify import Natural_Breaks as nb
values = pd.Series([71664, 65456, 60378, 50128, 46618, 44028, 42642, 41237, 35300, 34891, 34848, 33089, 29964, 25193, 25088, 23879, 23458, 18149, 16537, 15576, 15235, 14741, 11981, 11963, 11616, 10280, 9723, 9720, 9709, 9659, 9649, 9631, 9369, 8345, 8211, 7809, 7758, 7119, 7034, 6979, 6455, 5861, 5580, 5498, 5469, 5448, 5317, 4749, 4498, 4254, 4152, 3876, 3861, 3836, 3813, 3786, 3655, 3582, 3475, 2922, 2870, 2866, 2849, 2634, 2598, 2185, 1950, 1924, 1886, 1879, 1794, 1756, 1702, 1700, 1637, 1632, 1524, 1505, 1453, 1415, 1396, 1345, 1327, 1306, 1250, 1125, 1084, 1079, 1025, 976, 920, 903, 877, 868, 842, 815, 803, 799, 799, 792, 762, 725, 718, 714, 710, 660, 654, 647, 617, 616, 611, 600, 588, 572, 572, 567, 547, 536, 522, 482, 463, 439, 434, 428, 419, 415, 412, 410, 395, 390, 389, 386, 375, 374, 370, 345, 338, 325, 324, 285, 276, 272, 250, 236, 229, 227, 226, 216, 213, 209, 203, 200, 186, 186, 182, 182, 175, 173, 170, 169, 164, 164, 159, 155, 153, 148, 147, 140, 131, 129, 127, 127, 126, 124, 119, 117, 115, 114, 111, 109, 105, 103, 101, 97, 90, 89, 89, 85, 84, 77, 76, 74, 72, 71, 70, 70, 69, 62, 61, 61, 60, 57, 54, 53, 53, 51, 50, 50, 48, 44, 43, 42, 35, 34, 30, 29, 26, 23, 20, 19, 16, 15, 15, 12, 11, 9, 8, 8, 5, 3, 1])
num_colors = 7
# Jenks natural breaks for colormap
breaks = nb( values, initial=150, k = num_colors - 1)
bins = breaks.bins
# Orange-Red colormap
cm = plt.get_cmap('OrRd')
scheme = cm(1.*np.arange(num_colors)/num_colors)
fig = plt.figure(figsize=(19, 7))
ax_legend = fig.add_axes([0.35, 0.15, 0.3, 0.03], zorder=3)
cmap = mpl.colors.ListedColormap(scheme)
# Round legend ticks to nearest 100
legend_bins = np.around(bins, decimals = -2)
# Split colormap into equal intervals
legend_colors = np.linspace(values.min(), values.max(), num_colors)
cb = mpl.colorbar.ColorbarBase(ax_legend,
cmap=cmap,
ticks=legend_bins,
boundaries=legend_colors,
orientation='horizontal' )
After much wrestling, I found the answer. It's all about setting the ticks and boundaries parameters to the same thing, i.e. the bins. Then set the ticks to legend_colors.
The relevant bit to make it work is:
cb = mpl.colorbar.ColorbarBase(ax_legend,
cmap=cmap,
ticks=bins,
boundaries=bins,
orientation='horizontal' )
cb.set_ticks(legend_colors[1:])

Categories

Resources