Specify color points depending on conditions [duplicate] - python

This question already has answers here:
How to change outliers to some other colors in a scatter plot
(2 answers)
Closed 5 years ago.
I have two numpy arrays, x and y, with 7000 elements each. I want to make a scatter plot of them giving each point a different color depending on these conditions:
-BLACK if x[i]<10.
-RED if x[i]>=10 and y[i]<=-0.5
-BLUE if x[i]>=10 and y[i]>-0.5
I tried creating a list of the same length as the data with the color I want to assign to each point and then plot the data with a loop, but it takes me a long time to run it. Here's my code:
import numpy as np
import matplotlib.pyplot as plt
#color list with same length as the data
col=[]
for i in range(0,len(x)):
if x[i]<10:
col.append('k')
elif x[i]>=10 and y[i]<=-0.5:
col.append('r')
else:
col.append('b')
#scatter plot
for i in range(len(x)):
plt.scatter(x[i],y[i],c=col[i],s=5, linewidth=0)
#add horizontal line and invert y-axis
plt.gca().invert_yaxis()
plt.axhline(y=-0.5,linewidth=2,c='k')
Before that, I tried creating the same color list in the same way, but plotting the data without the loop:
#scatter plot
plt.scatter(x,y,c=col,s=5, linewidth=0)
Even though this plots the data much, much faster than using the for loop, some of the scattered points appear with a wrong color. Why not using a loop to plot the data leads to incorrect color of some points?
I also tried defining three sets of data, one for each color, and adding them to the plot separately. But this is not what I am looking for.
Is there a way to specify in the scatter plots arguments the list of colors I want to use for each point in order not to use the for loop?
PS: This is the plot I get when I don't use the for loop (wrong one):
And this one when I use the for loop (correct):

This can be done using numpy.where. Since I do not your exact x and y values I will have to use some fake data:
import numpy as np
import matplotlib.pyplot as plt
#generate some fake data
x = np.random.random(10000)*10
y = np.random.random(10000)*10
col = np.where(x<1,'k',np.where(y<5,'b','r'))
plt.scatter(x, y, c=col, s=5, linewidth=0)
plt.show()
This produces the plot below:
The line col = np.where(x<1,'k',np.where(y<5,'b','r')) is the important one. This produces a list, the same size as x and y. It fills this list with 'k','b' or 'r' depending on the condition that is written before it. So if x is less than 1, 'k' will be appended to list, else if y is less than 5 'b' will be appended and if neither of those conditions are met, 'r' will be appended to the list. This way, you do not have to use a loop to plot your graph.
For your specific data you will have to change the values in the conditions of np.where.

Related

Python Scatter plot with matrix input. Having trouble getting number of columns showing on x axis, then a dot for each value in each column

I'm making a bar chart and a scatter plot. The bar chart takes a vector as an input. I plotted the values on the x-axis, and the amount of times they repeat on the y-axis. This is did by converting the vector to a list and using .count(). That worked great and was relatively straightforward.
As for the scatterplot, the input is going to be a matrix of any x and y dimensions. The idea is to have the amount of columns in the matrix show up on the x axis going from 1,2,3,4 etc depending on how many columns the inserted matrix is. The rows of each column will consist of many different numbers that I would like all to be displayed as dots or stars above the relevant column index, i. e. Column #3 consists of values 6,2,8,5,9,5 going down, and would like a dot for each of them going up the y-axis directly on top of the number 3 on the x axis. I have tried different approaches, some with dots showing up but in wrong places, other times the x axis is completely off even though I used .len(0,:) which prints out the correct amount of columns but doesn't chart it.
My latest attempt which now doesn't even show the dots or stars:
import numpy as np # Import NumPy
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module
vector = np.array([[-3,7,12,4,0o2,7,-3],[7,7,12,4,0o2,4,12],[12,-3,4,10,12,4,-3],[10,12,4,0o3,7,10,12]])
x = len(vector[0,:])
print(x)#vector[0,:]
y = vector[:,0]
plt.plot(x, y, "r.") # Scatter plot with blue stars
plt.title("Scatter plot") # Set the title of the graph
plt.xlabel("Column #") # Set the x-axis label
plt.ylabel("Occurences of values for each column") # Set the y-axis label
plt.xlim([1,len(vector[0,:])]) # Set the limits of the x-axis
plt.ylim([-5,15]) # Set the limits of the y-axis
plt.show(vector)
The matrix shown at the top is just one I made up for the purpose of testing, the idea is that it should work for any given matrix which is imported.
I tried the above pasted code which is the closest I have gotten as it actually prints the amount of columns it has, but it doesn't show them on the plot. I haven't gotten to a point where it actually plots the points above the columns on y axis yet, only in completely wrong positions in a previous version.
import numpy as np # Import NumPy
import matplotlib.pyplot as plt # Import the matplotlib.pyplot module
vector = np.array([[-3,7,12,4,0o2,7,-3],
[7,7,12,4,0o2,4,12],
[12,-3,4,10,12,4,-3],
[10,12,4,0o3,7,10,12]])
rows, columns = vector.shape
plt.title("Scatter plot") # Set the title of the graph
plt.xlabel("Column #") # Set the x-axis label
plt.ylabel("Occurences of values for each column") # Set the y-axis label
plt.xlim([1,columns]) # Set the limits of the x-axis
plt.ylim([-5,15]) # Set the limits of the y-axis
for i in range(1, columns+1):
y = vector[:,i-1]
x = [i] * rows
plt.plot(x, y, "r.")
plt.show()

Matplotlib - how to prevent graph from backtracking [duplicate]

This question already has answers here:
Matplotlib plotting in wrong order
(6 answers)
Closed 3 years ago.
I am new to Matplotlib. How do I prevent my graph from backtracking? Note the hook in the upper right of my graph. The X axis are strings, the Y axis are floats.
Here is my data:
['192.168.0.1', 2.568]
['96.120.96.153', 14.139]
['96.110.232.169', 10.505]
['162.151.49.133', 11.446]
['68.86.90.225', 24.335]
['68.86.84.226', 23.631]
['68.86.83.94', 29.011]
['173.167.58.162', 35.688]
['209.58.57.17', 162.768]
['64.86.79.2', 187.42]
['64.86.21.104', 162.461]
['63.243.205.1', 166.525]
['120.29.217.66', 156.898]
['209.58.86.143', 156.785]
['120.29.217.66', 181.599]
And the corresponding code:
import matplotlib.pyplot as plt
# x axis values
a = []
# corresponding y axis values
b = []
for k in range(15):
a.append(dataArray[k][0])
b.append(dataArray[k][1])
# plotting the points
plt.plot(a, b)
# naming the x axis
plt.xlabel('Hop Addresses')
# naming the y axis
plt.ylabel('average time (ms)')
# giving a title to my graph
plt.title('Time vs. Hops')
# function to show the plot
plt.show()
The issue you're having is that plt.plot() will plot lines from one data point to the next.
It is evident by the data that each timestep is independent of the other, but the order of IP addresses is important. plt.plot() tends to indicate a trend. Since the data is not a trend, rather a series of independent events, a different type of graph might be more appropriate. I suggest a bar graph in this case. And since the x-axis labels are bunched up, a horizontal bar chart would be even better.
You'll notice in the graph below that 120.29.217.66 appears twice. Once for each hop. To do account for this, plot against the index in the list rather than the IP address and then replace the y-axis labels.
import matplotlib.pyplot as plt
data = [
['192.168.0.1', 2.568],
['96.120.96.153', 14.139],
['96.110.232.169', 10.505],
['162.151.49.133', 11.446],
['68.86.90.225', 24.335],
['68.86.84.226', 23.631],
['68.86.83.94', 29.011],
['173.167.58.162', 35.688],
['209.58.57.17', 162.768],
['64.86.79.2', 187.42],
['64.86.21.104', 162.461],
['63.243.205.1', 166.525],
['120.29.217.66', 156.898],
['209.58.86.143', 156.785],
['120.29.217.66', 181.599],
]
idxs = range(len(data))
ips = [i[0] for i in data]
times = [i[1] for i in data]
plt.barh(idxs, times) # plot times vs the index of the array
plt.ylabel('Hop Addresses')
plt.xlabel('average time (ms)')
plt.title('Time vs. Hops')
plt.yticks(idxs, ips) # Replace tick labels with IP Addresses
plt.tight_layout()
plt.show()
Since we tend to read from top to bottom, you can always flip the y-axis.
plt.gca().invert_yaxis()
A line connects points in sequence. You need the data to be in the correct order, for the line graph to make sense. We can also just use the index as the 'x' value. For instance, for your data as below
data = [['192.168.0.1', 2.568],
['96.120.96.153', 14.139],
['96.110.232.169', 10.505],
['162.151.49.133', 11.446],
['68.86.90.225', 24.335],
['68.86.84.226', 23.631],
['68.86.83.94', 29.011],
['173.167.58.162', 35.688],
['209.58.57.17', 162.768],
['64.86.79.2', 187.42],
['64.86.21.104', 162.461],
['63.243.205.1', 166.525],
['120.29.217.66', 156.898],
['209.58.86.143', 156.785],
['120.29.217.66', 181.599]]
we can get only the delay in the same order provided by traceroute. Then the graph will be correct.
from matplotlib import pyplot as plt
plt.plot([i[1] for i in data])
plt.show()

How to style/format point markers in Plotly 3D scatterplot?

I am unsure how to customize scatterplot marker styles in Plotly scatterplots.
Specifically, I have a column predictions that is 0 or 1 (1 represents an unexpected value) and even though I used the symbol parameter in px.scatter_3d to indicate the unexpected value through varying point shape (diamond for 1 and circle for 0), the difference is very subtle and I want it to be more dramatic. I was envisioning something like below (doesn't need to be exactly this), but something along the lines of the diamond shaped points have a different outline colors or an additional shape/bubble around it. How would I do this?
Additionally, I have a set column which can take on one of two values, set A or set B. I used the color parameter inside px.scatter_3d and made that equal to set so the points are colored according to which set it came from. While it is doing what I asked, I don't want the colors to be blue and red, but any two colors I specify. How would I be able to this (let's say I want the colors to be blue and orange instead)? Thank you so much!
Here is the code I used:
fig = px.scatter_3d(X_combined, x='x', y='y', z='z',
color='set', symbol='predictions', opacity=0.7)
fig.update_traces(marker=dict(size=12,
line=dict(width=5,
color='Black')),
selector=dict(mode='markers'))
You can use multiple go.Scatter3d() statements and gather them in a list to format each and every segment or extreme values more or less exactly as you'd like. This can be a bit more demanding than using px.scatter_3d(), but it will give you more control. The following plot is produced by the snippet below:
Plot:
Code:
import plotly.graph_objects as go
import numpy as np
import pandas as pd
# sample data
t = np.linspace(0, 10, 50)
x, y, z = np.cos(t), np.sin(t), t
# plotly data
data=[go.Scatter3d(x=[x[2]], y=[y[2]], z=[z[2]],mode='markers', marker=dict(size=20), opacity=0.8),
go.Scatter3d(x=[x[26]], y=[y[26]], z=[z[26]],mode='markers', marker=dict(size=30), opacity=0.3),
go.Scatter3d(x=x, y=y, z=z,mode='markers')]
fig = go.Figure(data)
fig.show()
How you identify the different segmens, whether it be max or min values will be entirely up to you. Anyway, I hope this approach will be useful!

Use a list to determine matplotlib colours

I am making a basic program using matplotlib which graphs a large number of points, and calculates a value to colour those points. My issue is that as the number of points gets very large, the time it takes to individually plot each point through a for loop also gets very large. Is there any way I can use one plot statement and specify a list to use the colours for each individual point? As an example,
Current method:
colours = [(1,0,0),(0,1,0),(0,1,1)] #The length of these lists is usual in the thousands
x = [0,1,2]
y = [2,1,0]
for i in range(len(colours)):
plot([x[i]],[y[i]],'o', color = colours[i])
Whereas what I would like to use would be something more like:
plot(x,y,'o', color=colours)
Which would use each colour for each point. Is there any better way to approach this than a for loop?
You do not want to use plot, but scatter.
import matplotlib.pyplot as plt
colours = [(1,0,0),(0,1,0),(0,1,1)]
x = [0,1,2]
y = [2,1,0]
plt.scatter(x,y, c=colours)
plt.show()

matplotlib: manually change yaxis values to differ from the actual value (NOT: change ticks!) [duplicate]

I am trying to plot a data and function with matplotlib 2.0 under python 2.7.
The x values of the function are evolving with time and the x is first decreasing to a certain value, than increasing again.
If the function is plotted against time, it shows function like this plot of data against time
I need the same x axis evolution for plotting against real x values. Unfortunately as the x values are the same for both parts before and after, both values are mixed together. This gives me the wrong data plot:
In this example it means I need the x-axis to start on value 2.4 and decrease to 1.0 than again increase to 2.4. I swear I found before that this is possible, but unfortunately I can't find a trace about that again.
A matplotlib axis is by default linearly increasing. More importantly, there must be an injective mapping of the number line to the axis units. So changing the data range is not really an option (at least when the aim is to keep things simple).
It would hence be good to keep the original numbers and only change the ticks and ticklabels on the axis. E.g. you could use a FuncFormatter to map the original numbers to
np.abs(x-tp)+tp
where tp would be the turning point.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
x = np.linspace(-10,20,151)
y = np.exp(-(x-5)**2/19.)
plt.plot(x,y)
tp = 5
fmt = lambda x,pos:"{:g}".format(np.abs(x-tp)+tp)
plt.gca().xaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(fmt))
plt.show()
One option would be to use two axes, and plot your two timespans separately on each axes.
for instance, if you have the following data:
myX = np.linspace(1,2.4,100)
myY1 = -1*myX
myY2 = -0.5*myX-0.5
plt.plot(myX,myY, c='b')
plt.plot(myX,myY2, c='g')
you can instead create two subplots with a shared y-axis and no space between the two axes, plot each time span independently, and finally, adjust the limits of one of your x-axis to reverse the order of the points
fig, (ax1,ax2) = plt.subplots(1,2, gridspec_kw={'wspace':0}, sharey=True)
ax1.plot(myX,myY1, c='b')
ax2.plot(myX,myY2, c='g')
ax1.set_xlim((2.4,1))
ax2.set_xlim((1,2.4))

Categories

Resources