Scatter plot only lines that meet the condition in a specific column

Scatter plot only lines that meet the condition in a specific column - python

I would like to scatter plot a number of stations from a txt file to a map, using cartopy:
def ReadData(FileName,typee,delimee):
return np.genfromtxt(FileName, dtype=typee, delimiter=delimee, encoding='latin-1')
MyTypes = ("|U11","float","float","float","|U1","|U2","|U1","|U29")
MyDelimiters = [11,9,10,7,1,2,1,29] # station ID, lat, lon (-180 to 180), elevation (m), blank, Country code, blank, Name
RawData = ReadData(stations.txt,MyTypes,MyDelimiters)
stations.txt:
01001099999 70.9330 -8.6670 9.0 NO JAN MAYEN(NOR-NAVY) 0100109
01001599999 61.3830 5.8670 327.0 NO BRINGELAND 0100159
01003099999 77.0000 15.5000 12.0 NO HORNSUND 0100309
01008099999 78.2460 15.4660 26.8 SV LONGYEAR 0100809
01010099999 69.2930 16.1440 13.1 NO ANDOYA 0101009
2nd column represents the latitudes, 3rd column the longitudes, 4th column the elevation.
StationListID = np.array(RawData['f0'])
StationListLat = np.array(RawData['f1'])
StationListLon = np.array(RawData['f2'])
StationListElev = np.array(RawData['f3'])
I use:
import matplotlib.pyplot as plt
import cartopy.crs as crs
plt.scatter(x=StationListLon, y=StationListLat,
color="dodgerblue",
s=1,
alpha=0.5,
transform=crs.PlateCarree())
If the elevation < 0, I would like to have black dots, for > 5 green, for > 10 red and for > 15 blue dots. Where do I set the if conditions or group the lines?

Modules and data:
import numpy as np
import pandas as pd
import io
RawData = pd.read_csv(io.StringIO("""
x f1 f2 f3 stations
01001099999 70.9330 -8.6670 9.0 NOJANMAYEN(NOR-NAVY)
01001599999 61.3830 5.8670 327.0 NOBRINGELAND
01003099999 77.0000 15.5000 12.0 NOHORNSUND
01008099999 78.2460 15.4660 26.8 SVLONGYEAR
01010099999 69.2930 16.1440 13.1 NOANDOYA
"""), sep="\s", engine="python")
StationListLat = np.array(RawData['f1'])
StationListLon = np.array(RawData['f2'])
StationListElev = np.array(RawData['f3'])
You could first make labels that signify the colors using pd.cut.
color_labels = ['black', 'yellow', 'green', 'red', 'blue']
cut_bins = [-500, 0, 5, 10, 15, 500]
RawData['colors'] = pd.cut(RawData['f3'], bins=cut_bins, labels=color_labels)
Then you could use these labels to display the colors of the dots. Note that you do not have a color for values inbetween 0 and 5, I just gave it the color yellow.
As you see, I left the crs part out, if I am not mistaken it is not directly relevant for this problem.
import matplotlib.pyplot as plt
plt.scatter(x=StationListLon, y=StationListLat,
color=RawData['colors'],
s=20,
alpha=0.5)

Related

Different colors for each bubbles on map on basis of date

I have a dataset with: 'latitudine'; 'longitudine'; 'created_at'.
'created_at' has the format such as: 24/11/2019 01:00:00. Inside 'created_at' there are only two date 24 and 25 november 2019 with different hours.
I used this script to get map with bubbles with different radius, but bubbles have the same color (red). It's possible to get one color for each date (in this case 2 colors, one for 24 november and one for 25 november)?
This is the dataset:[dataset1
import pandas as pd
# Load the dataset into a pandas dataframe.
df = pd.read_csv("autostrada_a_6.csv", delimiter=';', error_bad_lines=False)
import folium
locations = df.groupby(by=['latitudine','longitudine'])\
.count()['created_at']\
.sort_values(ascending=False)
locations = locations.to_frame('value')
Make an empty map
m = folium.Map(location=[df['latitudine'].mean(), df['longitudine'].mean()], tiles="Stamen Toner", zoom_start=8)
def get_radius(freq):
if freq < 5:
return 5
elif freq < 15:
return 15
elif freq < 257:
return 45
for i,row in locations.iterrows():
#print(i,row)
folium.CircleMarker(
location=[i[0], i[1]],
radius=get_radius(row[0]),
color='crimson',
fill=True,
fill_color='crimson'
).add_to(m)
m
Or applying another script, but I have some problem, because I would like that the radius, in this case 's' was on basis of counting:
# Basemap library
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
# Set the dimension of the figure
plt.rcParams["figure.figsize"]=15,10;
# Make the background map
m=Basemap(llcrnrlon=-180, llcrnrlat=-65, urcrnrlon=180, urcrnrlat=80, projection='merc');
m.drawmapboundary(fill_color='#A6CAE0', linewidth=0);
m.fillcontinents(color='grey', alpha=0.3);
m.drawcoastlines(linewidth=0.1, color="white");
locations = df.groupby(by=['latitudine','longitudine'])\
.count()['created_at']\
.sort_values(ascending=False)
locations = locations.to_frame('value')
Make the background map
m=Basemap(llcrnrlon=-180, llcrnrlat=-65, urcrnrlon=180, urcrnrlat=80)
m.drawmapboundary(fill_color='#A6CAE0', linewidth=0)
m.fillcontinents(color='grey', alpha=0.3)
m.drawcoastlines(linewidth=0.1, color="white")
# prepare a color for each point depending on the continent.
df['label'] = pd.factorize(data['created_at'])[0]
# Add a point per position
m.scatter(
x=data['homelon'],
y=data['homelat'],
s=data['n']/6,
alpha=0.4,
c=data['label'],
cmap="Set1"
)

Markers on seaborn line plot in python

New here so putting hyperlinks. My dataframe looks like this.
HR ICULOS SepsisLabel PatientID
100.3 1 0 1
117.0 2 0 1
103.9 3 0 1
104.7 4 0 1
102.0 5 0 1
88.1 6 0 1
Access the whole file here. What I wanted is to add a marker on the HR graph based on SepsisLabel (See the file). E.g., at ICULOS = 249, Sepsis Label changed from 0 to 1. I wanted to show that at this point on graph, sepsis label changed. I was able to calculate the position using this code:
mark = dummy.loc[dummy['SepsisLabel'] == 1, 'ICULOS'].iloc[0]
print("The ICULOS where SepsisLabel changes from 0 to 1 is:", mark)
Output: The ICULOS where SepsisLabel changes from 0 to 1 is: 249
I Plotted the graph using the code:
plt.figure(figsize=(15,6))
ax = plt.gca()
ax.set_title("Patient ID = 1")
ax.set_xlabel('ICULOS')
ax.set_ylabel('HR Readings')
sns.lineplot(ax=ax,
x="ICULOS",
y="HR",
data=dummy,
marker = '^',
markersize=5,
markeredgewidth=1,
markeredgecolor='black',
markevery=mark)
plt.show()
This is what I got: Graph. The marker was supposed to be on position 249 only. But it is also on position 0. Why is it happening? Can someone help me out?
Thanks.

Working with markevery can be tricky in this case, as it strongly depends on there being exactly one entry for each patient and each ICULOS.
Here is an alternative approach, using an explicit scatter plot to draw the marker:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'HR': np.random.randn(200).cumsum() + 60,
'ICULOS': np.tile(np.arange(1, 101), 2),
'SepsisLabel': np.random.binomial(2, 0.05, 200),
'PatientID': np.repeat([1, 2], 100)})
for patient_id in [1, 2]:
dummy = df[df['PatientID'] == patient_id]
fig, ax = plt.subplots(figsize=(15, 6))
ax.set_title(f"Patient ID = {patient_id}")
ax.set_xlabel('ICULOS')
ax.set_ylabel('HR Readings')
sns.lineplot(ax=ax,
x="ICULOS",
y="HR",
data=dummy)
x = dummy[dummy['SepsisLabel'] == 1]["ICULOS"].values[0]
y = dummy[dummy['SepsisLabel'] == 1]["HR"].values[0]
ax.scatter(x=x,
y=y,
marker='^',
s=5,
linewidth=1,
edgecolor='black')
ax.text(x, y, str(x) + '\n', ha='center', va='center', color='red')
plt.show()
For your new question, here is an example how to convert the 'ICULOS' column to pandas dates. The example uses date 20210101 to correspond with ICULOS == 1. You probably have a different starting date for each patient.
df_fb = pd.DataFrame()
df_fb['Y'] = df['HR']
df_fb['DS'] = pd.to_datetime('20210101') + pd.to_timedelta(df['ICULOS'] - 1, unit='D')

Histogram from Table

Table
Hi, I'm trying to make a histogram with above table, and below is my coding.
def histograms(t):
salaries = t.column('Salary')
salary_bins = np.arange(min(salaries), max(salaries)+1000, 1000)
t.hist('Salary', bins=salary_bins, unit='$')
histograms(full_data)
But it's not showing properly. Can you help me?
Histogram

The bins argument in a histogram specifies the number of bins into which the data will be evenly distributed.
Let's say you have a sample dataframe of salaries like this:
import pandas as pd
sample_dataframe = pd.DataFrame({'name':['joe','jill','martin','emily','frank','john','sue','sally','sam'],
'salary':[105324,65002,98314,24480,55000,62000,75000,79000,32000]})
#output:
name salary
0 joe 105324
1 jill 65002
2 martin 98314
3 emily 24480
4 frank 55000
5 john 62000
6 sue 75000
7 sally 79000
8 sam 32000
If you want to plot a histogram where the salaries will be distributed in 10 bins and you want to stick with your function, you can do:
import matplotlib.pyplot as plt
def histograms(t):
plt.hist(t.salary, bins = 10, color = 'orange', edgecolor = 'black')
plt.xlabel('Salary')
plt.ylabel('Count')
plt.show()
histograms(sample_dataframe)
If you want the x-axis ticks to reflect the boundaries of the 10 bins, you can add this line:
import numpy as np
plt.xticks(np.linspace(min(t.salary), max(t.salary), 11), rotation = 45)
Finally to show the y-ticks as integers, you add these lines:
from matplotlib.ticker import MaxNLocator
plt.gca().yaxis.set_major_locator(MaxNLocator(integer=True))
The final function looks like this:
def histograms(t):
plt.hist(t.salary, bins = 10, color = 'orange', edgecolor = 'black')
plt.xlabel('Salary')
plt.ylabel('Count')
plt.gca().yaxis.set_major_locator(MaxNLocator(integer=True))
plt.xticks(np.linspace(min(t.salary), max(t.salary), 11), rotation = 45)
plt.show()

Is this what you are looking for ?
import matplotlib.pyplot as plt
def histograms(t):
_min = min(t['salary'])
_max = max(t['salary'])
bins = int((_max - _min) / 1000) # dividing the salary range in bins of 1000 each
plt.hist(t['salary'], bins = bins)
histograms(df)

Multicolor scatter plot legend in Python

I have some basic car engine size, horsepower and body type data (sample shown below)
body-style engine-size horsepower
0 convertible 130 111.0
2 hatchback 152 154.0
3 sedan 109 102.0
7 wagon 136 110.0
69 hardtop 183 123.0
Out of which I made a scatter plot with horsepower on x axis, engine size on y axis and using body-style as a color scheme to differentiate body classes and.
I also used "compression ratio" of each car from a seperate dataframe to dictate the point point size
This worked out well except I cant display color legends for my plot. Help needed as i'm a beginner.
Here's my code:
dict = {'convertible':'red' , 'hatchback':'blue' , 'sedan':'purple' , 'wagon':'yellow' , 'hardtop':'green'}
wtf["colour column"] = wtf["body-style"].map(dict)
wtf["comp_ratio_size"] = df['compression-ratio'].apply ( lambda x : x*x)
fig = plt.figure(figsize=(8,8),dpi=75)
ax = fig.gca()
plt.scatter(wtf['engine-size'],wtf['horsepower'],c=wtf["colour column"],s=wtf['comp_ratio_size'],alpha=0.4)
ax.set_xlabel('horsepower')
ax.set_ylabel("engine-size")
ax.legend()

In matplotlib, you can easily generate custom legends. In your example, just retrieve the color-label combinations from your dictionary and create custom patches for the legend:
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import matplotlib.patches as mpatches
import pandas as pd
#this part just recreates your dataset
wtf = pd.read_csv("test.csv", delim_whitespace=True)
col_dict = {'convertible':'red' , 'hatchback':'blue' , 'sedan':'purple' , 'wagon':'yellow' , 'hardtop':'green'}
wtf["colour_column"] = wtf["body-style"].map(col_dict)
wtf["comp_ratio_size"] = np.square(wtf["horsepower"] - wtf["engine-size"])
fig = plt.figure(figsize=(8,8),dpi=75)
ax = fig.gca()
ax.scatter(wtf['engine-size'],wtf['horsepower'],c=wtf["colour_column"],s=wtf['comp_ratio_size'],alpha=0.4)
ax.set_xlabel('horsepower')
ax.set_ylabel("engine size")
#retrieve values from color dictionary and attribute it to corresponding labels
leg_el = [mpatches.Patch(facecolor = value, edgecolor = "black", label = key, alpha = 0.4) for key, value in col_dict.items()]
ax.legend(handles = leg_el)
plt.show()
Output:

Matplotlib not setting x,y axis ticks at regular interval to dataset maximum

Having a bit of touble with matplotlib and setting the x and y axis ticks.
I have a dataFrame of VO2 and VCO2 data that I want to scatter as x=VO2 and y=VCO2.
It is breath-by-breath (i.e. raw) data so there are >400 data points.
When I do I get x/y labels for all 400 datapoints. Hence I have been trying to set the x/y tick intervals to start at 0.0, and increase 0.5 until the max value of the respective datasets. It needs to by a 'dynamic' plot (i.e. x,y ticks change based on the max values of the data) as it is being incorporated into a larger program.
Current code:
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def import_data(fpath):
with open(fpath, 'r') as f:
fdata = csv.reader(f)
fdata = list(fdata)
df = pd.DataFrame(fdata[1:], columns=fdata[0])
df.rename(columns=lambda i: i.replace('\'', ''), inplace=True)
df.columns = [i.lower() for i in df.columns]
df.t = pd.to_timedelta(df.t).dt.total_seconds()
df.rename(columns={'t': 't_culm'}, inplace=True)
df['t'] = df.t_culm - df.t_culm.min()
df.set_index('t', inplace=True)
return df
# import data
data = import_data('_data/mpl_data.csv')
if 'vo2' and 'vco2' in data:
# set x, y variables
x, y = data.vo2, data.vco2
# setup figure
fig, ax = plt.subplots(1, 1)
ax.scatter(x, y)
ax.set_xticks(np.arange(0.0, round(float(x.max())) + 0.5, 0.5))
ax.set_yticks(np.arange(0.0, round(float(y.max())) + 0.5, 0.5))
ax.set_title('V-Slope')
ax.set_xlabel('V\'O2 (L.min-1)')
ax.set_ylabel('V\'CO2 (L.min-1)')
plt.show()
Output:
This isn't what I was expecting - the max x and y values are 4.5 and 5.0, respectively. This has restricted the plot to showing the first 10/11 tick labels.
e.g. zoomed in
Can anyone shed some light on what I am doing wrong?
EDIT:
.head() and .tail() of the dataframe below, can download the .csv file here
t_culm vo2 vco2 ve/vo2 ve/vco2 petco2
t
0.0 121.0 1.39106827 1.17594794 23.27565131 27.53354881 36.28618
2.0 123.0 1.651732905 1.340260757 21.9176962 27.01129598 39.94186
4.0 125.0 1.661732959 1.304045507 21.89457686 27.90005396 40.41926
7.0 128.0 1.74404014 1.424610552 22.36329263 27.37764362 41.9804
9.0 130.0 1.86141934 1.449644274 20.25671442 26.0106846 41.56256
-------------------------------------------------------------------------
685.0 806.0 4.055829465 4.565977319 31.68991968 28.14926598 137.91915
686.0 807.0 4.144702364 4.70023799 32.09388958 28.30061377 142.30906
686.0 807.0 4.11506417 4.617204706 31.28707468 27.88447301 138.67856
687.0 808.0 4.03808704 4.647029051 31.82453689 27.65428161 137.60385
688.0 809.0 3.808044776 4.323517481 31.20141883 27.48142005 127.28648

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scatter plot only lines that meet the condition in a specific column - python

Related

Different colors for each bubbles on map on basis of date

Markers on seaborn line plot in python

Histogram from Table

Multicolor scatter plot legend in Python

Matplotlib not setting x,y axis ticks at regular interval to dataset maximum

Categories

Resources