Visualize scatter plot with labels on each point - python

i have a dataset longitude, latitude, its city, and the status of its city of coronavirus.
I want to give a label on each point for city name. i dont have any idea if i use plt.text() one by one to give the labels.
Here the code i use for creating dataset
jabar = [
['Depok',-6.385589,106.830711,'sedang',600],
['Tasikmalaya',-7.319563,108.202972,'sedang',600],
['Ciamis',-7.3299,108.3323,'sedang',600],
['Kuningan',-7.0138,108.5701,'sedang',600],
['Bogor',-6.497641,106.828224,'sedang',600],
['Bogor',-6.595038,106.816635,'sedang',600],
['Cirebon',-6.737246,108.550659,'sedang',600],
['Majalengka',-6.8364,108.2274,'sedang',600],
['Sumedang',-6.8381,107.9275,'sedang',600],
['Indramayu',-6.327583,108.324936,'sedang',600],
['Subang',-6.571589,107.758736,'sedang',600],
['Purwakarta',-6.538681,107.449944,'sedang',600],
['Karawang',-6.3227,107.3376,'sedang',600],
['Bekasi',-6.241586,106.992416,'sedang',600],
['Pangandaran',-7.6833,108.6500,'sedang',600],
['Sukabumi',-6.923700,106.928726,'sedang',600],
['Cimahi',-6.8841,107.5413,'sedang',600],
['Banjar',-7.374585,108.558189,'sedang',600],
['Cianjur',-6.734679,107.041252,'sedang',600],
['Bandung',-6.914864,107.608238,'tinggi',1000],
['Bandung',-6.905977,107.613144,'tinggi',1000],
['Bandung',-6.914744,107.609810,'tinggi',1000],
['Garut',-7.227906,107.908699,'sedang',600],
['Bandung Barat',-7.025253,107.519760,'sedang',600]]
features=['City','longitude','latitude','status','status_size']
risk_map = pd.DataFrame(jabar, columns=features)
and here it is the code i create for visualize to give the label each points.
import matplotlib.pyplot as plt
plt.figure(figsize=(14,8))
plt.scatter(risk_map['latitude'],risk_map['longitude'], c='orange',
s=risk_map['status_size'], label='Risk region')
plt.title('Peta Sebaran Covid-19', fontsize=20)
plt.text(-7.227906,107.908699,'Garut')
plt.show()
actually i have two datasets exclude the code i write above, the another is about confirmed-positive-cases-covid-region which is the point about more than 500.000 points.
I merge this two dataset to get the risk-region. But i get trouble when i want to giva a labels on each point.
the plt.text() i write above is example to give a label on a point. it is impossible if i write one by one as same as the text code because my computer got cracked and blank after i executed that code.
Anyone have any idea to give a label on each points that i write the code above?
thank in advance

plotly mapbox provides very simple to use capabilities for what you want
your longitude, latitude values are reversed. See in code sample below I've reversed them
import plotly.express as px
import pandas as pd
jabar = [
['Depok',-6.385589,106.830711,'sedang',600],
['Tasikmalaya',-7.319563,108.202972,'sedang',600],
['Ciamis',-7.3299,108.3323,'sedang',600],
['Kuningan',-7.0138,108.5701,'sedang',600],
['Bogor',-6.497641,106.828224,'sedang',600],
['Bogor',-6.595038,106.816635,'sedang',600],
['Cirebon',-6.737246,108.550659,'sedang',600],
['Majalengka',-6.8364,108.2274,'sedang',600],
['Sumedang',-6.8381,107.9275,'sedang',600],
['Indramayu',-6.327583,108.324936,'sedang',600],
['Subang',-6.571589,107.758736,'sedang',600],
['Purwakarta',-6.538681,107.449944,'sedang',600],
['Karawang',-6.3227,107.3376,'sedang',600],
['Bekasi',-6.241586,106.992416,'sedang',600],
['Pangandaran',-7.6833,108.6500,'sedang',600],
['Sukabumi',-6.923700,106.928726,'sedang',600],
['Cimahi',-6.8841,107.5413,'sedang',600],
['Banjar',-7.374585,108.558189,'sedang',600],
['Cianjur',-6.734679,107.041252,'sedang',600],
['Bandung',-6.914864,107.608238,'tinggi',1000],
['Bandung',-6.905977,107.613144,'tinggi',1000],
['Bandung',-6.914744,107.609810,'tinggi',1000],
['Garut',-7.227906,107.908699,'sedang',600],
['Bandung Barat',-7.025253,107.519760,'sedang',600]]
features=['City','longitude','latitude','status','status_size']
risk_map = pd.DataFrame(jabar, columns=features)
fig = px.scatter_mapbox(risk_map, lon="latitude", lat="longitude",
color="status", hover_name="City",size="status_size"
)
fig.update_layout(mapbox={"style":"carto-positron"})
fig

Related

How to plot multiple layers with Geoframes in python?

Context: I've two data frames that i read with pandas from .csv files, one of them (dfevents) has a latitude and longitude fields on it, the other dataframe (dfplacedetails) has multiple points that creates a polygon. I'm usign "intersets" properties to verify when the first data frame cross with the polygon of the other one. That actually works fine, but when I'm triying to plot both of the layers is just not posibble, they plot separete
My code is as follow:
# Libraries
from matplotlib import pyplot as plt
import geopandas as gp
import pandas as pd
# Creating data frames
dfevents = pd.read_csv (r'C:\Users\alan_\Desktop\TAT\Inputs\Get Events\Get_Events.csv')
print(dfevents)
dfplacedetails = pd.read_csv (r'C:\Users\alan_\Desktop\TAT\Inputs\Get Place Details\Get_Place_Details.csv')
print(dfplacedetails)
# Make them proper Geometrys
dfevents['point'] = gp.GeoSeries.from_xy(dfevents.longitude, dfevents.latitude)
dfplacedetails['polygon'] = gp.GeoSeries.from_wkt('POLYGON' + dfplacedetails.polygon)
# Make them GeoDataFrames
dfevents = gp.GeoDataFrame(dfevents, geometry='point')
dfplacedetails = gp.GeoDataFrame(dfplacedetails, geometry='polygon')
# Output (It works fine)
dfout = dfevents.intersects(dfplacedetails)
print(dfout)
# Plot
fig, ax =plt.subplots(figsize =(20,10))
dfplacedetails.plot(ax=ax, color='blue')
dfevents.plot(ax=ax, color='red',markersize=10)
ax.set_axis_on()
The result that i got when I plot as I described up in my code is as follow:
But when I plot separate both of the layers plot fine:
Is there any way to plot both of them in the same image?
Thanks for you help!
By the way i'm using Visual Studio Code

Scatterplot with plotly vs pyplot / different approach in data table needed?

I'm trying to create a scatterplot in plotly, but have some difficulties. I think I need to rearrange my data table to be able to work with it, but am note sure.
This is how my data table looks:
table structure
The "Average Price" is the "real" data and the prices in the "Predictions" column are what my model predicted.
I want to display it in a scatterplot, showing both the predicted and real prices as dots, like this:
scatterplot created through matplotlib
This, I created with pyplot
plt.scatter(x_axis, result['Average Price'], label='Real')
plt.scatter(x_axis, result['Predictions'], label='Predictions')
plt.xlabel('YYY-MM-DD')
plt.ylabel('Average Price')
plt.legend(loc='lower right')
plt.show()
However, I wanted to do the same with plotly, which I can't seem to figure out. I have no problems with one column, but don't know how to access both. Do I need to rearrange the table so that I have all prices (predicted and real) in one column and an additional column labeling the data as "real" or "predicted"?
chart_model = px.scatter(result, x='YYYY-MM-DD', y='Predictions', title='Predictions')
chart_model.update_layout(title_x=0.5, plot_bgcolor='#ecf0f1', yaxis_title='Average Price Predicted',
font_color='#2c3e50')
chart_model.update_traces(marker=dict(color='blue'))
Thanks in advance for any tips on how to proceed!
have simulated dataframe of same structure as your question
have used pandas melt() to reshape in line to long dataframe that is then simple to use with plotly
import pandas as pd
import numpy as np
import plotly.express as px
# simulate data frame
df = pd.DataFrame(
{
"YYYY-MM-DD": pd.date_range("4-jan-2015", freq="7D", periods=300),
"Average Price": np.random.uniform(1.2, 1.4, 300),
}
).pipe(
lambda d: d.assign(
Predictions=d["Average Price"] * np.random.uniform(0.9, 1.1, 300)
)
)
# simple inline restructure of data frame
px.scatter(df.set_index("YYYY-MM-DD").melt(ignore_index=False), y="value", color="variable")
alternate
just move data into index and define columns to be plotted
px.scatter(df.set_index("YYYY-MM-DD"), y=["Average Price", "Predictions"])

plotly rendering bug with python 3d plot

I am using plotly and python to visualize 3D data and I encoutered a strange phenomenon when plotting some data. The following code visualizes data of the form (3,20) each for the direction x,y and z.
import numpy as np
import plotly.io as pio
import plotly.graph_objects as go
data = np.array([
[4.41568822e+05, 4.41568474e+05, 4.41567958e+05, 4.41567603e+05,
4.41567249e+05, 4.41566952e+05, 4.41566619e+05, 4.41566324e+05,
4.41566021e+05, 4.41565737e+05, 4.41565435e+05, 4.41565098e+05,
4.41564807e+05, 4.41564472e+05, 4.41564121e+05, 4.41563860e+05,
4.41563538e+05, 4.41563226e+05, 4.41562933e+05, 4.41562641e+05],
[5.71148897e+06, 5.71148909e+06, 5.71148928e+06, 5.71148942e+06,
5.71148955e+06, 5.71148967e+06, 5.71148981e+06, 5.71148993e+06,
5.71149006e+06, 5.71149019e+06, 5.71149032e+06, 5.71149047e+06,
5.71149060e+06, 5.71149076e+06, 5.71149093e+06, 5.71149106e+06,
5.71149122e+06, 5.71149137e+06, 5.71149153e+06, 5.71149168e+06],
[1.86559470e+02, 1.86547226e+02, 1.86529120e+02, 1.86516642e+02,
1.86504156e+02, 1.86493615e+02, 1.86481706e+02, 1.86471064e+02,
1.86460026e+02, 1.86449593e+02, 1.86438417e+02, 1.86425803e+02,
1.86414828e+02, 1.86402073e+02, 1.86388572e+02, 1.86378511e+02,
1.86366018e+02, 1.86353893e+02, 1.86342497e+02, 1.86331154e+02]])
fig = go.Figure(data=[go.Scatter3d(x=data[0,:], y=data[1,:], z=data[2,:],
mode='markers',
marker=dict(size=3),
)])
pio.show(fig, renderer='browser')
I have compared the result (top) below with a matplotlib plot of the same data (bottom).
The points that represent a relatively straight line are represented in Plotly in steps rather than in a line, and I don't really understand why.
Can someone explain to me why the points are displayed like this in plotly and how I can fix this problem?
I appreciate any help I can get!
It looks like an issue with how Plotly is interpreting your numpy array. Your arrays look to be nested which would explain why instead of plotly drawing the plot as a continuous line they are being rendered as steps.

Python stats and visualization

I am new to Python and am currently working on a set of real estate data from redfinn.
Currently my data looks like this:
There are many different neighborhoods in the dataset. I would like
to:
get the average homes_sold per month(date field was cut out of the
screenshot) per neighborhood
graph the above using only the neighborhoods I wish to use (about
4).
Any help is greatly appreciated.
As I understood, you have different values of sold per month houses and you want to take an average of it. If so, try this code (provide your data instead):
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
data = pd.DataFrame({'neighborhood':['n1','n1','n2','n3','n3','n4','n5'],'homes_sold per month':[5,7,2,6,4,1,5],'something_else':[5,3,3,5,5,5,5]})
neighborhoods_to_plot = ['n1','n2','n4','n5'] #provide here a list you want to plot
plot = pd.DataFrame()
for n in neighborhoods_to_plot:
plot.at[n,'homes_sold per month'] = data.loc[data['neighborhood']==n]['homes_sold per month'].mean()
plot.index.name = 'neighborhood'
plt.figure(figsize=(4,3),dpi=300,tight_layout=True)
sns.barplot(x=plot.index,y=plot['homes_sold per month'],data=plot)
plt.savefig('graph.png', bbox_inches='tight')
Plot
Okay so I am going to assume that you are using Pandas and Matplotlib in order to handle this data. Then in order to get the average number of homes sold for month you just need to do:
import pandas as pd
mean_number_of_homes_sold = data[['neighborhood','homes_sold']].groupby['neighborhood'].agg('mean')
In order to get the information plotted with only the neighborhoods you want you will need something like this
import pandas as pd
import matplotlib.pyplot as plt
#fill this list with strings representing the names of the data you need plotted
neighborhoods_to_plot = ['Albany Park', 'Tinley Park']
data_to_graph = data[data.neighborhood.isin(neighborhoods_to_plot)]
fig, ax = plt.subplots()
data_to_graph.plot(kind='scatter', x='avg_sale_to_list', y ='inventory_mom')
ax.set(title='Relationship between time to sale from listing and inventory momentum for selected neighborhoods')
fig.savefig('neighborhood.png', transparent=False, dpi=300, bbox_inches="tight")
You can obviously change which data is graphed or the type of graph but this should give you a decent starting point.

Reordering heatmap from seaborn using column info from additional text file

I wrote a python script to read in a distance matrix that was provided via a CSV text file. This distance matrix shows the difference between different animal species, and I'm trying to sort them in different ways(diet, family, genus, etc.) using data from another CSV file that just has one row of ordering information. Code used is here:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as mp
dietCols = pd.read_csv("label_diet.txt", header=None)
df = pd.read_csv("distance_matrix.txt", header=None)
ax = sns.heatmap(df)
fig = ax.get_figure()
fig.savefig("fig1.png")
mp.clf()
dfDiet = pd.read_csv("distance_matrix.txt", header=None, names=dietCols)
ax2 = sns.heatmap(dfDiet, linewidths=0)
fig2 = ax2.get_figure()
fig2.savefig("fig2.png")
mp.clf()
When plotting the distance matrix, the original graph looks like this:
However, when the additional naming information is read from the text file, the graph produced only has one column and looks like this:
You can see the matrix data is being used as row labeling, and I'm not sure why that would be. Some of the rows provided have no values so they're listed as "NaN", so I'm not sure if that would be causing a problem. Is there any easy way to order this distance matrix using an exterior file? Any help would be appreciated!

Categories

Resources