Python plot heatmap from csv pixel file with panda - python

I would like to plot a heatmap from a csv file which contains pixels position. This csv file has this shape:
0 0 8.400000e+01
1 0 8.500000e+01
2 0 8.700000e+01
3 0 8.500000e+01
4 0 9.400000e+01
5 0 7.700000e+01
6 0 8.000000e+01
7 0 8.300000e+01
8 0 8.900000e+01
9 0 8.500000e+01
10 0 8.300000e+01
I try to write some lines in Python, but it returns me an error. I guess it is the format of column 3 which contains string. Is there any way to plot this kind of file?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
path_to_csv= "/run/media/test.txt"
df= pd.read_csv(path_to_csv ,sep='\t')
plt.imshow(df,cmap='hot',interpolation='nearest')
plt.show(df)
I tried also seaborn but with no success.
Here the error returned:
TypeError: Image data of dtype object cannot be converted to float

You can set dtype=float as a keyword argument of pandas.read_csv :
df = pd.read_csv(path_to_csv, sep='\t', dtype=float)
Or use pandas.DataFrame.astype :
plt.imshow(df.astype(float), cmap='hot', interpolation='nearest', aspect='auto')
plt.show()
# Output :

Related

pie chart drawing for a specific column in pandas python

I have a dataframe df, which has many columns. In df["house_electricity"], there are values like 1,0 or blank/NA. I want to plot the column in terms of a pie chart, where percentage of only 1 and 0 will be shown. Similarly I want to plot another pie chart where percentage of 1,0 and blank/N.A all will be there.
customer_id
house_electricity
house_refrigerator
cid01
0
0
cid02
1
na
cid03
1
cid04
1
cid05
na
0
#I wrote the following but it didnt give my my expected result
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("my_file.csv")
df_col=df.columns
df["house_electricity"].plot(kind="pie")
#I wrote the following but it didnt give my my expected result
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("my_file.csv")
df_col=df.columns
df["house_electricity"].plot(kind="pie")
For a dataframe
df = pd.DataFrame({'a':[1,0,np.nan,1,1,1,'',0,0,np.nan]})
df
a
0 1
1 0
2 NaN
3 1
4 1
5 1
6
7 0
8 0
9 NaN
The code below will give
df["a"].value_counts(dropna=False).plot(kind="pie")
If you want combine na and empty value, try replacing empty values with np.nan, then try to plot
df["a"].replace("", np.nan).value_counts(dropna=False).plot(kind="pie")
For solution you need to try with this code to generate 3 blocks.
import pandas as pd
import matplotlib.pyplot as plt
data = {'customer_id': ['cid01', 'cid02', 'cid03', 'cid04', 'cid05'],
'house_electricity': [0, 1, None, 1, None],
'house_refrigerator': [0, None, 1, None, 0]}
df = pd.DataFrame(data)
counts = df['house_electricity'].value_counts(dropna=False)
counts.plot.pie(autopct='%1.1f%%', labels=['0', '1', 'NaN'], shadow=True)
plt.title('Percentage distribution of house_electricity column')
plt.axis('equal')
plt.show()
Result:

How to create a bipartite graph from a csv file

I am trying to create a bipartite graph from an excel file that looks similar to this:
xyz pqr tsu
abc -1 1 -2
def -2 -1 2
ghj 2 -1 1
For begining, I have tried the following:
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import xlrd
import numpy as np
from numpy import genfromtxt
df = pd.read_csv (r'C:\Users\Dragos\Desktop\networkx project\proiect.csv')
G=nx.read_edgelist('proiect.csv', create_using=nx.Graph(), nodetype=str)
nx.draw(G)
plt.show()
But I keep getting the error Failed to convert edge data (['wage,carbon', 'tax,imigration,healthcare,voting,drugs,dc', 'statehood,abortion,UBI,wealthtax']) to dictionary.
Right now I'm at a loss and not sure how to proceed.
First thing is loading the pandas dataframe
pd.read_csv(path, sep=',')
See more here.
Then you need to create a new dataframe such that it follows this format.
>>> df
weight cost 0 b
0 4 7 A D
1 7 1 B A
2 10 9 C E
G=nx.from_pandas_dataframe(df, 0, 'b', ['weight', 'cost'])
Check this as well.

Matplotlib: Adding the DataFrame values to the plot

I would like to print the DataFrame besides the plot. What would be a pythonic way to do that?
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Age':[21,22,23,24,25,26,27,28,29,30],'Count':[4,1,3,7,2,3,5,1,1,5]})
print(df)
Age Count
0 21 4
1 22 1
2 23 3
3 24 7
4 25 2
5 26 3
6 27 5
7 28 1
8 29 1
9 30 5
plt.rcParams['figure.figsize']=(10,6)
fig,ax = plt.subplots()
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
plt.plot(df['Age'],df['Count'])
I would like to have a Graph like this. How can I have the DataFrame's plotted values are printed alongside?:
You can use ax.text to add the DataFrame to the plot. DataFrames have a .to_string method which makes formatting nice. Supply index=False to remove the row index.
plt.rcParams['figure.figsize']=(10, 6)
fig,ax = plt.subplots()
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
# Adjust to where you want.
ax.text(x=28.5, y=4.5, s=df.to_string(index=False))
plt.plot(df['Age'],df['Count'])
plt.show()
Another option is to use the function plt.table():
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Age':[21,22,23,24,25,26,27,28,29,30],'Count':[4,1,3,7,2,3,5,1,1,5]})
plt.rcParams['figure.figsize']=(10,15)
fig,ax = plt.subplots()
plt.subplots_adjust(left=0.1, right=0.85, top=0.9, bottom=0.1)
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
plt.plot(df['Age'],df['Count'])
ax.table(cellText=df['Count'].map(str),
rowLabels=df['Age'].map(str),
colWidths=[0.2,0.25],
loc='right')
plt.show()
This approach will create a table with their respective lines. Just make sure to adjust the plot with subplots_adjust() afterwards.
Pandas has a to_html function you can use and place the html next to it. What are you placing the graph and Dataframe into?
df.to_html()

Plotly Bubble chart from pandas crosstab

How can I plot a bubble chart from a dataframe that has been created from a pandas crosstab of another dataframe?
Imports;
import plotly as py
import plotly.graph_objects as go
from plotly.subplots import make_subplots
The crosstab was created using;
df = pd.crosstab(raw_data['Speed'], raw_data['Height'].fillna('n/a'))
The df contains mostly zeros, however where a number appears I want a point where the value controls the point size. I want to set the Index values as the x axis and the columns name values as the Y axis.
The df would look something like;
10 20 30 40 50
1000 0 0 0 0 5
1100 0 0 0 7 0
1200 1 0 3 0 0
1300 0 0 0 0 0
1400 5 0 0 0 0
I’ve tried using scatter & Scatter like this;
fig.add_trace(go.Scatter(x=df.index.values, y=df.columns.values, size=df.values,
mode='lines'),
row=1, col=3)
This returned a TypeError: 'Module' object not callable.
Any help is really appreciatted. Thanks
UPDATE
The answers below are close to what I ended up with, main difference being that I reference 'Speed' in the melt line;
df.reset_index()
df.melt(id_vars="Speed")
df.rename(columns={"index":"Engine Speed",
"variable":"Height",
"value":"Count"})
df[df!=0].dropna()
scale=1000
fig.add_trace(go.Scatter(x=df["Speed"], y=df["Height"],mode='markers',marker_size=df["Count"]/scale),
row=1, col=3)
This works however my main problem now is that the dataset is huge and plotly is really struggling to deal with it.
Update 2
Using Scattergl allows Plotly to deal with the large dataset very well!
If this is the case you can use plotly.express this is very similar to #Erik answer but shouldn't return errors.
import pandas as pd
import plotly.express as px
from io import StringIO
txt = """
10 20 30 40 50
1000 0 0 0 0 5
1100 0 0 0 7 0
1200 1 0 3 0 0
1300 0 0 0 0 0
1400 5 0 0 0 0
"""
df = pd.read_csv(StringIO(txt), delim_whitespace=True)
df = df.reset_index()\
.melt(id_vars="index")\
.rename(columns={"index":"Speed",
"variable":"Height",
"value":"Count"})
fig = px.scatter(df, x="Speed", y="Height",size="Count")
fig.show()
UPDATE
In case you got error please check your pandas version with pd.__version__ and try to check line by line this
df = pd.read_csv(StringIO(txt), delim_whitespace=True)
df = df.reset_index()
df = df.melt(id_vars="index")
df = df.rename(columns={"index":"Speed",
"variable":"Height",
"value":"Count"})
and report in which line it breaks.
I recommend to use tidy format to represent your data. We say a dataframe is tidy if and only if
Each row is an observation
Each column is a variable
Each value must have its own cell
To create a more tidy-dataframe you can do
df = pd.crosstab(raw_data["Speed"], raw_data["Height"])
df.reset_index(level=0, inplace=True)
df.melt(id_vars=["Speed", "Height"], value_vars=["Counts"])
Speed Height Counts
0 1000 10 2
1 1100 20 1
2 1200 10 1
3 1200 30 1
4 1300 40 1
5 1400 50 1
The next step is to do the actual plotting.
# when scale is increased bubbles will become larger
scale = 10
# create the scatter plot
scatter = go.Scatter(
x=df.Speed,
y=df.Height,
marker_size=df.counts*scale,
mode='markers')
fig = go.Figure(scatter)
fig.show()
This will create a plot as shown below.

Python Plotting: Heatmap from dataframe with fixed colors in case of strings

I'm trying to visualise a large (pandas) dataframe in Python as a heatmap. This dataframe has two types of variables: strings ("Absent" or "Unknown") and floats.
I want the heatmap to show cells with "Absent" in black and "Unknown" in red, and the rest of the dataframe as a normal heatmap, with the floats in a scale of greens.
I can do this easily in Excel with conditional formatting of cells, but I can't find any help online to do this with Python either with matplotlib, seaborn, ggplot. What am I missing?
Thank you for your time.
You could use cmap_custom.set_under('red') and cmap_custom.set_over('black') to apply custom colors to values below and above vmin and vmax (See 1, 2):
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.axes_grid1 as axes_grid1
import pandas as pd
# make a random DataFrame
np.random.seed(1)
arr = np.random.choice(['Absent', 'Unknown']+list(range(10)), size=(5,7))
df = pd.DataFrame(arr)
# find the largest and smallest finite values
finite_values = pd.to_numeric(list(set(np.unique(df.values))
.difference(['Absent', 'Unknown'])))
vmin, vmax = finite_values.min(), finite_values.max()
# change Absent and Unknown to numeric values
df2 = df.replace({'Absent': vmax+1, 'Unknown': vmin-1})
# make sure the values are numeric
for col in df2:
df2[col] = pd.to_numeric(df2[col])
fig, ax = plt.subplots()
cmap_custom = plt.get_cmap('Greens')
cmap_custom.set_under('red')
cmap_custom.set_over('black')
im = plt.imshow(df2, interpolation='nearest', cmap = cmap_custom,
vmin=vmin, vmax=vmax)
# add a colorbar (https://stackoverflow.com/a/18195921/190597)
divider = axes_grid1.make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
plt.colorbar(im, cax=cax, extend='both')
plt.show()
The DataFrame
In [117]: df
Out[117]:
0 1 2 3 4 5 6
0 3 9 6 7 9 3 Absent
1 Absent Unknown 5 4 7 0 2
2 3 0 2 9 8 0 2
3 5 5 7 Unknown 5 Absent 4
4 7 7 5 4 7 Unknown Absent
becomes

Categories

Resources