I'm trying to create a heatmap and I am following the following question:
Making heatmap from pandas DataFrame
My dataframe looks like the following picture:
I tried the following code:
years = ["1860","1870", "1880","1890","1900","1910","1920","1930","1940","1950","1960","1970","1980","1990","2000"]
kantons = ["AG","AI","AR","BE","BL","BS","FR","GE","GL","GR","JU","LU","NE","NW","OW","SG","SH","SO","SZ","TG","TI","UR","VD","VS","ZG","ZH"]
df = pd(abs(dfYears), index=years, columns=kantons)
which gives the exception that:
"AG" can not be used as float
So I thought if I need to drop the index column which is not possible.
Any suggestions?
When replicating similar data, you can do:
import pandas as pd
import numpy as np
years = ["1860","1870", "1880","1890","1900","1910","1920","1930","1940","1950","1960","1970","1980","1990","2000"]
kantons = ["AG","AI","AR","BE","BL","BS","FR","GE","GL","GR","JU","LU","NE","NW","OW","SG","SH","SO","SZ","TG","TI","UR","VD","VS","ZG","ZH"]
df = pd.DataFrame(np.random.randint(low=10000, high=200000, size=(15, 26)), index=years, columns=kantons)
df.style.background_gradient(cmap='Reds')
Pandas has some Builtin Styles for the most common visualization needs. .background_gradient function is a simple way for highlighting cells based on their values. cmap parameter determines the color map based on the matplotlib colormaps.
This question already has an answer here:
Prevent scientific notation
(1 answer)
Closed 1 year ago.
I'm trying to create an histogram made of data I got as homework.
when I'm trying to plot it, values on the x axis are different (0.0-1.0) from those in the actual dataset (20,000 - 1,000,000).
How do I get the range of actual values from my data to be displayed on the x axis of the histogram instead?
My code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('okcupid_profiles.csv')
df = df[df['income'] != -1]
income_histogram = sns.distplot(df['income'], bins=40)
income_histogram
the histogram I've created
Thanks
The values displayed in the x-axis are the same on the dataset, if you can see in the bottom right corner there is 1e6, that mean :
0.1 * 1e6 == 100,000
I'm using the below code to get Segment and Year in x-axis and Final_Sales in y-axis but it is throwing me an error.
CODE
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline
order = pd.read_excel("Sample.xls", sheet_name = "Orders")
order["Year"] = pd.DatetimeIndex(order["Order Date"]).year
result = order.groupby(["Year", "Segment"]).agg(Final_Sales=("Sales", sum)).reset_index()
bar = plt.bar(x = result["Segment","Year"], height = result["Final_Sales"])
ERROR
Can someone help me to correct my code to see the output as below.
Required Output
Try to add another pair of brackets - result[["Segment","Year"]],
What you tried to do is to retrieve column named - "Segment","Year",
But actually what are you trying to do is to retrieve a list of columns - ["Segment","Year"].
There are several problems with your code:
When using several columns to index a dataframe you want to pass a list of columns to [] (see the docs) as follows :
result[["Segment","Year"]]
From the figure you provide it looks like you want to use year as hue. matplotlib.barplot doesn't have a hue argument, you would have to build it manually as described here. Instead you can use seaborn library that you are already importing anyway (see https://seaborn.pydata.org/generated/seaborn.barplot.html):
sns.barplot(x = 'Segment', y = 'Final_Sales', hue = 'Year', data = result)
I am using the standard pandas.df.plot() function to plot two columns in a dataframe. For some reason, the x-axis values and the xlabel are not visible! There seem to be no options to turn them on in the function either (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html).
Does someone know what is going on, and how to correct it?
import matplotlib.cm as cm
import pandas as pd
ax1 = df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn);
gives this:
This is a bug with Jupyter notebooks displaying pandas scatterplots that have a colorscale displayed while using Matplotlib as the plotting backend.
#june-skeeter has a solution in the answers that works. Alternatively, pass sharex=False to df.plot.scatter and you don't need to create subplots.
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
df.plot.scatter(
x='t',
y='hlReference',
c='STEP_STRENGTH',
cmap=cm.autumn,
sharex=False
)
See discussion in this closed pandas issues. Which references the above solution in a related SO answer.
Still an issue with pandas v1.1.0. You can track the issue here: https://github.com/pandas-dev/pandas/issues/36064
Create your axes instance first and then send it as an argument to the plot()
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
fig,ax1=plt.subplots()
df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn,ax=ax1)
I have a pandas dataframe of all tweets about a sporting event. The tweets are arranged by date and now include a polarity rating using the afinn sentiment library.
What I'd like to do is create a matplotlib boxplot chart for each day in the range. Unfortunately, I'm a coding newb, and I'm stuck.
Here is my code:
import numpy
import matplotlib.pyplot as plt
%matplotlib inline
boxplot_maker = lambda x: plt.boxplot()
#creating a function that I will run against a specific column in the df
Event_df["P Score"].map(boxplot_maker)
#using the .map function and my boxplot maker to the column in question
Unfortunately, this doesn't work. I get the following error:
TypeError: boxplot() missing 1 required positional argument: 'x'