I have a pandas dataframe which looks like this:
Country Sold
Japan 3432
Japan 4364
Korea 2231
India 1130
India 2342
USA 4333
USA 2356
USA 3423
I want to plot graphs using this dataframe. The plot will have country names on X-axis and the mean/sum of the sold of each country will on y-axis. I know I can compute the mean/sum using the group by function like this:
df.groupby('Country')['Sold'].mean()
I know the plotly histogram has the function that can directly compute the value and plot the graph. But I want to apply on other graph of plotly such as bar chart to make the graph more interactive. I am having a hard time figuring it out. Can anyone please help me out?
Related
I have the following data frame:
year tradevalueus partner
0 1989 26065 Algeria
1 1989 12345 Albania
2 1991 178144 Argentina
3 1991 44384 Bhutan
4 1990 1756844 Bulgaria
5 1990 57088556 Myanmar
I want a bar graph by year on the x-axis for each trade partner with values. By this, with the above data, I want to have 3 years on the x-axis with 2 bar-graphs for each year with the tradevalueus variable and I want to name each of these by the partner column. I have checked df.plot.bar() and other stackoverflow posts about bar graphs but they don't give the output I desire. Any pointers would be greatly appreciated.
Thanks!
You can either pivot the table and plot:
df.pivot(index='year',columns='partner',values='tradevalueus').plot.bar()
Or use seaborn:
import seaborn as sns
sns.barplot(x='year', y='tradevalueus', hue='partner', data=df, dodge=True)
Output:
I have a pandas dataframe which looks like this:
Country
Japan
Japan
Korea
India
India
USA
USA
USA
I need to count the unique values of the country column and change to percentage and need to put in the x-axis and y-axis of plotly bar chart. Can anyone teach me how to do it?
Use value_counts:
df.Country.value_counts(normalize=True)
I've been tried to create heatmap with seaborn. The dataframe I use is: https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv
The dataset has 5 columns namely: country,year,pop,continent,lifeExp and gdpPercap. I want to create a pivot table dataframe with year along x-axes, continent along y-axes and lifeExp filled within cells then plot it to heatmap.
The first thing I did is pivot the dataframe using codes
df1 = pd.read_csv('https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv')
df2 = df1.pivot('year','continent','lifeExp')
but got an error.
So, I tried to change my codes to:
df = pd.read_csv('https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv')
print(df.head())
df2 = df.pivot_table(values= 'lifeExp', index=['year', 'continent'])
print(df2)
and the output of df2 is like this
lifeExp
year continent
1952 Africa 39.135500
Americas 53.279840
Asia 46.314394
Europe 64.408500
Oceania 69.255000
1957 Africa 41.266346
Americas 55.960280
Asia 49.318544
Europe 66.703067
Oceania 70.295000
.....
and when I tried to plot it to seaborn
sns.heatmap(df2)
the lifeExp won't fill the heatmap.
How to fix?
-- Hi ebuzz168,
It looks to me like you have set both 'year' and 'continent' as index and nothing as column. Looking at the documentation the function call should look like this:
table = df.pivot_table(values='lifeExp', index='year', columns='continent', aggfunc=np.mean)
sns.heatmap(table)
I have a dataframe with multiple repeating series, over time.
I want to create visualizations that compare series over time, as well as with one another.
What is the best way to structure my data to accomplish this?
I have thus far been trying to make smaller dataframes from this, by either dropping years or selecting only one series; using a variety of indexes, lists or series calls that refer to the multiple years, .Series, .loc or .drop etc..
I always seem to encounter the same issues when making the actual graphs; usually relating to the years.
My best result has been making simple bar graphs with countries on the x axis and GDP from only 2018 on the Y axis.
I would like to eventually be able to have countries represented by color with 3D plotly graphs, wherein a series like GDP is Z (depth), Years are Y, and some other series like GNI could be X.
For now I am just aiming to make a scatterplot
I am also fine with using matplot, seaborn, whatever makes the most sense here.
Columns: [country, series, 1994, 1995, 1996, etc..]
Country Series 1994 1995 1996 ...
USA GDP 3.12 4.13
USA Export% 25.5 32
USA GNI 867,123,111 989,666,123
UK GDP 2.87 etc.
UK Export% 43.1
UK GNI 981,125,555
China GDP 5.98
China Export% NaN
China GNI 787,123,447
...
df1 = df.loc[df['series']== 'GDP']
time = df1['1994':'1996']
gdp_time = px.scatter(df1, x = time, y= 'series', color="country")
gdp_time.show()
#Desired Graph
gdp_time = px.scatter(df1, x = years, y= GDP, color= Countries)
gdp_time.show()
I find it hard to believe that I cant simply create a series that references the multiple year columns as a singular 'time'.
What am I missing?
I have a pandas dataframe which looks like this:
A B
1 USA Y
3 USA Y
4 USA N
5 India Y
8 India N
12 USA N
14 USA Y
19 USA Y
I want to make a countplot for this dataframe. That is, the plot will have country names on X-axis and the counts for each category on Y-axis. I know I can do this in seaborn like this:
sns.countplot(x='A', data=df, hue='B')
But this will not be an interactive plot. I want to achieve the same thing in plotly but I am having a hard time figuring it out. Can anyone please help me out?
Using plotly 3 you can do something like this:
from plotly import graph_objs as go
fig = go.Figure()
for name, group in df.groupby('B'):
trace = go.Histogram()
trace.name = name
trace.x = group['A']
fig.add_trace(trace)
you can also change other properties like the colors by setting trace.marker.color attribute.