How to plot DataFrames? in Python - python

I'm trying to plot a DataFrame, but I'm not getting the results I need. This is an example of what I'm trying to do and what I'm currently getting. (I'm new in Python)
import pandas as pd
import matplotlib.pyplot as plt
my_data = {1965:{'a':52, 'b':54, 'c':67, 'd':45},
1966:{'a':34, 'b':34, 'c':35, 'd':76},
1967:{'a':56, 'b':56, 'c':54, 'd':34}}
df = pd.DataFrame(my_data)
df.plot( style=[])
plt.show()
I'm getting the following graph, but what I need is: the years in the X axis and each line must be what is currently in X axis (a,b,c,d). Thanks for your help!!.

import pandas as pd
import matplotlib.pyplot as plt
my_data = {1965:{'a':52, 'b':54, 'c':67, 'd':45},
1966:{'a':34, 'b':34, 'c':35, 'd':76},
1967:{'a':56, 'b':56, 'c':54, 'd':34}}
df = pd.DataFrame(my_data)
df.T.plot( kind='bar') # or df.T.plot.bar()
plt.show()
Updates:
If this is what you want:
df = pd.DataFrame(my_data)
df.columns=[str(x) for x in df.columns] # convert year numerical values to str
df.T.plot()
plt.show()

you can do it this way:
ax = df.T.plot(linewidth=2.5)
plt.locator_params(nbins=len(df.columns))
ax.xaxis.set_major_formatter(mtick.FormatStrFormatter('%4d'))

Related

Transposing x and y axes with matplotlib and pandas

I'm trying to use a bar chart to visualize my csv data. The data looks like this:
question,count_1,count_2,count_3,count_4,count_5
Q1,0,0,6,0,0
Q2,6,0,0,0,0
Q3,3,2,1,0,0
Q4,0,0,6,0,0
Q5,6,0,0,0,0
Q6,0,6,0,0,0
Q7,6,0,0,0,0
Q8,0,0,0,5,1
Q9,1,4,0,0,1
Q10,0,0,1,5,0
Here is my code
import pandas as pd
import csv
import matplotlib.pyplot as plt
df = pd.read_csv('example.csv')
ax = df.set_index(['question']).plot.bar(stacked=True)
ax.legend(loc='best')
plt.show()
Which gives me:
What I'm trying to do is flip the x and y axes. I want the bars to be horizontal and y axis to be the questions. I tried to transpose my data frame using:
ax = df.set_index(['question']).T.plot.bar(stacked=True)
but that gives me:
which is not what I want. Can anyone help?
to get the bars horizontally (flip the x and y axis), you need to use barh (horizontal bar). More info here. So, the code would be...
import pandas as pd
import csv
import matplotlib.pyplot as plt
df = pd.read_csv('example.csv')
ax = df.set_index(['question']).plot.barh(stacked=True)
ax.legend(loc='best')
plt.show()
Output plot

Seaborn xaxis with large timeline

I have around 4475 rows of csv data like below:
,Time,Values,Size
0,1900-01-01 23:11:30.368,2,
1,1900-01-01 23:11:30.372,2,
2,1900-01-01 23:11:30.372,2,
3,1900-01-01 23:11:30.372,2,
4,1900-01-01 23:11:30.376,2,
5,1900-01-01 23:11:30.380,,
6,1900-01-01 23:11:30.380,,
7,1900-01-01 23:11:30.380,,
8,1900-01-01 23:11:30.380,,321
9,1900-01-01 23:11:30.380,,111
.
.
4474,1900-01-01 23:11:32.588,,
When I try to create simple seaborn lineplot with below code. It creates line chart but its continuous chart while my data i.e. 'Values' has many empty/nan values which should show as gap on chart. How can I do that?
[from datetime import datetime
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("Data.csv")
sns.set(rc={'figure.figsize':(13,4)})
ax =sns.lineplot(x="Time", y="Values", data=df)
ax.set(xlabel='Time', ylabel='Values')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()]
As reported in this answer:
I've looked at the source code and it looks like lineplot drops nans from the DataFrame before plotting. So unfortunately it's not possible to do it properly.
So, the easiest way to do it is to use matplotlib in place of seaborn.
In the code below I generate a dataframe like your with 20% of missing values in 'Values' column and I use matplotlib to draw a plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'Time': pd.date_range(start = '1900-01-01 23:11:30', end = '1900-01-01 23:11:30.1', freq = 'L')})
df['Values'] = np.random.randint(low = 2, high = 10, size = len(df))
df['Values'] = df['Values'].mask(np.random.random(df['Values'].shape) < 0.2)
fig, ax = plt.subplots(figsize = (13, 4))
ax.plot(df['Time'], df['Values'])
ax.set(xlabel = 'Time', ylabel = 'Values')
plt.xticks(rotation = 90)
plt.tight_layout()
plt.show()

Pandas plot a repeating dataframe issue

I am having some problems with plotting a Pandas dataframe with repeating range on x-axis after every 17 points. It doesn't start from new line after repetition. How to fix this issue.
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_excel('BS.xlsx')
plt.plot(df.BZ, df.energy)
plt.show()
Repeating Dataframe
Based on the df provided. You can try as below:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_excel('BS.xlsx')
df['range']= df.index//17
ax = plt.axes()
df.groupby('range').apply(lambda x:x.plot(x='BZ', y= 'energy', legend = False, ax=ax))
plt.show()

Pyhon matplotlib - plot box plots from 2 different data frames

Hello,
I'm trying to plot a box plot combining columns from two different data frames. Help please :)
This is the code:
import pandas as pd
from numpy import random
#Generating the data frame
df1 = pd.DataFrame(data = random.randn(5,2), columns = ['W','Y'])
df2 = pd.DataFrame(data = random.randn(5,2), columns = ['X','Y'])
print(df1.head())
print('\n')
print(df2.head())
This is the output:
This is what I want to get:
The following will give you what you desire:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.boxplot([df1['Y'], df2['Y']], positions=[1, 2])
ax.set_xticklabels(['W', 'X'])
ax.set_ylabel('Y')
This gave me the plot below (which I think is what you were aiming for):

Python Change axis on Multi Histogram plot

I have a pandas dataframe df for which I plot a multi-histogram as follow :
df.hist(bins=20)
This give me a result that look like this (Yes this exemple is ugly since there is only one data per histogram, sorry) :
I have a subplot for each numerical column of my dataframe.
Now I want all my histograms to have an X-axis between 0 and 1. I saw that the hist() function take a ax parameter, but I cannot manage to make it work.
How is it possible to do that ?
EDIT :
Here is a minmal example :
import pandas as pd
import matplotlib.pyplot as plt
myArray = [(0,0,0,0,0.5,0,0,0,1),(0,0,0,0,0.5,0,0,0,1)]
myColumns = ['col1','col2','col3','co4','col5','col6','col7','col8','col9']
df = pd.DataFrame(myArray,columns=myColumns)
print(df)
df.hist(bins=20)
plt.show()
Here is a solution that works, but for sure is not ideal:
import pandas as pd
import matplotlib.pyplot as plt
myArray = [(0,0,0,0,0.5,0,0,0,1),(0,0,0,0,0.5,0,0,0,1)]
myColumns = ['col1','col2','col3','co4','col5','col6','col7','col8','col9']
df = pd.DataFrame(myArray,columns=myColumns)
print(df)
ax = df.hist(bins=20)
for x in ax:
for y in x:
y.set_xlim(0,1)
plt.show()

Categories

Resources