Plotting a histogram for categorical data

Plotting a histogram for categorical data - python

I have a dataset with two columns like as follows:
index Year
0 5 <2012
1 8 >=2012
2 9 >=2012
3 10 <2012
4 15 <2012
... ... ...
171 387 >=2012
172 390 <2012
173 398 <2012
174 403 >=2012
175 409 <2012
And I would like to plot it in a histogram. I tried with
plt.style.use('ggplot')
df.groupby(['Year'])\
.Year.count().unstack().plot.bar(legend=True)
plt.show()
but I have got an error: AttributeError: 'CategoricalIndex' object has no attribute 'remove_unused_levels' for
df.groupby(['Year'])\
.Year.count().unstack().plot.bar(legend=True)
I think this is because I am using categorical values. Any help would be appreciated it.

Try:
plt.style.use('ggplot')
df.groupby(["Year"])["Year"].agg("count").plot.bar();
Alternatively:
plt.hist(df["Year"]);

Related

Problem animating polar plots from measured data

Problem
I'm trying to animate a polar plot from a measured temperature data from a cylinder using the plotly.express command line_polar by using a dataset of 6 radial values (represented by columns #1 - #6) over 10 rows (represented by column Time) distributed over a polar plot. I'm struggling to make it animate and get the following error:
Error
ValueError: All arguments should have the same length. The length of column argument df[animation_frame] is 10, whereas the length of previously-processed arguments ['r', 'theta'] is 6
According to the help for the parameter "animation_frame" it should be specified as following:
animation_frame (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to animation frames.
I'm a bit stumped with this problem since I don't see why this shouldn't work, since other use cases seem to use multi-dimensional data with the data with equal rows.
Example of polar plot for t=1
Polar plot
Dataset:
Time
#1
#2
#3
#4
#5
#6
1
175
176
179
182
178
173
2
174
175
179
184
178
172
3
175
176
178
183
179
174
4
173
174
178
184
179
174
5
173
174
177
185
180
175
6
173
174
177
185
180
175
7
172
173
176
186
181
176
8
172
173
176
186
181
176
9
171
172
175
187
182
177
10
171
172
175
187
182
177
Code:
import pandas as pd
import plotly.express as px
df = pd.read_excel('TempData.xlsx')
sensor = ["0", "60", "120", "180", "240","300"]
radial_all = ['#1', '#2', '#3', '#4', '#5', '#6']
fig = px.line_polar(df, r=radial_all, theta=sensor, line_close=True,
color_discrete_sequence=px.colors.sequential.Plasma_r, template="plotly_dark", animation_frame="Time")
fig.update_polars(radialaxis_range=[160, 190])
fig.update_polars(radialaxis_rangemode="normal")
fig.update_polars(radialaxis=dict(tickvals = [150, 160, 170, 180, 190, 200]))
Thanks in advance!

I have found the solution to this problem, its also possible with scatterpolar but I recommend line_polar from plotly express, its way more elegant and easy. What you need to do is format the data from wide to long format using the pandas command melt(). This will allow you to correctly walk through the data and match it to the animation steps (in this case "Time" column). See following links for helpful info.
Pandas - reshaping-by-melt
pandas.melt()
Resulting code:
import plotly.express as px
import pandas as pd
df = pd.read_excel('TempData.xlsx')
df_1 = df.melt(id_vars=['Time'], var_name="Sensor", value_name="Temperature",
value_vars=['#1', '#2', '#3', '#4','#5','#6'])
fig = px.line_polar(df_1, r="Temperature", theta="Sensor", line_close=True,
line_shape="linear", direction="clockwise",
color_discrete_sequence=px.colors.sequential.Plasma_r, template="plotly_dark",
animation_frame="Time")
fig.show()
Resulting animating plot

Plot moving average with data [duplicate]

This question already has answers here:
Moving Average Pandas
(4 answers)
Closed 2 years ago.
I am trying to calculate and plot moving average along with the data it is calculated from:
def movingAvg(df):
window_size = 7
i = 0
moving_averages = []
while i < len(df) - window_size + 1:
current_window = df[i : i + window_size]
window_average = current_window.mean()
moving_averages.append(window_average)
i += 1
return moving_averages
dates = df_valid['dateTime']
startDay = dates.iloc[0]
lastDay = dates.iloc[-1]
fig, ax = plt.subplots(figsize=(20, 10))
ax.autoscale()
#plt.xlim(startDay, lastDay)
df_valid.sedentaryActivityMins.reset_index(drop=True, inplace=True)
df_moving = pd.DataFrame(movingAvg(df_valid['sedentaryActivityMins']))
df_nan = [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]
df_nan = pd.DataFrame(df_nan)
df_moving = pd.concat([df_nan, df_moving])
plt.plot(df_valid.sedentaryActivityMins)
plt.plot(df_moving)
#plt.show()
But as the moving average uses 7 windows, the list of moving averages is 7 items short, and therefore the plots do not follow each other correctly.
I tried putting 7 "NaN" into the moving average list, but those are ignored when I plot.
The plot is as follows:
But I would like the the orange line to start 7 steps ahead.
So it looks like this:
df_valid.sedentaryActivityMins.head(40)
0 608
1 494
2 579
3 586
4 404
5 750
6 573
7 466
8 389
9 604
10 351
11 553
12 768
13 572
14 616
15 522
16 675
17 607
18 229
19 529
20 746
21 646
22 625
23 590
24 572
25 462
26 708
27 662
28 649
29 626
30 485
31 509
32 561
33 664
34 517
35 587
36 602
37 601
38 495
39 352
Name: sedentaryActivityMins, dtype: int64
Any ideas as to how?
Thanks in advance!

When you do a concat, the indexes don't change. The NaNs will also take the same indices as the first 7 observations of your series. So either do a reset index after the concat or set ignore_index as True as follows:
df_moving = pd.concat([df_nan, df_moving],ignore_index=True)
plt.plot(x)
plt.plot(df_moving)
This gives the output as expected:

Need help using pyplot to plot multiple line charts in same plot for whitespace formatted data

28 121 106 112 134
42 123 114 115 135
56 130 118 124 138
42 123 114 115 135
63 132 126 131 141(and 14 more rows....)
basically each row has 5 points that need to be plotted by line graph along equidistant x(say)
Even if i plot say 5 rows it decent enough. For now this has been my approach to tackle this but it displays a blank plot
for i in range(20):
print()
for j in range(5):
print(int(mat[0].split()[j]),end=' ')
plt.plot(j,int(mat[i].split()[j]),'r')
plt.show()
i checked and mat[i].split()[j] returns the proper no. for each row to be extracted but it is not getting plotted. I dont want to deal with dataframes now since the data is so simple.

Given a simple pandas Series, what's a simple way to create a histogram (bar plot) of it?

I have a supersimple Series like this:
hour
0 438
1 444
2 351
3 402
4 473
5 498
6 440
7 431
8 259
9 11
11 52
12 62
13 77
14 55
22 40
23 162
Name: value, dtype: int64
It's just a count of the number of observations of something in a given hour. How could this be plotted quickly and easily as a histogram in a Jupyter notebook? The first bin would be from 0 to 1 hours (00:00 to 01:00), the second bin would be from 1 to 2 hours (01:00 to 02:00) and so on.

if you need a standard bar plot:
In [8]: import matplotlib
...: matplotlib.style.use('ggplot')
...:
In [9]: s.plot.bar(rot=0, grid=True, width=1, alpha=0.7)
Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0xaaab7f0>

Python plot data against date

I have a dataset:
A B C D yearweek
0 245 95 60 30 2014-48
1 245 15 70 25 2014-49
2 150 275 385 175 2014-50
3 100 260 170 335 2014-51
4 580 925 535 2590 2015-02
5 630 126 485 2115 2015-03
6 425 90 905 1085 2015-04
7 210 670 655 945 2015-05
How to plot each value against 'yearweek'?
I tried for example:
import matplotlib.pyplot as plt
import pandas as pd
new = pd.DataFrame([df['A'].values, df['yearweek'].values])
plt.plot(new)
but it doesn't work and shows
ValueError: could not convert string to float: '2014-48'
Then I tried this:
plt.scatter(df['Total'], df['yearweek'])
turns out:
ValueError: could not convert string to float: '2015-37'
Is this means the type of yearweek has some problem? How can I fix it?
Or if it's possible to change the index into date?

The best solution I see is to calculate the date from scratch and add it to a new column as a datetime. Then you can plot it easily.
df['date'] = df['yearweek'].map(lambda x: datetime.datetime.strptime(x,"%Y-%W")+datetime.timedelta(days=7*(int(x.split('-')[1])-1)))
df.plot('date','A')
So I start with the first january of the current year and go forward 7*(week-1) days, then generate the date from it.

As of pandas 0.20.X, you can use DataFrame.plot() to generate your required plots. It uses matplotlib under the hood -
import pandas as pd
data = pd.read_csv('Your_Dataset.csv')
data.plot(['yearweek'], ['A'])
Here, yearweek will become the x-axis and A will become the y. Since it's a list, you can use multiple in both cases
Note: If it still doesn't look good then you could go towards parsing the yearweek column correctly into dateformat and try again.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting a histogram for categorical data - python

Try: plt.style.use('ggplot') df.groupby(["Year"])["Year"].agg("count").plot.bar(); Alternatively: plt.hist(df["Year"]);

Related

Problem animating polar plots from measured data

Plot moving average with data [duplicate]

Need help using pyplot to plot multiple line charts in same plot for whitespace formatted data

Given a simple pandas Series, what's a simple way to create a histogram (bar plot) of it?

Python plot data against date

Categories

Resources