Making line chart of MIS using python and pandas - python

I have a following dataframe with date as index
Apples Oranges Strawberries
07-13-2020 1 5 10
07-14-2020 1 17 4
I have to make the line chart of above dataframe with number of fruits on the Y axis and dates on the x axis.
df.plot(x=df.index,y=["Apples","Oranges","Strawberries"],kind="line") is not working
how can I fix it?

Try converting your pandas index to datetime format and try again as below:
df.index = pd.to_datetime(df.index, format='%m-%d-%Y', errors='ignore')
df.plot(kind="line")

That's df.plot.line(). The index is automatically the x axis, and the columns are the groups.

df.plot.line()
https://matplotlib.org/ dedicated libraries for this you can learn.

Related

Count category occurence in pandas data row

i have fairly simple question but could not find the answer somehow.
My Pandas dataframe looks like this:
0 1 2 3 ....
fruit apple apple banana apple ....
county .... .... .... .... ....
basically I want to count the different fruit types and plot them in a bar plot with X axis beeing the categories and Y beeing the number of occurrences.
I tried df["fruit"].value_counts() with .plot but apparently i always get a key error as it doesn't seem to be a valid row key?
Thanks.
Dataframes follow a tabular format where the convention is to have features as columns and entries as rows. So you need to transpose your dataframe. After that df["fruit"] will give what you expect.
I believe that fruit is in your dataframe index. If so, use:
df.loc['fruit'].value_counts().plot.bar()
Fairly straightforward. transpose will give you columns as rows so now you can
df.T["fruit"].value_counts()

Plot numbers from different years which are different columns python

I have the following df:
Country 2013 2014 2015 2016 2017
0 USA 40 30 20 30 30
1 Chile 1 2 4 6 1
So i need to plot the total Infected (which are the numbers in each year) throughout time per year.
So I did:
grid = sns.FacetGrid(data=df, col="Country", col_wrap=5, hue="Country")
grid.map(plt.plot,)
But this is not going to work because each year is a column and I cannot pass that to the grid.map
Any ideas on how to do this?
Not sure what exactly kind of plot you wanted, but this is one way I got around your problem:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Country':['USA', 'Chile'],
'2013':[40,1],
'2014':[30,2],
'2015':[20,4],
'2016':[30,6],
'2017':[30,1]})
df = df.T # This will transpose our df: see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html
df.columns = df.iloc[0] #Set the row [0] as our header
df.drop(['Country'], inplace=True, axis=0) # Drop row [0] since we don't want it.
Right now, this is what our df looks like:
From our df we can call:
df.plot.bar()
plt.xticks(rotation=0)
And we get the desired plot:
Plot
Ps. I can't post pictures so far, but please take a look o the links StackOverflow provides for them.
This code is one way of solving it, but definitely you can approach this by different method. Remember the plot is based on matplotlib, so you can customize as such.

Why repeating a pd.series does not work as expected?

I have just started working with python 3.7 and I am trying to create a series e.g from 0 to 23 and repeat it. Using
rep1 = pd.Series(range(24))
I figured out how to make the first 24 values and I wanted to "copy-paste" it many times so that the final series is the original 5 times, one after the other. The result with rep = pd.Series.repeat(rep1, 5) gives me a result that looks like this and it's not what I want
0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 ...
What I seek for is the 0-23 range multiple times. Any advice?
you can try this:
pd.concat([rep1]*5)
This will repeat your series 5 times.
Another solution using numpy.tile:
import numpy as np
rep = pd.Series(np.tile(rep1, 5))
If you want the repeated Series as one data object then use a pandas DataFrame for this. A DataFrame is multiple pandas Series in one object, sharing an index.
So firstly I am creating a python list, of 0-23, 5 times.
Then I put this into a DataFrame and optionally transpose so that I have the rows going down rather than across in this example.
import pandas as pd
lst = [list(range(0,24))] * 5
rep = pd.DataFrame(lst).transpose()
You could use a list to generate directly your Series.
rep = pd.Series(list(range(24))*5)

Convert categorical data into various columns for plotting in pandas

I'm new to python and pandas but I've used R in the past for data analysis. I have a simple dataset:
df.head()
Sequence Level Count
1 Easy 5
1 Medium 7
1 Hard 9
I would like to convert this to:
Sequence Easy Medium Hard
1 5 7 9
In R, I could simply do this by using the reshape2 package. In python it seems like one of my options is to create dummy variables using get_dummies but that would still generate multiple rows for the same Sequence in my case. Is there an easy way of achieving my resultset?
I'm finally trying to plot it using:
import matplotlib.pyplot as plt
df.plot(kind='bar', stacked=True)
plt.show()
Any help would be appreciated.
You could use pandas pivot_table:
In [1436]: pd.pivot_table(df, index='Sequence', columns='Level', values='Count')
Out[1436]:
Level Easy Hard Medium
Sequence
1 5 9 7
Then you could plot it:
df1 = pd.pivot_table(df, index='Sequence', columns='Level', values='Count')
df1.plot(kind='bar', stacked=True)

Python Pandas - Don't sort bar graph on y axis values

I am beginner in Python. I have a Series with Date and count of some observation as below
Date Count
2003 10
2005 50
2015 12
2004 12
2003 15
2008 10
2004 05
I wanted to plot a graph to find out the count against the year with a Bar graph (x axis as year and y axis being count). I am using the below code
import pandas as pd
pd.value_counts(sfdf.Date_year).plot(kind='bar')
I am getting the bar graph which is automatically sorted on the count. So I am not able to clearly visualize how the count is distributed over the years. Is there any way we can stop sorting the data on the bar graph on the count and instead sort on the x axis values (i,e year)?
I know this is an old question, but in case someone is still looking for another answer.
I solved this by adding .sort_index(axis=0)
So, instead of this:
pd.value_counts(sfdf.Date_year).plot(kind='bar')
you can write this:
pd.value_counts(sfdf.Date_year).sort_index(axis=0).plot(kind='bar')
Hope, this helps.
The following code uses groupby() to join the multiple instances of the same year together, and then calls sum() on the groupby() object to sum it up. By default groupby() pushes the grouped object to the dataframe index. I think that groupby() automatically sorts, but just in case, sort(axis=0) will sort the index. All that then remains is to plot. All in one line:
df = pd.DataFrame([[2003,10],[2005,50],[2015,12],[2004,12],[2003,15],[2008,10],[2004,5]],columns=['Date','Count'])
df.groupby('Date').sum().sort(axis=0).plot(kind='bar')

Categories

Resources