In extension to my previous question
I can plot the Heat map with Seaborn very well and with suggestion can get annotation. But I see a new problem now.
Input File
Nos,Place,Way,Name,00:00:00,12:00:00
123,London,Air,Apollo,342,972
123,London,Rail,Beta,2352,342
123,Paris,Bus,Beta,545,353
345,Paris,Bus,Rava,652,974
345,Rome,Bus,Rava,2325,56
345,London,Air,Rava,2532,9853
567,Paris,Air,Apollo,545,544
567,Rome,Rail,Apollo,5454,5
876,Japan,Rail,Apollo,644,54
876,Japan,Bus,Beta,45,57
876,Japan,Bus,Beta,40,57
876,Japan,Bus,Beta,40,57
Program:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
df = pd.read_csv('heat_map_data.csv')
df3 = df.copy()
for c in ['Place','Name']:
df3[c] = df3[c].astype('category')
sns.heatmap(df3.pivot_table(index='Place', columns='Name', values='00:00:00' ),annot=True, fmt='.1f' )
plt.show()
If I take fmt='d' then I get error of float value and changed to fmt='f' And I get the count of the desired column.
But When the same axis value repeats it does not add the count from desired column. Any solution for that pls ?
As it is seen in the input file
876,Japan,Bus,Beta,45,57
876,Japan,Bus,Beta,40,57
876,Japan,Bus,Beta,40,57
It has 3 rows in repeat and the value of them should be shown as sum
the cell which represents Japan and Beta should annot value as 125 instead it shows 41.7. How do I achieve that? Also is it possible to give two values as annotation ?
Second doubt is now that in pivot I am giving value='00:00:00' but I need it to dynamically read the last column from the file.
You can use the aggfunc keyword passing in a dict:
aggfunc :
function, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
sns.heatmap(df3.pivot_table(index='Place', columns='Name',
values='00:00:00',aggfunc={'00:00:00':np.sum}), annot=True, fmt='.1f')
Which outputs:
Related
My Goal
Display a bar chart showing the names durations of the first 30 Netflix shows from a .CSV file
Relevant Code after Trail & Error
names = pd.read_csv("netflix_titles.csv", nrows=31, usecols=[2])
durations = pd.read_csv("netflix_titles.csv", nrows=31, usecols=[9])
durations[['duration']] = durations[['duration']].astype(int)
Then I plot it.
plt.bar(names,durations)
plt.title("Show Durations")
plt.xlabel("Name of Shows")
plt.ylabel("Durations (In Minutes)")
plt.show()
31 rows are read as the first rows are headers. durations is turned into integers as the numbers in the column count as string or something else, and wouldn't work with matplotlib.
Error Message
TypeError: unhashable type: 'numpy.ndarray'
I don't think Numpy applies with what I'm trying to do, so I'm at a dead end here.
This was able to print out a bar chart for the first 31 values
dataset = pd.read_csv("netflix_titles.csv")
names = dataset['title'].head(31)
durations = dataset['duration'].head(31)
plt.bar(names,durations)
plt.title("Show Durations")
plt.xlabel("Name of Shows")
plt.ylabel("Durations (In Minutes)")
plt.show
The problem is that your are making two different DataFrames from the csv file and trying to plot them against each other. While this is possible, a much simpler approach is to create a single Dataframe from the selected columns and rows of the csv file and then plot it as demonstrated below:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv("netflix_titles.csv", nrows=31, usecols=[2,9])
df.columns = ['name', 'duration']
df['duration'] = df['duration'].astype(int)
df.set_index('name', inplace=True)
df.plot(kind = 'bar')
plt.title("Show Durations")
plt.xlabel("Name of Shows")
plt.ylabel("Durations (In Minutes)")
plt.show()
I have a dataframe with just two columns, Date, and ClosingPrice. I am trying to plot them using df.plot() but keep getting this error:
ValueError: view limit minimum -36785.37852 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units
I have found documentation about this from matplotlib but that says how to make sure that the format is datetime. Here is code that I have to make sure the format is datetime and also printing the data type for each column before attempting to plot.
df.Date = pd.to_datetime(df.Date)
print(df['ClosingPrice'].dtypes)
print(df['Date'].dtypes)
The output for these print statements are:
float64
datetime64[ns]
I am not sure what the problem is since I am verifying the data type before plotting. Here is also what the first few rows of the data set look like:
Date ClosingPrice
0 2013-09-10 64.7010
1 2013-09-11 61.1784
2 2013-09-12 61.8298
3 2013-09-13 60.8108
4 2013-09-16 58.8776
5 2013-09-17 59.5577
6 2013-09-18 60.7821
7 2013-09-19 61.7788
Any help is appreciated.
EDIT 2 after seeing more people ending up here. To be clear for new people to python, you should first import pandas for the codes bellow to work:
import pandas as pd
EDIT 1: (short quick answer)
If³ you don't want to drop your original index (this makes sense after reading the original and long answer bellow) you could:
df[['Date','ClosingPrice']].plot('Date', figsize=(15,8))
Original and long answer:
Try setting your index as your Datetime column first:
df.set_index('Date', inplace=True, drop=True)
Just to be sure, try setting the index dtype (edit: this probably wont be needed as you did it previously):
df.index = pd.to_datetime(df.index)
And then plot it
df.plot()
If this solves the issue it's because when you use the .plot() from DataFrame object, the X axis will automatically be the DataFrame's index.
If² your DataFrame had a Datetimeindex and 2 other columns (say ['Currency','pct_change_1']) and you wanted to plot just one of them (maybe pct_change_1) you could:
# single [ ] transforms the column into series, double [[ ]] into DataFrame
df[['pct_change_1']].plot(figsize=(15,8))
Where figsize=(15,8) you're setting the size of the plot (width, height).
Here is a simple solution:
my_dict = {'Date':['2013-09-10', '2013-09-11', '2013-09-12', '2013-09-13', '2013-09-16', '2013-09-17', '2013-09-18',
'2013-09-19'], 'ClosingPrice': [ 64.7010, 61.1784, 61.8298, 60.8108, 58.8776, 59.5577, 60.7821, 61.7788]}
df = pd.DataFrame(my_dict)
df.set_index('Date', inplace=True)
df.plot()
I have a dataframe with a lot of missing values which looks like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
date = pd.date_range(start='2003/01/01', end='2005/12/31')
df = pd.DataFrame({'date':date, })
Assign missing values to columns:
df = pd.DataFrame(np.nan, index=date, columns=['A', 'B'])
Add some actual values throughout to illustrate what my data actually looks like
df.loc['2003-01-10', 'B'] = 50
df.loc['2003-01-15', 'A'] = 70
df.loc['2003-06-10', 'B'] = 45
df.loc['2003-07-15', 'A'] = 55
df.loc['2004-01-01', 'B'] = 20
df.loc['2004-01-05', 'A'] = 30
df.loc['2004-05-01', 'B'] = 25
df.loc['2004-06-05', 'A'] = 35
df.loc['2005-01-01', 'B'] = 40
df.loc['2005-01-05', 'A'] = 35
Plot the data
df.plot(style = '-o')
This plot looks like this:
So you can see that I have specified that it be a line plot using the style = '-o' command, and it shows up correctly in the legend, but the dots are not joined by lines on the graph. When I plot it with no style specification I get a blank graph.
Any help would be greatly appreciated. Thank you.
I assume this is due to the NaNs in your data set. Your data is simply not tidy. I assumed pandas could figure this out just using stack but it doesn't work either. Also, a bit inconvenient is that for a specific date not both values are defined( maybe one could use interpolate here. However, what works is simply:
df['A'].dropna().plot()
df['B'].dropna().plot()
in a single Jupiter notebook cell. Both plots will be drawn to the same axis there.
Interpolate works, but looks a bit different due to the scaling:
pd.concat([df['A'].interpolate(),
df['B'].interpolate()], axis=1).plot()
note that here the legend is created directly. I was too lazy to overwrite the old df.
Tweaking interpolate a bit and realizing that it's already a DataFrame method one could also do:
df.interpolate(limit_area='inside').plot()
for qualitatively the drop_na result or
df.interpolate(limit_area='inside').plot()
for the concat result.
You have a lot of NaN values in your dataframe, so that it can't draw a line (the actual points aren't following each other).
What you can do is drop the nan values like this:
df.B.dropna().plot()
df.A.dropna().plot()
I created a random dataFrame simulating the dataset tips from seaborn:
import numpy as np
import pandas as pd
time = ['day','night']
sex = ['female','male']
smoker = ['yes','no']
for t in range(0,len(time)):
for s in range(0,len(sex)):
for sm in range(0,len(smoker)):
randomarray = np.random.rand(10)*10
if t == 0 and s == 0 and sm == 0:
df = pd.DataFrame(index=np.arange(0,len(randomarray)),columns=["total_bill","time","sex","smoker"])
L = 0
for i in range(0,len(randomarray)):
df.loc[i] = [randomarray[i], time[t], sex[s], smoker[sm]]
L = L + 1
else:
for i in range(0,len(randomarray)):
df.loc[i+L] = [randomarray[i], time[t], sex[s], smoker[sm]]
L = L + 1
My dataFrame df has, for each column, the same type of class as the dataFrame tips from seaborn's dataset:
tips = sns.load_dataset("tips")
type(tips["total_bill"][0])
type(tips["time"][0])
numpy.float64
str
And so on for the other columns. Same as my dataFrame:
type(df["total_bill"][0])
type(tips["time"][0])
numpy.float64
str
However, when I try to use seaborn's violinplot or factorplot following the documentation:
g = sns.factorplot(x="sex", y="total_bill", hue="smoker", col="time", data=df, kind="violin", split=True, size=4, aspect=.7);
I have no problems if I use the dataFrame tips, but when I use my dataFrame I get:
AttributeError: 'float' object has no attribute 'shape'
I Imagine this is an issue with the way I pass the array into the dataFrame, but I couldn't find what is the problem since every issue I found on the internet with the same AttributeError says it's because it's not the same type of class, and as shown above my dataFrame has the same type of class as the one in seaborn's documentation.
Any suggestions?
I got the same problem and was trying to find a solution but did not see the answer I was looking for. So I guess provide an answer here may help people like me.
The problem here is that the type of df.total_bill is object instead of float.
So the solution is to change it to float befor pass the dataframe to seaborn:
df.total_bill = df.total_bill.astype(float)
This is a rather unusual way of creating a dataframe. The resulting dataframe also has some very strange properties, e.g. it has a length of 50 but the last index is 88. I'm not going into debugging these nested loops. Instead, I would propose to create the dataframe from some numpy array, e.g. like
import numpy as np
import pandas as pd
time = ['day','night']
sex = ['female','male']
smoker = ['yes','no']
data = np.repeat(np.stack(np.meshgrid(time, sex, smoker), -1).reshape(-1,3), 10, axis=0)
df = pd.DataFrame(data, columns=["time","sex","smoker"])
df["total_bill"] = np.random.rand(len(df))*10
Then also plotting works fine:
g = sns.factorplot(x="sex", y="total_bill", hue="smoker", col="time", data=df,
kind="violin", size=4, aspect=.7)
Convert the data type of your variable from object to say float/int.
I had a different issue in my code that produced the same error:
'str' object has no attribute 'get'
For me, I had in my seaborn syntax ...data='df'... where df is an object, however, and should not be in quotes. Once I removed the quotes, my program worked perfectly. I made the mistake, as someone else might, because the x= and y= parameters are in quotes (for the columns in the dataframe)
I'm trying to create an interactive plot using plotly and having trouble ordering the X axis. Here's the code I'm using:
import plotly.plotly as py
import cufflinks as cf
import pandas as pd
import plotly.tools as tls
tls.set_credentials_file(username='ladeeda', api_key='ladeeda')
cf.set_config_file(offline=False, world_readable=True, theme='pearl')
StudentModalityRetention[StudentModalityRetention['schoolyearsemester'] == 'Sem3']\
.iplot(kind='bubble', x='branch', y='retention', size='active_users', text='active_users',
xTitle='', yTitle='Retention',
filename='cufflinks/Sem3ModalityRetention')
and here's the plot that is generated:
I would like to arrange the X axis in descending order or Y axis. In other words, I would like the bubble with the highest Y value to appear first and so on..
Any help would be much appreciated.
A simple and efficient way to achieve your goal is to sort the pandas dataframe in descending order according to your requirements and then use iplot to plot the graph, which will give you the desired result. Here is a short example:
yourdataframe.sort_values(by='Y',\ #column name or index values according to which the dataframe is to be sorted
axis=0, #for column sorting
ascending=False, #in your case, for descending
inplace = True)\ #if you want to perform the operation inplace or return a new dataframe
.iplot(kind='bubble', x='branch', y='retention',size='active_users', text='active_users',
xTitle='', yTitle='Retention',
filename='cufflinks/Sem3ModalityRetention')
Hope that helps you:))