I want to draw a plot of people who are more than 0.5 years old.
when I enter the data in python and make the data-frame, my code works:
import pandas as pd
data = {'age': [0.62,0.84,0.78,0.80,0.70,0.25,0.32,0.86,0.75],
'gender': [1,0,0,0,1,0,0,1,0],
'LOS': [0.11,0.37,0.23,-0.02,0.19,0.27,0.37,0.31,0.21],
'WBS': [9.42,4.40,6.80,9.30,5.30,5.90,3.10,4.10,12.07],
'HB': [22.44,10.40,15.60,15.10,11.30,10.60,12.50,10.40,14.10],
'Nothrophil': [70.43,88.40,76.50,87,82,87.59,15.40,77,88]}
df = pd.DataFrame(data, index=[0,1,2,3,4,5,6,7,8])
old = df.query('age > 0.5')
import matplotlib.pyplot as plt
plt.plot(old.age)
plt.show()
but when I use a csv file to form my data-frame, the code dosen’t work:
import pandas as pd
df= pd.read_csv('F:\HCSE\sample_data1.csv',sep=';')
old = df.query('age > 0.5')
import matplotlib.pyplot as plt
plt.plot(old.age)
plt.show()
How can I use a csv file and do the same action?
and one more question. Is it possible to draw a scatter plot with only one argument?
As an example I want to draw a scatter plot of people who are more than 0.5 years old (Y axis is the age and the X axis is the number of datas or number of rows in csv file) and I want to use different colors for different genders. how can I do it?
Thanks a lot.
but when I use a csv file to form my data-frame, the code dosen’t
work:
You might want to share the error message so that we can know, what is going on under the hood.
Is it possible to draw a scatter plot with only one argument?
As an example I want to draw a scatter plot of people who are more
than 0.5 years old (Y axis is the age and the X axis is the number of
datas or number of rows in csv file) and I want to use different
colors for different genders. how can I do it?
Yes. Please refer to below code.
colors = ['b' if gender == 1 else 'r' for gender in df.loc[df['age'] >0.5].gender]
df.loc[df['age'] > 0.5].reset_index().plot.scatter('index', 'age', color=colors)
You also can do this very easily using seaborn's lmplot.
import seaborn as sns
sns.lmplot(x="index", y="age", data=df.loc[df['age'] > 0.5].reset_index(), hue="gender", fit_reg=False)
Notice that you can apply colors according to gender with hue argument. Hope this helps for the visualization.
For the scatter plot, you could simply do:
colors = ['b' if gender == 1 else 'r' for gender in old.gender]
plt.scatter(range(len(old.age)), old.age, color = colors)
plt.show()
About the query, can you put your .csv file? It works with my data.
Related
I'm trying to plot a histogram from different columns of an imported CSV file (data_dict). I am trying to solve the question below- the axis appear when I type the below code, however, the plots do not. How would I go about plotting these? Many thanks.
Question
Write your code to plot a histogram of number of accidents by age for females and males separately. Use 10-year bins. Plot both distributions on the same plot.
gender1 = np.array(data_dict['Gender'])
age1 = np.array(data_dict['Age'])
age_females = age1[np.where(gender1 == 'Female')]
age_males = age1[np.where(gender1 == 'Male')]
plt.hist(age_males,label='Males',alpha=0.5)
plt.hist(age_females,label='Females',alpha=0.5)
plt.legend()
plt.title('Histogram of Accidents by Age and Genders')
plt.xlabel('Age')
plt.ylabel('Accidents')
plt.xticks(ticks=np.arange(10,110,step=10),labels=(10,20,30,40,50,60,70,80,90,100))
print
To me the code looks all right. I ran the following:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([i for i in range(50)])
b = np.array([i for i in range(50,100)])
plt.hist(a,label='Males',alpha=0.5)
plt.hist(b,label='Females',alpha=0.5)
plt.legend()
plt.title('Histogram of Accidents by Age and Genders')
plt.xlabel('Age')
plt.ylabel('Accidents')
plt.xticks(ticks=np.arange(10,110,step=10),labels=(10,20,30,40,50,60,70,80,90,100))
plt.show()
and got this plot:
Can you reproduce this picture and if so, are you sure your age_-arrays contain the required data?
EDIT based on comment:
Well, that depends on what format your dictionary actually contains. Try to get your arrays into this format:
gender1 = np.array(['male', 'male', 'male', 'female', 'female'])
age1 = np.array([22,25,23,40,60])
age_females = age1[np.where(gender1=='female')]
age_males = age1[np.where(gender1=='male')]
While there are more elegant ways to do the indexing, this should work if you get whatever comes out of the dictionary to this array form.
I have a dataframe, like this. I want to do scatter plots of it.
I want to do scatter plots of Value1 but whenever value2 is decreased to below 0.6, I want to marked in those scatter plots (Value1) to red color otherwise default color is okay.
Any Suggestions ?
Add another column with color information:
import matplotlib.cm as cm
df['color'] = [int(value < 0.6) for value in df.Value2]
df.plot.scatter(x=df.index, y='Value1',c='color',cmap=cm.jet)
I use seaborn's lmplot (advanced scatterplot) tool for that.
You can make a new column in your spreadsheet file with name "Category". It's very easy to categorize variables in excel or openoffice
(It's something like this -> (if(cell_value<0.6-->low),if(cell_value>0.6-->high)).)
So your test data should look like this:
Than you can import the data in python (I use Anaconda 3.5 with spider: python 3.6) I saved the file in .txt format. but any other format is possible (.csv etc.)
#Import libraries
import seaborn as sns
import pandas as pd
import numpy as np
import os
#Open data.txt which is stored in a repository
os.chdir(r'C:\Users\DarthVader\Desktop\Graph')
f = open('data.txt')
#Get data in a list splitting by semicolon
data = []
for l in f:
v = l.strip().split(';')
data.append(v)
f.close()
#Convert list as dataframe for plot purposes
df = pd.DataFrame(data, columns = ['ID', 'Value', 'Value2','Category'])
#pop out first row with header
df2 = df.iloc[1:]
#Change variables to be plotted as numeric types
df2[['Value','Value2']] = df2[['Value','Value2']].apply(pd.to_numeric)
#Make plot with red color with values below 0.6 and green color with values above 0.6
sns.lmplot( x="Value", y="Value2", data=df2, fit_reg=False, hue='Category', legend=False, palette=dict(high="#2ecc71", low="#e74c3c"))
Your output should look like this.
This is a code I've written:
import pandas as pd
import matplotlib.pyplot as plt
data1 = pd.read_csv('F:\HCSE\sample_data1.csv',sep=';')
colnames = data1.columns
plt.plot(data1.iloc[:,0],data1.iloc[:,2],'bs')
plt.ylabel(colnames[2])
plt.xlabel(colnames[0])
plt.show()
This is the data I have used:
Age;Gender;LOS;WBC;HB;Nothrophil
0.62;1;0.11;9.42;22.44;70.43
0.84;0;0.37;4.4;10.4;88.4
0.78;0;0.23;6.8;15.6;76.5
0.8;0;-0.02;9.3;15.1;87
0.7;1;0.19;5.3;11.3;82
0.25;0;0.27;5.9;10.6;87.59
0.32;0;0.37;3.1;12.5;15.4
0.86;1;0.31;4.1;10.4;77
0.75;0;0.21;12.07;14.1;88
Finally, I have drawn the chart which can be found in the link here.
My questions is: How can I have different colors for different sexes (for example: male=red and female=blue)?
Thanks in advance
I think you're looking for something like this:
cols = {0: 'red', 1: 'blue'}
plt.scatter(data1.Age, data1.LOS, c=data1.Gender.map(cols))
With your dataframe as it is, you could use the built-in df.plot.scatter() function and pass Gender to the color keyword:
data1.plot.scatter(
'Age', 'LOS',
c='Gender', cmap='RdBu',
edgecolor='None', s=45)
Note that I've also removed the black borders around each point and slightly increased the size.
I am new to Pandas and its libraries. By using the following code I can make a scatter plot of my 'class' in the plane 'Month' vs 'Amount'. Because I consider more than one class I would like to use colors for distinguishing each class and to see a legend in the figure.
Below my first attempt can generate dots for each given class having a different color but it can not generate the right legend. On the contrary the second attempt can generate the right legend but labeling is not correct. I can indeed visualize the first letter of each class name. Moreover this second attempt plots as many figures as the number of classes. I would like to see how I can correct both my attempts. Any ideas? suggestions? Thanks in advance.
ps. I wanted to use
colors = itertools.cycle(['gold','blue','red','chocolate','mediumpurple','dodgerblue'])
as well, so that I could decide the colors. I could not make it though.
Attempts:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import matplotlib.cm as cm
np.random.seed(176)
random.seed(16)
df = pd.DataFrame({'class': random.sample(['living room','dining room','kitchen','car','bathroom','office']*10, k=25),
'Amount': np.random.sample(25)*100,
'Year': random.sample(list(range(2010,2018))*50, k=25),
'Month': random.sample(list(range(1,12))*100, k=25)})
print(df.head(25))
print(df['class'].unique())
for cls1 in df['class'].unique():
test1= pd.pivot_table(df[df['class']==cls1], index=['class', 'Month', 'Year'], values=['Amount'])
print(test1)
colors = cm.rainbow(np.linspace(0,2,len(df['class'].unique())))
fig, ax = plt.subplots(figsize=(8,6))
for cls1,c in zip(df['class'].unique(),colors):
# SCATTER PLOT
test = pd.pivot_table(df[df['class']==cls1], index=['class', 'Month', 'Year'], values=['Amount'], aggfunc=np.sum).reset_index()
test.plot(kind='scatter', x='Month',y='Amount', figsize=(16,6),stacked=False,ax=ax,color=c,s=50).legend(df['class'].unique(),scatterpoints=1,loc='upper left',ncol=3,fontsize=10.5)
plt.show()
for cls2,c in zip(df['class'].unique(),colors):
# SCATTER PLOT
test = pd.pivot_table(df[df['class']==cls2], index=['class', 'Month', 'Year'], values=['Amount'], aggfunc=np.sum).reset_index()
test.plot(kind='scatter', x='Month',y='Amount', figsize=(16,6),stacked=False,color=c,s=50).legend(cls2,scatterpoints=1,loc='upper left',ncol=3,fontsize=10.5)
plt.show()
enter image description here
Up-to-date code
I would like to plot the following code via scatter plot.
for cls1 in df['class'].unique():
test3= pd.pivot_table(df[df['class']==cls1], index=['class', 'Month'], values=['Amount'], aggfunc=np.sum)
print(test3)
Unlike above here a class appears only once each month thanks to the sum over Amount.
Here my attempt:
for cls2 in df['class'].unique():
test2= pd.pivot_table(df[df['class']==cls2], index=['class','Year'], values=['Amount'], aggfunc=np.sum).reset_index()
print(test2)
sns.lmplot(x='Year' , y='Amount', data=test2, hue='class',palette='hls', fit_reg=False,size= 5, aspect=5/3, legend_out=False,scatter_kws={"s": 70})
plt.show()
This gives me one plot for each class. A part from the first one (class=car) which shows different colors, the others seem to be ok. Despite this, I would like to have only one plot with all classes..
After the Marvin Taschenberger's useful help here is up-to-date result:
enter image description here
I get a white dot instead a colorful one and the legend has a different place in the figure with respect to your figure. Moreover I can not see the year labels correctly. Why?
An easy way to work around ( unfortunately not solving) your problem is letting seaborn deal with the heavy lifting due to the simple line
sns.lmplot(x='Month' , y='Amount', data=df, hue='class',palette='hls', fit_reg=False,size= 8, aspect=5/3, legend_out=False)
You could also plug in other colors for palette
EDIT : how about this then :
`
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import seaborn as sns
np.random.seed(176)
random.seed(16)
df = pd.DataFrame({'class': random.sample(['living room','dining room','kitchen','car','bathroom','office']*10, k=25),
'Amount': np.random.sample(25)*100,
'Year': random.sample(list(range(2010,2018))*50, k=25),
'Month': random.sample(list(range(1,12))*100, k=25)})
frame = pd.pivot_table(df, index=['class','Year'], values=['Amount'], aggfunc=np.sum).reset_index()
sns.lmplot(x='Year' , y='Amount', data=frame, hue='class',palette='hls', fit_reg=False,size= 5, aspect=5/3, legend_out=False,scatter_kws={"s": 70})
plt.show()
male[['Gender','Age']].plot(kind='hist', x='Gender', y='Age', bins=50)
female[['Gender','Age']].plot(kind='hist', x='Gender', y='Age', bins=50)
So basically, I used data from a file to create two histograms based on gender and age. From the beginning I separated the data by gender to initially plot. Now i'm having a hard time putting the two histograms together.
As mentioned in the comment, you can use matplotlib to do this task. I haven't figured out how to plot two histogram using Pandas tho (would like to see how people have done that).
import matplotlib.pyplot as plt
import random
# example data
age = [random.randint(20, 40) for _ in range(100)]
sex = [random.choice(['M', 'F']) for _ in range(100)]
# just give a list of age of male/female and corresponding color here
plt.hist([[a for a, s in zip(age, sex) if s=='M'],
[a for a, s in zip(age, sex) if s=='F']],
color=['b','r'], alpha=0.5, bins=10)
plt.show()
Consider converting the dataframes to a two-column numpy matrix as matplotlib's hist works with this structure instead of two different length pandas dataframes with non-numeric columns. Pandas' join is used to bind the two columns, MaleAge and FemaleAge.
Here, the Gender indicator is removed and manually labeled according to the column order.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
...
# RESET INDEX AND RENAME COLUMN AFTER SUBSETTING
male = df2[df2['Gender'] == "M"].reset_index(drop=True).rename(columns={'Age':'MaleAge'})
female = df2[df2['Gender'] == "F"].reset_index(drop=True).rename(columns={'Age':'FemaleAge'})
# OUTER JOIN TO ACHIEVE SAME LENGTH
gendermat = np.array(male[['MaleAge']].join(female[['FemaleAge']], how='outer'))
plt.hist(gendermat, bins=50, label=['male', 'female'])
plt.legend(loc='upper right')
plt.show()
plt.clf()
plt.close()