Plot and annotate from DataFrame with MultiIndex and multiple columns [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a Pandas DataFrame that contains multiple columns and multiIndex. I would like to plot data from two columns(“Total” and ”Sold”) as different line charts and use the values from the third column “Percentage” as the text of the annotation for the points on the “Sold” chart.
What is the best way to do it? Any advice and suggestions will be greatly appreciated.
#data is a dict
data = { 'Department': ['Furniture','Furniture','Furniture',
'Gifts','Gifts','Gifts'],
'Month':['May','June','July','May','June','July'],
'Total':[2086,1740,1900,984,662,574],
'Sold':[201,225,307,126,143,72],
'Percentage':[10, 13, 16, 13, 22, 13]
}
# DataFrame() turns the dict into a DataFrame
# Set up MultiIndex
df=pd.DataFrame(data)
df.set_index(['Department', 'Month'], inplace=True)
df
DataFrame
# Plot departments
departments=df.index.get_level_values(0).unique()
for department in departments:
ax=df.ix[department].plot(title=department,y=['Total','Sold'],
xlim=(-1.0, 3.0))
Plot from DataFrame

You could achieve this in different ways.
I will just mention a couple, the most straightforward ones without the goal of being complete and I am sure there are many easier ways to do that.
One way involves the use of the method text.
In your case would be
ii = [0, 1, 2] # the locations of the month labels, according to your plotting... I leave it to you to automatize or find a way to retrieve those
for department in departments:
ax=df.ix[department].plot(title=department,y=['Total','Sold'], xlim=(-1.0, 3.0))
for c, months in enumerate(unique_list_of_months): # in your case would be ['May', 'June', 'July']
ax.text(ii[c], df.ix[department]['Sold'][c], str(df.ix[department]['Percentage'][c]) + '%')
The other method involves the use of annotate. Leaving out some for loops as above, you would replace the call to ax.text with something like
ax.annotate(str(df.ix[department]['Percentage'][months]) + '%',
(ii[c], df.ix[department]['Sold'][months]),
xytext=(0, 0),
textcoords='offset points')
Of course you can tweak positions, font size, etc.
For an intro to annotations, please consult the official webpage:
Matplotlib annotations
Here the resulting plots I get:

Related

Pandas Plot: Plotting the freq each person visits the park each month [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 days ago.
Improve this question
I have a dataframe grouped by month. I want to plot a graph with the x-axis showing each person id and the y axis showing the frequency of trips to the park each month. I would like each month to be a different color. The dataframe consist of an index, month, id, and freq column. The frequency is a calculation of how many times a person visited the park per month.
I've come up with the following two graphs, but I'm unable to figure out how to manipulate the graphs to display what I'm looking for exactly.
Grouped by Month using it has a group key
Grouped by both Month and Camera ID
I'm looking for a graph that will show a similar output. The dataset contains nearly a thousand people (y-axis), so I'm open to suggestions for a better format.
Desired Product
You need to use seaborn.barplot
I have created dummy data based on your desired output.
Data Creation Code
import pandas as pd
data = {'person_id':['1','1','2','2','3','3','4','4','1','3','3','4','4','1'] \
,'month':['Jan','March','Feb','June','Jan','July','April','Dec','Dec','Feb','Oct','Oct','Nov','May'] \
, 'frequency':[1,1,3,23,23,34,23,24,123,34,324,245,123,34]}
df = pd.DataFrame(data)
df
Data Looks Like This
The Code for resultant output should be
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(10,8))
sns.barplot(data=df,x='person_id',y='frequency',hue='month')
plt.xlabel("Person ID")
plt.ylabel("Frequency")
plt.title("Person ID v/s Frequency")
plt.show()
The Output

Plot time series chart with dates as multiple lines [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have a pandas dataframe in following format
date ticks value
the ticks vary from 1 to 12 for each date. and there are corresponding values in value column
I want to plot a time series line chart where x-axis represents ticks from 1 to 12, the y-axis represents value and there are multiple lines on the chart, each line representing a new date. How can i achieve this using pandas or any other library like matplotlib
Use:
# making sample df
df = pd.DataFrame({'date':['2020']*12+['2019']*12, 'ticks': list(range(1, 13))*2, 'value': np.random.randint(1,100,24)})
g = df.groupby('date').agg(list).reset_index()
import matplotlib.pyplot as plt
for i, row in g.iterrows():
plt.plot(row['ticks'], row['value'], label = row['date']);
plt.legend();
Output:

Timeseries data to plot [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a data of dates with times and the industries happened during this dates. For example the data would be something like this:
I want to plot the dates with as months with which industries occurred the most during this months
How can I do that?
So your problem seems to be that you have two different data types which makes creating a graph difficult. However you can reformat the data to the proper types you want which will make creating a graph in the way you intend much easier. Something like this should work for what your wanting.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(
[{'date_raised':pd.to_datetime('2016-01-01 00:00:00'),'primary_industry':'Real Estate'},
{'date_raised':pd.to_datetime('2016-01-10 04:00:00'),'primary_industry':'IT Solutions'},
{'date_raised':pd.to_datetime('2016-01-04 04:00:00'),'primary_industry':'Multimedia'},
{'date_raised':pd.to_datetime('2016-01-05 04:00:00'),'primary_industry':'Technology'},
{'date_raised':pd.to_datetime('2016-01-09 04:00:00'),'primary_industry':'Technology'}]
)
#Group data for monthly occurrences
result = data.sort_values('date_raised').groupby([data['date_raised'].dt.strftime('%B')])['primary_industry'].value_counts().unstack(level=1)
result.index.name = None #Remove index name "date_raised"
result.columns.names = [None] #Remove series name "primary_industry"
#Plot data
ax = result.plot(kind='bar',use_index=True,rot=1)
ax.set_xlabel('Month')
ax.set_ylabel('Total Occurrences')
plt.show()

How to plot in this kind of graph in Python or R? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have searched about how to plot the graph by matplotlib or ggplot but I couldn't figure out how to make it.
from Nature 500(7463):415-421 August 2013.
so I wanna plot in dots and with a mark for median, kind of showing distribution.
million thanks for any help!
This question is really about how to research the literature. So let's do that.
Here's the article in PubMed. It's also freely-available at PubMed Central. There, we find supplementary data files in XLS format. The file with data closest to what we need is this XLS file. Unfortunately, exploration reveals that it contains only 8 distinct tissue types, whereas Figure 1 contains 30. So we cannot reproduce that figure from the data. This is not uncommon in science.
However: the figure caption points us to this article, which contains a similar figure. Data is available in this XLS file.
I downloaded that file, opened in Excel and saved as the latest XLSX format. Now we can read it into R, assuming the file is in Downloads:
library(tidyverse)
library(readxl)
tableS2 <- read_excel("~/Downloads/NIHMS471461-supplement-3.xlsx",
sheet = "Table S2")
Now we read the figure caption:
Each dot corresponds to a tumor-normal pair, with vertical position indicating the total frequency of somatic mutations in the exome. Tumor types are ordered by their median somatic mutation frequency...
In our file, the pairs correspond to name, total frequency is n_coding_mutations and somatic mutation frequency is coding_mutation_rate. So we want to:
group by tumor_type
calculate the median of coding_mutation_rate
order the values of n_coding_mutations within tumor_type
order tumor_type by median coding_mutation_rate
And then plot the ordered total frequencies versus sample, grouped by the ordered tumor types.
Which might look something like this:
tableS2 %>%
group_by(tumor_type) %>%
mutate(median_n = median(n_coding_mutations)) %>%
arrange(tumor_type, coding_mutation_rate) %>%
mutate(idx = row_number()) %>%
arrange(median_n) %>%
ungroup() %>%
mutate(tumor_type = factor(tumor_type,
levels = unique(tumor_type))) %>%
ggplot(aes(idx, n_coding_mutations)) +
geom_point() +
facet_grid(~tumor_type,
switch = "x") +
scale_y_log10() +
geom_hline(aes(yintercept = median_n),
color = "red") +
theme_minimal() +
theme(strip.text.x = element_text(angle = 90),
axis.title.x = element_blank(),
axis.text.x = element_blank())
Result:
Which looks pretty close to the original:

Python method to display dataframe rows with least common column string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a dataframe with 3 columns (department, sales, region), and I want to write a method to display all rows that are from the least common region. Then I need to write another method to count the frequency of the departments that are represented in the least common region. No idea how to do this.
Functions would be unecessary - pandas already has implementations to accomplish what you want! Suppose I had the following csv file, test.csv...
department,sales,region
sales,26,midwest
finance,45,midwest
tech,69,west
finance,43,east
hr,20,east
sales,34,east
If I'm understanding you correctly, I would obtain a DataFrame representing the least common region like so:
import pandas as pd
df = pd.read_csv('test.csv')
counts = df['region'].value_counts()
least_common = counts[counts == counts.min()].index[0]
least_common_df = df.loc[df['region'] == least_common]
least_common_df is now:
department sales region
2 tech 69 west
As for obtaining the department frequency for the least common region, I'll leave that up to you. (I've already shown you how to get the frequency for region.)

Categories

Resources