Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a data of dates with times and the industries happened during this dates. For example the data would be something like this:
I want to plot the dates with as months with which industries occurred the most during this months
How can I do that?
So your problem seems to be that you have two different data types which makes creating a graph difficult. However you can reformat the data to the proper types you want which will make creating a graph in the way you intend much easier. Something like this should work for what your wanting.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(
[{'date_raised':pd.to_datetime('2016-01-01 00:00:00'),'primary_industry':'Real Estate'},
{'date_raised':pd.to_datetime('2016-01-10 04:00:00'),'primary_industry':'IT Solutions'},
{'date_raised':pd.to_datetime('2016-01-04 04:00:00'),'primary_industry':'Multimedia'},
{'date_raised':pd.to_datetime('2016-01-05 04:00:00'),'primary_industry':'Technology'},
{'date_raised':pd.to_datetime('2016-01-09 04:00:00'),'primary_industry':'Technology'}]
)
#Group data for monthly occurrences
result = data.sort_values('date_raised').groupby([data['date_raised'].dt.strftime('%B')])['primary_industry'].value_counts().unstack(level=1)
result.index.name = None #Remove index name "date_raised"
result.columns.names = [None] #Remove series name "primary_industry"
#Plot data
ax = result.plot(kind='bar',use_index=True,rot=1)
ax.set_xlabel('Month')
ax.set_ylabel('Total Occurrences')
plt.show()
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have a pandas dataframe in following format
date ticks value
the ticks vary from 1 to 12 for each date. and there are corresponding values in value column
I want to plot a time series line chart where x-axis represents ticks from 1 to 12, the y-axis represents value and there are multiple lines on the chart, each line representing a new date. How can i achieve this using pandas or any other library like matplotlib
Use:
# making sample df
df = pd.DataFrame({'date':['2020']*12+['2019']*12, 'ticks': list(range(1, 13))*2, 'value': np.random.randint(1,100,24)})
g = df.groupby('date').agg(list).reset_index()
import matplotlib.pyplot as plt
for i, row in g.iterrows():
plt.plot(row['ticks'], row['value'], label = row['date']);
plt.legend();
Output:
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a 1000*8 dataset and each column represent the price of a stock in different time so there are 8 stocks. I want to draw 8 boxplots for all the stocks to examine the extreme values in a loop in python. Could you please tell me how I can do that?
As a quick alternative to using matplotlib directly, Pandas has a reasonable boxplot function that could be used.
df = pd.DataFrame(np.random.randn(1000, 8), columns=list('ABCDEFGH'))
df.boxplot(column = list(df.columns))
edit: Just realise your question asked to do this in a loop.
for c in df.columns:
fig, ax = plt.subplots()
ax = df.boxplot(column = c)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a big dataset of 83000 rows with dates and values. I want to generate a plot with a moving average value and time. But my graph is not clear as you may see in the images. How can I adjust the graph and make it clearer? Is there another way to plot such a big dataset like this? When I look at this graph so many lines are like put on each other and they don't mean a lot?
(I generally use matplotlib and seaborn libraries for Python)
Given this dataframe:
df.head()
complete mid_c mid_h mid_l mid_o time
0 True 0.80936 0.80943 0.80936 0.80943 2018-01-31 09:54:10+00:00
1 True 0.80942 0.80942 0.80937 0.80937 2018-01-31 09:54:20+00:00
2 True 0.80946 0.80946 0.80946 0.80946 2018-01-31 09:54:25+00:00
3 True 0.80942 0.80942 0.80940 0.80940 2018-01-31 09:54:30+00:00
4 True 0.80944 0.80944 0.80944 0.80944 2018-01-31 09:54:35+00:00
Create a 50 moving average:
df['ma'] = df.mid_c.rolling(window=50).mean()
plot it:
df.plot('time', ['mid_c', 'ma'])
import matplotlib.pyplot as plt
plt.show()
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a Pandas DataFrame that contains multiple columns and multiIndex. I would like to plot data from two columns(“Total” and ”Sold”) as different line charts and use the values from the third column “Percentage” as the text of the annotation for the points on the “Sold” chart.
What is the best way to do it? Any advice and suggestions will be greatly appreciated.
#data is a dict
data = { 'Department': ['Furniture','Furniture','Furniture',
'Gifts','Gifts','Gifts'],
'Month':['May','June','July','May','June','July'],
'Total':[2086,1740,1900,984,662,574],
'Sold':[201,225,307,126,143,72],
'Percentage':[10, 13, 16, 13, 22, 13]
}
# DataFrame() turns the dict into a DataFrame
# Set up MultiIndex
df=pd.DataFrame(data)
df.set_index(['Department', 'Month'], inplace=True)
df
DataFrame
# Plot departments
departments=df.index.get_level_values(0).unique()
for department in departments:
ax=df.ix[department].plot(title=department,y=['Total','Sold'],
xlim=(-1.0, 3.0))
Plot from DataFrame
You could achieve this in different ways.
I will just mention a couple, the most straightforward ones without the goal of being complete and I am sure there are many easier ways to do that.
One way involves the use of the method text.
In your case would be
ii = [0, 1, 2] # the locations of the month labels, according to your plotting... I leave it to you to automatize or find a way to retrieve those
for department in departments:
ax=df.ix[department].plot(title=department,y=['Total','Sold'], xlim=(-1.0, 3.0))
for c, months in enumerate(unique_list_of_months): # in your case would be ['May', 'June', 'July']
ax.text(ii[c], df.ix[department]['Sold'][c], str(df.ix[department]['Percentage'][c]) + '%')
The other method involves the use of annotate. Leaving out some for loops as above, you would replace the call to ax.text with something like
ax.annotate(str(df.ix[department]['Percentage'][months]) + '%',
(ii[c], df.ix[department]['Sold'][months]),
xytext=(0, 0),
textcoords='offset points')
Of course you can tweak positions, font size, etc.
For an intro to annotations, please consult the official webpage:
Matplotlib annotations
Here the resulting plots I get:
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a dataframe with 3 columns (department, sales, region), and I want to write a method to display all rows that are from the least common region. Then I need to write another method to count the frequency of the departments that are represented in the least common region. No idea how to do this.
Functions would be unecessary - pandas already has implementations to accomplish what you want! Suppose I had the following csv file, test.csv...
department,sales,region
sales,26,midwest
finance,45,midwest
tech,69,west
finance,43,east
hr,20,east
sales,34,east
If I'm understanding you correctly, I would obtain a DataFrame representing the least common region like so:
import pandas as pd
df = pd.read_csv('test.csv')
counts = df['region'].value_counts()
least_common = counts[counts == counts.min()].index[0]
least_common_df = df.loc[df['region'] == least_common]
least_common_df is now:
department sales region
2 tech 69 west
As for obtaining the department frequency for the least common region, I'll leave that up to you. (I've already shown you how to get the frequency for region.)