Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have a pandas dataframe in following format
date ticks value
the ticks vary from 1 to 12 for each date. and there are corresponding values in value column
I want to plot a time series line chart where x-axis represents ticks from 1 to 12, the y-axis represents value and there are multiple lines on the chart, each line representing a new date. How can i achieve this using pandas or any other library like matplotlib
Use:
# making sample df
df = pd.DataFrame({'date':['2020']*12+['2019']*12, 'ticks': list(range(1, 13))*2, 'value': np.random.randint(1,100,24)})
g = df.groupby('date').agg(list).reset_index()
import matplotlib.pyplot as plt
for i, row in g.iterrows():
plt.plot(row['ticks'], row['value'], label = row['date']);
plt.legend();
Output:
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a data of dates with times and the industries happened during this dates. For example the data would be something like this:
I want to plot the dates with as months with which industries occurred the most during this months
How can I do that?
So your problem seems to be that you have two different data types which makes creating a graph difficult. However you can reformat the data to the proper types you want which will make creating a graph in the way you intend much easier. Something like this should work for what your wanting.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(
[{'date_raised':pd.to_datetime('2016-01-01 00:00:00'),'primary_industry':'Real Estate'},
{'date_raised':pd.to_datetime('2016-01-10 04:00:00'),'primary_industry':'IT Solutions'},
{'date_raised':pd.to_datetime('2016-01-04 04:00:00'),'primary_industry':'Multimedia'},
{'date_raised':pd.to_datetime('2016-01-05 04:00:00'),'primary_industry':'Technology'},
{'date_raised':pd.to_datetime('2016-01-09 04:00:00'),'primary_industry':'Technology'}]
)
#Group data for monthly occurrences
result = data.sort_values('date_raised').groupby([data['date_raised'].dt.strftime('%B')])['primary_industry'].value_counts().unstack(level=1)
result.index.name = None #Remove index name "date_raised"
result.columns.names = [None] #Remove series name "primary_industry"
#Plot data
ax = result.plot(kind='bar',use_index=True,rot=1)
ax.set_xlabel('Month')
ax.set_ylabel('Total Occurrences')
plt.show()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Date,hrs,Count,Status
2018-01-02,4,15,SFZ
2018-01-03,5,16,ACZ
2018-01-04,3,14,SFZ
2018-01-05,5,15,SFZ
2018-01-06,5,18,ACZ
This is the fraction of data to what I've been working on. The actual data is in the same format with around 1000 entries of each date in it. I am taking the start_date and end_date as inputs from user. Consider in this case it is:
start_date:2018-01-02
end_date:2018-01-06
So, I have to display a total for hrs and the count within the selected date range, on the output. Also I want to do it using an #app.callback in dash(plot.ly). Can someone help please?
Use Series.between with filtering by DataFrame.loc and boolean indexing for columns by condition and then sum:
df = df.loc[df['Date'].between('2018-01-02','2018-01-06'), ['hrs','Count']].sum()
print (df)
hrs 22
Count 78
dtype: int64
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a 1000*8 dataset and each column represent the price of a stock in different time so there are 8 stocks. I want to draw 8 boxplots for all the stocks to examine the extreme values in a loop in python. Could you please tell me how I can do that?
As a quick alternative to using matplotlib directly, Pandas has a reasonable boxplot function that could be used.
df = pd.DataFrame(np.random.randn(1000, 8), columns=list('ABCDEFGH'))
df.boxplot(column = list(df.columns))
edit: Just realise your question asked to do this in a loop.
for c in df.columns:
fig, ax = plt.subplots()
ax = df.boxplot(column = c)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a big dataset of 83000 rows with dates and values. I want to generate a plot with a moving average value and time. But my graph is not clear as you may see in the images. How can I adjust the graph and make it clearer? Is there another way to plot such a big dataset like this? When I look at this graph so many lines are like put on each other and they don't mean a lot?
(I generally use matplotlib and seaborn libraries for Python)
Given this dataframe:
df.head()
complete mid_c mid_h mid_l mid_o time
0 True 0.80936 0.80943 0.80936 0.80943 2018-01-31 09:54:10+00:00
1 True 0.80942 0.80942 0.80937 0.80937 2018-01-31 09:54:20+00:00
2 True 0.80946 0.80946 0.80946 0.80946 2018-01-31 09:54:25+00:00
3 True 0.80942 0.80942 0.80940 0.80940 2018-01-31 09:54:30+00:00
4 True 0.80944 0.80944 0.80944 0.80944 2018-01-31 09:54:35+00:00
Create a 50 moving average:
df['ma'] = df.mid_c.rolling(window=50).mean()
plot it:
df.plot('time', ['mid_c', 'ma'])
import matplotlib.pyplot as plt
plt.show()
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a Pandas DataFrame that contains multiple columns and multiIndex. I would like to plot data from two columns(“Total” and ”Sold”) as different line charts and use the values from the third column “Percentage” as the text of the annotation for the points on the “Sold” chart.
What is the best way to do it? Any advice and suggestions will be greatly appreciated.
#data is a dict
data = { 'Department': ['Furniture','Furniture','Furniture',
'Gifts','Gifts','Gifts'],
'Month':['May','June','July','May','June','July'],
'Total':[2086,1740,1900,984,662,574],
'Sold':[201,225,307,126,143,72],
'Percentage':[10, 13, 16, 13, 22, 13]
}
# DataFrame() turns the dict into a DataFrame
# Set up MultiIndex
df=pd.DataFrame(data)
df.set_index(['Department', 'Month'], inplace=True)
df
DataFrame
# Plot departments
departments=df.index.get_level_values(0).unique()
for department in departments:
ax=df.ix[department].plot(title=department,y=['Total','Sold'],
xlim=(-1.0, 3.0))
Plot from DataFrame
You could achieve this in different ways.
I will just mention a couple, the most straightforward ones without the goal of being complete and I am sure there are many easier ways to do that.
One way involves the use of the method text.
In your case would be
ii = [0, 1, 2] # the locations of the month labels, according to your plotting... I leave it to you to automatize or find a way to retrieve those
for department in departments:
ax=df.ix[department].plot(title=department,y=['Total','Sold'], xlim=(-1.0, 3.0))
for c, months in enumerate(unique_list_of_months): # in your case would be ['May', 'June', 'July']
ax.text(ii[c], df.ix[department]['Sold'][c], str(df.ix[department]['Percentage'][c]) + '%')
The other method involves the use of annotate. Leaving out some for loops as above, you would replace the call to ax.text with something like
ax.annotate(str(df.ix[department]['Percentage'][months]) + '%',
(ii[c], df.ix[department]['Sold'][months]),
xytext=(0, 0),
textcoords='offset points')
Of course you can tweak positions, font size, etc.
For an intro to annotations, please consult the official webpage:
Matplotlib annotations
Here the resulting plots I get: