Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a 1000*8 dataset and each column represent the price of a stock in different time so there are 8 stocks. I want to draw 8 boxplots for all the stocks to examine the extreme values in a loop in python. Could you please tell me how I can do that?
As a quick alternative to using matplotlib directly, Pandas has a reasonable boxplot function that could be used.
df = pd.DataFrame(np.random.randn(1000, 8), columns=list('ABCDEFGH'))
df.boxplot(column = list(df.columns))
edit: Just realise your question asked to do this in a loop.
for c in df.columns:
fig, ax = plt.subplots()
ax = df.boxplot(column = c)
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 days ago.
Improve this question
enter image description here
Currently I have the following excel table that i am importing in pandas. I am trying to transpose the data so that the Name 1, Name 2, etc., is the index column and the dates are now the headers. After, I want to create a new table with this layout that selects the most recent data point based on the dates per name.
transposed_df = df.transpose()
print (transposed_df)
transposed_df.set_index('Name', inplace=True)
latest_date_col_index = df.idxmax(axis=1)
latest_data = df.lookup(df.index, latest_date_col_index)
df_latest = pd.DataFrame(latest_data, index=df.index, columns=['Latest Data'])
print(df_latest)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have a pandas dataframe in following format
date ticks value
the ticks vary from 1 to 12 for each date. and there are corresponding values in value column
I want to plot a time series line chart where x-axis represents ticks from 1 to 12, the y-axis represents value and there are multiple lines on the chart, each line representing a new date. How can i achieve this using pandas or any other library like matplotlib
Use:
# making sample df
df = pd.DataFrame({'date':['2020']*12+['2019']*12, 'ticks': list(range(1, 13))*2, 'value': np.random.randint(1,100,24)})
g = df.groupby('date').agg(list).reset_index()
import matplotlib.pyplot as plt
for i, row in g.iterrows():
plt.plot(row['ticks'], row['value'], label = row['date']);
plt.legend();
Output:
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a data of dates with times and the industries happened during this dates. For example the data would be something like this:
I want to plot the dates with as months with which industries occurred the most during this months
How can I do that?
So your problem seems to be that you have two different data types which makes creating a graph difficult. However you can reformat the data to the proper types you want which will make creating a graph in the way you intend much easier. Something like this should work for what your wanting.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(
[{'date_raised':pd.to_datetime('2016-01-01 00:00:00'),'primary_industry':'Real Estate'},
{'date_raised':pd.to_datetime('2016-01-10 04:00:00'),'primary_industry':'IT Solutions'},
{'date_raised':pd.to_datetime('2016-01-04 04:00:00'),'primary_industry':'Multimedia'},
{'date_raised':pd.to_datetime('2016-01-05 04:00:00'),'primary_industry':'Technology'},
{'date_raised':pd.to_datetime('2016-01-09 04:00:00'),'primary_industry':'Technology'}]
)
#Group data for monthly occurrences
result = data.sort_values('date_raised').groupby([data['date_raised'].dt.strftime('%B')])['primary_industry'].value_counts().unstack(level=1)
result.index.name = None #Remove index name "date_raised"
result.columns.names = [None] #Remove series name "primary_industry"
#Plot data
ax = result.plot(kind='bar',use_index=True,rot=1)
ax.set_xlabel('Month')
ax.set_ylabel('Total Occurrences')
plt.show()
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have created a single stacked bar chart but I want to have them clustered. Exactly something like the picture.
Wondering if it's possible.
link to picture
df = pd.DataFrame(dict(Subsidy=[3, 3, 3],
Bonus=[1, 1, 1],
Expense=[2, 2, 2]),
list('ABC'))
df
ax = df[['Subsidy', 'Bonus']].plot.bar(stacked=True, position=1,
width=.2, ylim=[0, 8], color=['orange', 'red'])
df[['Expense']].plot.bar(ax=ax, position=0, width=.2, color=['green'])
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a data frame with 1000 rows and 10 columns.
3 of these columns are 'total_2013', 'total_2014' and 'total_2015'
I would like to create a new column, containing the average of total over these 3 years for each row, but ignoring any 0 values.
If you are using pandas:
Use DataFrame.mean leveraging the skipna attribute.
First replace 0 with None using:
columns = ['total_2013', 'total_2014', 'total_2015']
df[columns].replace(0, None)
Then compute the mean:
df["total"] = df[columns].mean(
axis=1, # columns mean
skipna=True # skip nan values
)