Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a big dataset of 83000 rows with dates and values. I want to generate a plot with a moving average value and time. But my graph is not clear as you may see in the images. How can I adjust the graph and make it clearer? Is there another way to plot such a big dataset like this? When I look at this graph so many lines are like put on each other and they don't mean a lot?
(I generally use matplotlib and seaborn libraries for Python)
Given this dataframe:
df.head()
complete mid_c mid_h mid_l mid_o time
0 True 0.80936 0.80943 0.80936 0.80943 2018-01-31 09:54:10+00:00
1 True 0.80942 0.80942 0.80937 0.80937 2018-01-31 09:54:20+00:00
2 True 0.80946 0.80946 0.80946 0.80946 2018-01-31 09:54:25+00:00
3 True 0.80942 0.80942 0.80940 0.80940 2018-01-31 09:54:30+00:00
4 True 0.80944 0.80944 0.80944 0.80944 2018-01-31 09:54:35+00:00
Create a 50 moving average:
df['ma'] = df.mid_c.rolling(window=50).mean()
plot it:
df.plot('time', ['mid_c', 'ma'])
import matplotlib.pyplot as plt
plt.show()
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have a pandas dataframe in following format
date ticks value
the ticks vary from 1 to 12 for each date. and there are corresponding values in value column
I want to plot a time series line chart where x-axis represents ticks from 1 to 12, the y-axis represents value and there are multiple lines on the chart, each line representing a new date. How can i achieve this using pandas or any other library like matplotlib
Use:
# making sample df
df = pd.DataFrame({'date':['2020']*12+['2019']*12, 'ticks': list(range(1, 13))*2, 'value': np.random.randint(1,100,24)})
g = df.groupby('date').agg(list).reset_index()
import matplotlib.pyplot as plt
for i, row in g.iterrows():
plt.plot(row['ticks'], row['value'], label = row['date']);
plt.legend();
Output:
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a data of dates with times and the industries happened during this dates. For example the data would be something like this:
I want to plot the dates with as months with which industries occurred the most during this months
How can I do that?
So your problem seems to be that you have two different data types which makes creating a graph difficult. However you can reformat the data to the proper types you want which will make creating a graph in the way you intend much easier. Something like this should work for what your wanting.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(
[{'date_raised':pd.to_datetime('2016-01-01 00:00:00'),'primary_industry':'Real Estate'},
{'date_raised':pd.to_datetime('2016-01-10 04:00:00'),'primary_industry':'IT Solutions'},
{'date_raised':pd.to_datetime('2016-01-04 04:00:00'),'primary_industry':'Multimedia'},
{'date_raised':pd.to_datetime('2016-01-05 04:00:00'),'primary_industry':'Technology'},
{'date_raised':pd.to_datetime('2016-01-09 04:00:00'),'primary_industry':'Technology'}]
)
#Group data for monthly occurrences
result = data.sort_values('date_raised').groupby([data['date_raised'].dt.strftime('%B')])['primary_industry'].value_counts().unstack(level=1)
result.index.name = None #Remove index name "date_raised"
result.columns.names = [None] #Remove series name "primary_industry"
#Plot data
ax = result.plot(kind='bar',use_index=True,rot=1)
ax.set_xlabel('Month')
ax.set_ylabel('Total Occurrences')
plt.show()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm using pandas and I have a dataset containing 20 columns and 65 rows. What I'm trying to do is to try to measure the data completeness. So, I want to check the percentage of NaN values compared to the whole dataset. For example, the output I need is: The percentage of NaNs in the dataset is: 40%
I've counted the number of NaNs by doing the following: comp_df.isna().sum().sum() and got a result of 776. But, I don't know what to do next.
Use:
comp_df = pd.DataFrame(dict(a=[np.nan,1,1],
b=[np.nan,np.nan,np.nan]))
print (comp_df)
a b
0 NaN NaN
1 1.0 NaN
2 1.0 NaN
In your solution is possible divide by DataFrame.size for number of all values:
print (comp_df.isna().sum().sum() / comp_df.size * 100)
66.66666666666666
Or reshape values to Series, by DataFrame.stack and use mean, what is sum/count by definition:
print (comp_df.isna().stack().mean() * 100)
66.66666666666666
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a 1000*8 dataset and each column represent the price of a stock in different time so there are 8 stocks. I want to draw 8 boxplots for all the stocks to examine the extreme values in a loop in python. Could you please tell me how I can do that?
As a quick alternative to using matplotlib directly, Pandas has a reasonable boxplot function that could be used.
df = pd.DataFrame(np.random.randn(1000, 8), columns=list('ABCDEFGH'))
df.boxplot(column = list(df.columns))
edit: Just realise your question asked to do this in a loop.
for c in df.columns:
fig, ax = plt.subplots()
ax = df.boxplot(column = c)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a dataframe with 3 columns (department, sales, region), and I want to write a method to display all rows that are from the least common region. Then I need to write another method to count the frequency of the departments that are represented in the least common region. No idea how to do this.
Functions would be unecessary - pandas already has implementations to accomplish what you want! Suppose I had the following csv file, test.csv...
department,sales,region
sales,26,midwest
finance,45,midwest
tech,69,west
finance,43,east
hr,20,east
sales,34,east
If I'm understanding you correctly, I would obtain a DataFrame representing the least common region like so:
import pandas as pd
df = pd.read_csv('test.csv')
counts = df['region'].value_counts()
least_common = counts[counts == counts.min()].index[0]
least_common_df = df.loc[df['region'] == least_common]
least_common_df is now:
department sales region
2 tech 69 west
As for obtaining the department frequency for the least common region, I'll leave that up to you. (I've already shown you how to get the frequency for region.)