Plotting N number of Graphs - python

I have a DataFrame that looks like this:
spot total
date_delivery
2016-06-21 x 20
2016-07-25 x 22
2016-08-14 x 25
2016-09-11 y 16
2016-10-16 y 10
The index of the DataFrame is in a datetime format. I want to create a simple graph for each unique spot that shows the total over time. I am having trouble writing a loop that performs this as well as saves each one. Keep in mind that while there is only 2 actual spots in this DataFrame the real one has many many more.

Append spot to the index, groupby spot and then plot
df.set_index('spot', append=True).groupby(level='spot').plot(kind='bar')
For your example you'll get two bar graphs, one for x, one for y, right below each other (but you can customize that)

Related

Calculating the cumulative bottom values for a stacked bar chart when the length of the array varies

Found how to do it:
used pandas to groupby strike,expiration and sum openInterest, then after a couple of hours of scratching my head i learned what .unstack() does and did that.
y = option_chain.groupby(['strike', 'expirationDate'])['openInterest'].sum().unstack(level=-1)
y.plot.bar(stacked=True)
I am looking to plot a stacked bar chart for options open interest. I am looking to do the exact same thing as the person in this link : https://medium.com/#txlian13/webscraping-options-data-with-python-and-yfinance-e4deb0124613 . I have the data from the same source and I have it arranged in the same way.
My problem is that I can't find a way to calculate the bottom argument and chart looks like this:
all the values start at y=0 and not the previous bar height
tried this code among other options but not managed to make it work
exp is a list of all possible expiration dates for the options
bottom = np.zeros(12) #(using 12 because I am testing with the same stock, so I know my first array needs to be 12 to match the number of strikes for the first date)
for i in exp:
z = option_chain.loc[option_chain['expirationDate'] == i]
zx = z['strike']
zy = z['openInterest']
#here i print my bottom and its an empty array of 0s so it will plot the next line from 0
plt.bar(zx,zy,label=i,alpha=0.7,bottom=bottom)
bottom += zy
#i print bottom again here and I can see that it has the 12 correct values of the open interest
#then i get an error "ValueError: shape mismatch: objects cannot be broadcast to a single shape"
So my problem is that the strike (my x values) changes with every iteration I make. For example my first iteration has 12 values for x and the second one has 9 value for x.
So, is there a way to have a variable array that changes with my x and also I realize this will lead to another problem: how to match the x's so that it gets added to the correct strike.
One way I was thinking to do is to find which date has the most strikes and use that as my base, but the problem with that is that it is not given that the date with most strikes has all the strikes in the other dates.
If the problem can be easily fixed with another plotting package, I have no issue in using that. I am a finance graduate and just trying to learn python so only used matplotlib as it's the one the with the most learning materials out there.

multiple boxplot in subplots in python

I have 18 individual of np.arrays, each containing 30 numbers with similar range (share = True).
I want to create boxplots for all 18 arrays in a subplot of 1 row, 4 columns. Each subplot will contain few sets of arrays.
How do I do this?
when I try it, it looks like this:
This was my trying to put them in one, the red scratch was what I want it to look like
I get this solved!!
Since it's only 1 row,
I should use only
-axes(num)
instead of
-axes(num,0)

Any existing methods to find a drop in a noisy time series?

I have a time series (array of values) and I would like to find the starting points where a long drop in values begins (at least X consecutive values going down). For example:
Having a list of values
[1,2,3,4,3,4,5,4,3,4,5,4,3,2,1,2,3,2,3,4,3,4,5,6,7,8]
I would like to find a drop of at least 5 consecutive values. So in this case I would find the segment 5,4,3,2,1.
However, in a real scenario, there is noise in the data, so the actual drop includes a lot of little ups and downs.
I could write an algorithm for this. But I was wondering whether there is an existing library or standard signal processing method for this type of analysis.
You can do this pretty easily with pandas (which I know you have). Convert your list to a series, and then perform a groupby + count to find consecutively declining values:
v = pd.Series([...])
v[v.groupby(v.diff().gt(0).cumsum()).transform('size').ge(5)]
10 5
11 4
12 3
13 2
14 1
dtype: int64

Python Pandas - Don't sort bar graph on y axis values

I am beginner in Python. I have a Series with Date and count of some observation as below
Date Count
2003 10
2005 50
2015 12
2004 12
2003 15
2008 10
2004 05
I wanted to plot a graph to find out the count against the year with a Bar graph (x axis as year and y axis being count). I am using the below code
import pandas as pd
pd.value_counts(sfdf.Date_year).plot(kind='bar')
I am getting the bar graph which is automatically sorted on the count. So I am not able to clearly visualize how the count is distributed over the years. Is there any way we can stop sorting the data on the bar graph on the count and instead sort on the x axis values (i,e year)?
I know this is an old question, but in case someone is still looking for another answer.
I solved this by adding .sort_index(axis=0)
So, instead of this:
pd.value_counts(sfdf.Date_year).plot(kind='bar')
you can write this:
pd.value_counts(sfdf.Date_year).sort_index(axis=0).plot(kind='bar')
Hope, this helps.
The following code uses groupby() to join the multiple instances of the same year together, and then calls sum() on the groupby() object to sum it up. By default groupby() pushes the grouped object to the dataframe index. I think that groupby() automatically sorts, but just in case, sort(axis=0) will sort the index. All that then remains is to plot. All in one line:
df = pd.DataFrame([[2003,10],[2005,50],[2015,12],[2004,12],[2003,15],[2008,10],[2004,5]],columns=['Date','Count'])
df.groupby('Date').sum().sort(axis=0).plot(kind='bar')

random sampling with pandas dataframe

I'm relatively new to pandas (and python... and programming) and I'm trying to do a Montecarlo simulation, but I have not being able to find a solution that takes a reasonable amount of time
The data is stored in a data frame called "YTDSales" which has sales per day, per product
Date Product_A Product_B Product_C Product_D ... Product_XX
01/01/2014 1000 300 70 34500 ... 780
02/01/2014 400 400 70 20 ... 10
03/01/2014 1110 400 1170 60 ... 50
04/01/2014 20 320 0 71300 ... 10
...
15/10/2014 1000 300 70 34500 ... 5000
and what I want to do is to simulate different scenarios, using for the rest of the year (from October 15 to Year End) the historical distribution that each product had. For example with the data presented I will like to fill the rest of the year with sales between 20 and 1100.
What I've done is the following
# creates range of "future dates"
last_historical = YTDSales.index.max()
year_end = dt.datetime(2014,12,30)
DatesEOY = pd.date_range(start=last_historical,end=year_end).shift(1)
# function that obtains a random sales number per product, between max and min
f = lambda x:np.random.randint(x.min(),x.max())
# create all the "future" dates and fill it with the output of f
for i in DatesEOY:
YTDSales.loc[i]=YTDSales.apply(f)
The solution works, but takes about 3 seconds, which is a lot if I plan to 1,000 iterations... Is there a way not to iterate?
Thanks
Use the size option for np.random.randint to get a sample of the needed size all at once.
One approach that I would consider is briefly as follows.
Allocate the space you'll need into a new array that will have index values from DatesEOY, columns from the original DataFrame, and all NaN values. Then concatenate onto the original data.
Now that you know the length of each random sample you'll need, use the extra size keyword in numpy.random.randint to sample all at once, per column, instead of looping.
Overwrite the data with this batch sampling.
Here's what this could look like:
new_df = pandas.DataFrame(index=DatesEOY, columns=YTDSales.columns)
num_to_sample = len(new_df)
f = lambda x: np.random.randint(x[1].min(), x[1].max(), num_to_sample)
output = pandas.concat([YTDSales, new_df], axis=0)
output[len(YTDSales):] = np.asarray(map(f, YTDSales.iteritems())).T
Along the way, I choose to make a totally new DataFrame, by concatenating the old one with the new "placeholder" one. This could obviously be inefficient for very large data.
Another way to approach is setting with enlargement as you've done in your for-loop solution.
I did not play around with that approach long enough to figure out how to "enlarge" batches of indexes all at once. But, if you figure that out, you can just "enlarge" the original data frame with all NaN values (at index values from DatesEOY), and then apply the function about to YTDSales instead of bringing output into it at all.

Categories

Resources