I have the following code in python to create a treemap, which should show the changes in asset values from changing stock prices. However, I can only achieve, that the positive asset changes are displayed, the negatives are all left out. See result:
Treemap:
Underlying dataframe (dfWeek):
This is my code:
treemap = px.treemap(dfWeek, path=["Ticker", "Asset Change", "Price Change %"],
values="Asset Change", color="Price Change %", color_continuous_scale='RdBu',
color_continuous_midpoint=0, title="Weekly Changes in Stocks")
treemap.update_layout(margin=dict(t=50, l=25, r=25, b=25))
treemap.show()
How can I update the code to also get the negative asset changes displayed?
Thanks for your help!
In plotly, same as in other tools such as Power BI or Tableau, the area of each rectangle in the treemap is based on the value compared to the sum of values.
For example, values [10, 10, 5] would occupy [40%, 40%, 20%] of the area.
A negative area can't exist, so you need a workaround if it makes sense for your case:
Create a new column in the dataframe with the absolute value of Asset Change.
dfWeek["Abs Asset Change"]=dfWeek["Asset Change"].abs()
For the values use Abs Asset Change instead:
fig = px.treemap(data_frame=dfWeek,
path=["Ticker", "Asset Change", "Price Change %"],
values="Abs Asset Change",
color="Price Change %",
color_continuous_scale='RdBu',
color_continuous_midpoint=0,
title="Weekly Changes in Stocks")
I checked it with some dummy values and it worked. Let me know if you have any issues.
Related
I want my plot to retrieve data from one dataframe, but hovering over the data i want it to incorperate data from both data frames.
example:
which results from
fig = px.scatter(X_reduced_df, x='EXTRACTION_DATE_SAMPLE', y='score_IF', color= 'anomaly_class_IF', hover_data=['score_IF','delta',
'omicron',
'NH4-N (titr.)(W)_Zusatzparameter-ESRI',
'P ges. filtr. (photom.)(W) mA_Zusatzparameter-ESRI',
'BSB5 (mit Verd) DIN 1899_Zusatzparameter-ESRI',
'N org. (Ber. TKN)(W)_Zusatzparameter-ESRI'
], #range_x=['2015-12-01', '2016-01-15'],
title="Default Display with Gaps")
fig.show()
Here i want the value "delta" to be associated with additional info on "delta" from another dataframe, i.e. i want "delta= 0, add info" where add info is a list or a dataframecolumn, or similar.(its a list of names associated with a double, like as:
column name: delta
column entries
gamma: 1.2
alpha: 1.3
.
.
.
)
basically its a correlation matrix. and i want the correlations associated with each entry to be displayed.
The second dataframe, the correlation matrix is regrettably not the same columns as the original dataframe, hence not joinable. i want column names to be associated with the add info. i thought about categories, but i cannot see how that could help for a compact add info.
also i do not want to meddle with the column names(like in forcing a rename with the add info).
the plotly library only allows for one dataframe as input, right? how can i add my add info, the i way i described?
I need to display only unique values on x-axis, but it is showing all the values in a specific column of the csv-file. Any suggestions please to fix this out?
df=pd.read_csv('//media//HOTEL MANAGEMENT.csv')
df.plot('Room_Type','Charges',color='g')
plt.show()
My assumption is that you are looking to plot the result of some aggregated data. e.g. Either:
The total charges per room type, or
The average charge per room type, or
The minimum/maximum charge per room type.
If so, you could so like:
df=pd.read_csv('//media//HOTEL MANAGEMENT.csv')
# And use any of the following:
df.groupby('Room_Type')['Charges'].sum().plot(color='g')
df.groupby('Room_Type')['Charges'].mean().plot(color='g')
df.groupby('Room_Type')['Charges'].min().plot(color='g')
df.groupby('Room_Type')['Charges'].max().plot(color='g')
Seeing that the x-axis may not necesarily be sequential, a comparative bar graph could be another way to plot.
df.groupby('Room_Type')['Charges'].mean().plot.bar(color=['r','g'])
I have a 'city' column which has more than 1000 unique entries. (The entries are integers for some reason and are currently assigned float type.)
I tried df['city'].value_counts()/len(df) to get their frequences. It returned a table. The fist few values were 0.12,.4,.4,.3.....
I'm a complete beginner so I'm not sure how to use this information to assign everything in, say, the last 10 percentile to 'other'.
I want to reduce the unique city values from 1000 to something like 10, so I can later use get_dummies on this.
Let's go through the logic of expected actions:
Count frequencies for every city
Calculate the bottom 10% percentage
Find the cities with frequencies less then 10%
Change them to other
You started in the right direction. To get frequencies for every city:
city_freq = (df['city'].value_counts())/df.shape[0]
We want to find the bottom 10%. We use pandas' quantile to do it:
bottom_decile = city_freq.quantile(q=0.1)
Now bottom_decile is a float which represents the number that differs bottom 10% from the rest. Cities with frequency less then 10%:
less_freq_cities = city_freq[city_freq<=botton_decile]
less_freq_cities will hold enteries of cities. If you want to change the value of them in 'df' to "other":
df.loc[df["city"].isin(less_freq_cities.index.tolist())] = "other"
complete code:
city_freq = (df['city'].value_counts())/df.shape[0]
botton_decile = city_freq.quantile(q=0.1)
less_freq_cities = city_freq[city_freq<=botton_decile]
df.loc[df["city"].isin(less_freq_cities.index.tolist())] = "other"
This is how you replace 10% (or whatever you want, just change q param in quantile) to a value of your choice.
EDIT:
As suggested in comment, to get normalized frequency it's better use
city_freq = df['city'].value_counts(normalize=True) instead of dividing it by shape. But actually, we don't need normalized frequencies. pandas' qunatile will work even if they are not normalize. We can use:
city_freq = df['city'].value_counts() and it will still work.
I have stock data that contains the ohlc attribute and I want to make a RSI indicator plot by calculating the close value. Because the stock data is sorted by date, the date must be changed to a number using date2num. But the calculation result of the close attribute becomes a list of RSI values when plotted overlapping.
I think the length of the results of the RSI is not the same as the date length, but after I test by doing len(rsi) == len(df ['date']) show the same length. Then I try not to use the x-axis date but the list of number made by range(0, len(df['date'])) and plot show as I expected.
#get data
df = df.tail(1000)
#covert date
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(mdates.date2num)
#make indicator wit TA-Lib
rsi = ta.RSI(df['close'], timeperiod=14)
#plot rsi indicator wit TA-Lib
ax1.plot(df['date'], rsi)
ax2.plot(range(0, len(df['date'])), rsi)
#show chart
plt.show()
I expect the output using the x-axis date to be the same as the x-axis list of numbers
Image that shows the difference
It seems that matplotlib chooses the x-ticks to display (when chosen automatically) to show "round" numbers. So in your case of integers, a tick every 200; in your case of dates, every two months.
You seem to expect the dates to follow the same tick steps as the integers, but this will cause the graph to show arbitrary dates in the middle of the month, which isn't a good default behavior.
If that's the behavior you want, try something of this sort:
rng = range(len(df['date']))
ax2.plot(rng, rsi) # Same as in your example
ax2.set_xlim((rng[0], rng[-1])) # Make sure no ticks outside of range
ax2.set_xticklabels(df['date'].iloc[ax2.get_xticks()]) # Show respective dates in the locations of the integers
This behavior can of course be reversed if you wish to show numbers instead of dates, using the same ticks as the dates, but I'll leave that to you.
After I tried several times, I found the core of the problem. On weekends the data is not recorded so there is a gap on the date. The matplotlib x-axis date will be given a gap on weekends even though there is no data on that day, so the line plot will overlap.
For the solution I haven't found it, but for the time being I use the list of numbers.
I have two columns customer_id and revenue and I'm trying to figure out how to use matplotlib (or seaborn) to create a histogram/bar/column chart that has an aggregated column on the right. Everytime I change the range it just cuts off those values above my max range. Instead I want there to be a bin that is the count of instances above that max value.
For the example chart linked below, if I define my range as 0-1558, I want there be a column that counts the instances of all values $1558 and above and display that as a column.
Example Chart
Cap the values above the limit:
df[df['revenue']>limit] = limit
Now, plot the histogram.
Same concept as #DYZ, but my code ended up being:
df.ix[df.revenue > limit, 'revenue'] = limit