I have a data set with prices of 6 major stocks i.e., Google, Amazon etc.
My plan is to create a plot which would show a percent change, pct_change()of column known as close_value.
As you can see my ticker_symbol is an object. I tried and changed it to float because of the string error but then I lost all ticker names i.e. I executed returns.close_value.plot();.
How not to lose stock names while plotting?
Data display
Data info
Does this work?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Create Sample DataFrame
df1 = pd.DataFrame({'day_date': ['2020-05-28', '2020-05-27', '2020-05-26', '2020-05-22'],
'ticker_symbol': ['AAPL', 'AAPL','TSLA','TSLA'],
'close_value': [318, 400, 500, 450]})
# Convert to Timestamp format
df1['day_date'] = pd.to_datetime(df1['day_date'])
# Store % Change in new Column
df1['pct_change_close_value'] = df1['close_value'].pct_change()
# Fill null value with 0
df1['pct_change_close_value'].fillna(0, inplace = True)
# Display
display(df1)
# Check Data types of columns
display(df1.dtypes)
# Use Seaborn to plot
sns.lineplot(data = df1, x = 'day_date', y = 'pct_change_close_value', hue = 'ticker_symbol')
You just need to set hue = ticker_symbol in sns plot.
Related
In my dataset I have a categorical column named 'Type'contain(eg.,INVOICE,IPC,IP) and 'Date' column contain dates(eg,2014-02-01).
how can I plot these two.
On x axis I want date
On y axis a line for (eg.INVOCE) showing its trend
enter image description here
Not very sure what you mean by plot and show trend, one ways is to count like #QuangHoang suggested, and plot with a heatmap, something like below. If it is something different, please expand on your question.
import pandas as pd
import numpy as np
import seaborn as sns
dates = pd.date_range(start='1/1/2018', periods=5, freq='3M')[np.random.randint(0,5,20)]
type = np.random.choice(['INVOICE','IPC','IP'],20)
df = pd.DataFrame({'dates':dates ,'type':type})
tab = pd.crosstab(df['type'],df['dates'].dt.strftime('%d-%m-%Y'))
n = np.unique(tab.values)
cmap = sns.color_palette("BuGn_r",len(n))
sns.heatmap(tab,cmap=cmap)
I want to plot two bar graphs side by side using matplotlib/seaborn for two countries Covid-19 confirmed cases: Italy and India for comparison. However after trying many methods I couldn't achieve the problem. Confirmed cases of both countries are coming from two different data frames.
Data source
I want to plot 'Dates' column on x-axis and 'Confirmed cases count' on y-axis.
Attaching images of my code for reference.
P.S: I am new to data visualization and pandas too.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://raw.githubusercontent.com/datasets/covid-
19/master/data/countries-aggregated.csv', parse_dates = ['Date'])
df.head(5)
ind_cnfd = df[['Date', 'Country', 'Confirmed']]
ind_cnfd = ind_cnfd[ind_cnfd['Country']=='India']
italy_cnfd = df[['Date', 'Country', 'Confirmed']]
italy_cnfd = italy_cnfd[italy_cnfd['Country'] == 'Italy']
Expected output kind of this:
With dates on x-axis and confirmed cases on y-axis
Here's an example of what you can put together using matplotlib with seaborn. Feel free to play around with the axes settings, spacing, and so on by looking through matplotlib/seaborn documentation. Take note that I only did import matplotlib.pyplot as plt if you want to run any of this code from your notebook. I didn't use seaborn by the way.
You can optionally display the confirmed cases on a log-based y scale with the line: plt.yscale('log')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv',
parse_dates = ['Date'])
# select the Date, Country, Confirmed features from country, with reset of index
ind_cnfd = df[df.Country == 'India']
ind_cnfd = ind_cnfd[['Date', 'Confirmed']].reset_index(drop = True)
ind_cnfd = ind_cnfd.rename(columns = {'Confirmed': 'Confirmed Cases in India'})
italy_cnfd = df[df.Country == 'Italy']
italy_cnfd = italy_cnfd[['Date', 'Confirmed']].reset_index(drop = True)
italy_cnfd = italy_cnfd.rename(columns = {'Confirmed': 'Confirmed Cases in Italy'})
# combine dataframes together, turn the date column into the index
df_cnfd = pd.concat([ind_cnfd.drop(columns = 'Date'), italy_cnfd], axis = 1)
df_cnfd['Date'] = df_cnfd['Date'].dt.date
df_cnfd.set_index('Date', inplace=True)
# make a grouped bar plot time series
ax = df_cnfd.plot.bar()
# show every other tick label
for label in ax.xaxis.get_ticklabels()[::2]:
label.set_visible(False)
# add titles, axis labels
plt.suptitle("Confirmed COVID-19 Cases over Time", fontsize = 15)
plt.xlabel("Dates")
plt.ylabel("Number of Confirmed Cases")
plt.tight_layout()
# plt.yscale('log')
plt.show()
Is there any pandas way to "link" a dataframe column name with a nice description for that name?
See the following snippet where I have a dataframe with two column: the weight in kg and the height in meter of ten people.
When I create the dataframe I use this syntax
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
but I would like to "attach" in the creation of the dataframe a beautiful description for column name a and $\b_0$ some latex for column name b so that all the graph items that automatically use that names appears nice to the user (legend, tick labels, axis labels and so on).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
sz = 10
bmi = np.random.normal(25,0.1,sz)
h = np.random.normal(70*2.54/100,4*2.54/100,sz)
w = bmi*h**2
df = pd.DataFrame({'height_m':h,'weight_kg':w})
ax1 = df.plot.scatter(x='height_m',y='weight_kg')
plt.savefig('raw.png')
ax2 = df.plot.scatter(x='height_m',y='weight_kg')
ax2.set_xlabel('$h_0$, Altezza/m')
ax2.set_ylabel('$p_0$, Peso/kg')
plt.savefig('publishable.png')
plt.show()
This is the raw picture straight from pandas:
This is the picture I would like to get... but without modifying by myself the plot adding set_xlabel and set_ylabel and so on...
You can name your DataFrame correctly from the beginning and plot the dataframe accessing df.columns:
sz = 10
bmi = np.random.normal(25,0.1,sz)
h = np.random.normal(70*2.54/100,4*2.54/100,sz)
w = bmi*h**2
df = pd.DataFrame({'$h_0$, Altezza/m':h,'$p_0$, Peso/kg':w})
df.plot.scatter(x=df.columns[0], y=df.columns[1])
plt.savefig('publishable.png')
plt.show()
Plus, if you are using Jupyter Notebook / Jupyter Lab, it will convert the LaTeX correctly:
I want to reduce the xlim label because i'm using datetime information and that take long space of the xlim. The problem it's when i want to read that
So i need some like to scale that, i think
dates = pd.read_csv("EURUSDtest.csv")
dates = dates["Date"]+" " + dates["Time"]
plt.title("EUR/USD")
plt.plot(dates, data_pred)
plt.xticks(rotation="vertical")
plt.tick_params(labelsize=10)
plt.plot(forecasting)
The problem...
IIUC: You need to convert the dates column to pandas datetime type by calling pd.to_datetime.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# To reproduce the issue you have lets create a date column as string
df = pd.DataFrame({"Dates":pd.date_range(start='2018-1-1', end='2019-1-1', freq='15MIN').strftime("%m-%d-%Y %H-%M-%S")})
# Convert the date string to date type
df["Dates"] = pd.to_datetime(df["Dates"])
# Add column to assign some dummy values
df = df.assign(VAL=np.linspace(10, 110, len(df)))
# Plot the graph
# Now the graph automatically adjusts the XLIM based on the size of the graph
plt.title("eur/usd")
plt.plot(df["Dates"], df["VAL"])
plt.xticks(rotation="vertical")
plt.show()
However if you need to further control xlim to your needs you need to go through matplotlib tutorials.
I am trying to get an output from a dataframe that shows a stacked horizontal bar chart with a table to the left of it. The relevant data is as follows:
import pandas as pd
import matplotlib.pyplot as plt
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
I'd like to get something similar to what's in the following: Python Matplotlib how to get table only. I can get the stacked bar chart:
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True)
Adding in the keyword argument table=True puts a table below the chart. How do I get the axis to either display the df as a table or add one in next to the chart. Also, the DataFrame will eventually have more than one row, but if I can get it work for one then I should be able to get it to work for n rows.
Thanks in advance.
Unfortunately using the pandas.plot method you won't be able to do this. The docs for the table parameter state:
If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib’s default layout. If a Series or DataFrame is passed, use passed data to draw a table.
So you will have to use matplotlib directly to get this done. One option is to create 2 subplots; one for your table and one for your chart. Then you can add the table and modify it as you see fit.
import matplotlib.pyplot as plt
import pandas as pd
cols = ['metric','target','daily_avg','days_green','days_yellow','days_red']
vals = ['Volume',338.65,106.81,63,2,1]
OutDict = dict(zip(cols,vals))
df = pd.DataFrame(columns = cols)
df = df.append(OutDict, ignore_index = True)
fig, (ax1, ax2) = plt.subplots(1, 2)
df[['days_green','days_yellow','days_red']].plot.barh(stacked=True, ax=ax2)
ax1.table(cellText=df[['days_green','days_yellow','days_red']].values, colLabels=['days_green', 'days_yellow', 'days_red'], loc='center')
ax1.axis('off')
fig.show()