Set x-axis intervals(ticks) for graph of Pandas DataFrame - python

I'm trying to set the ticks (time-steps) of the x-axis on my matplotlib graph of a Pandas DataFrame. My goal is to use the first column of the DataFrame to use as the ticks, but I haven't been successful so far.
My attempts so far have included:
Attempt 1:
#See 'xticks'
data_df[header_names[1]].plot(ax=ax, title="Roehrig Shock Data", style="-o", legend=True, xticks=data_df[header_names[0]])
Attempt 2:
ax.xaxis.set_ticks(data_df[header_names[0]])
header_names is just a list of the column header names and the dataframe is as follows:
Compression Velocity Compression Force
1 0.000213 6.810879
2 0.025055 140.693200
3 0.050146 158.401500
4 0.075816 171.050200
5 0.101011 178.639500
6 0.126681 186.228800
7 0.150925 191.288300
8 0.176597 198.877500
9 0.202269 203.937000
10 0.227466 208.996500
11 0.252663 214.056000
And here is the data in CSV format:
Compression Velocity,Compression Force
0.0002126891606,6.810879
0.025055073079999997,140.6932
0.050145696,158.4015
0.07581600279999999,171.0502
0.1010109232,178.6395
0.12668120459999999,186.2288
0.1509253776,191.2883
0.1765969798,198.8775
0.2022691662,203.937
0.2274659662,208.9965
0.2526627408,214.056
And here is an implementation of reading and plotting the graph:
data_df = pd.read_csv(file).astype(float)
fig = Figure()
ax = fig.add_subplot(111)
ax.set_xlabel("Velocity (m/sec)")
ax.set_ylabel("Force (N)")
data_df[header_names[1]].plot(ax=ax, title="Roehrig Shock Data", style="-o", legend=True)
The current graph looks like:
The x-axis is currently the number of rows in the dataframe (e.g. 12) rather than the actual values within the first column.
Is there a way to use the data from the first column in the dataframe to set as the ticks/intervals/time-steps of the x-axis?

This works for me:
data_df.plot(x='Compression Velocity', y='Compression Force', xticks=d['Compression Velocity'])

Related

Categorical data visualization - scatter plot with multiple X using Pandas and Seaborn

I spent many hours looking for tips how to create categorical plot using Seaborn and Pandas having several Xs to be added on x-axis, but I have not found the solution.
For specified columns from excel (for example: S1_1, S1_2, S1_3) I would like to create one scatterplot with readings - it means for each column header 9 measurements are expected. Please refer to the image to see the data structure in excel. I was unable to find the right function.
I tried with the following code, but this is not what I wanted to achieve.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_excel("panda.xlsx")
dfx = pd.DataFrame({"CHAR": ["S1_1","S1_2","S1_3"]})
sns.stripplot(x=dfx['CHAR'],y=df['S1_1'],color='black')
sns.stripplot(x=dfx['CHAR'],y=df['S1_2'],color='black')
sns.stripplot(x=dfx['CHAR'],y=df['S1_3'],color='black')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
Expected vs obtained plot:
You're overthinking things. You don't need to call stripplot separately for each column. I generated new random data since you didn't share yours in a copy-and-pastable form, but stripplot will basically do what I think you want with a very short invocation.
> print(df)
S1 S2 S3 S4
0 0.314097 0.678525 0.228356 0.770293
1 0.207790 0.739484 0.965662 0.604426
2 0.975562 0.959384 0.088162 0.265529
3 0.616823 0.902795 0.015561 0.662020
4 0.210507 0.287713 0.660347 0.763312
5 0.763505 0.381314 0.759422 0.257578
6 0.707832 0.912063 0.774681 0.534284
7 0.996891 0.258103 0.313047 0.729142
8 0.121308 0.797310 0.286265 0.757299
> sns.stripplot(data=df[["S1", "S2", "S3"]], color='black')
> plt.xlabel("X Axis")

Multiple Seaborn Heatmaps from Pandas Dataframe

I have a Pandas dataframe that looks like this:
store_id days times rating
100 monday '1:00pm - 3:00pm' 0
100 monday '3:00pm - 6:00pm' 1
100 monday '6:00pm - 9:00pm' 2
...
store n
Where there are ~60 stores and the ratings range from 0 - 2. I would like to create a 6x5 grid Seaborn of heatmaps, with one heatmap per store. I would like for the x-axis to be the days and for the y-axis to be the times.
I tried this:
f, axes = plt.subplots(5,6)
i=0
for store in df['store_id']:
sns.heatmap(data=df[df['store_id']==store]['rating'], ax=axes[i])
i+=1
This creates the 5x6 grid, but generates an error ('Inconsistent shape between the condition and the input...'). What's the best way to do this?
For heat map, you need to transpose/pivot your data so as the days becomes columns (x-axis) and times becomes index:
f, axes = plt.subplots(5,6)
# flatten axes for looping
axes = axes.ravel()
# use groupby to extract data faster
for ax, (store, data) in zip(axes, df.groupby('store_id')):
pivot = data.pivot_table(index='times', columns='days', values='rating')
sns.heatmap(data=pivot, ax=ax)

Stacked Area Chart in Python

I'm working on an assignment from school, and have run into a snag when it comes to my stacked area chart.
The data is fairly simple: 4 columns that look similar to this:
Series id
Year
Period
Value
LNS140000
1948
M01
3.4
I'm trying to create a stacked area chart using Year as my x and Value as my y and breaking it up over Period.
#Stacked area chart still using unemployment data
x = d.Year
y = d.Value
plt.stackplot(x, y, labels = d['Period'])
plt.legend(d['Period'], loc = 'upper left')
plt.show()enter code here`
However, when I do it like this it only picks up M01 and there are M01-M12. Any thoughts on how I can make this work?
You need to preprocess your data a little before passing them to the stackplot function. I took a look at this link to work on an example that could be suitable for your case.
Since I've seen one row of your data, I add some random values to the dataset.
import pandas as pd
import matplotlib.pyplot as plt
dd=[[1948,'M01',3.4],[1948,'M02',2.5],[1948,'M03',1.6],
[1949,'M01',4.3],[1949,'M02',6.7],[1949,'M03',7.8]]
d=pd.DataFrame(dd,columns=['Year','Period','Value'])
years=d.Year.unique()
periods=d.Period.unique()
#Now group them per period, but in year sequence
d.sort_values(by='Year',inplace=True) # to ensure entire dataset is ordered
pds=[]
for p in periods:
pds.append(d[d.Period==p]['Value'].values)
plt.stackplot(years,pds,labels=periods)
plt.legend(loc='upper left')
plt.show()
Is that what you want?
So I was able to use Seaborn to help out. First I did a pivot table
df = d.pivot(index = 'Year',
columns = 'Period',
values = 'Value')
df
Then I set up seaborn
plt.style.use('seaborn')
sns.set_style("white")
sns.set_theme(style = "ticks")
df.plot.area(figsize = (20,9))
plt.title("Unemployment by Year and Month\n", fontsize = 22, loc = 'left')
plt.ylabel("Values", fontsize = 22)
plt.xlabel("Year", fontsize = 22)
It seems to me that the problem you are having relates to the formatting of the data. Look how the values are formatted in this matplotlib example. I would try to groupby the data by period, or pivot it in the correct format, and then graphing again.

Reading excel with Python Pandas and isolating columns/rows to plot

I am using Python pandas read_excel to create a histogram or line plot. I would like to read in the entire file. It is a large file and I only want to plot certain values on it. I know how to use skiprows and parse_cols in read_excel, but if I do this, it does not read a part of the file that I need to use for the axis labels. I also do not know how to tell it to plot what I want for x-values and what I want for the y-values. Heres what I have:
df=pd.read_excel('JanRain.xlsx',parse_cols="C:BD")
years=df[0]
precip=df[31:32]
df.plot.bar()
I want the x axis to be row 1 of the excel file(years) and I want each bar in the bar graph to be the values on row 31 of the excel file. Im not sure how to isolate this. Would it be easier to read with pandas then plot with matplotlib?
Here is a sample of the excel file. The first row is years and the second column is days of the month (this file is only for 1 month:
Here's how I would plot the data in row 31 of a large dataframe, setting row 0 as the x-axis. (updated answer)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
create a random array with 32 rows, and 10 columns
df = pd.DataFrame(np.random.rand(320).reshape(32,10), columns=range(64,74), index=range(1,33))
df.to_excel(r"D:\data\data.xlsx")
Read only the columns and rows that you want using "parse_cols" and "skiprows." The first column in this example is the dataframe index.
# load desired columns and rows into a dataframe
# in this method, I firse make a list of all skipped_rows
desired_cols = [0] + list(range(2,9))
skipped_rows = list(range(1,33))
skipped_rows.remove(31)
df = pd.read_excel(r"D:\data\data.xlsx", index_col=0, parse_cols=desired_cols, skiprows=skipped_rows)
Currently this yields a dataframe with only one row.
65 66 67 68 69 70 71
31 0.310933 0.606858 0.12442 0.988441 0.821966 0.213625 0.254897
isolate only the row that you want to plot, giving a pandas.Series with the original column header as the index
ser = df.loc[31, :]
Plot the series.
fig, ax = plt.subplots()
ser.plot(ax=ax)
ax.set_xlabel("year")
ax.set_ylabel("precipitation")
fig, ax = plt.subplots()
ser.plot(kind="bar", ax=ax)
ax.set_xlabel("year")
ax.set_ylabel("precipitation")

Plotting timestamps in matplotlib

I have a pandas dataframe which contains a column called "order.timestamp" - a list of timestamps for a set of occurrences.
I would like to plot these timestamps on the x-axis of a matplotlib plot and have the dates, hours, seconds etc display as I zoom in. Is this possible?
I have tried using datetime.strptime:
date_format = '%Y-%m-%dT%H:%M:%S.%fZ'
for i in range(0, len(small_data)) :
b = datetime.strptime(small_data["order.timestamp"].iloc[i],date_format)
small_data = small_data.set_value(i, "order.timestamp", b)
Which re-creates the column "order.timestamp" in my pandas dataframe. The column now contains entries like:
2017-01-01 12:50:06.902000
However, if I now try to plot as normal:
fig = plt.figure()
plt.plot(small_data["order.timestamp"], small_data["y_values"])
plt.show()
I see an error
ValueError: ordinal must be >= 1
Any help greatly appreciated!

Categories

Resources