This question already has answers here:
Remove prefix (or suffix) substring from column headers in pandas
(7 answers)
How to convert column names of a DataFrame from string to integers
(1 answer)
Rotate pandas DataFrame 90 degrees
(1 answer)
matplotlib large set of colors for plots
(1 answer)
How to plot multiple pandas columns
(3 answers)
Closed 7 months ago.
I have a Pandas DataFrame of measurements:
,Fp076,Fp084,Fp092,Fp099,Fp107,Fp115,Fp122,Fp130,Fp143,Fp151,Fp158,Fp166,Fp174,Fp181,Fp189,Fp197,Fp204,Fp212,Fp220,Fp227
0,0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147
1,-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467
2,0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189
3,2.6838,2.394591,2.493416,0.874906,2.113343,1.812258,1.667047,1.779347,1.515663,1.620196,1.539494,1.63528,1.555373,1.471318,1.610067,1.507087,1.467174,1.458346,1.681998,1.14625
4,0.368415,0.435004,0.155035,0.161064,0.180133,0.202117,0.142981,0.138321,0.122557,0.099213,0.098213,0.062174,0.123664,0.2051,0.167415,0.185133,0.127677,0.037875,0.156252,0.015579
5,0.213577,0.187244,0.274151,0.173572,0.296122,0.308341,0.164578,0.159559,0.318383,0.181329,0.260223,0.257395,0.241779,0.292731,0.244476,0.187523,0.247331,0.293338,0.323894,0.179478
6,0.096093,0.140454,0.067185,6.441058,0.016797,0.141757,0.181792,0.13692,0.204091,0.180182,0.149626,0.220342,0.179286,0.276316,0.104531,0.20343,0.045161,-0.004546,0.045833,0.193849
7,0.286467,0.086673,-0.106538,-0.261802,0.16964,0.182858,0.062774,0.20471,0.040105,0.086975,0.211068,0.182423,0.098721,0.077085,0.102986,0.129935,0.130571,0.176024,0.154079,0.102391
8,0.480631,0.714554,0.858241,0.746666,0.555411,0.452689,0.337912,0.333942,0.269359,0.221312,0.09818,0.226218,0.287361,0.209858,0.222951,0.207584,0.258397,0.026713,0.162048,0.149924
9,1.055405,0.638777,0.468793,0.41544,0.559187,0.471218,0.493805,0.544716,0.412903,0.412182,0.51041,0.383991,0.351397,0.383201,0.368308,0.237954,0.330242,0.262648,0.425204,0.434928
10,1.116658,0.737544,0.854376,-0.004434,0.419419,0.35921,0.377095,0.273815,0.258913,0.290614,0.271843,0.321572,0.234764,0.298931,0.206039,0.192746,0.200727,0.132419,0.229914,0.159857
11,-0.004305,0.052289,0.275035,-0.849414,0.104146,0.185819,0.128376,0.136433,0.091787,0.149753,0.107246,0.081407,0.118816,0.117434,0.169153,0.108273,0.205751,0.145238,0.153086,0.114278
12,0.836223,0.323901,0.269564,0.364082,0.343695,0.386785,0.24881,0.307267,0.222634,0.214189,0.12167,0.251107,0.134083,0.284545,0.175479,0.221877,0.184749,0.225089,0.205388,0.214972
where each row is the flux measurements at the frequencies in the header (76, 84, 92, 99... MHz). I'm trying to plot a line graph of the flux measurements for a row. Since the frequencies in the header are not linear, I've tried this:
f = np.array([76,84,92,99,107,115,122,130,143,151,158,166,174,181,189,197,204,212,220,227])
y1 = [0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147]
y2 = [-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467]
y3 = [0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189]
fig, ax = plt.subplots()
ax.scatter(f, y1, label = r'$\alpha = -0.37$')
ax.plot(f, y1)
ax.scatter(f, y2, label = r'$\alpha = NaN$')
ax.plot(f, y2)
ax.scatter(f, y3, label = r'$\alpha = -0.75$')
ax.plot(f, y3)
ax.set_xlabel('Frequency (MHz)')
ax.set_ylabel('Flux (Jy/beam)')
ax.grid(which = 'both', axis = 'both')
which is just copy-pasting the first three rows of data, to produce:
That's basically what I want, but what's a better way to do it?
There are many ways to solve this problem, but the simplest way (that I can think of) is to pivot your dataframe and then use seaborn to plot all the columns
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# convert you sample data
data = [[e for e in row.split(',') if e] for row in data_.split("\n") if row]
columns = data[0]
# create the `x` axis
columns = [int(col.replace('Fp','')) for col in columns]
columns = ['index'] + columns
data = data[1:]
df = pd.DataFrame(data=data, columns=columns)
df = df.drop(columns=['index'])
df = df.astype('float')
This is the example of the dataframe without transforming the headers with int(col.replace('Fp',''))
you can transform your columns as I did above using
df.columns = [int(col.replace('Fp','')) for col in df.columns]
Once this is done you can do the following pivot
# the pivot of your data
df_ = df.T
# plot your data
plt.figure(figsize=(15,8))
sns.lineplot(data=df_)
plt.title('Example of timeseries plot')
plt.xlabel('Frequency(MHz)')
plt.ylabel('Flux (Jy/beam)')
the output is
You can play around with the various plotting to your desire, but this would be the simplest way (tip - try to leverage as much of the seaborn or pandas plotting methods for this aggregated plots)
This question already has answers here:
Plotting a time series?
(2 answers)
How to draw vertical lines on a given plot
(6 answers)
Closed 1 year ago.
I am trying to plot dates (N=50) on a 5 year time series chart and I'm having trouble trying to figure out how to run through an iteration on a for loop. Below is an example of what I'm trying to plot the dates on.
Visual of what I'm plotting dates on
Currently, I am trying:
for date in dataframe_with_dates.DATE:
plt.axvline(x = date, color = 'g')
plt.show()
and I'm receiving an error of:
Failed to convert value(s) to axis units: 'DATE'
I'm not sure if this has something to do with the dtype being datetime, or if I need to try another approach, but any advice/guidance is greatly appreciated!
Thank you!
This is what I am trying to accomplish: Example image
EDIT: Code to produce the plot
def plot_df(df_1, x, y, title = '', xlabel = 'DATE', ylabel = 'VALUE', dpi = 100):
plt.figure(figsize = (25,5), dpi = dpi)
plt.plot(x, y, color = 'tab:red')
plt.gca().set(title = title, xlabel = xlabel, ylabel = ylabel)
plt.show()
plot_df(df_VIX, x = df_VIX.DATE, y = df_VIX.AVG_VALUE, title = 'Daily VIX since 1990')
`
data_test = [['2016-01-04', 22.48, 23.36, 20.67, 20.70, 21.8025],
['2016-01-05', 20.75, 21.06, 19.25, 19.34, 20.1],
['2016-01-06', 21.67, 21.86, 19.8, 20.59, 20.98],
['2016-01-07', 23.22, 25.86, 22.4, 24.99, 24.1175],
['2016-01-08', 22.96, 25.86, 22.40, 24.99, 24.89]]
df_test = pd.DataFrame(data_test, columns = ['DATE','OPEN','HIGH','LOW','CLOSE', 'AVG_VALUE'])
df_test['DATE'] = pd.to_datetime(df_test['DATE'])
This will reproduce a sample of the exact data I'm using.
I think this is what you want:
df_test.plot(x='DATE', y='OPEN')
Or replace y='OPEN' with another column to plot. The x-axis will be formatted automatically by pandas to be similar to what you showed in the figure.