How to plot some months in years of dataframe - python

Using this code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.options.display.float_format = '{:.2f}'.format
a = pd.read_csv(r'C:\Users\Leonardo\Desktop\TRABALHO\dadosboias\MARINHA_TRATADO\Cabo Frio\boia_1\cabofrio.csv', na_values=['-9999.0'])
a.index = pd.to_datetime(a[['Year', 'Month', 'Day', 'Hour', 'Minute']])
pd.options.mode.chained_assignment = None
The output is something like this:
index wspd wdir gust hs
2009-06-24 15:21:00 1.4669884357700003 9.0 2.03121475722 nan
2009-06-24 16:21:00 1.4669884357700003 34.0 2.03121475722 nan
2009-06-24 17:21:00 0.677071585741 127.0 1.35414317148 nan
2009-06-24 18:21:00 0.22569052858000002 146.0 0.902762114322 nan
... ... ... ...
2013-02-10 17:21:00 nan nan nan nan
And doing a simple plotting with plt.plot(a.hs, 'r.') the output is this:
As can be seeable the dataframe has a lot of missing data in "hs" column. The main objective is to plot just the periods with data. In the image you can see that 2012-03 to 2013-3 have a lot of good data of "hs", so the objective is to plot this period and get something like this:
I Would be thankful if someone could help.

You can just select the relevant range, e.g.
a.loc['2012-03-01':'2013-03-01', 'hs'].plot()

Related

How to combine sensor data for plotting

I'm testing a light sensor for sensitivity. I now have data that I would like to plot.
The sensor has 24 levels of sensitivity
I'm only testing 0,6,12,18 and 23
On the x-axes: PWM value, range 0-65000
My goal is to plot from a dataframe using with plotly.
My question is:
How can I combine the data (as shown below) into a dataframe for plotting?
EDIT: The link to my csv files: https://filetransfer.io/data-package/QwzFzT8O
Also below: my code so far
Thanks!
def main_code():
data = pd.DataFrame(columns=['PWM','sens_00','sens_06','sens_12','sens_18','sens_23'])
sens_00 = pd.read_csv('sens_00.csv', sep=';')
sens_06 = pd.read_csv('sens_06.csv', sep=';')
sens_12 = pd.read_csv('sens_12.csv', sep=';')
sens_18 = pd.read_csv('sens_18.csv', sep=';')
sens_23 = pd.read_csv('sens_23.csv', sep=';')
print(data)
print(sens_23)
import plotly.express as px
import pandas as pd
if __name__ == '__main__':
main_code()
#Dawid's answer is fine, but it does not produce a complete dataframe (so you can do more than just plotting), and contains too much redundancy.
Below is a better way to concatenate the multiple csv files.
Then plotting is just a single call.
Reading csv files into a single dataframe:
from pathlib import Path
import pandas as pd
def read_dataframes(data_root: Path):
# It can be turned into a single line
# but keeping it more readable here
dataframes = []
for fpath in data_root.glob("*.csv"):
df = pd.read_csv(fpath, sep=";")
df = df[["pwm", "lux"]]
df = df.rename({"lux": fpath.stem}, axis="columns")
df = df.set_index("pwm")
dataframes.append(df)
return pd.concat(dataframes)
data_root = Path("data")
df = read_dataframes(data_root)
df
sens_06 sens_18 sens_12 sens_23 sens_00
pwm
100 0.00000 NaN NaN NaN NaN
200 1.36435 NaN NaN NaN NaN
300 6.06451 NaN NaN NaN NaN
400 12.60010 NaN NaN NaN NaN
500 20.03770 NaN NaN NaN NaN
... ... ... ... ... ...
64700 NaN NaN NaN NaN 5276.74
64800 NaN NaN NaN NaN 5282.29
64900 NaN NaN NaN NaN 5290.45
65000 NaN NaN NaN NaN 5296.63
65000 NaN NaN NaN NaN 5296.57
[2098 rows x 5 columns]
Plotting:
df.plot(backend="plotly") # equivalent to px.line(df)
Here is my suggestion. You have two columns in each file, and you need to use unique column names to keep both columns. All files are loaded and appended to the empty DataFrame called data. To generate a plot with all columns, you need to specify it by fig.add_scatter. The code:
import pandas as pd
import plotly.express as px
def main_code():
data = pd.DataFrame()
for filename in ['sens_00', 'sens_06', 'sens_12', 'sens_18', 'sens_23']:
data[['{}-PWM'.format(filename), '{}-LUX'.format(filename)]] = pd.read_csv('{}.csv'.format(filename), sep=';')
print(data)
fig = px.line(data_frame=data, x=data['sens_00-PWM'], y=data['sens_00-LUX'])
for filename in ['sens_06', 'sens_12', 'sens_18', 'sens_23']:
fig.add_scatter(x=data['{}-PWM'.format(filename)], y=data['{}-LUX'.format(filename)], mode='lines')
fig.show()
if __name__ == '__main__':
main_code()
Based on the suggestion by #Dawid
This is what I was going for.

Python Seaborn Lineplot

I am new to Python and have a question regarding a lineplot.
I have a data set which I would like to display as a Seaborn lineplot.
In this dataset I have 3 categories which should be on the Y axis. I have no data for an X axis, but I want to use the index.
Unfortunately I did not get it right. I would like to use it like the Excel picture.
The columns are also of different lengths.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("Testdata.csv", delimiter= ";")
df
Double Single Triple
0 50.579652 24.498143 60.954680
1 53.313919 24.497490 60.494626
2 54.174343 24.490651 60.052566
3 56.622435 24.485605 59.622501
4 59.656155 26.201791 59.199581
... ... ... ...
410 NaN NaN 75.478118
411 NaN NaN 73.780804
412 NaN NaN 72.716096
413 NaN NaN 72.468472
414 NaN NaN 71.179819
How do I do that?
I appreciate your help.
First melt your columns and then use hue parameter to plot each line:
fig, ax = pyplot.subplots(figsize=(10, 10))
ax =seaborn.lineplot(
data= df.melt(id_vars='index').rename(columns=str.title),
x= 'index',
y= 'value',
hue='varaible'
)

Python Pandas totals and dates

Im sorry for not posting the data but it wouldn't really help. The thing is a need to make a graph and I have a csv file full of information organised by date. It has 'Cases' 'Deaths' 'Recoveries' 'Critical' 'Hospitalized' 'States' as categories. It goes in order by date and has the amount of cases,deaths,recoveries per day of each state. How do I sum this categories to make a graph that shows how the total is increasing? I really have no idea how to start so I can't post my data. Below are some numbers that try to explain what I have.
0 2020-02-20 1 Andalucía NaN NaN NaN
1 2020-02-20 2 Aragón NaN NaN NaN
2 2020-02-20 3 Asturias NaN NaN NaN
3 2020-02-20 4 Baleares 1.0 NaN NaN
4 2020-02-20 5 Canarias 1.0 NaN NaN
.. ... ... ... ... ... ...
888 2020-04-06 19 Melilla 92.0 40.0 3.0
889 2020-04-06 14 Murcia 1283.0 500.0 84.0
890 2020-04-06 15 Navarra 3355.0 1488.0 124.0
891 2020-04-06 16 País Vasco 9021.0 4856.0 417.0
892 2020-04-06 17 La Rioja 2846.0 918.0 66.0
It's unclear exactly what you mean by "sum this categories". I'm assuming you mean that for each date, you want to sum the values across all different regions to come up with the total values for Spain?
In which case, you will want to groupby date, then .sum() the columns (you can drop the States category.
grouped_df = df.groupby("date")["Cases", "Deaths", ...].sum()
grouped_df.set_index("date").plot()
This snippet will probably not work directly, you may need to reformat the dates etc. But should be enough to get you started.
I think you are looking for groupby followed by a cumsum not including dates.
columns_to_group = ['Cases', 'Deaths',
'Recoveries', 'Critical', 'Hospitalized', 'date']
new_columns = ['Cases_sum', 'Deaths_sum',
'Recoveries_sum', 'Critical_sum', 'Hospitalized_sum']
df_grouped = df[columns_to_group].groupby('date').sum().reset_index()
For plotting seaborn provides an easy functions:
import seaborn as sns
df_melted = df_grouped.melt(id_vars=["date"])
sns.lineplot(data=df_melted, x='date', y = 'value', hue='variable')

Plot the graph excluding missing values in pandas or matplotlib

I am new to time-series programming with Pandas.
Here is the sample data:
date 2 3 4 ShiftedPrice
0 2017-11-05 09:20:01.134 2123.0 12.23 34.12 300.0
1 2017-11-05 09:20:01.789 2133.0 32.43 45.62 330.0
2 2017-11-05 09:20:02.238 2423.0 35.43 55.62 NaN
3 2017-11-05 09:20:02.567 3423.0 65.43 56.62 NaN
4 2017-11-05 09:20:02.948 2463.0 45.43 58.62 NaN
I would like to plot date vs ShiftedPrice for all pairs where there is no missing values for ShiftedPrice column. You can assume that there will absolutely be no missing values in data column. So please do help me in this.
You can first remove NaNs rows by dropna:
df = df.dropna(subset=['ShiftedPrice'])
And then plot:
df.plot(x='date', y='ShiftedPrice')
All together:
df.dropna(subset=['ShiftedPrice']).plot(x='date', y='ShiftedPrice')

Remove interpolation Time series plot for missing values

I'm trying to plot a time series data but I have some problems.
I'm using this code:
from matplotlib import pyplot as plt
plt.figure('Fig')
plt.plot(data.index,data.Colum,'g', linewidth=2.0,label='Data')
And I get this:
But I dont want the interpolation between missing values!
How can I achieve this?
Since you are using pandas you could do something like this:
import pandas as pd
import matplotlib.pyplot as plt
pd.np.random.seed(1234)
idx = pd.date_range(end=datetime.today().date(), periods=10, freq='D')
vals = pd.Series(pd.np.random.randint(1, 10, size=idx.size), index=idx)
vals.iloc[4:8] = pd.np.nan
print vals
Here is an example of a column from a DataFrame with DatetimeIndex
2016-03-29 4.0
2016-03-30 7.0
2016-03-31 6.0
2016-04-01 5.0
2016-04-02 NaN
2016-04-03 NaN
2016-04-04 NaN
2016-04-05 NaN
2016-04-06 9.0
2016-04-07 1.0
Freq: D, dtype: float64
To plot it without dates where data is NaN you could do something like this:
fig, ax = plt.subplots()
ax.plot(range(vals.dropna().size), vals.dropna())
ax.set_xticklabels(vals.dropna().index.date.tolist());
fig.autofmt_xdate()
Which should produce a plot like this:
The trick here is to replace the dates with some range of values that do not trigger matplotlib's internal date processing when you call .plot method.
Later, when the plotting is done, replace the ticklabels with actual dates.
Optionally, call .autofmt_xdate() to make labels readable.

Categories

Resources