How to use matplotlib to plot line charts - python

I use pandas to read my csv file and turn two columns into arrays as independent/dependent variables respectively.
the data reading, array-turning trans and value assign
Then when I want to use matplotlib.pyplot to plot the line charts out, it turns out that 'numpy.ndarray' objects has no attribute 'find'.
import numpy as np
import matplotlib.pyplot as plt
plt.plot(x,y)

The problem is probably with your dtypes, assuming your data are in df check the df.dtypes. Columns you are trying to plot must be numeric (float, int, bool).
I guess that at least one of the columns you are plotting has object dtype, try to find out why (maybe missing values were read as some sort of string, or everything is just considered string) and convert it to correct type with astype, i.e.
df['float_col'] = df['float_col'].astype(np.float64)
Edit:
If you are trying to plot date use, make sure that dtype is actually a date i.e. datetime64[ns] and use matplotlibs dedicated method plot_date

Related

Why does not Seaborn Relplot print datetime value on x-axis?

I'm trying to solve a Kaggle Competition to get deeper into data science knowledge. I'm dealing with an issue with seaborn library. I'm trying to plot a distribution of a feature along the date but the relplot function is not able to print the datetime value. On the output, I see a big black box instead of values.
Here there is my code, for plotting:
rainfall_types = list(auser.loc[:,1:])
grid = sns.relplot(x='Date', y=rainfall_types[0], kind="line", data=auser);
grid.fig.autofmt_xdate()
Here there is the
Seaborn.relpot output and the head of my dataset
I found the error. Pratically, when you use pandas.read_csv(dataset), if your dataset contains datetime column they are parsed as object, but python read these values as 'str' (string). So when you are going to plot them, matplotlib is not able to show them correctly.
To avoid this behaviour, you should convert the datetime value into datetime object by using:
df = pandas.read_csv(dataset, parse_date='Column_Date')
In this way, we are going to indicate to pandas library that there is a date column identified by the key 'Column_Date' and it has to be converted into datetime object.
If you want, you could use the Column Date as index for your dataframe, to speed up the analyis along the time. To do it add argument index='Column_Date' at your read_csv.
I hope you will find it helpful.

ValueError: Could not interpret input 'index' when using index with seaborn lineplot

I want the use the index of a pandas DataFrame as x value for a seaborn plot. However, this raises a value error.
A small test example:
import pandas as pd
import seaborn as sns
sns.lineplot(x='index',y='test',hue='test2',data=pd.DataFrame({'test':range(9),'test2':range(9)}))
It raises:
ValueError: Could not interpret input 'index'
Is it not possible to use the index as x values? What am I doing wrong?
Python 2.7, seaborn 0.9
I would rather prefer to use it this way. You need to remove hue as I assume it has a different purpose which doesn't apply in your current DataFrame because you have a single line. Visit the official docs here for more info.
df=pd.DataFrame({'test':range(9),'test2':range(9)})
sns.lineplot(x=df.index, y='test', data=df)
Output
You would need to make sure the string you provide to the x argument is actually a column in your dataframe. The easiest solution to achieve that is to reset the index of the dataframe to convert the index to a column.
sns.lineplot(x='index', y='test', data=pd.DataFrame({'test':range(9),'test2':range(9)}).reset_index())
I know it's an old question, and maybe this wasn't around back then, but there's a much simpler way to achieve this:
If you just pass a series from a dataframe as the 'data' parameter, seaborn will automatically use the index as the x values.
sns.lineplot(data=df.column1)

Python matplotlib.dates.date2num: converting numpy array to matplotlib datetimes

I am trying to plot a custom chart with datetime axis. My understanding is that matplotlib requires a float format which is days since epoch. So, I want to convert a numpy array to the float epoch as required by matplotlib.
The datetime values are stored in a numpy array called t:
In [235]: t
Out[235]: array(['2008-12-01T00:00:59.000000000-0800',
'2008-12-01T00:00:59.000000000-0800',
'2008-12-01T00:00:59.000000000-0800',
'2008-12-01T00:09:26.000000000-0800',
'2008-12-01T00:09:41.000000000-0800'], dtype='datetime64[ns]')
Apparently, matplotlib.dates.date2num only accepts a sequence of python datetimes as input (not numpy datetimes arrays):
import matplotlib.dates as dates
plt_dates = dates.date2num(t)
raises AttributeError: 'numpy.datetime64' object has no attribute 'toordinal'
How should I resolve this issue? I hope to have a solution that works for all types of numpy.datetime like object.
My best workaround (which I am not sure to be correct) is not to use date2num at all. Instead, I try to use the following:
z = np.array([0]).astype(t.dtype)
plt_dates = (t - z)/ np.timedelta64(1,'D')
Even, if this solution is correct, it is nicer to use library functions, instead of manual adhoc workarounds.
For a quick fix, use:
import matplotlib.dates as dates
plt_dates = dates.date2num(t.to_pydatetime())
or:
import matplotlib.dates as dates
plt_dates = dates.date2num(list(t))
It seems the latest (matplotlib.__version__ '2.1.0') does not like numpy arrays... Edit: In my case, after checking the source code, the problem seems to be that the latest matplotlib.cbook cannot create an iterable from the numpy array and thinks the array is a number.
For similar but a bit more complex problems, check http://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64, possibly Why do I get "python int too large to convert to C long" errors when I use matplotlib's DateFormatter to format dates on the x axis?, and maybe matplotlib plot_date AttributeError: 'numpy.datetime64' object has no attribute 'toordinal' (if someone answers)
Edit: someone answered, his code using to_pydatetime() seems best, also: pandas 0.21.0 Timestamp compatibility issue with matplotlib, though that did not work in my case (because of python 2???)

Converting pandas dataframe to numeric; seaborn can't plot

I'm trying to create some charts using weather data, pandas, and seaborn. I'm having trouble using lmplot (or any other seaborn plot function for that matter), though. I'm being told it can't concatenate str and float objects, but I used convert_objects(convert_numeric=True) beforehand, so I'm not sure what the issue is, and when I just print the dataframe I don't see anything wrong, per se.
import numpy as np
import pandas as pd
import seaborn as sns
new.convert_objects(convert_numeric=True)
sns.lmplot("AvgSpeed", "Max5Speed", new)
Some of the examples of unwanted placeholder characters that I saw in the few non-numeric spaces just glancing through the dataset were "M", " ", "-", "null", and some other random strings. Would any of these cause a problem for convert_objects? Does seaborn know to ignore NaN? I don't know what's wrong. Thanks for the help.
You need to assign the result to itself:
new = new.convert_objects(convert_numeric=True)
See the docs
convert_objects is now deprecated as of version 0.21.0, you have to use to_numeric:
new = new.convert_objects()
if you have multiple columns:
new = new.apply(pd.to_numeric)

Matplotlib/Genfromtxt: Multiple plots against time, skipping missing data points, from .csv

I've been able to import and plot multiple columns of data against the same x axis (time) with legends, from csv files using genfromtxt as shown in this link:
Matplotlib: Import and plot multiple time series with legends direct from .csv
The above simple example works fine if all cells in the csv file contain data. However some of my cells have missing data, and some of the parameters (columns) only include data points every e.g. second or third time increment.
I want to plot all the parameters on the same time axis as previously; and if one or more data points in a column are missing, I want the plot function to skip the missing data points for that parameter and only draw lines between the points that are available for that parameter.
Further, I'm trying to find a generic solution which will automatically plot in the above style directly from the csv file for any number of columns, time points, missing data points etc., when these are not known in advance.
I've tried using the genfromtxt options missing_values and filling_values, as shown in my non-working example below; however I want to skip the missing data points rather than assign them the value '0'; and in any case with this approach I seem to get "ValueError: could not convert string to float" when missing data points are encountered.
Plotting multiple parameters against time on the same plot, whilst dealing with occasional or regularly skipped values must be a pretty common problem for the scientific community.
I'd be very grateful for any suggestions for an elegant solution using genfromtxt.
Non-working code and demo data below. Many thanks in anticipation.
Demo data: 'Data.csv':
Time,Parameter_1,Parameter_2,Parameter_3
0,10,12,11
1,20,,
2,25,23,
3,30,,30
import numpy as np
import matplotlib.pyplot as plt
arr = np.genfromtxt('DemoData.csv', delimiter=',', dtype=None, missing_values='', filling_values = 0)
names = (arr[0])
for n in range (1,len(names)):
plt.plot (arr[1:,0],arr[1:,n],label=names[n])
plt.legend()
plt.show()
I think if you set usemask =True in your genfromtxt command, it will do what you want. Probably don't want filling_values set either
arr = np.genfromtxt('DemoData.csv', delimiter=',', dtype=None, missing_values='', usemask=True)
you can then plot using something like this:
for n in range (1,len(names)):
plot(arr[1:,0][logical_not(arr[1:,n].mask)], arr[1:,n].compressed())

Categories

Resources