could not convert string to float: '12-31' - python

I need to plot 2 lines with minimum and maximum temperature per day
My dataframe looks like this:
Date min max min2015 max2015
0 01-01 -160 156 -133 11
1 01-02 -267 139 -122 39
2 01-03 -267 133 -67 39
3 01-04 -261 106 -88 44
The date column I formatted with day and month only, not year, the reason is that its the MIX and MIX of temperatures between 2004-2014, thats why the year is not present.
so I tried to plot like this:
fig, ax = plt.subplots(figsize=(10, 6))
axb = ax.twinx()
# Same as above
ax.set_xlabel('Date')
ax.set_ylabel('Temp')
ax.set_title('Min and Max temperature 2004-2014')
ax.grid(True)
# Plotting on the first y-axis
ax.plot(new_df.Date, new_df['min'], color='tab:orange', label='Min')
ax.plot(new_df.Date, new_df['max'], color='tab:olive', label='Max')
But I get this:
ValueError: could not convert string to float: '12-31'

the 'plot' function can not identity string type, you can use 'plot_date' function instead.
plt.xlabel('Date')
plt.ylabel('Temp')
plt.title('Min and Max temperature 2004-2014')
plt.grid(True)
plt.plot_date(['01-01', '01-02', '01-03'], [13, 15, 12], color='tab:orange', label='Min')
plt.show()
then you can get a picture like
this

Related

How to scatterplot 2 list of arrays with owns colors?

I have the following variables:
data = ['10 20 10 36 30 33',
'100 50 50 30 60 27 70 24',
'300 1000 80 21 90 18 100 15 110 12 120 9',
'30 90 130 6 140 3']
data = [e.split() for e in data]
time = [np.array((time[2::2]), dtype=int) for time in data]
concentration = [(np.array((concentration[3::2]), dtype=int)) for concentration in data]
I want to plot the variables time(x-value) and concentration(y-value) in a scatterplot diagram,
and the where each personson concentration have a different colour. The time and concentration variables are list of arrays, where each array in the list represents a new person.
So i did the following code to plot my variable:
plt.scatter(time, concentration, color = 'plt.rainbow')
plt.title('Concentration pr. time')
plt.legend(loc='upper right')
plt.xlabel('Time')
plt.ylabel('Concentration')
plt.grid(True)
plt.show()
But this code does not work, i get the following error:
ValueError: 'color' kwarg must be an color or sequence of color specs. For a sequence of values to be color-mapped, use the 'c' argument instead.
How do i plot my arrays, with their own colors.

plot data with different scale on same y axis on subplots

I have a dataframe with variable scale data, I am trying to get a plot with subplots. something like this.
raw_data = {'strike_date': ['2019-10-31', '2019-11-31','2019-12-31','2020-01-31', '2020-02-31'],
'strike': [100.00, 113.00, 125.00, 126.00, 135.00],
'lastPrice': [42, 32, 36, 18, 23],
'volume': [4, 24, 31, 2, 3],
'openInterest': [166, 0, 0, 62, 12]}
ploty_df = pd.DataFrame(raw_data, columns = ['strike_date', 'strike', 'lastPrice', 'volume', 'openInterest'])
ploty_df
strike_date strike lastPrice volume openInterest
0 2019-10-31 100.0 42 4 166
1 2019-11-31 113.0 32 24 0
2 2019-12-31 125.0 36 31 0
3 2020-01-31 126.0 18 2 62
4 2020-02-31 135.0 23 3 12
this is what I tried so far with a twinx, if you noticed the out put is a flat data without any scale difference for strike and volume.
fig, ax = plt.subplots()
fig.subplots_adjust(right=0.75)
mm = ax.twinx()
yy = ax.twinx()
for col in ploty_df.columns:
mm.plot(ploty_df.index,ploty_df[[col]],label=col)
mm.set_ylabel('volume')
yy.set_ylabel('strike')
yy.spines["right"].set_position(("axes", 1.2))
yy.set_ylim(mm.get_ylim()[0]*12, mm.get_ylim()[1]*12)
plt.tick_params(axis='both', which='major', labelsize=16)
handles, labels = mm.get_legend_handles_labels()
mm.legend(fontsize=14, loc=6)
plt.show()
and the output
the main problem with your script is that you are generating 3 axes but only plotting on one of them, you need to think of each axes as a separate object with its own y-scale, y-limit and so. So for example in your script when you call fig, ax = plt.subplots() you generate the first axes that you call ax (this is the standard yaxis with the scale on the left-side of your plot). If you want to plot something on this axes you should call ax.plot() but in your case you are plotting everything on the axes that you called mm.
I think you should really go through the matplotlib documentation do understand these concepts better. For plotting on multiple y-axis I would recommend you to have a look at this example.
Below you can find a basic example to plot your data on 3 different y-axis, you can take it as a starting point to produce the graph you are looking for.
#convert the index of your dataframe to datetime
plot_df.index=pd.DatetimeIndex(plot_df.strike_date)
fig, ax = plt.subplots(figsize=(15,7))
fig.subplots_adjust(right=0.75)
l1,=ax.plot(plot_df['strike'],'r')
ax.set_ylabel('Stike')
ax2=ax.twinx()
l2,=ax2.plot(plot_df['lastPrice'],'g')
ax2.set_ylabel('lastPrice')
ax3=ax.twinx()
l3,=ax3.plot(plot_df['volume'],'b')
ax3.set_ylabel('volume')
ax3.spines["right"].set_position(("axes", 1.2))
ax3.spines["right"].set_visible(True)
ax.legend((l1,l2,l3),('Stike','lastPrice','volume'),loc='center left')
here the result:
p.s. Your example dataframe contains non existing dates (31st February 2020) so you have to modify those in order to be able to convert the index to datetime.

Labels at the end of curves (matplotlib-seaborn) [duplicate]

This question already has answers here:
How to annotate end of lines using python and matplotlib?
(3 answers)
Closed 4 years ago.
I have multiple data frames in this format:
year count cum_sum
2001 5 5
2002 15 20
2003 14 34
2004 21 55
2005 44 99
2006 37 136
2007 55 191
2008 69 260
2009 133 393
2010 94 487
2011 133 620
2012 141 761
2013 206 967
2014 243 1210
2015 336 1546
2016 278 1824
2017 285 2109
2018 178 2287
I have generated a plot as the followig:
enter image description here
The following code has been utilized for this purpose:
fig, ax = plt.subplots(figsize=(12,8))
sns.pointplot(x="year", y="cum_sum", data=china_papers_by_year_sorted, color='red')
sns.pointplot(x="year", y="cum_sum", data=usa_papers_by_year_sorted, color='blue')
sns.pointplot(x="year", y="cum_sum", data=korea_papers_by_year_sorted, color='lightblue')
sns.pointplot(x="year", y="cum_sum", data=japan_papers_by_year_sorted, color='yellow')
sns.pointplot(x="year", y="cum_sum", data=brazil_papers_by_year_sorted, color='green')
ax.set_ylim([0,2000])
ax.set_ylabel("Cumulative frequency")
fig.text(x = 0.91, y = 0.76, s = "China", color = "red", weight = "bold") #Here I have had to indicate manually x and y coordinates
fig.text(x = 0.91, y = 0.72, s = "South Korea", color = "lightblue", weight = "bold") #Here I have had to indicate manually x and y coordinates
plt.show()
The problem is that the method for adding text to the plot is not recognizing the data coordinates. So, I have had to manually indicate the coordinates of the labels of each dataframe (please see "China" and "Korea"). Is there a clever way of doing it? I have seen an example using ".last_valid_index()" method. However, since the data coordinates are not being recognized, it is not working.
You don't need to make repeated calls to pointplot and add labels manually. Instead add a country column to your data frames to indicate the country, combine the data frames and then simply plot cumulative sum vs year using country as the hue.
Instead, do the following:
# Add a country label to dataframe itself
china_papers_by_year_sorted['country'] = 'China'
usa_papers_by_year_sorted['country'] = 'USA'
korea_papers_by_year_sorted['country'] = 'Korea'
japan_papers_by_year_sorted['country'] = 'Japan'
brazil_papers_by_year_sorted['country'] = 'Brazil'
# List of dataframes with same columns
frames = [china_papers_by_year_sorted, usa_papers_by_year_sorted,
korea_papers_by_year_sorted, japan_papers_by_year_sorted,
brazil_papers_by_year_sorted]
# Combine into one dataframe
result = pd.concat(frames)
# Plot.. hue will make country name a label
ax = sns.pointplot(x="year", y="cum_sum", hue="country", data=result)
ax.set_ylim([0,2000])
ax.set_ylabel("Cumulative frequency")
plt.show()
Edit: Editing to add that if you want to annotate the lines themselves instead of using the legend, the answers to this existing question indicate how to annotate end of lines.

Plotting graph with categorical axes

I have the following dataframe, which I am aiming to plot both max data and min data on the same graph, using Month_Day as x-axis, but only printing 'Jan', 'Feb', 'Mar', etc...
Month_Day max min
0 Jan-01 243 86
1 Jan-02 230 90
2 Jan-03 233 104
3 Jan-04 220 73
4 Jan-05 224 71
but once I include the dates, it poped an error.
dates = pd.date_range('1/1/2015','31/12/2015', freq='D')
plt.plot(tmax, '-r', tmin, '-b')
#plt.plot(dates, tmax, '-r', dates, tmin, '-b') <- this is the line i plot dates as axis
plt.fill_between(range(len(tmin)), tmin, tmax, facecolor='gray', alpha=0.25)
plt.grid(True)
gives the error:
error: ordinal must be >= 1
You could use xaxis.set_major_formatter().
Here's a simple example of this:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(days=i) for i in range(180)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
p1 = plt.subplot(211)
p1.xaxis.set_major_formatter(mdate.DateFormatter('%b', None))
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Output

Scatter plot with custom ticks

I want to do a scatter plot of a wavelength (float) in y-axis and spectral class (list of character/string) in x-axis, labels = ['B','A','F','G','K','M']. Data are saved in pandas dataframe, df.
df['Spec Type Index']
0 NaN
1 A
2 G
. .
. .
167 K
168 Nan
169 G
Then,
df['Disk Major Axis "']
0 4.30
1 4.50
2 22.00
. .
. .
167 1.32
168 0.28
169 25.00
Thus, I thought this should be done simply with
plt.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
But I get this annoying error
could not convert string to float: 'G'
After fixing this, I want to make custom xticks as follows. However, how can I
labels = ['B','A','F','G','K','M']
ticks = np.arange(len(labels))
plt.xticks(ticks, labels)
First, I think you have to map those strings to integers then matplotlib can decide where to place those points.
labels = ['B','A','F','G','K','M']
mapping = {'B': 0,'A': 1,'F': 2,'G': 3,'K': 4,'M': 5}
df = df.replace({'Spec Type Index': mapping})
Then plot the scatter,
fig, ax = plt.subplots()
ax.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
Finally,
ax.set_xticklabels(labels)

Categories

Resources