Making separate plots with unique identifiers in Python using CSV file - python

I have a CSV file where one column has a unique identifier (a,b,c...) and I would like to make separate plots based on this identifier (so a separate line on the same graph for a,b and so forth).
SSID Time RSSI
0 a 13:14:42 -33
1 a 13:14:46 -30
2 a 13:14:49 -31
3 a 13:14:52 -31
4 a 13:14:55 -35
.. ... ... ...
64 b 13:15:43 -58
65 b 13:15:46 -56
66 b 13:15:50 -65
67 b 13:15:53 -52
68 b 13:15:57 -65
What I've written plots every point together in one line, but how can I plot them on the same graph, but have them separated based on the identifier?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
temp = np.genfromtxt('file.csv', delimiter=',')
plt.figure()
plt.plot(temp)
plt.show
Thank you!

reshape so that SSID are columns
simple pandas plot()
df = pd.read_csv(io.StringIO(""" SSID Time RSSI
0 a 13:14:42 -33
1 a 13:14:46 -30
2 a 13:14:49 -31
3 a 13:14:52 -31
4 a 13:14:55 -35
64 b 13:15:43 -58
65 b 13:15:46 -56
66 b 13:15:50 -65
67 b 13:15:53 -52
68 b 13:15:57 -65"""), sep="\s+")
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, figsize=[10,6])
df.set_index(["SSID","Time"]).unstack(0).droplevel(0,1).plot(ax=ax)

Related

Seaborn figure with multiple axis (year) and month on x-axis

I try to become warm with seaborn. I want to create one or both of that figures (bar plot & line plot). You see 12 months on the X-axis and 3 years each one with its own line or bar color.
That is the data creating script including the data in comments.
#!/usr/bin/env python3
import random as rd
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
rd.seed(0)
a = pd.DataFrame({
'Y': [2016]*12 + [2017]*12 + [2018]*12,
'M': list(range(1, 13)) * 3,
'n': rd.choices(range(100), k=36)
})
print(a)
# Y M n
# 0 2016 1 84
# 1 2016 2 75
# 2 2016 3 42
# ...
# 21 2017 10 72
# 22 2017 11 89
# 23 2017 12 68
# 24 2018 1 47
# 25 2018 2 10
# ...
# 34 2018 11 54
# 35 2018 12 1
b = a.pivot_table(columns='M', index='Y')
print(b)
# n
# M 1 2 3 4 5 6 7 8 9 10 11 12
# Y
# 2016 84 75 42 25 51 40 78 30 47 58 90 50
# 2017 28 75 61 25 90 98 81 90 31 72 89 68
# 2018 47 10 43 61 91 96 47 86 26 80 54 1
I'm even not sure which form (a or b or something elese) of a dataframe I should use here.
What I tried
I assume in seaboarn speech it is a countplot() I want. Maybe I am wrong?
>>> sns.countplot(data=a)
<AxesSubplot:ylabel='count'>
>>> plt.show()
The result is senseless
I don't know how I could add the pivoted dataframe b to seaborn.
You could do the first plot with a relplot, using hue as a categorical grouping variable:
sns.relplot(data=a, x='M', y='n', hue='Y', kind='line')
I'd use these colour and size settings to make it more similar to the plot you wanted:
sns.relplot(data=a, x='M', y='n', hue='Y', kind='line', palette='pastel', height=3, aspect=3)
The equivalent axes-level code would be sns.lineplot(data=a, x='M', y='n', hue='Y', palette='pastel')
Your second can be done with catplot:
sns.catplot(kind='bar', data=a, x='M', y='n', hue='Y')
Or the axes-level function sns.barplot. In that case let's move the default legend location:
sns.barplot(data=a, x='M', y='n', hue='Y')
plt.legend(bbox_to_anchor=(1.05, 1))

dates from csv files , how can i graph it

I am new in using python.
I am trying to graph 2 variables in Y1 and Y2 (secondary y axis) , and the date in the x axis from a csv file.
I think my main problem is with converting the date in csv.
Moreover is it possible to save the 3 graphs according to the ID (A,B,C)... Thanks a lot.
I added the CSV file that I have and an image of the figure that i am looking for.
Thanks a lot for your advice
ID date Y1 Y2
A 40480 136 83
A 41234 173 23
A 41395 180 29
A 41458 124 60
A 41861 158 27
A 42441 152 26
A 43009 155 51
A 43198 154 38
B 40409 185 71
B 40612 156 36
B 40628 165 39
B 40989 139 77
B 41346 138 20
B 41558 132 85
B 41872 157 58
B 41992 120 91
B 42245 139 43
B 42397 131 34
B 42745 114 68
C 40711 110 68
C 40837 156 38
C 40946 110 63
C 41186 161 46
C 41243 187 20
C 41494 122 55
C 41970 103 19
C 42183 148 78
C 42247 115 33
C 42435 132 92
C 42720 187 43
C 43228 127 28
C 43426 183 45
Try the matplotlib library, if i understood right, it should work.
from mpl_toolkits import mplot3d
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection='3d')
Data for a three-dimensional line
zaxis = y1
xaxis = date
yaxis = y2
ax.plot3D(xaxis, yaxis, zaxis, 'red')
Data for three-dimensional scattered points
zdat = y1
xdat = date
ydat = y2
ax.scatter3D(xdat, ydat, zdat, c=xdat, cmap='Greens')
If I understand you correctly, you are looking for three separate graphs for ID=A, ID=B, ID=C. Here is how you could get that:
import pandas as pd
import pylab as plt
data = pd.read_csv('data.dat', sep='\t') # read your datafile, you might have a different name here
for i, (label, subset) in enumerate(data.groupby('ID')):
plt.subplot(131+i)
plt.plot(subset['date'], subset['Y1'])
plt.plot(subset['date'], subset['Y2'], 'o')
plt.title('ID: {}'.format(label))
plt.show()
Note that this treats your dates as integers (same as in the datafile).

How to create contours over points with Basemap?

Having a table "tempcc" of value with x,y geografic coords (don't know attaching files here, there is 86 rows in my csv):
X Y Temp
0 35.268 55.618 1.065389
1 35.230 55.682 1.119160
2 35.508 55.690 1.026214
3 35.482 55.652 1.007834
4 35.289 55.664 1.087598
5 35.239 55.655 1.099459
6 35.345 55.662 1.066117
7 35.402 55.649 1.035958
8 35.506 55.643 0.991939
9 35.526 55.688 1.018137
10 35.541 55.695 1.017870
11 35.471 55.682 1.033929
12 35.573 55.668 0.985559
13 35.547 55.651 0.982335
14 35.425 55.671 1.042975
15 35.505 55.675 1.016236
16 35.600 55.681 0.985532
17 35.458 55.717 1.063691
18 35.538 55.720 1.037523
19 35.230 55.726 1.146047
20 35.606 55.707 1.003364
21 35.582 55.700 1.006711
22 35.350 55.696 1.087173
23 35.309 55.677 1.088988
24 35.563 55.687 1.003785
25 35.510 55.764 1.079220
26 35.334 55.736 1.119026
27 35.429 55.745 1.093300
28 35.366 55.752 1.119061
29 35.501 55.745 1.068676
.. ... ... ...
56 35.472 55.800 1.117183
57 35.538 55.855 1.134721
58 35.507 55.834 1.129712
59 35.256 55.845 1.211969
60 35.338 55.823 1.174397
61 35.404 55.835 1.162387
62 35.460 55.826 1.138965
63 35.497 55.831 1.130774
64 35.469 55.844 1.148516
65 35.371 55.510 0.945187
66 35.378 55.545 0.969400
67 35.456 55.502 0.902285
68 35.429 55.517 0.925932
69 35.367 55.710 1.090652
70 35.431 55.490 0.903296
71 35.284 55.606 1.051335
72 35.234 55.634 1.088135
73 35.284 55.591 1.041181
74 35.354 55.587 1.010446
75 35.332 55.581 1.015004
76 35.356 55.606 1.023234
77 35.311 55.545 0.997468
78 35.307 55.575 1.020845
79 35.363 55.645 1.047831
80 35.401 55.628 1.021373
81 35.340 55.629 1.045491
82 35.440 55.643 1.017227
83 35.293 55.630 1.063910
84 35.370 55.623 1.029797
85 35.238 55.601 1.065699
I try to create isolines with:
from numpy import meshgrid,linspace
data=tempcc
m = Basemap(lat_0 = np.mean(tempcc['Y'].values),\
lon_0 = np.mean(tempcc['X'].values),\
llcrnrlon=35,llcrnrlat=55.3, \
urcrnrlon=35.9, urcrnrlat=56.0, resolution='l')
x = linspace(m.llcrnrlon, m.urcrnrlon, data.shape[1])
y = linspace(m.llcrnrlat, m.urcrnrlat, data.shape[0])
xx, yy = meshgrid(x, y)
m.contour(xx, yy, data,latlon=True)
#pt.legend()
m.scatter(tempcc['X'].values, tempcc['Y'].values, latlon=True)
#m.contour(x,y,data,latlon=True)
But I can't manage correctly, although everything seems to be fine. As far as I understand I have to make a 2D matrix of values, where i is lat, and j is lon, but I can't find the example.
The result I get
as you see, region is correct, but interpolation is not good.
What's the matter? Which parameter have I forgotten?
You could use a Triangulation and then call tricontour() instead of contour()
import matplotlib.pyplot as plt
from matplotlib.tri import Triangulation
from mpl_toolkits.basemap import Basemap
import numpy
m = Basemap(lat_0 = np.mean(tempcc['Y'].values),
lon_0 = np.mean(tempcc['X'].values),
llcrnrlon=35,llcrnrlat=55.3,
urcrnrlon=35.9, urcrnrlat=56.0, resolution='l')
triMesh = Triangulation(tempcc['X'].values, tempcc['Y'].values)
tctr = m.tricontour(triMesh, tempcc['Temp'].values,
levels=numpy.linspace(min(tempcc['Temp'].values),
max(tempcc['Temp'].values), 7),
latlon=True)

Unable to index x-axis of bokeh line chart with timestamps

I have been trying to make a bokeh line chart, however I am running into the issue of indexing the x-axis with a column of time stamps from my pandas data frame. Currently my data frame looks like this:
TMAX TMIN TAVG DAY NUM
2007-04-30 65 46 55.5 2007-04-30 1
2007-05-01 75 45 60.0 2007-05-01 2
2007-05-02 66 52 59.0 2007-05-02 3
2007-05-03 65 43 54.0 2007-05-03 4
2007-05-04 61 45 53.0 2007-05-04 5
2007-05-05 65 43 54.0 2007-05-05 6
2007-05-06 77 51 64.0 2007-05-06 7
2007-05-07 89 66 77.5 2007-05-07 8
2007-05-08 91 56 73.5 2007-05-08 9
2007-05-09 83 48 65.5 2007-05-09 10
2007-05-10 68 47 57.5 2007-05-10 11
2007-05-11 65 46 55.5 2007-05-11 12
2007-05-12 63 43 53.0 2007-05-12 13
2007-05-13 65 46 55.5 2007-05-13 14
2007-05-14 71 46 58.5 2007-05-14 15
....
[3592 rows x 5 columns]
I want to index the line plot with the values of the "DAY" column, however, I get an error no matter the approach I take. The documentation for line plots says that "x (str or list(str), optional) – specifies variable(s) to use for x axis". My code is as follows:
xyvalues = np.array([df['TAVG'], df_reg['ry'], df['DAY']])
regr = Line(data=xyvalues, x='DAY', title="Linear Regression of Data", ylabel="Average Daily Temperature", xlabel="Number of Days")
output_file("regression.html")
show(regr)
This gives me the error "TypeError: Cannot compare type 'Timestamp' with type 'float64'". I have tried converting it to float, but it doesn't seem to have an effect. Any help would be much appreciated. The df_reg['ry'] is data from a linear regression data frame.
Documentation for line graphs can be found here: http://docs.bokeh.org/en/latest/docs/reference/charts.html#line
Inside Line, you need to pass a pandas data frame to the data argument in order to be able to refer to your variable DAY for the x axis ticks. Here I create a new pandas DataFrame from the other two:
import pandas as pd
df2 = pd.DataFrame(data=dict(TAVG=df['TAVG'], ry=df_reg['ry'], DAY=df['DAY']))
regr = Line(data=df2, x='DAY',
title="Linear Regression of Data",
ylabel="Average Daily Temperature",
xlabel="Number of Days")
output_file("regression.html")
show(regr)

Adjusting y-lim Scale in the Plot (matplotlib, pandas) to Achieve Same Scale for Both Plots

I have a dataframe which looks something like this:
AgeGroups Factor Cancer Frequency
0 00-05 B Yes 223
1 00-05 A No 108
2 00-05 A Yes 0
3 00-05 B No 6575
4 11-15 B Yes 143
5 11-15 A No 5
6 11-15 A Yes 1
7 11-15 B No 3669
8 16-20 B Yes 395
9 16-20 A No 28
10 16-20 A Yes 1
11 16-20 B No 6174
12 21-25 B Yes 624
13 21-25 A No 80
14 21-25 A Yes 2
15 21-25 B No 8173
16 26-30 B Yes 968
17 26-30 A No 110
18 26-30 A Yes 2
19 26-30 B No 9143
20 31-35 B Yes 1225
21 31-35 A No 171
22 31-35 A Yes 5
23 31-35 B No 9046
24 36-40 B Yes 1475
25 36-40 A No 338
26 36-40 A Yes 21
27 36-40 B No 8883
28 41-45 B Yes 2533
29 41-45 A No 782
.. ... ... ... ...
54 71-75 A Yes 2441
55 71-75 B No 15992
56 76-80 B Yes 4614
57 76-80 A No 5634
58 76-80 A Yes 1525
59 76-80 B No 10531
60 81-85 B Yes 1869
61 81-85 A No 2893
62 81-85 A Yes 702
63 81-85 B No 5692
64 86-90 B Yes 699
65 86-90 A No 1398
66 86-90 A Yes 239
67 86-90 B No 3081
68 91-95 B Yes 157
69 91-95 A No 350
70 91-95 A Yes 47
71 91-95 B No 1107
72 96-100 B Yes 31
73 96-100 A No 35
74 96-100 A Yes 2
75 96-100 B No 230
76 >100 B Yes 5
77 >100 A No 1
78 >100 A Yes 1
79 >100 B No 30
80 06-10 B Yes 112
81 06-10 A No 6
82 06-10 A Yes 0
83 06-10 B No 2191
with the code:
by_factor = counts.groupby(level='Factor')
k = by_factor.ngroups
fig, axes = plt.subplots(1, k, sharex=True, sharey=False, figsize=(15, 8))
for i, (gname, grp) in enumerate(by_factor):
grp.xs(gname, level='Factor').plot.bar(
stacked=True, rot=45, ax=axes[i], title=gname)
fig.tight_layout()
I got a beautiful chart, which seems like this:
This actually served what I was looking for until I realized I wanted to re-adjust my y-axis in such a way that I could have same scale for y-axis in both of the charts. If you look at the right chart 'B', the y-axis has scale of 25000 and chart 'A' has the scale of 10000. Can anyone suggest what would be the best possible approach to have same scale on both charts?.
I tried:
plt.ylim([0,25000])
which did rather nothing or didn't change anything in chart 'A' because this bascially only changes y-axis of chart 'B'.
I would highly appreciate any suggestion to achieve same scale for both plots.
Set ylim min and max values for every axis in a cycle:
for ax in axes: ax.set_ylim([0,25000])
You may of course simply call .set_ylim() on the respective axes. The drawback of this is that you would need to know the limits to set by doing so.
The following solutions do not have this requirement:
sharey
In your code you explicitely set sharey=False. If you change it to True, you get a shared yaxis. You can then use plt.ylim([0,25000]) to limit the axes, but you don't have to, since they are shared and will adjust automatically.
Minimal example:
import matplotlib.pyplot as plt
fig, (ax, ax2) = plt.subplots(ncols=2, sharex=True, sharey=True)
ax.plot([1,3,2])
ax2.plot([2,3,1])
plt.show()
As can be seen the ticklabels of the shared axes are hidden, which might be desirable in many cases.
join shared axes
Having two axes, you can make them sharing the same scale using
import matplotlib.pyplot as plt
fig, (ax, ax2) = plt.subplots(ncols=2, sharex=True, sharey=False)
ax.get_shared_x_axes().join(ax, ax2)
ax.plot([1,3,2])
ax2.plot([2,3,1])
plt.show()
Here the ticklabels stay visible. If you don't want that you can turn them off via ax2.set_yticklabels([]).
Using plt.ylim() will only adjust the axes for the last figure that was plotted. In order to change the y limits for a specific plot you need to use ax.set_ylim().
So in your case it would be
axes[0].set_ylim(0,25000)
axes[1].set_ylim(0,25000)

Categories

Resources