How to make sure matplotlib line graph is correct? - python

I need to plot an accurate line graph through matplotlib but I only get a y=x graph. And the y-axis tick values are jumbled up.
import numpy as np
import matplotlib.pyplot as plt
title = "Number of Flats Constructed"
data = np.genfromtxt('C:\data/flats-constructed-by-housing-and-development-board-annual.csv',
skip_header=1,
dtype=[('year','i8'),('flats_constructed','U50')], delimiter=",",
missing_values=['na','-'],filling_values=[0])
x = data['year']
y = data['flats_constructed']
plt.title('No. of Flats Constructed over the Years')
#plt.plot(data['year'], data['flats_constructed'])
plt.plot(x, y)
plt.show()
I received a y=x graph instead of a jagged graph reflecting the values.
Actual output
Sample of expected output

Your mistake is at ('flats_constructed','U50').
Give it as ('flats_constructed','i8') itself. You read it as string when you gave U50.
from io import StringIO
import numpy as np
s = StringIO(u"1977,30498\n1978,264946\n1979,54666\n1980,54666")
data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','i8')], delimiter=",",skip_header=0)
data
plt.plot(data['myint'],data['myfloat'])
plt.show()

Related

Pandas dataframe plot 's' argument

I have the statement and I really don't understand the s= part. I know it sets the area of the plot but is it taking the data from pop_2007 and raising it to 1^6 to create the area ?
df.plot(kind='scatter', x='gdp_2007', y='lifeExp_2007', s=df['pop_2007']/1e6)
I'm trying to understand the area of a plot better and the s=
The 's' parameter in the pandas dataframe plot function is changing the size of the markers in your scatter plot. See these two outputs where I change the 's' value from 1 to 100. So right now, your plot is taking the value in the df['pop_2007'] column and dividing it by 1e6 to get your value for the marker size.
#Three lines to make our compiler able to draw:
import sys
import matplotlib
matplotlib.use('Agg')
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot(kind = 'scatter', x = 'Duration', y = 'Maxpulse', s=1)
plt.show()
#Two lines to make our compiler able to draw:
plt.savefig(sys.stdout.buffer)
sys.stdout.flush()
Plot with s=1
#Three lines to make our compiler able to draw:
import sys
import matplotlib
matplotlib.use('Agg')
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot(kind = 'scatter', x = 'Duration', y = 'Maxpulse', s=100)
plt.show()
#Two lines to make our compiler able to draw:
plt.savefig(sys.stdout.buffer)
sys.stdout.flush()
Plot with s=100
Test it out here: https://www.w3schools.com/python/pandas/trypandas.asp?filename=demo_pandas_plot_scatter2

Too many xticks in the histogam

I am trying to read data from a txt file containing 200 values and then trying to plot a histogram based on the data. However, there are too many values on the xticks, and I do not know how to fix this.
My code:
import math
import numpy as np
import matplotlib.pyplot as plt
f=open('LedData.rtf',"r")
lines=f.readlines()
result=[]
for x in lines:
result.append(x.split(',')[1])
f.close()
plt.hist(result)
plt.xticks(rotation = 'vertical')
plt.show
Histogram : Here is an image for reference.
Thanks for any help.

Displaying Colormap/legend with x,y,z plot and fourth variable

I'm using Pandas and am very new to programming. I'm plotting Energy Deposited (eDep) as a function of its x,y and z positions. So far, was successful in getting it to plot, but it won't let me plot the colormap beside my scatter plot! Any help is much appreciated
%matplotlib inline
import pandas as pd
import numpy as np
IncubatorBelow = "./Analysis.Test.csv"
df = pd.read_csv(IncubatorBelow, sep = ',', names['Name','TrackID','ParentID','xPos','yPos','zPos','eDep','DeltaE','Einit','EventID'],low_memory=False,error_bad_lines=False)
df["xPos"] = df["xPos"].str.replace("(","")
df["zPos"] = df["zPos"].str.replace(")","")
df.sort_values(by='Name', ascending=[False])
df.dropna(how='any',axis=0,subset=['Name','TrackID','ParentID','xPos','yPos','zPos','eDep','DeltaE','Einit','EventID'], inplace=True)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
df['xPos'] = df['xPos'].astype(float)
df['yPos'] = df['yPos'].astype(float)
df['zPos'] = df['zPos'].astype(float)
#df10[df10['Name'].str.contains("e-")]
threedee = plt.figure().gca(projection='3d')
threedee.scatter(df["xPos"], df["yPos"], df["zPos"], c=df["eDep"], cmap=plt.cm.coolwarm)
threedee.set_xlabel("x(mm)")
threedee.set_ylabel("y(mm)")
threedee.set_zlabel("z(mm)")
plt.show()
Heres what the plot looks like!
Its from a particle physics simulation using GEANT4. The actual files are extremely large (3.7GB's that I've chunked into 40ish MB's) and this plot only represents a small fraction of the data.

Contour plot from csv file with row being axis

I am trying to make a contour plot from a csv file. I would like the first column to be the x axis, the first row (with has values) to be the y, and then the rest of the matrix is what should be contoured, see the basic example in the figure below.
Simple table example
What I am really struggling is to get that first row to be the y axis, and then how to define that set of values so that they can be called into the contourf function. Any help would be very much appreciated as I am very new to python and am really don't know where to start with this problem.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import csv
import pandas as pd
import numpy as np
from csv import reader
from matplotlib import cm
f = pd.read_csv('/trialforplot.csv',dayfirst=True,index_col=0)
x = f.head()
y = f.columns
X,Y = np.meshgrid(x,y)
z=(x,y)
z=np.array(z)
Z=z.reshape((len(x),len(y)))
plt.contour(Y,X,Z)
plt.colorbar=()
plt.xlabel('Time')
plt.ylable('Particle Size')
plt.show()
I'm stuck at defining the z values and getting my contour plot plotting.

Python Matplotlib Plotting CSV data, formatting date X label

My data looks as follows:
2012021305, 65217
2012021306, 82418
2012021307, 71316
2012021308, 66833
2012021309, 69406
2012021310, 76422
2012021311, 94188
2012021312, 111817
2012021313, 127002
2012021314, 141099
2012021315, 147830
2012021316, 136330
2012021317, 122252
2012021318, 118619
2012021319, 115763
2012021320, 121393
2012021321, 130022
2012021322, 137658
2012021323, 139363
Where the first column is the data YYYYMMDDHH . I'm trying to graph the data using the csv2rec module. I can get the data to graph but the x axis and labels are not showing up the way that I expect them to.
import matplotlib
matplotlib.use('Agg')
from matplotlib.mlab import csv2rec
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from pylab import *
output_image_name='plot1.png'
input_filename="data.log"
input = open(input_filename, 'r')
input.close()
data = csv2rec(input_filename, names=['time', 'count'])
rcParams['figure.figsize'] = 10, 5
rcParams['font.size'] = 8
fig = plt.figure()
plt.plot(data['time'], data['count'])
ax = fig.add_subplot(111)
ax.plot(data['time'], data['count'])
hours = mdates.HourLocator()
fmt = mdates.DateFormatter('%Y%M%D%H')
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(fmt)
ax.grid()
plt.ylabel("Count")
plt.title("Count Log Per Hour")
fig.autofmt_xdate(bottom=0.2, rotation=90, ha='left')
plt.savefig(output_image_name)
I assume this has something to do with the date format. Any suggestions?
You need to convert the x-values to datetime objects
Something like:
time_vec = [datetime.strp(str(x),'%Y%m%d%H') for x in data['time']]
plot(time_vec,data['count'])
Currently, you are telling python to format integers (2012021305) as a date, which it does not know how to do, so it returns and empty string (although, I suspect that you are getting errors raised someplace).
You should also check your format string mark up.

Categories

Resources