I am using pandas to import a csv to my notebook, and I changed any blank data column to a blank space. When I use plt.plot to make a graph of the data it turns out with a bunch of black lines on the x and y axis. Below is my code and graph:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
apo2data = pd.read_csv('/Users/lilyloyer/Desktop/Apo2excel.csv')
apo2data.isnull()
data = apo2data.fillna(" ")
teff=data['Teff (K)']
grav=data['logg_seis']
plt.plot(teff, grav, 'ro')
Related
I have the following code in which I read CSV files and get a graph plotted:
import numpy as np
import matplotlib.pyplot as plt
import scipy.odr
from scipy.interpolate import interp1d
plt.rcParams["figure.figsize"] = (15,10)
def readPV(filename="HE3.csv",d=32.5e-3):
t=np.genfromtxt(fname=filename,delimiter=',',skip_header=1, usecols=0)
P=np.genfromtxt(fname=filename,delimiter=',',skip_header=1, usecols=1)
V=np.genfromtxt(fname=filename,delimiter=',',skip_header=1, usecols=2,filling_values=np.nan)
V=V*np.pi*(d/2)**2
Vi= interp1d(t[~np.isnan(V)],V[~np.isnan(V)],fill_value="extrapolate")
V=Vi(t)
return P,V,t
P,V,t=readPV(filename="HE3.csv")
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(V,P,'ko')
ax.set_xlabel("Volume")
ax.set_ylabel("Pressure")
plt.show()
From this code, the following graph is made:
The CSV file has several data points in one column, separated by commas; I want to know how to pick a range of columns to read, instead of all of them.
I have a graph, and I would like to make one of my lines different color
Tried using the matplotlib recommendation which just made me print two graphs
import numpy as np
import pandas as pd
import seaborn as sns
data = pd.read_csv("C:\\Users\\Nathan\\Downloads\\markouts_changed_maskedNEW.csv");
data.columns = ["Yeet","Yeet1","Yeet 2","Yeet 3","Yeet 4","Yeet 7","Exchange 5","Yeet Average","Intelligent Yeet"];
mpg = data[data.columns]
mpg.plot(color='green', linewidth=2.5)
I need to plot an accurate line graph through matplotlib but I only get a y=x graph. And the y-axis tick values are jumbled up.
import numpy as np
import matplotlib.pyplot as plt
title = "Number of Flats Constructed"
data = np.genfromtxt('C:\data/flats-constructed-by-housing-and-development-board-annual.csv',
skip_header=1,
dtype=[('year','i8'),('flats_constructed','U50')], delimiter=",",
missing_values=['na','-'],filling_values=[0])
x = data['year']
y = data['flats_constructed']
plt.title('No. of Flats Constructed over the Years')
#plt.plot(data['year'], data['flats_constructed'])
plt.plot(x, y)
plt.show()
I received a y=x graph instead of a jagged graph reflecting the values.
Actual output
Sample of expected output
Your mistake is at ('flats_constructed','U50').
Give it as ('flats_constructed','i8') itself. You read it as string when you gave U50.
from io import StringIO
import numpy as np
s = StringIO(u"1977,30498\n1978,264946\n1979,54666\n1980,54666")
data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','i8')], delimiter=",",skip_header=0)
data
plt.plot(data['myint'],data['myfloat'])
plt.show()
I am trying to make a contour plot from a csv file. I would like the first column to be the x axis, the first row (with has values) to be the y, and then the rest of the matrix is what should be contoured, see the basic example in the figure below.
Simple table example
What I am really struggling is to get that first row to be the y axis, and then how to define that set of values so that they can be called into the contourf function. Any help would be very much appreciated as I am very new to python and am really don't know where to start with this problem.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import csv
import pandas as pd
import numpy as np
from csv import reader
from matplotlib import cm
f = pd.read_csv('/trialforplot.csv',dayfirst=True,index_col=0)
x = f.head()
y = f.columns
X,Y = np.meshgrid(x,y)
z=(x,y)
z=np.array(z)
Z=z.reshape((len(x),len(y)))
plt.contour(Y,X,Z)
plt.colorbar=()
plt.xlabel('Time')
plt.ylable('Particle Size')
plt.show()
I'm stuck at defining the z values and getting my contour plot plotting.
I am trying to plot a simple Distplot using pandas and seaborn to understand the density of the datasets.
Input
#Car,45
#photo,4
#movie,6
#life,1
#Horse,14
#Pets,20
#run,67
#picture,89
The dataset has above 10K rows, no headers and I am trying to use col[1]
code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('keyword.csv', delimiter=',', header=None, usecols=[1])
#print df
sns.distplot(df)
plt.show()
No error as I can print the input column but the distplot is taking ages to compute and freezes my screen. Any suggestion to speed the process.
Edit1: As Suggested in the Comment Below I try to change from pandas.read_csv to np.loadtxt and now I get an error.
Code:
import numpy as np
from numpy import log as log
import matplotlib.pyplot as plt
import seaborn as sns
import pandas
df = np.loadtxt('keyword.csv', delimiter=',', usecols=(1), unpack=True)
sns.kdeplot(df)
sns.distplot(df)
plt.show()
Error:
Traceback (most recent call last):
File "0_distplot_csv.py", line 7, in <module>
df = np.loadtxt('keyword.csv', delimiter=',', usecols=(1), unpack=True)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 726, in loadtxt
usecols = list(usecols)
TypeError: 'int' object is not iterable
Edit 2: I did try the mentioned suggestions from the comment section
sns.distplot(df[1])
This does the same as mentioned initially. The screen is frozen for ages.
sns.distplot(df[1].values)
I see a strange behavior in this case.
When the input is
Car,45
photo,4
movie,6
life,1
Horse,14
Pets,20
run,67
picture,89
It does plot but when the input is below
#Car,45
#photo,4
#movie,6
#life,1
#Horse,14
#Pets,20
#run,67
#picture,89
It is again the same freezing entire screen and would do nothing.
I did try to put comments=None thinking it might be reading them as comments. But looks like comments isn't used in pandas.
Thank you
After several trials and a lot of online search, I could finally get what I was looking for. The code allows to load data with column number when we do not have headers. This also reads the rows with # comments.
code:
import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
from scipy.stats.kde import gaussian_kde
import seaborn as sns
data = np.genfromtxt('keyword.csv', delimiter=',', comments=None)
d0=data[:,1]
#Plot a simple histogram with binsize determined automatically
sns.kdeplot(np.array(d0), color='b', bw=0.5, marker='o', label='keyword')
plt.legend(loc='upper right')
plt.xlabel('Freq(x)')
plt.ylabel('pdf(x)')
#plt.gca().set_xscale("log")
#plt.gca().set_yscale("log")
plt.show()