How to read only part of a CSV file? - python

I have the following code in which I read CSV files and get a graph plotted:
import numpy as np
import matplotlib.pyplot as plt
import scipy.odr
from scipy.interpolate import interp1d
plt.rcParams["figure.figsize"] = (15,10)
def readPV(filename="HE3.csv",d=32.5e-3):
t=np.genfromtxt(fname=filename,delimiter=',',skip_header=1, usecols=0)
P=np.genfromtxt(fname=filename,delimiter=',',skip_header=1, usecols=1)
V=np.genfromtxt(fname=filename,delimiter=',',skip_header=1, usecols=2,filling_values=np.nan)
V=V*np.pi*(d/2)**2
Vi= interp1d(t[~np.isnan(V)],V[~np.isnan(V)],fill_value="extrapolate")
V=Vi(t)
return P,V,t
P,V,t=readPV(filename="HE3.csv")
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(V,P,'ko')
ax.set_xlabel("Volume")
ax.set_ylabel("Pressure")
plt.show()
From this code, the following graph is made:
The CSV file has several data points in one column, separated by commas; I want to know how to pick a range of columns to read, instead of all of them.

Related

A boxplot with lines connecting data points in python

I am trying to connect lines based on a specific relationship associated with the points. In this example the lines would connect the players by which court they played in. I can create the basic structure but haven't figured out a reasonably simple way to create this added feature.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df_dict={'court':[1,1,2,2,3,3,4,4],
'player':['Bob','Ian','Bob','Ian','Bob','Ian','Ian','Bob'],
'score':[6,8,12,15,8,16,11,13],
'win':['no','yes','no','yes','no','yes','no','yes']}
df=pd.DataFrame.from_dict(df_dict)
ax = sns.boxplot(x='score',y='player',data=df)
ax = sns.swarmplot(x='score',y='player',hue='win',data=df,s=10,palette=['red','green'])
plt.show()
This code generates the following plot minus the gray lines that I am after.
You can use lineplot here:
sns.lineplot(
data=df, x="score", y="player", units="court",
color=".7", estimator=None
)
The player name is converted to an integer as a flag, which is used as the value of the y-axis, and a loop process is applied to each position on the court to draw a line.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df_dict={'court':[1,1,2,2,3,3,4,4],
'player':['Bob','Ian','Bob','Ian','Bob','Ian','Ian','Bob'],
'score':[6,8,12,15,8,16,11,13],
'win':['no','yes','no','yes','no','yes','no','yes']}
df=pd.DataFrame.from_dict(df_dict)
ax = sns.boxplot(x='score',y='player',data=df)
ax = sns.swarmplot(x='score',y='player',hue='win',data=df,s=10,palette=['red','green'])
df['flg'] = df['player'].apply(lambda x: 0 if x == 'Bob' else 1)
for i in df.court.unique():
dfq = df.query('court == #i').reset_index()
ax.plot(dfq['score'], dfq['flg'], 'g-')
plt.show()

CSV file matplotlib.pyplot graphing error

I am using pandas to import a csv to my notebook, and I changed any blank data column to a blank space. When I use plt.plot to make a graph of the data it turns out with a bunch of black lines on the x and y axis. Below is my code and graph:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
apo2data = pd.read_csv('/Users/lilyloyer/Desktop/Apo2excel.csv')
apo2data.isnull()
data = apo2data.fillna(" ")
teff=data['Teff (K)']
grav=data['logg_seis']
plt.plot(teff, grav, 'ro')

How to change the space between histograms in pandas

I'm currently using df.hist(alpha = .5), but all of the subplots are too close from each other, like this:
Histograms
Which way is better to change the space between them?
Or is better to plot each one in a separate .png file?
One simple way is to manipulate figsize and add pyplot.tight_layout. Below is the example.
Without adjustment:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(6400)
.reshape((100, 64)), columns=['col_{}'.format(i) for i in range(64)])
df.hist(alpha=0.5)
plt.show()
You will get this as you showed:
In contrast, if you add figsize (with arbitrary size) and pyplot.tight_layout like below:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(6400)
.reshape((100, 64)), columns=['col_{}'.format(i) for i in range(64)])
df.hist(alpha=0.5, figsize=(20, 10))
plt.tight_layout()
plt.show()
In this case you will get more aligned view:
Hope this helps.

Contour plot from csv file with row being axis

I am trying to make a contour plot from a csv file. I would like the first column to be the x axis, the first row (with has values) to be the y, and then the rest of the matrix is what should be contoured, see the basic example in the figure below.
Simple table example
What I am really struggling is to get that first row to be the y axis, and then how to define that set of values so that they can be called into the contourf function. Any help would be very much appreciated as I am very new to python and am really don't know where to start with this problem.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import csv
import pandas as pd
import numpy as np
from csv import reader
from matplotlib import cm
f = pd.read_csv('/trialforplot.csv',dayfirst=True,index_col=0)
x = f.head()
y = f.columns
X,Y = np.meshgrid(x,y)
z=(x,y)
z=np.array(z)
Z=z.reshape((len(x),len(y)))
plt.contour(Y,X,Z)
plt.colorbar=()
plt.xlabel('Time')
plt.ylable('Particle Size')
plt.show()
I'm stuck at defining the z values and getting my contour plot plotting.

add legend to numpy array in matplot lib

I am plotting 2D numpy arrays using
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2,3])
y = np.array([[2,2.2,3],[1,5,1]])
plt.plot(x,y.T[:,:])
plt.legend()
plt.show()
I want a legend that tells which line belongs to which row. Of course, I realize I can't give it meaningful names, but I need some sort of unique label for the line without running through loop.
import numpy as np
import matplotlib.pyplot as plt
import uuid
x = np.array([1,2,3])
y = np.array([[2,2.2,3],[1,5,1]])
fig, ax = plt.subplots()
lines = ax.plot(x,y.T[:,:])
ax.legend(lines, [str(uuid.uuid4())[:6] for j in range(len(lines))])
plt.show()
(This is off of the current mpl master branch with a preview of the 2.0 default styles)

Categories

Resources