I have 5 csv files that I am trying to put into one graph in python. In the first column of each csv file, all of the numbers are the same, and I want to treat these as the x values for each csv file in the graph. However, there are two more columns in each csv file (to make 3 columns total), but I just want to graph the second column as the 'y-values' for each csv file on the same graph, and ideally get 5 different lines, one for each file. Does anyone have any ideas on how I could do this?
I have already uploaded my files to the variable file_list
Read the first file and create a list of lists in which each list filled by two columns of this file. Then read the other files one by one and append y column of them to the correspond index of this list.
You can simply call plot more than once. Assuming you from matplotlib.pyplot import plot, You can repeat the same x values, or have different ones and it will still work. Here is an example:
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
files = list(Path("/path/to/folder/with/csvs").glob("*.csv"))
fig, ax = plt.subplots(figsize=(10, 10))
x_col, y_col = "x_column_name", "y_column_name"
for file in files:
file_name = file.stem
df = pd.read_csv(file)
df.plot(x=x_col, y=y_col, ax=ax, label=file_name, legend=True)
fig # If using a jupyter notebook, and you've run a cell with %matplotlib inline
Assuming your files are named File0.csv, File1.csv, File2.csv, File3.csv, File4.csv, you can loop over them, ignore the third column values and plot the x and y values. The following pseudo code will work for 3 columns
import numpy as np
import matplotlib.pyplot as plt
for i in range(5):
x, y, _ = np.loadtxt('File%s.csv' %i, unpack=True)
plt.plot(x, y, label='File %s' %i)
plt.legend()
plt.show()
Related
I want to read csv files from a directory and plot them and be able to click the arrow button to step through a plot and look at a different plot. I want to specify which column and be able to title it as well as I have in the code below as well.
I am able to read the csv file and plot a single plot with specific columns but I am not sure how to do it with multiple. I've tried glob but it didn't work, I do not want to concatenate them to a single csv file. I have provided my code below. Any help would be appreciated. Thank you.
import pandas as pd
import matplotlib.pyplot as plt
cols_in = [1, 3]
col_name = ['Time (s), Band (mb)']
df = pd.read_csv("/user/Desktop/TestNum1.csv", usecols = cols_in, names =
col_name, header = None)
fig, ax = plt.subplots()
my_scatter_plot = ax.scatter(df["Time (s)"], df["Band (mb)"])
ax.set_xlabel("Time (s)")
ax.set_ylabel("Band (mb)")
ax.set_title("TestNum1")
plt.show()
You just need to add a for loop over all the files and use glob to collect them.
For example,
import pandas as pd
import matplotlib.pyplot as plt
import glob
cols_in = [1, 3]
col_name = ['Time (s), Band (mb)']
# Select all CSV files on Desktop
files = glob.glob("/user/Desktop/*.csv")
for file in files:
df = pd.read_csv(file, usecols = cols_in, names =
col_name, header = None)
fig, ax = plt.subplots()
my_scatter_plot = ax.scatter(df["Time (s)"], df["Band (mb)"])
ax.set_xlabel("Time (s)")
ax.set_ylabel("Band (mb)")
ax.set_title("TestNum1")
plt.show()
Keeping plt.show() inside the for loop will ensure each plot is plotted. It should be pretty easy to search for 'How to add a title to a plot in python' for answers to your other questions.
I have two files, named "data1.dat" and "data2.dat". I want to take first column of "data1.dat" as xlabel and third column of "data2.dat" as ylabel and make a plot.
How can I do that?
Help please.
You can read both files and store the required column data in numpy arrays as follows :
import numpy as np
import matplotlib.pyplot as plt
with open('data1.dat','r') as f1:
x=np.genfromtxt(f1) . # I suppose your data1 file has 1 column
with open('data2.dat','r') as f2:
y=np.genfromtxt(f2)
y=y[:,2] # I only the third column
# plot
plt.figure()
plt.plot(x,y)
plt.show()
From all the rows of a csv file, I want to keep only two arithmetic values from each row and use them as X-Y pairs for a plot I want to make and later to "feed" them on the code I wrote to cluster them. Any help?
You can use numpy.genfromtxt to only load specific columns from a csv file, using delimiter=',' and the usecols kwarg to select which columns to read.
For example:
import numpy as np
import matplotlib.pyplot as plt
# Create a dummy csv file
from StringIO import StringIO
mycsv = StringIO("""
1.,2.,3.,9.
3.,4.,2.,4.
8.,3.,4.,1.
1.,6.,3.,4.
""")
# Read csv using genfromtxt. Select only the second and firth columns.
x, y = np.genfromtxt(mycsv, usecols=(1, 3), unpack=True, delimiter=',')
plt.plot(x, y, 'ko')
plt.show()
You can use python's CSV module and list indexing to extract data and store them in lists.
I find this website's tutorial very enlightening: https://pythonprogramming.net/reading-csv-files-python-3/
You can use the plt.scatter() method to plot the data.
import matplotlib.pytplot as plt
plt.scatter(x,y) # x and y being 2 lists of the coordinates of your values
plt.show()
I have a csv file which contains two columns where first column is fruit name and second column is count and I need to plot histogram using this csv as input to the code below. How do I make it possible. I just have to show first 20 entries where fruit names will be x axis and count will be y axis from entire csv file of 100 lines.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', header = None ,quoting=2)
data.hist(bins=10)
plt.xlim([0,100])
plt.ylim([50,500])
plt.title("Data")
plt.xlabel("fruits")
plt.ylabel("Frequency")
plt.show()
I edited the above program to plot a bar chart -
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None)
data.values
print data
plt.bar(data[:,0], data[:,1], color='g')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
but this gives me an error 'Unhashable Type '. Can anyone help on this.
You can use the inbuilt plot of pandas, although you need to specify the first column is index,
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None, index_col =0)
data.plot(kind='bar')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
If you need to use matplotlib, it may be easier to convert the array to a dictionary using data.to_dict() and extract the data to numpy array or something.
While plotting with matplotlib, each I have different number of columns and rows, I have to edit my script. Below I have posted the script which has 5 columns. But if I have file which has 7 columns and I want to plot 1st column against 7th column, then I have to edit my code again as in example: c0[7],c7=float(elements[7]), C7.append(c7),etc. Is there a way to automate it? so I won't have to keep changing my code each time I have different number of rows and cols. Thank you
As input parameters, you can have your data file and provide which columns you want to plot for example (1st col against 6th one). Script will take care of number of columns by itself.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
infile = open("ewcd.txt","r")
data = infile.readlines()
C0=[]
C1=[]
C2=[]
C3=[]
C4=[]
for line in data:
elements = line.split()
try:
c0=float(elements[0])
c1 = float(elements[1])
c2=float(elements[2])
c3=float(elements[3])
c4=float(elements[4])
C0.append(c0)
C1.append(c1)
C2.append(c2)
C3.append(c3)
C4.append(c4)
except IndexError:
pass
fig, ax = plt.subplots()
plt.yscale('log')
plt.tick_params(axis='both', which='major', labelsize=13)
plt.plot(C0,C1,'b-')
plt.plot(C0,C2,'g-')