How to plot specific data from a CSV file with matplotlib - python

From all the rows of a csv file, I want to keep only two arithmetic values from each row and use them as X-Y pairs for a plot I want to make and later to "feed" them on the code I wrote to cluster them. Any help?

You can use numpy.genfromtxt to only load specific columns from a csv file, using delimiter=',' and the usecols kwarg to select which columns to read.
For example:
import numpy as np
import matplotlib.pyplot as plt
# Create a dummy csv file
from StringIO import StringIO
mycsv = StringIO("""
1.,2.,3.,9.
3.,4.,2.,4.
8.,3.,4.,1.
1.,6.,3.,4.
""")
# Read csv using genfromtxt. Select only the second and firth columns.
x, y = np.genfromtxt(mycsv, usecols=(1, 3), unpack=True, delimiter=',')
plt.plot(x, y, 'ko')
plt.show()

You can use python's CSV module and list indexing to extract data and store them in lists.
I find this website's tutorial very enlightening: https://pythonprogramming.net/reading-csv-files-python-3/
You can use the plt.scatter() method to plot the data.
import matplotlib.pytplot as plt
plt.scatter(x,y) # x and y being 2 lists of the coordinates of your values
plt.show()

Related

how to plot a single column graph of a CSV file in python?

I am trying to plot a graph with the train data that I have from this website. The train data consists of many column many rows data, but I wanted to plot the graph column by column.
I managed to figure out a working code to only print out a column, however I do not know how to plot graph for that particular column. For my below code, the last two lines are my attempt to try plot the single column graph but it is not working. Can anyone help me on how I can successfully plot the graph of that column?
https://archive.ics.uci.edu/ml/datasets/Parkinson+Speech+Dataset+with++Multiple+Types+of+Sound+Recordings
import csv
import matplotlib.pyplot as plt
with open("C://Users/RichardStone/Pycharm/Projects/train_data.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for lines in csv_reader:
print(lines[1])
plt.plot(lines[1])
plt.show()
Why not read data into a pandas dataframe and then plot it using matplotlib?
Something like this should work:
import pandas as pd
import matplotlib.pyplot as plt
file_path = "path\to\file"
df = pd.read_csv(file_path)
for column in df.columns:
print(df[column])
plt.figure()
plt.title(column)
plt.plot(df[column])
plt.show()
If you're using matplotlib, you've already got numpy, so you could do something like this:
import csv
import matplotlib.pyplot as plt
import numpy
with open('C://Users/RichardStone/Pycharm/Projects/train_data.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
# convert strings to numbers, collect everything in a list of lists
data_list = [[float(item) for item in row] for row in reader if row]
# convert to numpy array for convenient indexing
data = numpy.array(data_list)
column_index = 1
plt.plot(data[:, column_index])
# or plt.plot(data) to show all columns
# or plt.scatter(data[:, 0], data[:, 1]) for a scatter plot
plt.show()

Graphing multiple csv lists into one graph in python

I have 5 csv files that I am trying to put into one graph in python. In the first column of each csv file, all of the numbers are the same, and I want to treat these as the x values for each csv file in the graph. However, there are two more columns in each csv file (to make 3 columns total), but I just want to graph the second column as the 'y-values' for each csv file on the same graph, and ideally get 5 different lines, one for each file. Does anyone have any ideas on how I could do this?
I have already uploaded my files to the variable file_list
Read the first file and create a list of lists in which each list filled by two columns of this file. Then read the other files one by one and append y column of them to the correspond index of this list.
You can simply call plot more than once. Assuming you from matplotlib.pyplot import plot, You can repeat the same x values, or have different ones and it will still work. Here is an example:
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
files = list(Path("/path/to/folder/with/csvs").glob("*.csv"))
fig, ax = plt.subplots(figsize=(10, 10))
x_col, y_col = "x_column_name", "y_column_name"
for file in files:
file_name = file.stem
df = pd.read_csv(file)
df.plot(x=x_col, y=y_col, ax=ax, label=file_name, legend=True)
fig # If using a jupyter notebook, and you've run a cell with %matplotlib inline
Assuming your files are named File0.csv, File1.csv, File2.csv, File3.csv, File4.csv, you can loop over them, ignore the third column values and plot the x and y values. The following pseudo code will work for 3 columns
import numpy as np
import matplotlib.pyplot as plt
for i in range(5):
x, y, _ = np.loadtxt('File%s.csv' %i, unpack=True)
plt.plot(x, y, label='File %s' %i)
plt.legend()
plt.show()

python plotting from two different files

I have two files, named "data1.dat" and "data2.dat". I want to take first column of "data1.dat" as xlabel and third column of "data2.dat" as ylabel and make a plot.
How can I do that?
Help please.
You can read both files and store the required column data in numpy arrays as follows :
import numpy as np
import matplotlib.pyplot as plt
with open('data1.dat','r') as f1:
x=np.genfromtxt(f1) . # I suppose your data1 file has 1 column
with open('data2.dat','r') as f2:
y=np.genfromtxt(f2)
y=y[:,2] # I only the third column
# plot
plt.figure()
plt.plot(x,y)
plt.show()

How to plot a CVS file with python? My plot comes up blank

I have the code below that seems to run without issues until I try to plot it. A blank plot will show when asked to plot.
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt('/home/oem/Documents/620157.csv', delimiter=',', skip_header=01, skip_footer=01, names=['x', 'y'])
plt.plot(data,'o-')
plt.show()
I'm not sure what your data looks like, but I believe you need to do something like this:
data = np.genfromtxt('/home/oem/Documents/620157.csv',
delimiter=',',
skip_header=1,
skip_footer=1)
name, x, y, a, b = zip(*data)
plt.plot(x, y, 'o-')
As per your comment, the data is currently an array containing tuples of the station name and the x and y data. Using zip with the * symbol assigns them back to individual variables which can then be used for plotting.

plot histogram in python using csv file as input

I have a csv file which contains two columns where first column is fruit name and second column is count and I need to plot histogram using this csv as input to the code below. How do I make it possible. I just have to show first 20 entries where fruit names will be x axis and count will be y axis from entire csv file of 100 lines.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', header = None ,quoting=2)
data.hist(bins=10)
plt.xlim([0,100])
plt.ylim([50,500])
plt.title("Data")
plt.xlabel("fruits")
plt.ylabel("Frequency")
plt.show()
I edited the above program to plot a bar chart -
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None)
data.values
print data
plt.bar(data[:,0], data[:,1], color='g')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
but this gives me an error 'Unhashable Type '. Can anyone help on this.
You can use the inbuilt plot of pandas, although you need to specify the first column is index,
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None, index_col =0)
data.plot(kind='bar')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
If you need to use matplotlib, it may be easier to convert the array to a dictionary using data.to_dict() and extract the data to numpy array or something.

Categories

Resources