How to get labels by numpy loadtext? - python

I have a data file in the form of
Col0 Col1 Col2
2015 1 4
2016 2 3
The data is float, and I use numpty loadtext to make a ndarray. However, I need to skip the label rows and columns to have an array of the data. How can I make the ndarray out of the data while reading the labels too?
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt("data.csv", skiprows=1)
# I need to skip the first row in reading the data but still get the labels.
x= data[:,0]
a= data[:,1]
b= data[:,2]
plt.xlabel(COL0) # Reading the COL0 value from the file.
plt.ylabel(COL1) # Reading the COL1 value from the file.
plt.plot(x,a)
NOTE: The labels (column titles) are unknown in the script. The script should be generic to work with any input file of the same structure.

With genfromtxt it is possible to get the names in a tuple. You can query on name, and you can get the names out into a variable using dtype.names[n], where n is an index.
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt('data.csv', names=True)
x = data[data.dtype.names[0]] # In this case this equals data['Col1'].
a = data[data.dtype.names[1]]
b = data[data.dtype.names[2]]
plt.figure()
plt.plot(x, a)
plt.xlabel(data.dtype.names[0])
plt.ylabel(data.dtype.names[1])
plt.show()

This is not really an answer to the actual question, but I feel you might be interested in knowing how to do the same with pandas instead of numpy.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("data.csv", delim_whitespace=True)
df.set_index(df.columns[0]).plot()
plt.show()
would result in
As can be seen, there is no need to know any column name and the plot is labeled automatically.
Of course the data can then also be used to be plotted with matplotlib:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("data.csv", delim_whitespace=True)
x = df[df.columns[0]]
a = df[df.columns[1]]
b = df[df.columns[2]]
plt.figure()
plt.plot(x, a)
plt.xlabel(df.columns[0])
plt.ylabel(df.columns[1])
plt.show()

Related

Trying to resolve problem in pandas-python

I have one question. I have point cloud data, and now I have to read and plot the points. If anyone can help me, I would be very thankful. I am using python(pandas, matplotlib,...), and I got all values of X,Y,Z but don't know how to plot all of them to get 3D plot. The values are taken from point cloud data and it has 170 rows and 254 combinations of x,y,z,I,N values.
https://datalore.jetbrains.com/notebook/n9MPhjVrtrIoU1buWmQuDh/MT7MrS1buzmbD7VSDqhGqu/
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
import pandas as pd
df1 = pd.read_csv('cloud.txt',delimiter='\t')
pd.set_option('display.max_columns', None)
df1 = df1.apply (pd.to_numeric, errors='coerce')
#cloud.dropna()
df1.fillna(0,axis=0,inplace=True)
df2=df1.iloc[:,:-1]
df2.head(170)
kolone=[]
i=1
while i<6:
kolone.append(i)
i=i+1
display(kolone)
c=[]
columns=kolone*224
c=c+columns
df2.columns=c
display(df2)
#Reading the points: 1 column is x value, 2 column is y value and
3 column is z value. 4 and 5 are intensity and noise values and
they are not important for this.
#First row is exchanged with numerisation of columns: adding
values 1,2,3,4,5 or x,y,z,I,N values.
x=df2[1]
y=df2[2]
z=df2[3]
r=[]
i=1
while i<225:
r.append(i)
i=i+1
#print(r)
x.columns=r
display(x)
#Reading x coordinates--224 values of x
i=1
p=[]
while i<225:
p.append(i)
i=i+1
#print(p)
y.columns=p
display(y)
#Reading y coordinates--224 values of y
i=1
q=[]
while i<225:
q.append(i)
i=i+1
#print(q)
z.columns=q
display(z)
#Reading z coordinates--224 values of z
It is a bit upsetting that you haven't tried anything at all yet. The documentation page for matplotlib's 3D scatter plot includes a complete example.
There is no point in going to all that trouble to assign column names. Indeed, there is really no point in using pandas at all for this; you could read the CSV directly into a numpy array. However, assuming you have a dataframe with unnamed columns, it's still pretty easy.
In this code, I create a 50x3 array of random integers, then I pull the columns as lists and pass them to scatter. You ought to be able to adapt this to your own code.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randint( 256, size=(50,3))
df = pd.DataFrame(data)
x = df[0].tolist()
y = df[1].tolist()
z = df[2].tolist()
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter( x, y, z )
plt.show()

Plotting a CSV-file with time using matplotlib

I have currently started a project where I need to evaluate and plot data using python. The csv-file that I have to plot are structured like this:
date,ch1,ch2,ch3,date2
11:56:20.149766,0.909257531,0.909420371,1.140183687, 13:56:20.149980
11:56:20.154008,0.895447016,0.895601869,1.122751355, 13:56:20.154197
11:56:20.157245,0.881764293,0.881911397,1.105638862, 13:56:20.157404
11:56:20.160590,-0.009178977,-0.000108901,-1.486875653, 13:56:20.160750
11:56:20.190473,-1.473576546,-1.477073431,-1.846657276, 13:56:20.190605
11:56:20.193810,-1.460405469,-1.463766813,-1.8300246, 13:56:20.193933
11:56:20.197139,-1.447362065,-1.450844049,-1.813711882, 13:56:20.197262
11:56:20.200480,-1.434574604,-1.437921286,-1.797878742, 13:56:20.200604
11:56:20.203803,-1.422042727,-1.425382376,-1.782045603, 13:56:20.203926
11:56:20.207136,-1.40951097,-1.412971258,-1.7663728, 13:56:20.207258
11:56:20.210472,-0.436505407,-0.438260257,-0.54675138, 13:56:20.210595
11:56:20.213804,0.953246772,0.953690529,1.19551909, 13:56:20.213921
11:56:20.217136,0.93815738,0.938464701,1.176487565, 13:56:20.217252
11:56:20.220472,0.923707485,0.924006522,1.158255577, 13:56:20.220590
11:56:20.223807,0.909385324,0.909676254,1.140343547, 13:56:20.223922
11:56:20.227132,0.895447016,0.895729899,1.122911215, 13:56:20.227248
11:56:20.230466,0.881892085,0.882039428,1.105798721, 13:56:20.230582
I can already read the file and print it using pandas:
df = pd.read_csv (r'F:\Schule\HTL\Diplomarbeit\aw_python\datei_meas.csv')
print (df)
But now I want to plot the file using matplotlib. The first column date should be in the x axis and column 2,3 and 4 should be the y-values of different graphs.
I hope that anyone can help me with my problem.
Kind regards
Matthias
Edit:
This is what I have tried to convert the date-column into a readable file-format:
import matplotlib.pyplot as plt
import numpy as np
import mplcursors
import pandas as pd
import matplotlib.dates as mdates
df = pd.read_csv (r'F:\Schule\HTL\Diplomarbeit\aw_python\datei_meas.csv')
print (df)
x_list = df.date
y = df.ch1
x = mdates.date2num(x_list)
plt.scatter(x,y)
plt.show
And this is the occurring error message:
d = d.astype('datetime64[us]')
ValueError: Error parsing datetime string " 11:56:20.149766" at position 3

Plotting from panda frame

This seems like a very simple thing but I canĀ“t make it. I have panda frame like this http://prntscr.com/ko8lyd and I now want to plot one column on X-axis and another column on Y-axis. Here is what i try
import matplotlib.pyplot as plt
x = ATR_7
y = Vysledek
plt.scatter(x,y)
plt.show()
the is the error i am getting
<ipython-input-116-5ead5868ec87> in <module>()
1 import matplotlib.pyplot as plt
----> 2 x = ATR_7
3 y = Vysledek
4 plt.scatter(x,y)
5 plt.show()
where am I going wrong?
You just need:
df.plot.scatter('ATR_7','Vysledek')
Where df is the name of your dataframe. There's no need to use matplotlib.
You are trying to use undefined variables. ATR_7 is a name of a column inside your dataframe, it is not known to the rest of the world.
Try something like:
df.plot.scatter(x='ATR_7', y='Vysledek')
assuming your dataframe name is df
If you want to use matplotlib then you need to make your x and y values a list then pass to plt.scatter
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
%matplotlib inline
x = list(df['ATR_7']) # set x axis by creating a list
y = list(df['Vysledek']) # set y axis by creating a list
plt.scatter(x,y)
It seems there were two issues in your code. First, the names of the columns were not in quotes, so python has no way of knowing those are strings (column names are strings). Second, the easiest way to plot variables using pandas is to use pandas functions. You are trying to plot a scatter plot using matplotlib (that takes as input an array, not just a column name).
First, let's load modules and create the data
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
d = {'ATR_7' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'Vysledek' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
Then, you can either use pandas plotting as in
x = 'ATR_7'
y = 'Vysledek'
df.plot.scatter(x,y)
Or plain-old matplotlib plotting as in
x = df['ATR_7']
y = df['Vysledek']
plt.scatter(x,y)
Scatter does not know which data to use. You need to provide it with the data.
x = "ATR_7"
y = "Vysledek"
plt.scatter(x,y, data=df)
under the assumption that df is your dataframe and has columns named "ATR_7" and "Vysledek".

Box Plot of a many Pandas Dataframes

I have three dataframes containing 17 sets of data with groups A, B, and C. A shown in the following code snippet
import pandas as pd
import numpy as np
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C'])
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C'])
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C'])
I would like to plot a box plot to compare the three groups as shown in the figure below
I am trying make the plot using seaborn's box plot as follows
import seaborn as sns
sns.boxplot(data1, groupby='A','B','C')
but obviously this does not work. Can someone please help?
Consider assigning an indicator like Location to distinguish your three sets of data. Then concatenate all three and melt the data to retrieve one value column, one Letter categorical column, and one Location column, all inputs into sns.boxplot:
import pandas as pd
import numpy as np
from matplotlib.pyplot as plt
import seaborn as sns
data1 = pd.DataFrame(np.random.rand(17,3), columns=['A','B','C']).assign(Location=1)
data2 = pd.DataFrame(np.random.rand(17,3)+0.2, columns=['A','B','C']).assign(Location=2)
data3 = pd.DataFrame(np.random.rand(17,3)+0.4, columns=['A','B','C']).assign(Location=3)
cdf = pd.concat([data1, data2, data3])
mdf = pd.melt(cdf, id_vars=['Location'], var_name=['Letter'])
print(mdf.head())
# Location Letter value
# 0 1 A 0.223565
# 1 1 A 0.515797
# 2 1 A 0.377588
# 3 1 A 0.687614
# 4 1 A 0.094116
ax = sns.boxplot(x="Location", y="value", hue="Letter", data=mdf)
plt.show()

plot histogram in python using csv file as input

I have a csv file which contains two columns where first column is fruit name and second column is count and I need to plot histogram using this csv as input to the code below. How do I make it possible. I just have to show first 20 entries where fruit names will be x axis and count will be y axis from entire csv file of 100 lines.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', header = None ,quoting=2)
data.hist(bins=10)
plt.xlim([0,100])
plt.ylim([50,500])
plt.title("Data")
plt.xlabel("fruits")
plt.ylabel("Frequency")
plt.show()
I edited the above program to plot a bar chart -
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None)
data.values
print data
plt.bar(data[:,0], data[:,1], color='g')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
but this gives me an error 'Unhashable Type '. Can anyone help on this.
You can use the inbuilt plot of pandas, although you need to specify the first column is index,
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv', sep=',',header=None, index_col =0)
data.plot(kind='bar')
plt.ylabel('Frequency')
plt.xlabel('Words')
plt.title('Title')
plt.show()
If you need to use matplotlib, it may be easier to convert the array to a dictionary using data.to_dict() and extract the data to numpy array or something.

Categories

Resources