Taking data from specific columns in a dataset - python

I need to take data from only 3 columns in my dataset, how do I do this? I am trying to make a correlation graph. This is my code:
import matplotlib.pyplot as plt
import pandas as pd
crimedata = pd.read_csv('MasterFileCSV.csv')
crime_df = pd.DataFrame(crimedata)
plt.matshow(crime_df.corr())
plt.show

Related

Plot Correlation Table imported from excel with Python

So I am trying to plot correlation Matrix (already calculated) in python. the table is like below:
And I would like it to look like this:
I am using the Following code in python:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
df = pd.DataFrame(data)
print (df)
corrMatrix = data.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
Note that, the matrix is ready and I don't want to calculate the correlation again! but I failed to do that. Any suggestions?
You are recalculating the correlation with the following line:
corrMatrix = data.corr()
You then go on to utilize this recalculated variable in the heatmap here:
sn.heatmap(corrMatrix, annot=True)
plt.show()
To resolve this, instead of passing in the corrMatrix value which is the recalculated value, pass the pure excel data data or df (as df is just a copy of data). Thus, all the code you should need is:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
sn.heatmap(data, annot=True)
plt.show()
Note that this assumes, however, that your data IS ready for the heatmap as you suggest. As we online do not have access to your data we cannot confirm that.
I have deleted to frist column (names) and add them later so the code is as below:
import seaborn as sn
import matplotlib.pyplot as plt
import pandas as pd
data =pd.read_excel('/Users/yousefalbuhaisi/Desktop/wetchimp_global/corr/correlation_matrix.xlsx')
fig, ax = plt.subplots(dpi=150)
y_axis_labels = ['CLC','GIEMS','GLWD','LPX_BERN','LPJ_WSL','LPJ_WHyME','SDGVM','DLEM','ORCHIDEE','CLM4ME']
sn.heatmap(data,yticklabels=y_axis_labels, annot=True)
plt.show()
and the results are:

How do I find covariance and correlation?

I have 2 data sets saved in the csv file. Column names "avg" and "hu". I want to find the covariance and correlation values ​​of these two data sets. I tried it with some simple codes. But every time I got an error. What am I doing wrong ?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv("80hucov.csv")
avg=data["avg"]
hu=data["hu"]
data = np.array(["avg, hu"])
covMatrix = np.cov(data,bias=True)
print (covMatrix)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv("80hucov.csv")
data = {'A': ["avg"],
'B': ["hu"],}
df = pd.DataFrame(data,columns=['A','B'])
covMatrix = pd.DataFrame.cov(df)
sn.heatmap(covMatrix, annot=True, fmt='g')
plt.show()
It seems you may need to redefine your definition of the array.
Currently you have:
data = np.array(["avg, hu"])
You can do:
data_array = data[['avg', 'hu']].to_numpy()
I recommend using different names for different objets within your code. In your example you use "data" for both your dataframe and your array.

Working with Electrodermal data from Empatica E4 - how to plot with time?

I'm working with electrodermal data imported from an Empatica E4. I want to create descriptives and z score the data then plot it. I've managed to get so far with the
below:
# Import packages
import pandas as pd
# Download data
df = pd.read_csv("EDA.csv")
# Plot it
df.plot()
import pandas as pd
from scipy.stats import zscore
df = pd.DataFrame(pd.read_csv('EDA.csv', sep=','))
print(df.describe())
df = df.apply(zscore) # Normalization
print(df.describe())
print (df)
import matplotlib.pyplot as plt
plt.plot(df)
Here's my output:
Descriptives
Z SCORE plot
I want to change the x axis so that it reads time rather than the data point number. What stuck on is how to read in EDA.csv data at its 4hz sample rate and include that in my plot.
Thanks in advance!

Histogram for multiple dataframes with different sizes in Pandas?

I am trying to generate a multiple legend histogram(example). The problem is that the length(size) of the DataFrame is different. The following code would have worked if the size (30 and 10 in this example) were the same. Is there a way to still generate the histogram that I can compare multiple data series?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
orig = pd.DataFrame(np.random.random(30))
short = pd.DataFrame(np.random.random(10))
combine = pd.DataFrame({'orig' : orig, 'short' : short})
plt.figure()
h = combine.plot(kind='hist', logy=True)
f = h.get_figure()
f.savefig('figures/combined.png')

How can I show True/False graph on Python

I have a csv file and I want to show this data on grap. I have date,place and status data but I don't need place so I fetch data like this.
And going like this
Here is my code. How can I get a graph with 1-0 values according to date value. Which method should I use ? Thanks
import pandas as pd from pandas
import DataFrame
import datetime
import pandas.io.data
import matplotlib.pyplot as plt from mpl_toolkits.mplot3d
import Axes3D import pylab rows_list=[] df=pd.read_csv('filepath',header=None,parse_dates=True,pr‌​efix='column')
for row in df.iterrows():
if row[1][1]=='Beweging in de living':
if row[1][2]=='OPEN': rows_list.append([row[1][0],'1'])
else: rows_list.append([row[1][0],'0'])
df2 = pd.DataFrame(rows_list)
df3=df2.set_index(0)
print df3 plt.plot(df3)
plt.show()

Categories

Resources