Getting wrong readings when trying to plot CSV file using pandas - python

My csv file looks like the following:
As you see there are 7 columns with comma separated. I have spent hours to read and plot the first column starting with 31364 with the following code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('test.csv', sep=',', header=None, names=['colA','colB','colC','colD','colE','colF','colG'])
y = df['colA']
plt.plot(y)
But the code outputs this plot which does not match the data at all:
I'm using Spyder with Anaconda. What could be the problem?

Is column A all values in the 31,000 range? You're not plotting the whole file.
edit: Don't know what result you're looking for. In your code, the first column in your csv is used as the index to the dataframe (after you read the csv, enter 'df', no quotes, at the python prompt to see what your dataset looks like.
If you don't want the first column in the csv as an index, add 'index_col=False', no quotes, to the parameters when you read the csv in.
Also, not a good idea to end lines in a csv wit the delimiter, comma in this case.

Related

Seaborn Pairplot with Dataframe vs CSV

I have a dataframe in a Jupyter notebook and do a pairplot on it to get a bunch of plots against each other.
import seaborn as sns
sns.pairplot(df_merge)
Here is the pairplot as a result.
However, it plots the data incorrectly and in a non-aesthetic way. However, when I export this dataframe to a csv and then read it back into the program as a dataframe:
import seaborn as sns
df_merge.to_csv('dataframe.csv')
x = pd.read_csv('dataframe.csv')
sns.pairplot(x)
Sns plots it fine and the correlations between variables can be seen but I have an unnecessary column called Unnamed which I don't need.
Does anyone know what could cause this issue and how I can go about correcting it without needing to export the dataframe as a csv?
When you do:
df_merge.to_csv('dataframe.csv')
you write also the index of df_merge without a name. Then
x = pd.read_csv('dataframe.csv')
reads the index as Unnamed 0 column. To fix this, either save the data frame without index:
df_merge.to_csv('dataframe.csv', index=False)
x = pd.read_csv('dataframe.csv')
or read the csv with index:
df_merge.to_csv('dataframe.csv')
x = pd.read_csv('dataframe.csv', index_col=[0])
Figured out that the issue I was having was when I was changing the dataframe to a CSV and then changing it back to a dataframe, the values in the dataframe had a float64 type where as in my dataframe before they were all objects. Converting all the numerical columns to float before plotting the graph solved my issue.

Pandas cannot load the proper column of the CSV File

I have been facing some problems importing a specific column of a CSV file.I needed to import the Longitude and Latitude Column of the dataset (Fig:1).
But in spyder, the variable explorer is showing the wrong values of the variable (Fig:2). And it seems like that my expected column of values is showing inside the Index column. How do I fix this/ How do I import it?
However, When I click the resize button below on the variable explorer window, the index column expands and show something like Fig: 3
The code I am using:
import pandas as pd
import numpy as np
dataset = pd.read_csv('dataset.csv',error_bad_lines=False)
X=dataset.loc[:,['latitude','longitude']]
I suggest making an array of column names, and trying to read the csv like so:
colnames = ["latitude", "longitude",...]
dataset = pd.read_csv('dataset.csv', names=colnames, index_col=0)
# index_col = 0 makes a new index column
# and if you must use error_bad_lines...
dataset = pd.read_csv('dataset.csv', names=colnames, index_col=0, error_bad_lines=False)
When you set error_bad_lines=False you are telling pandas to not raise an error when an error happens. Your previous error instead was telling you exactly what is going wrong:
"Error tokenizing data. C error: Expected 62 fields in line 8, saw 65"
It means you have lines with more fields than the number of headers, which cause the misalignment when you tell pandas to don't care about that. You should clean your data removing the extra column or import just some specific columns using the headers as the other answer suggests.

Problems with creating a CSV file using Excel

I have some data in an Excel file. I would like to analyze them using Python. I started by creating a CSV file using this guide.
Thus I have created a CSV (Comma delimited) file filled with the following data:
I wrote a few lines of code in Python using Spyder:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames)
GDP = data.GDP.tolist()
print(GDP)
The output is nothing I've expected:
It can be easily seen that the output differs a lot from the figures in GDP column. I will appreciate any tips or hints which will help to deal with my problem.
Seems like in the GDP column there are decimal values from the first column in the .csv file and first digits of the second column. There's either something wrong with the .csv you created, but more probably you need to specify separator in the pandas.read_csv line. Also, add header=None, to make sure you don't lose the first line of the file (i.e. it will get replaced by colnames).
Try this:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames, header=None, sep=';')
GDP = data.GDP.tolist()
print(GDP)

Unable to get correct output from tsv file using pandas

I have a tsv file which I am trying to read by the help of pandas. The first two rows of the files are of no use and needs to be ignored. Although, when I get the output, I get it in the form of two columns. The name of the first column is Index and the name of second column is a random row from the csv file.
import pandas as pd
data = pd.read_csv('zahlen.csv', sep='\t', skiprows=2)
Please refer to the screenshot below.
The second column name is in bold black, which is one of the row from the file. Moreover, using '\t' as delimiter does not separate the values in different column. I am using Spyder IDE for this. Am I doing something wrong here?
Try this:
data = pd.read_table('zahlen.csv', header=None, skiprows=2)
read_table() is more suited for tsv files and read_csv() is a more specialized version of it. Then header=None will make first row data, instead of header.

How to plot from .dat file with multiple columns and rows separated with tab spaces

I have 12 columns in my .dat file. How can I plot the first column with 12th column and there are around 50 rows. Each value is separated by a tab space. I have tried this error as the wrong number of columns at line42 is coming.
import numpy as np
from matplotlib import pyplot as plt
data=np.loadtxt('filep.dat')
pl.plot(data[:,1],data[:,2],'bo')
X=data[:,1]
Y=data[:,2]
plt.plot(X,Y,':ro')
plt.show()
The code in the question is correct! If it doesn't work, it's because your data is not organized the way you think it is or because you have missing values somewhere in your data.
You may try to use numpy.genfromtxt(...) which has more options for bad data filtering than np.loadtxt.

Categories

Resources