I have an excel sheet with 15 rows and 1445 columns(24*60 +5 columns). The data contained in 1440 columns (24*60) columns are time series data.
I have the following python code.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from matplotlib.backends.backend_pdf import PdfPages
a=pd.read_csv('test.csv')
print('a.size {}'.format(len(a.axes[0])))
print('a.size {}'.format(len(a.axes[1])))
for x in a.iterrows():
x[1][4:].plot(label=str(x[1][0])+str(x[1][1])+str(x[1][2])+str(x[1][3]))
I get the following output.
a.size 15
a.size 1024
For some reason the number of columns are getting truncated to 1024. Is that a limitation of the machine that I am running on? or is it something else? How do I get around this limitation.
Some Spreadsheet viewers may have a limit on the number of columns to view. For example, I have a CSV file with 4097 columns that when viewed with LibreOffice, it is 1024 columns only.
However, the CSV file usually has all the columns. To make sure the exported CSV file has proper column count, open it in any text editor. If there is mismatch, then there is a problem with the code that exported the CSV.
Related
I have a feather datafile that weights approximately 300 MB, name it df.ftr. I can read it with Pandas using the following command:
import pandas as pd
df = pd.read_feather('df.ftr')
However, this dataset contains over 21 million rows and its size overflows my local computer's memory. What I would like to do is to read only the first 1 million rows.
If it were an df.h5 file, I would read it by using the stop argument of the read_hdf method (documentation available here):
import pandas as pd
df = pd.read_hdf('df.h5', 'table', stop=1000000)
However, after checking the read_feather() documentation, there is no argument which seems to be able to produce the same effect.
How can it be done?
My code is:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset= pd.read_csv('libro1.csv')
Where in my excel I have 60 rows and 14 columns
but it shows me a Dataframe of size (59,1)
Pandas parses the first row as a header, so that's correct to have 59 rows in your case. You can disable this using header=None parameter.
Regarding the columns. Your csv file probably has non-standard delimiter like \t. Pandas assumes comma by default. Open the file in a simple text editor , check your delimiter and set the sep parameter if it is not a comma.
Is this truly a .csv file or an .excel / .xlsx file?
If not, you should open it with read_excel rather than read_csv.
I am trying to import a txt file which has around 56 columns and has different data types.
Few columns have values with prefix 000, which I cannot see once the data has been imported.
I am also getting the error message "specify dtype option on reading or set low_memory=false".
Values in certain columns have changed to "NaN" & "4.40578e+01", which is not correct...
I want the data to be imported and displayed correctly.
This is code that I am using
from os import os path
import numpy as np
import pandas as pd
df=pd.read_csv(r"C:\Users\abc\desktop\file.txt",sep=",")
df.head()
I am new to python,pandas,etc and i was asked to import, and plot an excel file. This file contains 180 rows and 15 columns and i have to plot each column with respect to the first one which is time, in total 14 different graphs. I would like some help with writing the script. Thanks in advance.
The function you are looking for is pandas.read_excel (Link).
It will return a DataFrame-Object from where you can access your data in python. Make sure you Excel-File is well formatted.
import pandas as pd
# Load data
df = pd.read_excel('myfile.xlsx')
Check out these packages/ functions, you'll find some code on these websites and you can tailor it to your needs.
Some useful codes:
Read_excel
import pandas as pd
df = pd.read_excel('your_file.xlsx')
Code above reads an excel file to python and keeps it as a DataFrame, named df.
Matplotlib
import matplotlib.pyplot as plt
plt.plot(df['column - x axis'], df['column - y axis'])
plt.savefig('you_plot_image.png')
plt.show()
This is a basic example of making a plot using matplotlib and saving it as your_plot_image.png, you have to replace column - x axis and column - y axis with desired columns from your file.
For cleaning data and some basics regarding DataFrames have a look at this package: Pandas
I have 12 columns in my .dat file. How can I plot the first column with 12th column and there are around 50 rows. Each value is separated by a tab space. I have tried this error as the wrong number of columns at line42 is coming.
import numpy as np
from matplotlib import pyplot as plt
data=np.loadtxt('filep.dat')
pl.plot(data[:,1],data[:,2],'bo')
X=data[:,1]
Y=data[:,2]
plt.plot(X,Y,':ro')
plt.show()
The code in the question is correct! If it doesn't work, it's because your data is not organized the way you think it is or because you have missing values somewhere in your data.
You may try to use numpy.genfromtxt(...) which has more options for bad data filtering than np.loadtxt.