Issue with reading a text file in pandas - python

I am trying to import a txt file which has around 56 columns and has different data types.
Few columns have values with prefix 000, which I cannot see once the data has been imported.
I am also getting the error message "specify dtype option on reading or set low_memory=false".
Values in certain columns have changed to "NaN" & "4.40578e+01", which is not correct...
I want the data to be imported and displayed correctly.
This is code that I am using
from os import os path
import numpy as np
import pandas as pd
df=pd.read_csv(r"C:\Users\abc\desktop\file.txt",sep=",")
df.head()

Related

When I import my cvs to python it shows me only a size 1

My code is:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset= pd.read_csv('libro1.csv')
Where in my excel I have 60 rows and 14 columns
but it shows me a Dataframe of size (59,1)
Pandas parses the first row as a header, so that's correct to have 59 rows in your case. You can disable this using header=None parameter.
Regarding the columns. Your csv file probably has non-standard delimiter like \t. Pandas assumes comma by default. Open the file in a simple text editor , check your delimiter and set the sep parameter if it is not a comma.
Is this truly a .csv file or an .excel / .xlsx file?
If not, you should open it with read_excel rather than read_csv.

Python: Reading tdms files using Python npTDMS and creating a Pandas dataframe

I'm able to read a labview .tdms file using Python npTDMS package, could read metadata and sample data for groups and channels as well.
However, the file has timestamp values with year '9999'. Hence getting the following error while converting to a pandas dataframe:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp:.
I went through the documentation in:
https://nptdms.readthedocs.io/en/stable/apireference.html#nptdms.TdmsFile.as_dataframe; however, couldn't find an option to deal with this data situation.
Tried passing errors='coerce' while calling as.dataframe() didn't work either. Any pointers or directions to read the .tdms file to a pandas dataframe, with this data situation, would be very helpful.
Changing the data at the source is not an option.
Code snippet to read tdms file:
import numpy as np
import pandas as pd
from nptdms import TdmsFile as td
tdms_file = td.read(<tdms file name>)
tdms_file_df = tdms_file.as_dataframe()
Error while creating a pandas dataframe

Why cant I extract a single column using pandas?

I have a (theoretically) simple task. I need to pull out a single column of 4000ish names from a table and use it in another table.
I'm trying to extract the column using pandas and I have no idea what is going wrong. It keeps flagging an error:
TypeError: string indices must be integers
import pandas as pd
file ="table.xlsx"
data = file['Locus tag']
print(data)
You have just add file name and define the path . But you cannot load the define pandas read excel function . First you have just the read excel function from pandas . That can be very helpful to you read the data and extract the column etc
Sample Code
import pandas as pd
import os
p = os.path.dirname(os.path.realpath("C:\Car_sales.xlsx"))
name = 'C:\Car_sales.xlsx'
path = os.path.join(p, name)
Z = pd.read_excel(path)
Z.head()
Sample Code
import pandas as pd
df = pd.read_excel("add the path")
df.head()

python/pandas "Kernel died, restarting" while loading a csv file

While trying to load a big csv file (150 MB) I get the error "Kernel died, restarting". Then only code that I use is the following:
import pandas as pd
from pprint import pprint
from pathlib import Path
from datetime import date
import numpy as np
import matplotlib.pyplot as plt
basedaily = pd.read_csv('combined_csv.csv')
Before it used to work, but I do not know why it is not working anymore. I tried to fixed it using engine="python" as follows:
basedaily = pd.read_csv('combined_csv.csv', engine='python')
But it gives me an error execution aborted.
Any help would be welcome!
Thanks in advance!
It may be because of the lack of memory you got this error. You can split your data in many data frames, do your work than you can re merge them, below some useful code that you may use:
import pandas as pd
# the number of row in each data frame
# you can put any value here according to your situation
chunksize = 1000
# the list that contains all the dataframes
list_of_dataframes = []
for df in pd.read_csv('combined_csv.csv', chunksize=chunksize):
# process your data frame here
# then add the current data frame into the list
list_of_dataframes.append(df)
# if you want all the dataframes together, here it is
result = pd.concat(list_of_dataframes)

No of `columns` in pandas DataFrame limited to 1024

I have an excel sheet with 15 rows and 1445 columns(24*60 +5 columns). The data contained in 1440 columns (24*60) columns are time series data.
I have the following python code.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from matplotlib.backends.backend_pdf import PdfPages
a=pd.read_csv('test.csv')
print('a.size {}'.format(len(a.axes[0])))
print('a.size {}'.format(len(a.axes[1])))
for x in a.iterrows():
x[1][4:].plot(label=str(x[1][0])+str(x[1][1])+str(x[1][2])+str(x[1][3]))
I get the following output.
a.size 15
a.size 1024
For some reason the number of columns are getting truncated to 1024. Is that a limitation of the machine that I am running on? or is it something else? How do I get around this limitation.
Some Spreadsheet viewers may have a limit on the number of columns to view. For example, I have a CSV file with 4097 columns that when viewed with LibreOffice, it is 1024 columns only.
However, the CSV file usually has all the columns. To make sure the exported CSV file has proper column count, open it in any text editor. If there is mismatch, then there is a problem with the code that exported the CSV.

Categories

Resources