Python crashing when using error_bad_lines=False in Pandas DataFrame - python

Whenever I load data from a csv in a pandas dataframe and use :
error_bad_lines=False
it gives Segmentation fault: 11 error and keeps crashing everytime.
Here it is..
df = pandas.read_csv(filename,error_bad_lines=False)

I got the problem fixed. Somehow the file format got changed and it was not parsing it properly because of which it crashed.
Sorry guys

Related

Jupyter Notebook - Pandas

I am new to using Jupyter, NumPy, and pandas. I was looking for a solution online but I could not find anything to solve the error.
I am trying to load a file.csv but I got an error each time I find a solution. I also tried to upload the file to Jupyter notebook to use just the file directly but my system respond that the file is not there. I convert the file from .txt to .csv assuming that that was the problem but still can't load directly. Thus, I decided to use the long format but still have problems.
data = pd.read_csv(r'C:/Users/kharm/Dropbox/Jupyter/Assignment/AutoInsurSweden.csv', header=None)
data.head()
I got the error:
ParserError: Error tokenizing data. C error: Expected 1 field in line 12, saw 2
If I modify to:
data = pd.read_csv(r'C:/Users/kharm/Dropbox/Jupyter/Assignment/AutoInsurSweden.csv', header=None, error_bad_lines=False )
data.head()
or
data = pd.read_csv(r'C:/Users/kharm/Dropbox/Jupyter/Assignment/AutoInsurSweden.csv', header=None, sep='\n')
data.head()
that error suggests that the problem is with the datafile itself not your code, it seems on line 12 of the csv you have an extra data field

How to export data to csv/excel from clipboard with python

I have being trying to export a table to a csv file. The table is copied to the clipboard and it is ready to be put into a csv (at least manually).
I have seen that you can read with pandas anything that you have in the clipboard and assign it to a dataframe, so I tried this code.
df = pd.read_clipboard()
df
df.to_csv('data.csv')
However, I got this error:
pandas.errors.ParserError: Expected 10 fields in line 5, saw 16. Error could possibly be due to
quotes being ignored when a multi-char delimiter is used.
I have being looking for a solution or an alternative but failed.
Thanks in advance!

I am getting an error expected <class 'openpyxl.styles.fills.Fill'> reading an excel file with pandas read_excel

I am trying to read an excel file with pandas read_excel function, but I keep getting the following error:
expected <class 'openpyxl.styles.fills.Fill'>
The exact code I tiped is:
corrosion_df=pd.read_excel('Corrosion.xlsx')
I already double checked the filename and it is correct. The file is also saved in the correct directory. I don't know what's going wrong because I used this method many times and until now it has always worked. Thank you very much in advance.
I had the same issue, but I found when I made some changed the spreadsheet and resaved the problem stopped.
I think the answer here is the most helpful:
Error when trying to use module load_workbook from openpyxl
My data was also being autogenerated by another site so I'm assuming there is so slight corruption in their process. I'm adding the option of csv to my project just to give an alternative.
The only way was to manually open it, save it and load it.
My workaround for it is to convert the file using libreoffice:
I ran this command line in my jupyter notebook:
!libreoffice --convert-to xls 'my_file.xlsx'
this creates a new file named my_file.xls, this file can be opened now with pandas.
import pandas as pd
df = pd.read_excel('my_file.xls')
I had the same problem. I just resaved the excel file.

trying to parse parquet file into pandas dataframe

As stated above I am trying to parse a parquet file into a pandas data frame but I always get the error from the screenshot below. I also switch from VS Code to Sublime because VS Code did not accept the pyarrow import even though it was there picture. The line above also gives the same error.
thanks in advance guys
edit: I know tried the following which lead to the following error Screenshot
This could resolve your problem:
df = pd.read_parquet(path=your_file_path)

How to write a dataframe in pyspark having null values to CSV

I'm using the below code to write to a CSV file.
df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").option("nullValue"," ").save("/home/user/test_table/")
when I execute it, I'm getting the following error:
java.lang.UnsupportedOperationException: CSV data source does not support null data type.
Could anyone please help?
I had the same problem (not using that command with the nullValue option) and I solved it by using the fillna method.
And I also realised that fillna was not working with _corrupt_record, so I dropped since I didn't need it.
df = df.drop('_corrupt_record')
df = df.fillna("")
df.write.option('header', 'true').format('csv').save('file_csv')

Categories

Resources