Jupyter Notebook - Pandas - python

I am new to using Jupyter, NumPy, and pandas. I was looking for a solution online but I could not find anything to solve the error.
I am trying to load a file.csv but I got an error each time I find a solution. I also tried to upload the file to Jupyter notebook to use just the file directly but my system respond that the file is not there. I convert the file from .txt to .csv assuming that that was the problem but still can't load directly. Thus, I decided to use the long format but still have problems.
data = pd.read_csv(r'C:/Users/kharm/Dropbox/Jupyter/Assignment/AutoInsurSweden.csv', header=None)
data.head()
I got the error:
ParserError: Error tokenizing data. C error: Expected 1 field in line 12, saw 2
If I modify to:
data = pd.read_csv(r'C:/Users/kharm/Dropbox/Jupyter/Assignment/AutoInsurSweden.csv', header=None, error_bad_lines=False )
data.head()
or
data = pd.read_csv(r'C:/Users/kharm/Dropbox/Jupyter/Assignment/AutoInsurSweden.csv', header=None, sep='\n')
data.head()

that error suggests that the problem is with the datafile itself not your code, it seems on line 12 of the csv you have an extra data field

Related

How to export data to csv/excel from clipboard with python

I have being trying to export a table to a csv file. The table is copied to the clipboard and it is ready to be put into a csv (at least manually).
I have seen that you can read with pandas anything that you have in the clipboard and assign it to a dataframe, so I tried this code.
df = pd.read_clipboard()
df
df.to_csv('data.csv')
However, I got this error:
pandas.errors.ParserError: Expected 10 fields in line 5, saw 16. Error could possibly be due to
quotes being ignored when a multi-char delimiter is used.
I have being looking for a solution or an alternative but failed.
Thanks in advance!

trying to parse parquet file into pandas dataframe

As stated above I am trying to parse a parquet file into a pandas data frame but I always get the error from the screenshot below. I also switch from VS Code to Sublime because VS Code did not accept the pyarrow import even though it was there picture. The line above also gives the same error.
thanks in advance guys
edit: I know tried the following which lead to the following error Screenshot
This could resolve your problem:
df = pd.read_parquet(path=your_file_path)

Parsing error when attempting to import unique ASCII file into pandas dataframe

I have attached my data file, when I try and import it into a pandas dataframe, I get the following error:
ParserError: Expected 6 fields in line 342, saw 12. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
Here is my code:
df = pd.read_csv('/content/drive/My Drive/MAT 395/RaivathariSinghania/ALDalumina.gr', header=None, delim_whitespace=True, engine="python", skiprows=136)
Any help on getting around this error would be very helpful, I've been looking all over the internet and haven't found a solution as of yet.
File is here: https://drive.google.com/file/d/1_nV3fNvuV8xfJR-Gmv159cKGuKcQ6NOp/view?usp=sharing
Pandas doesn't support gr extension files as per my knowledge.
gr files are primarily associated with XGMML (eXtensible Graph Markup and Modeling Language) File.
I don't think you can directly load gr files directly to pandas.
Probably this link https://gist.github.com/informationsea/4284956 will help you.

How can i show my csv data file in jupyter notebook using pyspark

I am working on a big data csv dataset. I need to read it on jupyter-notebook using pyspark. My data is about 4+ million records (540000 rows and 7 columns.) What can i do so i can show all my dataset printed?
I tried to use pandas dataframe, but it does show error as in the attached screenshot, then i tried to change the encoding type it gives SyntaxError: unexpected EOF while parsing. Can you please help me?
For the last screenshot I think you are missing the way files are reading in python by using the handler with. If your data is in a json file your can read it as follows:
with open('data_file.json', encoding='utf-8') as data_file:
data = json.loads(data_file.read())
Note that it is 'data_file.json' and not data_file.json. The same logis holds for the csv example
If it is in a csv file, tha's pretty straigtforward:
file = pd.read_csv('data_file.csv')
Try removing the encoding parameter in your csv reading step
I would not recommend to use a notebook for reading such a huge file even if you are using pyspark for that. Consider using a portion of that file for visualized in a notebook and then switch to another platform.
Hope it helps

Error tokenizing data while uploading CSV file into Pandas Dataframe

I have an 8GB CSV file that contains information about companies created in France.
When I try to upload it in Python using pandas.read_csv, I get various types of error; I believe it’s a combination of 3 factors that cause the problem:
The size of the file (8GB)
The French characters in the cells (like “é”)
The fact that this CSV file is organized like an Excel file; the fields are separated by column, just like an XLS file
When I tried to import the file using:
import pandas as pd
df = pd.read_csv(r'C:\..\data.csv')
I got the following error: OSError: Initializing from file failed
Then, to eliminate the problem about the size, I copy the file (data.csv) and paste it, only keeping the first 25 rows (data2.csv). This is a much lighter file, to eliminate the size problem:
df = pd.read_csv(r'C:\..\data2.csv')
I get the same OSError: Initializing from file failed error.
After some research, I try the following code with Data2.csv
df = pd.read_csv(r'C:\..\data2.csv', sep="\t", encoding="latin")
This time, the import successfully works, but in a weird format, like this: https://imgur.com/a/y6WJHC5. All fields are in the same column.
So this even with the size problem eliminated, it doesn't properly read the csv file. And still, I need to work with the main file, Data.csv. So I try the same code on the initial file (data.csv):
df = pd.read_csv(r'C:\..\data.csv', sep="\t", encoding="latin")
I get: ParserError: Error tokenizing data. C error: out of memory
What is the proper code to read this data.csv properly?
Thank you,
From your image it looks like the file is separated by semi-colons (;). Try using ";" as the sep in the read_csv function.
Pandas reads the csv into ram - an 8GB file could easily exhaust this - try reading the file in chunks. See this answer.

Categories

Resources