Dataframe Read CSV - Delimiter Not Working

Dataframe Read CSV - Delimiter Not Working - python

I am attempting to read a CSV via Pandas. The code works fine but the result does not properly separate the data. Below is my code:
df = pd.read_csv('data.csv', encoding='utf-16', sep='\\', error_bad_lines=False)
df.loc[:3]
When I run this the output looks something like this:
Anything I can do to adjust this? All help is appreciated!

Just use \t as sep argument while reading csv file
import pandas as pd
import io
data="""id\tname\temail
1\tJohn\tjohn#example.com
2\tJoe\tjoe#example.com
"""
df = pd.read_csv(io.StringIO(data),sep="\t")
id name email
1 John john#example.com
2 Joe joe#example.com
you dont need IO, its just for example.

Related

Cannot read content in CSV File in Pandas

I have a dataset from the State Security Department in my county that has some problems.
I can't read the records at all from the file that is made available in CSV, bringing up only empty records. When I convert the file to XLSX it does get read.
I would like to know if there is any possible solution to the above problem.
The dataset is available at: here or here.
I tried the code below, but i only get nulls, except for the first row in the first column:
df = pd.read_csv('mensal_ss.csv', sep=';', names=cols, encoding='latin1')
image
Thank you!

If you try with utf-16 as the encoding, it seems to work. However, note that the year rows complicates the parsing, so you may need some extra manipulation of the csv to circumvent that depending on what you want to do with the data
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16')

try to use 'utf-16-le':
import pandas as pd
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16-le')
print(df.head())

Reading more than 200M rows in pandas

I want to read a csv file that has more than 200M rows and the data look like this:
id_1
date
id_2
Hf23R
01-01-2005
M9R34
There is no null in the data and there is no special character i tried read it the basic way but it crash my computer every time i run the code i cannot do any analysis on it or anything so is there is a way to read via panda in a sufficient way!!
Here is how i read it!
import pandas as pd
path = "data.csv"
df = pd.read_csv(path)

I used the dask and it works perfectly
import dask.dataframe as dd
df = dd.read_csv('data.csv')

using pandas read_excel to read from stdin

Note: I have solve this problem as per below:
I can use to_csv to write to stdout in python / pandas. Something like this works fine:
final_df.to_csv(sys.stdout, index=False)
I would like to read in an actual excel file (not a csv). I want to output CSV, but input xlsx. I have this file
bls_df = pd.read_excel(sys.stdin, sheet_name="MSA_dl", index_col=None)
But that doesn't seem to work. Is it possible to do what I'm trying and, if so, how does one do it?
Notes:
The actual input file is "MSA_M2018_dl.xlsx" which is in the zip file https://www.bls.gov/oes/special.requests/oesm18ma.zip.
I download and extract the datafile like this:
curl -o oesm18ma.zip'https://www.bls.gov/oes/special.requests/oesm18ma.zip'
7z x oesm18ma.zip
I have solved the problem as follows, with script test01.py that reads from stdin and writes to stdout. NOTE the use of sys.stdin.buffer in the read_excel() call.
import sys
import os
import pandas as pd
BLS_DF = pd.read_excel(sys.stdin.buffer, sheet_name="MSA_dl", index_col=None)
BLS_DF.to_csv(sys.stdout, index=False)
I invoke this as:
cat MSA_M2018_dl.xlsx | python3 test01.py
This is a small test program to illustrate the idea while removing complexity. It's not the actual program I'm working on.

Basing on this answer, a possibility would be:
import sys
import pandas as pd
import io
csv = ""
for line in sys.stdin:
csv += line
df = pd.read_csv(io.StringIO(csv))

EmptyDataError: No columns to parse from file

Currently I am getting the below Error and I am tried out the below posts:
Solution 1
Solution 2
But I am not able to get the error resolved. My python code is as below:
import pandas as pd
testdata = pd.read_csv(file_name, header=None, delim_whitespace=True)
I tried to print the value in testdata but it doesn't show any output.
The following is my csvfile:

Firstly, declare your filename inside testdata as a string, and make sure it is either in the local directory, or that you have the correct filepath.
import pandas as pd
testdata = pd.read_csv("filename.csv", header=None, delim_whitespace=True)
If that does not work, post some information about the environment you are using.

First, you probably don't need header=None as you seem to have headers in the file.
Also try removing the blank line between the headers and the first line of data.
Check and double check your file name.

Pandas.read_csv() with special characters (accents) in column names �

I have a csv file that contains some data with columns names:
"PERIODE"
"IAS_brut"
"IAS_lissé"
"Incidence_Sentinelles"
I have a problem with the third one "IAS_lissé" which is misinterpreted by pd.read_csv() method and returned as �.
What is that character?
Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?
In [1]: import pandas as pd
In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns
Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss�', u'Incidence_Sentinelles'], dtype='object')

I found the same problem with spanish, solved it with with "latin1" encoding:
import pandas as pd
pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1')
Hope it helps!

You can change the encoding parameter for read_csv, see the pandas doc here. Also the python standard encodings are here.
I believe for your example you can use the utf-8 encoding (assuming that your language is French).
df = pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='utf-8')
Here's an example showing some sample output. All I did was make a csv file with one column, using the problem characters.
df = pd.read_csv('sample.csv', encoding='utf-8')
Output:
IAS_lissé
0 1
1 2
2 3

Try using:
import pandas as pd
df = pd.read_csv('file_name.csv', encoding='utf-8-sig')

Using utf-8 didn't work for me. E.g. this piece of code:
bla = pd.DataFrame(data = [1, 2])
bla.to_csv('funkyNamé , things.csv')
blabla = pd.read_csv('funkyNamé , things.csv', delimiter=";", encoding='utf-8')
blabla
Ultimately returned: OSError: Initializing from file failed
I know you said you didn't want to modify the file. If you meant the file content vs the filename, I would rename the file to something without an accent, read the csv file under its new name, then reset the filename back to its original name.
originalfilepath = r'C:\Users\myself\\funkyNamé , things.csv'
originalfolder = r'C:\Users\myself'
os.rename(originalfilepath, originalFolder+"\\tempName.csv")
df = pd.read_csv(originalFolder+"\\tempName.csv", encoding='ISO-8859-1')
os.rename(originalFolder+"\\tempName.csv", originalfilepath)
If you did mean "without modifying the filename, my apologies for not being helpful to you, and I hope this helps someone else.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dataframe Read CSV - Delimiter Not Working - python

Related

Cannot read content in CSV File in Pandas

Reading more than 200M rows in pandas

using pandas read_excel to read from stdin

EmptyDataError: No columns to parse from file

Pandas.read_csv() with special characters (accents) in column names �

Categories

Resources