Pandas.read_csv() with special characters (accents) in column names � - python

I have a csv file that contains some data with columns names:
"PERIODE"
"IAS_brut"
"IAS_lissé"
"Incidence_Sentinelles"
I have a problem with the third one "IAS_lissé" which is misinterpreted by pd.read_csv() method and returned as �.
What is that character?
Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?
In [1]: import pandas as pd
In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns
Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss�', u'Incidence_Sentinelles'], dtype='object')

I found the same problem with spanish, solved it with with "latin1" encoding:
import pandas as pd
pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1')
Hope it helps!

You can change the encoding parameter for read_csv, see the pandas doc here. Also the python standard encodings are here.
I believe for your example you can use the utf-8 encoding (assuming that your language is French).
df = pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='utf-8')
Here's an example showing some sample output. All I did was make a csv file with one column, using the problem characters.
df = pd.read_csv('sample.csv', encoding='utf-8')
Output:
IAS_lissé
0 1
1 2
2 3

Try using:
import pandas as pd
df = pd.read_csv('file_name.csv', encoding='utf-8-sig')

Using utf-8 didn't work for me. E.g. this piece of code:
bla = pd.DataFrame(data = [1, 2])
bla.to_csv('funkyNamé , things.csv')
blabla = pd.read_csv('funkyNamé , things.csv', delimiter=";", encoding='utf-8')
blabla
Ultimately returned: OSError: Initializing from file failed
I know you said you didn't want to modify the file. If you meant the file content vs the filename, I would rename the file to something without an accent, read the csv file under its new name, then reset the filename back to its original name.
originalfilepath = r'C:\Users\myself\\funkyNamé , things.csv'
originalfolder = r'C:\Users\myself'
os.rename(originalfilepath, originalFolder+"\\tempName.csv")
df = pd.read_csv(originalFolder+"\\tempName.csv", encoding='ISO-8859-1')
os.rename(originalFolder+"\\tempName.csv", originalfilepath)
If you did mean "without modifying the filename, my apologies for not being helpful to you, and I hope this helps someone else.

Related

Cannot read content in CSV File in Pandas

I have a dataset from the State Security Department in my county that has some problems.
I can't read the records at all from the file that is made available in CSV, bringing up only empty records. When I convert the file to XLSX it does get read.
I would like to know if there is any possible solution to the above problem.
The dataset is available at: here or here.
I tried the code below, but i only get nulls, except for the first row in the first column:
df = pd.read_csv('mensal_ss.csv', sep=';', names=cols, encoding='latin1')
image
Thank you!
If you try with utf-16 as the encoding, it seems to work. However, note that the year rows complicates the parsing, so you may need some extra manipulation of the csv to circumvent that depending on what you want to do with the data
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16')
try to use 'utf-16-le':
import pandas as pd
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16-le')
print(df.head())

How to write text files with sys.args[] as part of the filename?

I am pretty new to Python and I am trying to filter some rows in a dataframe based on whether they contain strings or not. I want the script to automatically use the input name to save the filtered dataframe on a text file.
Suppose I read my file with python3 code.py input.txt and my code looks like this:
#!/usr/bin/python3
import pandas as pd
import sys
data = pd.read_csv(sys.argv[1], sep='\t', header=0)
selectedcols = data['Func.refGene']
selectedrows = selectedcols.str.contains("exonic|splicing")
selecteddata = data[selectedrows]
selecteddata.to_csv(f'{sys.argv[1][:-4]}_exonic.splicing.txt', index=None, sep='\t', mode = 'a')
Where 'Func.refGene' is the column I want to search through for the strings "exonic" and "splicing". I have written this code and it worked before, but now I try to run it and the following error occurs:
File "code.py", line 12
selecteddata.to_csv(f'{sys.argv[1][:-4]}_exonic.splicing.txt', index=None, sep='\t', mode = 'a')
^
SyntaxError: invalid syntax
Would anyone know what could be wrong? I have searched for this syntax and haven't had any success.
Try this for below python 3.6,
selecteddata.to_csv('{0}_exonic.splicing.txt'.format(sys.argv[1][:-4]), index=None, sep='\t', mode = 'a')
f-string supports from python 3.6 https://docs.python.org/3/whatsnew/3.6.html#pep-498-formatted-string-literals

Dataframe Read CSV - Delimiter Not Working

I am attempting to read a CSV via Pandas. The code works fine but the result does not properly separate the data. Below is my code:
df = pd.read_csv('data.csv', encoding='utf-16', sep='\\', error_bad_lines=False)
df.loc[:3]
When I run this the output looks something like this:
Anything I can do to adjust this? All help is appreciated!
Just use \t as sep argument while reading csv file
import pandas as pd
import io
data="""id\tname\temail
1\tJohn\tjohn#example.com
2\tJoe\tjoe#example.com
"""
df = pd.read_csv(io.StringIO(data),sep="\t")
id name email
1 John john#example.com
2 Joe joe#example.com
you dont need IO, its just for example.

write a Pandas dataframe to a .txt file

My code writes a txt file with lots of data. I'm trying to print a pandas dataframe into that txt file as part of my code but can't use .write() as that only accepts strings.
How do I take a pandas dataframe, stored as DF1 for example, and print it in the file?
I've seen similar questions but those are aimed at creating a txt file solely for the dataframe, I would just like my dataframe to appear in a txt file
use the to_string method, and then you can use write with the mode set to append ('a')
tfile = open('test.txt', 'a')
tfile.write(df.to_string())
tfile.close()
Sample Data
import pandas as pd
import numpy as np
df = pd.DataFrame({'id': np.arange(1,6,1),
'val': list('ABCDE')})
test.txt
This line of text was here before.
Code
tfile = open('test.txt', 'a')
tfile.write(df.to_string())
tfile.close()
Output: test.txt
This line of text was here before.
id val
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E
Pandas DataFrames have to_string(), to_json() and to_csv() methods that may be helpful to you, see:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_string.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
Example of writing a text file to a string. Use 'w' flag to write and 'a' to append to a file.
example_string = df1.to_string()
output_file = open('file.txt','a')
output_file.write(example_string)
output_file.close()
If you are only looking to put certain information in the text file, you can do that using either pandas or json methods to select it, etc. and see the docs links above as well.
Before OP commented about appending I originally wrote an example about json. json supports a dump() method to help write to a file. However, in most cases, its not the most ideal format to keep appending output to vs. csv or txt. In case its useful to anyone:
import json
filename = 'file.json'
with open(filename, 'w') as file:
json.dump(df1.to_json(), file)

EmptyDataError: No columns to parse from file

Currently I am getting the below Error and I am tried out the below posts:
Solution 1
Solution 2
But I am not able to get the error resolved. My python code is as below:
import pandas as pd
testdata = pd.read_csv(file_name, header=None, delim_whitespace=True)
I tried to print the value in testdata but it doesn't show any output.
The following is my csvfile:
Firstly, declare your filename inside testdata as a string, and make sure it is either in the local directory, or that you have the correct filepath.
import pandas as pd
testdata = pd.read_csv("filename.csv", header=None, delim_whitespace=True)
If that does not work, post some information about the environment you are using.
First, you probably don't need header=None as you seem to have headers in the file.
Also try removing the blank line between the headers and the first line of data.
Check and double check your file name.

Categories

Resources