Only half of my CSV is encoded

Only half of my CSV is encoded - python

I'm importing a cvs file into pandas and when I do the first few names are encoded correctly then further down the accents turn back into symbols. It's a pretty large file with almost 200 names. Is there anything I can do to fix this issue.
import sys
import codecs
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
#%matplotlib inline
from matplotlib.pylab import rcParams
sys.stdout = codecs.getwriter( "ISO-8859-1" )( sys.stdout.detach() )
rcParams['figure.figsize'] = 15, 6
data = pd.read_csv('IndNames.csv', encoding='ISO-8859-1')
pd.get_option("display.max_rows")
pd.set_option('expand_frame_repr', False)
pd.set_option('display.height', 500)
data.align(data, axis=1)
print(data.head(n=182))
Ex: José
JosÃ©
Edit: ftfy does not work with dataframes
Edit1: I can't figure out the problem when I save it to a csv file everything is normal then when I use pd.read_csv to use it again it's unencoded.

sys.stdout = codecs.getwriter( "UTF-8" )( sys.stdout.detach() )
Simple fix and I don't know why it didn't work before when I tried it but this did the trick

Related

Is there a way to download a sample CSV file

I used a sample of a csv program to do some tables on Jupiter notebook, I now need to download that sample csv file so I can look at it in excel, is there a way I can download the sample
I need to download lf if possible.
Here is my code:
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import io
import requests
df = pd.read_csv("diamonds.csv")
lf = df.sample(5000, random_state=999)
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.style.use("seaborn")
lf.sample(5000, random_state=999)'''

You first need to convert the sample to a dataframe and then you can export it.
dframe.to_csv(“file_name.csv”)
Let me know if it works.

Answer from here:
import urllib.request
urllib.request.urlretrieve("http://jupyter.com/diamond.csv", "diamond.csv")

if what you mean by download is exporting the dataframe to spreadsheet format, pandas have the function
import pandas as pd
df = pd.read_csv("diamond.csv")
# do your stuff
df.to_csv("diamond2.csv") # if you want to export to csv with different name
df.to_csv("folder/diamond2.csv") # if you want to export to csv inside existed folder
df.to_excel("diamond2.xlsx") # if you want to export to excel
The file will appear on the same directory as your jupyter notebook.
You can also specify the directory
df.to_csv('D:/folder/diamond.csv')
to check where is your current work directory, you can use
import os
print(os.getcwd())

How do i import datasets in Python?

I try to import some datasets in my code. I need help, because I tried a lot of tutorials and web pages and I am still gettting errors. I use Spyder IDE and python 3.7:
import numpy as np
import pandas as pd
import tensorflow as tf
import os
dts1=pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
dts1

This works for me. If you are still experiencing errors, please post them.
import pandas as pd
# Read data from file 'sample_submission.csv'
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later)
data = pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
# Preview the first 5 lines of the loaded data
print(data.head())

Try using other approaches :
pd.read_csv("C:\\Users\\Cucu\\Desktop\\sample_submission.csv")
pd.read_csv("C:/Users/Cucu/Desktop/sample_submission.csv")

python/pandas "Kernel died, restarting" while loading a csv file

While trying to load a big csv file (150 MB) I get the error "Kernel died, restarting". Then only code that I use is the following:
import pandas as pd
from pprint import pprint
from pathlib import Path
from datetime import date
import numpy as np
import matplotlib.pyplot as plt
basedaily = pd.read_csv('combined_csv.csv')
Before it used to work, but I do not know why it is not working anymore. I tried to fixed it using engine="python" as follows:
basedaily = pd.read_csv('combined_csv.csv', engine='python')
But it gives me an error execution aborted.
Any help would be welcome!
Thanks in advance!

It may be because of the lack of memory you got this error. You can split your data in many data frames, do your work than you can re merge them, below some useful code that you may use:
import pandas as pd
# the number of row in each data frame
# you can put any value here according to your situation
chunksize = 1000
# the list that contains all the dataframes
list_of_dataframes = []
for df in pd.read_csv('combined_csv.csv', chunksize=chunksize):
# process your data frame here
# then add the current data frame into the list
list_of_dataframes.append(df)
# if you want all the dataframes together, here it is
result = pd.concat(list_of_dataframes)

CSV read error in Python using Panda

I want to read a csv file
import pandas as pd
import numpy as np
import matplotlib as plt
from pandas import DataFrame
df = pd.read_csv(r'C:\Andy\DataScience\python\Loan_Prediction\Train.csv')
df.head(10)
But getting error as below
IOError: File Train.csv does not exist
But the file does exist in the location.

If using backslash, because it is a special character in Python, you must remember to escape every instance
import pandas as pd
import numpy as np
import matplotlib as plt
from pandas import DataFrame
df = pd.read_csv(r'C:\\Andy\\DataScience\\python\\Loan_Prediction\\Train.csv')
df.head(10)

your read_csv could not find the path for reading the csv you have to give forward slashes
import pandas as pd
import numpy as np
import matplotlib as plt
from pandas import DataFrame
df = pd.read_csv('C:/Andy/DataScience/python/Loan_Prediction/Train.csv')
if it again gives error then just double the slashes to avoid any special character.
df = pd.read_csv('C://Andy//DataScience//python//Loan_Prediction//Train.csv')
df.head(10)

How do I put a file path variable into pandas.read_csv?

I tried to apply it through os.environ like so:
import os
import pandas as pd
os.environ["FILE"] = "File001"
df = pd.read_csv('/path/$FILErawdata.csv/')
But pandas doesn't recognize $FILE and instead gives me $FILErawdata.csv not found
Is there an alternative way to do this?

New Answer:
If you like string interpolation, python now uses f-strings for string interpolation:
import os
import pandas as pd
filename = "File001"
df = pd.read_csv(f'/path/{filename}rawdata.csv/')
Old Answer:
Python doesn't use variables like shells scripts do. Variables don't get automatically inserted into strings.
To do this, you have to create a string with the variable inside.
Try this:
import os
import pandas as pd
filename = "File001"
df = pd.read_csv('/path/' + filename + 'rawdata.csv/')

df = pd.read_csv('/path/%(FILE)srawdata.csv' % os.environ)
I suspect you need to remove the trailing '/'.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Only half of my CSV is encoded - python

sys.stdout = codecs.getwriter( "UTF-8" )( sys.stdout.detach() ) Simple fix and I don't know why it didn't work before when I tried it but this did the trick

Related

Is there a way to download a sample CSV file

How do i import datasets in Python?

python/pandas "Kernel died, restarting" while loading a csv file

CSV read error in Python using Panda

How do I put a file path variable into pandas.read_csv?

Categories

Resources