Import csv file using pandas and split the resulting strings - python

I want to read this csv file in pandas as a DataFrame. Then I would like to split the resulting strings from colons.
I import using:
df_r = pd.read_csv("report.csv", sep=";|,", engine="python")
Then split using:
for c in df_r:
if df_r[c].dtype == "object":
df_r[c] = df_r[c].str.split(':')
But I get the following error:
ValueError: could not convert string to float: '"\x001\x000\x00'
Any idea what I am doing wrong?
Edit:
The error actually shows when I try to convert one of the strings to a float
print(float(df_r["Laptime"].iloc[0][2]))

I ran your code and everything works fine. You can try catching the error and print the row that has that strange behaviour and manual inspect that.
Is that the entire dump you are using? I saw that you are assigning the csv to the variable a and using df_r afterwards so I think you are doing something else in between.
If the csv file is complete be aware that the last line is empty and create a row full of NaNs. You want to read the csv with skipfooter=1.
a = pd.read_csv('report.csv', sep=";|,", engine="python", skipfooter=1)
Edit:
You can convert it to float like this:
print(float(df_r["Laptime"].iloc[0][2].replace(b'\x00',b'')))

Related

String to pandas dataframe Name too long error

Having a comma separated string.When i export it in a CSV file,and then load the resultant CSV in pandas,i get my desired result. But when I try to load the string directly in pandas dataframe,i get error Filename too big. Please help me solve it.
I found the error. Actually it was showing that behaviour for large strings.
I used
import io
df= pd.read_csv(io.StringIO(str),sep=',', engine = 'python')
It solved the issue..str is the name of the string.

Problems working with a column of a CSV file

I am reading a CSV file using this piece of code:
import pandas as pd
import os
#Open document (.csv format)
path=os.getcwd()
database=pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0)
#Date in the requiered format
size=len(database.Date)
I get the next error: 'DataFrame' object has no attribute 'Date'
As you can see in the image, the first column of mydocument.csv is called Date. This is weird because I used this same procedure to work with this document, and it worked.
Try using delimeter=',' . It must be a comma.
(Can't post comments yet, but think I can help).
The explanation for your problem is, simply, you don't have a column called Date. Has pandas interpretted Date as an index column?
IF your Date is spelled correctly (no trailing whitespace or something else that might confuse things), then try this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=False)
or if that fails this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=None)
In my experience, I've had pandas unexpectedly infer an index_col.

Saving DataFrame to csv but output cells type becomes number instead of text

import pandas as pd
check = pd.read_csv('1.csv')
nocheck = check['CUSIP'].str[:-1]
nocheck = nocheck.to_frame()
nocheck['CUSIP'] = nocheck['CUSIP'].astype(str)
nocheck.to_csv('NoCheck.csv')
This works but while writing the csv, a value for an identifier like 0003418 (type = str) converts to 3418 (type = general) when the csv file is opened in Excel. How do I avoid this?
I couldn't find a dupe for this question, so I'll post my comment as a solution.
This is an Excel issue, not a python error. Excel autoformats numeric columns to remove leading 0's. You can "fix" this by forcing pandas to quote when writing:
import csv
# insert pandas code from question here
# use csv.QUOTE_ALL when writing CSV.
nocheck.to_csv('NoCheck.csv', quoting=csv.QUOTE_ALL)
Note that this will actually put quotes around each value in your CSV. It will render the way you want in Excel, but you may run into issues if you try to read the file some other way.
Another solution is to write the CSV without quoting, and change the cell format in Excel to "General" instead of "Numeric".

Convert dataframe column into integers from within loop

I'm trying loop through a folder of csv's and put them into a dataframe, change certain columns into an integer, before passing them through a Django model. Here is my code:
import glob
import pandas as pd
path = 'DIV1FCS_2017/*/*'
for fname in glob.glob(path):
df = pd.read_csv(fname)
df['Number'].apply(pd.to_numeric)
I am receiving the following: ValueError: Unable to parse string
Does anybody know if I can convert a column of strings into integers using pd.to_numeric from within a loop? Outside of the loop it seems to work properly.
I think you probably have some non-numbers data stored in your dataframe, and that's what's casuing the error.
You can examine your data and make sure everything's fine. In the meantime, you can also do pd.to_numeric(errors="ignore") to ignore errors for now.

Python: Using Pandas library. How to keep quotes on text?

I'm using the following code of Python using the Pandas library. The purpose of the code is to join 2 CSV files and works as exptected. In the CSV files all the values are within "". When using the Pandas libray they dissapear. I wonder what I can do to keep them? I have read the documentation and tried lots of options but can't seem to get it right.
Any help is much appreciated.
Code:
import pandas
csv1 = pandas.read_csv('WS-Produktlista-2015-01-25.csv', quotechar='"',comment='"')
csv2 = pandas.read_csv('WS-Prislista-2015-01-25.csv', quotechar='"', comment='"')
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False)
Instead of getting a line like this:
"1","Cologne","4711","4711","100ml",
I'm getting:
1,Cologne,4711,4711,100ml,
EDIT:
I now found the problem. My files contains a header with 16 columns. The data lines contains 16 values separated with ",".
Just found that some lines contains values within "" that contains ",". This is confusing the parser. Instead of expecting 15 commas, it finds 18. One example below:
"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup",**"7,2g"**,"W","Decorative range","5x**1,2**g Eye Shadow + **1,2**g Powder","http://image.jpg","","3660732000104","","No","","1","1"
How can make the parser ignore the comma sign within ""?

Categories

Resources