Having a comma separated string.When i export it in a CSV file,and then load the resultant CSV in pandas,i get my desired result. But when I try to load the string directly in pandas dataframe,i get error Filename too big. Please help me solve it.
I found the error. Actually it was showing that behaviour for large strings.
I used
import io
df= pd.read_csv(io.StringIO(str),sep=',', engine = 'python')
It solved the issue..str is the name of the string.
Related
I am using Pandas to Convert CSV to Parquet and below is the code, it is straight Forward.
import pandas as pd
df = pd.read_csv('path/xxxx.csv')
print(df)
df.to_parquet('path/xxxx.parquet')
Problem
In a String for Example :- David,Johnson. If there is a , getting error saying there is a problem in the data.
If i remove the , the CSV File is converting to Parquet.
Any suggesions, need help
Thanks
Madhu
If i remove the , the CSV File is converting to Parquet
Do you need to keep comma in the name of the file? Otherwise you can do input='David,Johnson', output=input.replace(',','_'). I don't think it is generally a good practice to have comma in your file names.
I am reading a CSV file using this piece of code:
import pandas as pd
import os
#Open document (.csv format)
path=os.getcwd()
database=pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0)
#Date in the requiered format
size=len(database.Date)
I get the next error: 'DataFrame' object has no attribute 'Date'
As you can see in the image, the first column of mydocument.csv is called Date. This is weird because I used this same procedure to work with this document, and it worked.
Try using delimeter=',' . It must be a comma.
(Can't post comments yet, but think I can help).
The explanation for your problem is, simply, you don't have a column called Date. Has pandas interpretted Date as an index column?
IF your Date is spelled correctly (no trailing whitespace or something else that might confuse things), then try this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=False)
or if that fails this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=None)
In my experience, I've had pandas unexpectedly infer an index_col.
I'm trying loop through a folder of csv's and put them into a dataframe, change certain columns into an integer, before passing them through a Django model. Here is my code:
import glob
import pandas as pd
path = 'DIV1FCS_2017/*/*'
for fname in glob.glob(path):
df = pd.read_csv(fname)
df['Number'].apply(pd.to_numeric)
I am receiving the following: ValueError: Unable to parse string
Does anybody know if I can convert a column of strings into integers using pd.to_numeric from within a loop? Outside of the loop it seems to work properly.
I think you probably have some non-numbers data stored in your dataframe, and that's what's casuing the error.
You can examine your data and make sure everything's fine. In the meantime, you can also do pd.to_numeric(errors="ignore") to ignore errors for now.
I want to read this csv file in pandas as a DataFrame. Then I would like to split the resulting strings from colons.
I import using:
df_r = pd.read_csv("report.csv", sep=";|,", engine="python")
Then split using:
for c in df_r:
if df_r[c].dtype == "object":
df_r[c] = df_r[c].str.split(':')
But I get the following error:
ValueError: could not convert string to float: '"\x001\x000\x00'
Any idea what I am doing wrong?
Edit:
The error actually shows when I try to convert one of the strings to a float
print(float(df_r["Laptime"].iloc[0][2]))
I ran your code and everything works fine. You can try catching the error and print the row that has that strange behaviour and manual inspect that.
Is that the entire dump you are using? I saw that you are assigning the csv to the variable a and using df_r afterwards so I think you are doing something else in between.
If the csv file is complete be aware that the last line is empty and create a row full of NaNs. You want to read the csv with skipfooter=1.
a = pd.read_csv('report.csv', sep=";|,", engine="python", skipfooter=1)
Edit:
You can convert it to float like this:
print(float(df_r["Laptime"].iloc[0][2].replace(b'\x00',b'')))
I'm using the following code of Python using the Pandas library. The purpose of the code is to join 2 CSV files and works as exptected. In the CSV files all the values are within "". When using the Pandas libray they dissapear. I wonder what I can do to keep them? I have read the documentation and tried lots of options but can't seem to get it right.
Any help is much appreciated.
Code:
import pandas
csv1 = pandas.read_csv('WS-Produktlista-2015-01-25.csv', quotechar='"',comment='"')
csv2 = pandas.read_csv('WS-Prislista-2015-01-25.csv', quotechar='"', comment='"')
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False)
Instead of getting a line like this:
"1","Cologne","4711","4711","100ml",
I'm getting:
1,Cologne,4711,4711,100ml,
EDIT:
I now found the problem. My files contains a header with 16 columns. The data lines contains 16 values separated with ",".
Just found that some lines contains values within "" that contains ",". This is confusing the parser. Instead of expecting 15 commas, it finds 18. One example below:
"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup",**"7,2g"**,"W","Decorative range","5x**1,2**g Eye Shadow + **1,2**g Powder","http://image.jpg","","3660732000104","","No","","1","1"
How can make the parser ignore the comma sign within ""?