I am trying to read the following .csv file, but I want to read each column of it. However, usecols is not working as it is giving the following error:
ValueError: Usecols do not match columns, columns expected but not found: ['sources', 'RMS']
this is how I am reading it:
train=pd.read_csv("parameters.csv", usecols = ['sources','RMS'])
And this is my csv file:
how can I read each column of this file?
edit: I had an unclose " but the problem persists
If you want to read each column, just remove the usecols part like so: train=pd.read_csv("parameters.csv"). I'm guessing the reason it doesn't work is that your columns have spaces after the names, so the actual name of one of your columns is something like sources .
It might be an error on your part, but you have unclosed quotation marks in your code snippet.
If the order of the columns will always be the same, you can also use an integer-list with usecols
df = pd.read_csv('file.csv',usecols=[0,4] #this selects just 0 and 4
Related
I am trying to print a single column name and the corresponding values for that column in Python from a CSV file using pandas. I am able to print the column names, but when I then try to print just one of the columns with the following code:
import pandas as pd
df = pd.read_csv('pokemon_data.csv')
print(df['name'])
I then get these errors:
Updated: I see that the error is a "key error" however a key named "Name" should exist as it does when I run:
print(df.columns)
with output:
Index(['#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary'], dtype='object')
"KeyError : name" It seems like you don't have a name column.
To extend my comment below your question - your next issue is that the output of print(df.columns), which is Index(['#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary'], dtype='object'), is indicative that you only have 1 column.
The name of that 1 column is '#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary'. Maybe there is an issue with your .csv file, or perhaps messing with the read_csv settings (see docs here) may help. For example, changing the separator/delimiter to whatever your .csv file is using.
I tried to get some basic statistics of my columns in csv file, but apparently, I can't even get the contents of the columns in my output.
I tried data['columnname']
import pandas as p
data = p.read_csv('Amazon.csv',delimiter='~}',na_values='nan')
data.columns
data['Title']
I expect to get the contents of 'Title' in my output
Without knowing the exact format of the .csv it's a bit hard, but hopefully something like this helps:
data = pd.read_csv("Amazon.csv", ... , header=0)
Setting header=0 will read the first line of the file and make it the column names.
If names aren't defined by the .csv you can either add them to the first line and use header=0 or use names=<array-like>.
data = pd.read_csv("Amazon.csv", ... , names=['Title',...,'LastCol'])
See: Pandas Docs for read_csv
I'm importing an .xlsx file with pd.read_excel(). I received this .xlsx file as an CSV file and used excel to seperate it by comma so I get the proper .xlsx file with columns etc. Six of the dataframe columns have a number as header (e.g. 5030, 5031,...). When I want to change the column name with df = df.rename(columns={...}) this does not work. Also df["5030"] does not work, it throws an error: KeyError:'5030'. This code works for columns which have regular/non-integer names.
However, when I import the raw .csv file with pd.read_csv(), all the code above does work. I can just rename column names. The df's do look exactly the same when imported with both techniques, but apparently something is different.
It is not a serious issue as I can change the column name to non-integers manually in excel, but I'm very curious about what the underlying "problem" is here and how these two function operate in a different way.
Thanks!
I have a tsv file which I am trying to read by the help of pandas. The first two rows of the files are of no use and needs to be ignored. Although, when I get the output, I get it in the form of two columns. The name of the first column is Index and the name of second column is a random row from the csv file.
import pandas as pd
data = pd.read_csv('zahlen.csv', sep='\t', skiprows=2)
Please refer to the screenshot below.
The second column name is in bold black, which is one of the row from the file. Moreover, using '\t' as delimiter does not separate the values in different column. I am using Spyder IDE for this. Am I doing something wrong here?
Try this:
data = pd.read_table('zahlen.csv', header=None, skiprows=2)
read_table() is more suited for tsv files and read_csv() is a more specialized version of it. Then header=None will make first row data, instead of header.
I'm using the following code of Python using the Pandas library. The purpose of the code is to join 2 CSV files and works as exptected. In the CSV files all the values are within "". When using the Pandas libray they dissapear. I wonder what I can do to keep them? I have read the documentation and tried lots of options but can't seem to get it right.
Any help is much appreciated.
Code:
import pandas
csv1 = pandas.read_csv('WS-Produktlista-2015-01-25.csv', quotechar='"',comment='"')
csv2 = pandas.read_csv('WS-Prislista-2015-01-25.csv', quotechar='"', comment='"')
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False)
Instead of getting a line like this:
"1","Cologne","4711","4711","100ml",
I'm getting:
1,Cologne,4711,4711,100ml,
EDIT:
I now found the problem. My files contains a header with 16 columns. The data lines contains 16 values separated with ",".
Just found that some lines contains values within "" that contains ",". This is confusing the parser. Instead of expecting 15 commas, it finds 18. One example below:
"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup",**"7,2g"**,"W","Decorative range","5x**1,2**g Eye Shadow + **1,2**g Powder","http://image.jpg","","3660732000104","","No","","1","1"
How can make the parser ignore the comma sign within ""?