I am trying to print a single column name and the corresponding values for that column in Python from a CSV file using pandas. I am able to print the column names, but when I then try to print just one of the columns with the following code:
import pandas as pd
df = pd.read_csv('pokemon_data.csv')
print(df['name'])
I then get these errors:
Updated: I see that the error is a "key error" however a key named "Name" should exist as it does when I run:
print(df.columns)
with output:
Index(['#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary'], dtype='object')
"KeyError : name" It seems like you don't have a name column.
To extend my comment below your question - your next issue is that the output of print(df.columns), which is Index(['#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary'], dtype='object'), is indicative that you only have 1 column.
The name of that 1 column is '#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary'. Maybe there is an issue with your .csv file, or perhaps messing with the read_csv settings (see docs here) may help. For example, changing the separator/delimiter to whatever your .csv file is using.
Related
I have been trying to rename the column name in a csv file which I have been working on through Google-Colab. But the same line of code is working on one column name and is also not working for the other.
import pandas as pd
import numpy as np
data = pd.read_csv("Daily Bike Sharing.csv",
index_col="dteday",
parse_dates=True)
dataset = data.loc[:,["cnt","holiday","workingday","weathersit",
"temp","atemp","hum","windspeed"]]
dataset = dataset.rename(columns={'cnt' : 'y'})
dataset = dataset.rename(columns={"dteday" : 'ds'})
dataset.head(1)
The Image below is the dataframe called data
The Image below is dataset
This image is the final output which I get when I try to rename the dataframe.
The column name "dtedate" is not getting renamed but "cnt" is getting replaced "y" by the same code. Can someone help me out, I have been racking my brain on this for sometime now.
That's because you're setting dteday as your index, upon reading in the csv, whereas cnt is quite simply a column. Avoid the index_col attribute in read_csv and instead perform dataset = dataset.set_index('ds') after renaming.
An alternative in which only your penultimate line (trying to rename the index) would need to be changed:
dataset.index.names = ['ds']
You can remove the 'index-col' in the read statement, include 'dtedate' in your dataset and then change the column name. You can make the column index using df.set_index later.
I am reading a CSV file using this piece of code:
import pandas as pd
import os
#Open document (.csv format)
path=os.getcwd()
database=pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0)
#Date in the requiered format
size=len(database.Date)
I get the next error: 'DataFrame' object has no attribute 'Date'
As you can see in the image, the first column of mydocument.csv is called Date. This is weird because I used this same procedure to work with this document, and it worked.
Try using delimeter=',' . It must be a comma.
(Can't post comments yet, but think I can help).
The explanation for your problem is, simply, you don't have a column called Date. Has pandas interpretted Date as an index column?
IF your Date is spelled correctly (no trailing whitespace or something else that might confuse things), then try this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=False)
or if that fails this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=None)
In my experience, I've had pandas unexpectedly infer an index_col.
I am trying to read the following .csv file, but I want to read each column of it. However, usecols is not working as it is giving the following error:
ValueError: Usecols do not match columns, columns expected but not found: ['sources', 'RMS']
this is how I am reading it:
train=pd.read_csv("parameters.csv", usecols = ['sources','RMS'])
And this is my csv file:
how can I read each column of this file?
edit: I had an unclose " but the problem persists
If you want to read each column, just remove the usecols part like so: train=pd.read_csv("parameters.csv"). I'm guessing the reason it doesn't work is that your columns have spaces after the names, so the actual name of one of your columns is something like sources .
It might be an error on your part, but you have unclosed quotation marks in your code snippet.
If the order of the columns will always be the same, you can also use an integer-list with usecols
df = pd.read_csv('file.csv',usecols=[0,4] #this selects just 0 and 4
this is my first question so sorry in advance if I make some explanation mistakes.
I'm coding in python 2.7.
I wrote a .xlsx (Excel) file (it could have been a .xls, I don't really need the macro + VBA at this point). The Excel file looks like this:
The values are linked with the name of the column and the name of the line. For example, I have a column named "Curve 1" and a line named "Number of extremum". So in that cell I wrote "1" if the curve1 has 1 extremum.
I want to take this value in order to manipule it in a python script.
I know I can use xlrd module with open workbook and put the values of the line 1 ("Number of extremum") in a list and then only take the first one (corresponding to the column "Curve 1" and so to the value "1" I want), but this isn't what I would like to have.
Instead, I would like to access the "1" cell value by only giving to the python script the strings "Curve 1" and "Number of extremum" and python would access to the cell at the meeting of the two and take its value : "1". Is it possible ?
I would like to do this because the Excel file would change in time and cells could be moved. So if I try to access cell value by it's "position number" (like line 1, column 1), I would have a problem if a column or a line is added at this position. I would like not to have to edit again the python script if there's some editing in the xlsx file.
Thank you very much.
Pandas is a popular 3rd party library for reading/writing datasets. You can use pd.DataFrame.at for efficient scalar access via row and column labels:
import pandas as pd
# read file
df = pd.read_excel('file.xlsx')
# extract value
val = df.at['N of extremum', 'Curve 1']
This is very easy using Pandas. To obtain the cell you want you can just use loc which allows you to specify the row and column just like you want.
import pandas
df = pandas.read_excel('test.xlsx')
df.loc['N of extremum', 'Curve 1']
I have a tsv file which I am trying to read by the help of pandas. The first two rows of the files are of no use and needs to be ignored. Although, when I get the output, I get it in the form of two columns. The name of the first column is Index and the name of second column is a random row from the csv file.
import pandas as pd
data = pd.read_csv('zahlen.csv', sep='\t', skiprows=2)
Please refer to the screenshot below.
The second column name is in bold black, which is one of the row from the file. Moreover, using '\t' as delimiter does not separate the values in different column. I am using Spyder IDE for this. Am I doing something wrong here?
Try this:
data = pd.read_table('zahlen.csv', header=None, skiprows=2)
read_table() is more suited for tsv files and read_csv() is a more specialized version of it. Then header=None will make first row data, instead of header.