I have a csv column heading:
"Submission S
tatus"
csv headers:
Unit,Publication ID,Title,"Submission S
tatus",Notes,Name,User ID
How can I refer to this when reading into the dataframe with the usecols parameter (or alternatively when renaming at a later stage)?
I have tried:
df = pd.read_csv('myfile.csv', usecols = ['Submission S\ntatus']
error: Usecols do not match columns, columns expected but not found
df = pd.read_csv('myfile.csv', usecols = ['Submission S\rtatus']
error: Usecols do not match columns, columns expected but not found
df = pd.read_csv('myfile.csv', usecols = ['Submission S
tatus']
error: SyntaxError: EOL while scanning string literal
How should I be referring to this column?
This is not the answer you wanted, but I hope it will help you if you want any workaround for this.
df = pd.read_csv('myfile.csv', usecols = [n])
df.rename(columns={df.columns[n]: "new column name"}, inplace=True)
# n is your column postion
You can read a csv file using the traditonal way of statements:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name
You can save the column names by
colnames = df.Names
Later replace the name of your specified column using a meaningful word
Related
I want the x to be all the columns except "churn" column.
But when i do the below i get the "['churn'] not found in axis" error, eventhough i can see the column name when i write "print(list(df.column))"
Here is my code:
import pandas as pd
import numpy as np
df = pd.read_csv("/Users/utkusenel/Documents/Data Analyzing/data.csv", header=0)
print(df.head())
print(df.columns)
print(len(df.columns))
x = df.drop(["churn"], axis=1) ## this is the part it gives the error
I am adding a snippet of my dataset as well:
account_length;area_code;international_plan;voice_mail_plan;number_vmail_messages;total_day_minutes;total_day_calls;total_day_charge;total_eve_minutes;total_eve_calls;total_eve_charge;total_night_minutes;total_night_calls;total_night_charge;total_intl_minutes;total_intl_calls;total_intl_charge;number_customer_service_calls;churn;
1;KS;128;area_code_415;no;yes;25;265.1;110;45.07;197.4;99;16.78;244.7;91;11.01;10;3;2.7;1;no
2;OH;107;area_code_415;no;yes;26;161.6;123;27.47;195.5;103;16.62;254.4;103;11.45;13.7;3;3.7;1;no
3;NJ;137;area_code_415;no;no;0;243.4;114;41.38;121.2;110;10.3;162.6;104;7.32;12.2;5;3.29;0;no
I see that your df snippet is separeted with ';' (semi colon). If that is what your actual data looks like, then probably your csv is being read wrong. Please try adding sep=';' to read_csv function.
df = pd.read_csv("/Users/utkusenel/Documents/Data Analyzing/data.csv", header=0, sep=';')
I also suggest print df.columns again and check if there is a leading or trailing whitespace in the column name for churn.
I'm reading .txt files in a directory and want to drop columns that contains some certain string.
for file in glob.iglob(files + '.txt', recursive=True):
cols = list(pd.read_csv(file, nrows =1))
df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =[i for i in cols if i.str.contains['TRIVIAL|EASY']==False])
when I do this I'm getting
df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =[i for i >in cols if i.str.contains['PASS']==True])
AttributeError: 'str' object has no attribute 'str'
Which part I need tp fix I could not figured it out ?
select columns based on columns names containing a specific string in pandas
drop column based on a string condition
AttributeError: 'str' object has no attribute 'str'
Drop multiple columns that end with certain string in Pandas
Without reading the header separately you would pass a callable to usecols. Check whether 'EASY' or 'TRIVIAL' are not in the column name.
exclu = ['EASY', 'TRIVIAL'] # Any substring in this list excludes a column
usecols = lambda x: not any(substr in x for substr in exclu)
df = pd.read_csv('test.csv', usecols=usecols)
print(df)
HARD MEDIUM
0 2 4
1 6 8
2 1 1
Sample Data: test.csv
TRIVIAL,HARD,EASYfoo,MEDIUM
1,2,3,4
5,6,7,8
1,1,1,1
few issues in your code, first you are using str.contains on the whole dataframe not the columns, secondly the str contains cannot be used on a list.
using regex
import re
cols = pd.read_csv(file, nrows =1)
cols_to_use = [i for i in cols.columns if not re.search('TRIVIAL|EASY',i)]
df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =cols_to_use)
I have a multi header excel sheet without any index column. When I read the excel in pandas, it treats first column as an index. I want pandas to create an index instead of treating 1st column as an index. Any help would be appreciated.
I tried below code:
df = pd.read_excel(file, header=[1,2], sheetname= "Ratings Inputs", parse_cols ="A:AA", index_col=None)
From my tests, read_csv seems broken with a multi_line header: when index_col is absent or None, it behaves as is it was 0.
You have 2 possible workarounds here:
reset_index as suggested by #mounaim:
df = pd.read_excel(file, header=[1,2], sheetname= "Ratings Inputs",
parse_cols ="A:AA", index_col=None).reset_index()
It is almost correct except that the header for first columns are used to name the MultiIndex df.columns and the first column is named `('index', ''). So you must re-create it:
df.columns = pd.MultiIndex.from_tuples([tuple(df.columns.names)]
+ list(df.columns)[1:])
Read separetely the headers
head = pd.read_excel('3x3.xlsx', header=None, sheetname= "Ratings Inputs",
parse_cols ="A:AA", skiprows=1, nrows=2)
df = pd.read_excel(file, header=2, sheetname= "Ratings Inputs",
parse_cols ="A:AA", index_col=None).reset_index()
df.columns = pd.MultiIndex.from_tuples(list(head.transpose().to_records(index=False)))
Have you tried reset_index() :
your_data_frame.reset_index(drop=True,inplace=True)
Question is quite self explanatory.Is there any way to read the csv file to read the time series data skipping first column.?
I tried this code:
df = pd.read_csv("occupancyrates.csv", delimiter = ',')
df = df[:,1:]
print(df)
But this is throwing an error:
"TypeError: unhashable type: 'slice'"
If you know the name of the column just do:
df = pd.read_csv("occupancyrates.csv") # no need to use the delimiter = ','
df = df.drop(['your_column_to_drop'], axis=1)
print(df)
df = pd.read_csv("occupancyrates.csv")
df.pop('column_name')
dataframe is like a dictionary, where column names are keys & values are the column items. For Ex
d = dict(a=1,b=2)
d.pop('a')
Now if you print d, the output will be
{'b': 2}
This is what I have done above to remove a column out of data frame.
By doing this way you need not to assign it back to dataframe like other answer(s)
df = df.iloc[:, 1:]
Or you don't even need to specify inplace=True anywhere
The simplest way to delete the first column should be:
del df[df.columns[0]]
or
df.pop(df.columns[0])
I want to read from a CSV file using pandas read_csv. The CSV file doesn't have column names. When I use pandas to read the CSV file, the first row is set as columns by default. But when I use df.columns = ['ID', 'CODE'], the first row is gone. I want to add, not replace.
df = pd.read_csv(CSV)
df
a 55000G707270
0 b 5l0000D35270
1 c 5l0000D63630
2 d 5l0000G45630
3 e 5l000G191200
4 f 55000G703240
df.columns=['ID','CODE']
df
ID CODE
0 b 5l0000D35270
1 c 5l0000D63630
2 d 5l0000G45630
3 e 5l000G191200
4 f 55000G703240
I think you need parameter names in read_csv:
df = pd.read_csv(CSV, names=['ID','CODE'])
names : array-like, default None
List of column names to use. If file contains no header row, then you should explicitly pass header=None. Duplicates in this list are not allowed unless mangle_dupe_cols=True, which is the default.
You may pass the column names at the time of reading the csv file itself as :
df = pd.read_csv(csv_path, names = ["ID", "CODE"])
Use names argument in function call to add the columns yourself:
df = pd.read_csv(CSV, names=['ID','CODE'])
you need both: header=None and names=['ID','CODE'], because there are no column names/labels/headers in your CSV file:
df = pd.read_csv(CSV, header=None, names=['ID','CODE'])
The reason there are extra index columns add is because to_csv() writes an index per default, so you can either disable index when saving your CSV:
df.to_csv('file.csv', index=False)
or you can specify an index column when reading:
df = pd.read_csv('file.csv', index_col=0)