Usecols do not match columns, columns expected but not found csv issue - python

My code for getting all column value from exl is given below :
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
col_list = ["storeName"]
df = pd.read_csv('/home/preety/Downloads/my_store.xlsx',usecols=col_list)
print("Column headings:")
print(df['storeName'])
Error i am getting :
File "/var/www/html/fulfilment-admin/venv/lib/python3.8/site-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
raise ValueError(
ValueError: Usecols do not match columns, columns expected but not found: ['CategoryName']
My excel is given below:
what i exactly want is i want all store_code in a list but when i trying to get it is returning me the error i dont know what i am doing wrong here can any one please help me related this . thanx in advance

To avoid such kind of errors specify separator argument sep= like here: https://stackoverflow.com/a/55514024/12385909
For those who has troubles with usecols= in read_excel() function - you need to specify here excel column names, e.g. usecols=“A:E” or usecols=“A,C,E:F”.

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
col_list = ["storeName"]
df = pd.read_csv('/home/preety/Downloads/my_store.xlsx',usecols=col_list)
print("Column headings:")
print(df['StoreName'])
The title of your column contains a capital "S", thus pandas is unable to locate "storeName" because it doesn't exist.

Related

how to determine the shape of .tsv file through python

I have a .tsv file that looks like this .tsv File structure in MSExcel
I want to determine its shape through pytorch. How Can I do that
I wrote a code
import pandas as pd
df = pd.read_csv(path/to/.tsv)
df.shape
and it output
(13596, 1)
But clearly the shape conflicts the image that I provided. What am I doing wrong?
You need to specify how the data is delimited when using pd.read_csv (unless it is comma separated)
df = pd.read_csv(path/to/.tsv, sep = '\t')
Should load the data correctly.
See: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
Edit: looking at your data you should also specify header=None because you don't have a header row. Ideally also supply a list of column names using the names parameter of pd.read_csv
The issue is you are missing seperator attribute
import pandas as pd
df = pd.read_csv("data/test.txt")
print(df.shape)
Output: (2, 1)
import pandas as pd
df = pd.read_csv("data/test.txt", sep='\t')
print(df.shape)
Output: (2, 3)
So please add sep='\t' to your read_csv
Also If you have a header, you can pass header=0
pd.read_csv("data/test.txt", sep='\t', header=0)
Plz let me know if it helps

Python parsing string value in csv with Panda

I am new to Python and I am trying to read a csv file using pandas but I have a bit of a problem within my csv file.
I have strings which contains commas at the end and this creates an undesired column at towards the end as shown:
This is the raw csv:
For example, on line 14, the green string value ends with a comma and creates a new column which then gives me parsing errors when using this:
import pandas as pd
pd.read_csv("data.csv")
ParserError: Error tokenizing data. C error: Expected 6 fields in line 8, saw 7
Is there a way I can clean up this and merge the last two columns?
You can use np.where to replace APP with the last column where APP is missing, then drop the last column.
import pandas as pd
import numpy as np
df = pd.read_csv("data.csv")
df['APP'] = np.where(df.app.isna(), df[-1], df.APP)
df = df.iloc[:,:-1]

Unable to read a column of an excel by Column Name using Pandas

Excel Sheet
I want to read values of the column 'Site Name' but in this sheet, the location of this tab is not fixed.
I tried,
df = pd.read_excel('TestFile.xlsx', sheet_name='List of problematic Sites', usecols=['Site Name'])
but got value error,
ValueError: Usecols do not match columns, columns expected but not found: ['RBS Name']
The output should be, List of RBS=['TestSite1', 'TestSite2',........]
try reading the excel columns by this
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
for i in df.index:
print(df['Site Name'][i])
You can first check dataframe without mentioning column name while reading excel file.
Then try to read column names.
Code is as below
import pandas as pd
df = pd.read_excel('TestFile.xlsx', sheet_name='List of problematic Sites')
print(df.head)
print(df.columns)

Unique Values Excel Column, no missing info in rows - Python

Currently self-teaching Python and running into some issues. My challenge requires me to count the number of unique values in a column of an excel spreadsheet in which the rows have no missing values. Here is what I've got so far but I can't seem to get it to work:
import xlrd
import pandas as pd
workbook = xlrd.open_workbook("*name of excel spreadsheet*")
worksheet = workbook.sheet_by_name("*name of specific sheet*")
pd.value_counts(df.*name of specific column*)
s = pd.value_counts(df.*name of specific column*)
s1 = pd.Series({'nunique': len(s), 'unique values': s.index.tolist()})
s.append(s1)
print(s)
Thanks in advance for any help.
Use the built in to find the unique in the columns:
sharing an example with you:
import pandas as pd
df=pd.DataFrame(columns=["a","b"])
df["a"]=[1,3,3,3,4]
df["b"]=[1,2,2,3,4]
print(df["a"].unique())
will give the following result:
[1 3 4]
So u can store it as a list to a variable if you like, with:
l_of_unique_vals=df["a"].unique()
and find its length or do anything as you like
df = pd.read_excel("nameoffile.xlsx", sheet_name=name_of_sheet_you_are_loading)
#in the line above we are reading the file in a pandas dataframe and giving it a name df
df["column you want to find vals from"].unique()
First you can use Pandas read_exel and then unique such as #Inder suggested.
import pandas as pd
df = pd.read_exel('name_of_your_file.xlsx')
print(df['columns'].unique())
See more here.

How to import all fields from xls as strings into a Pandas dataframe?

I am trying to import a file from xlsx into a Python Pandas dataframe. I would like to prevent fields/columns being interpreted as integers and thus losing leading zeros or other desired heterogenous formatting.
So for an Excel sheet with 100 columns, I would do the following using a dict comprehension with range(99).
import pandas as pd
filename = 'C:\DemoFile.xlsx'
fields = {col: str for col in range(99)}
df = pd.read_excel(filename, sheetname=0, converters=fields)
These import files do have a varying number of columns all the time, and I am looking to handle this differently than changing the range manually all the time.
Does somebody have any further suggestions or alternatives for reading Excel files into a dataframe and treating all fields as strings by default?
Many thanks!
Try this:
xl = pd.ExcelFile(r'C:\DemoFile.xlsx')
ncols = xl.book.sheet_by_index(0).ncols
df = xl.parse(0, converters={i : str for i in range(ncols)})
UPDATE:
In [261]: type(xl)
Out[261]: pandas.io.excel.ExcelFile
In [262]: type(xl.book)
Out[262]: xlrd.book.Book
Use dtype=str when calling .read_excel()
import pandas as pd
filename = 'C:\DemoFile.xlsx'
df = pd.read_excel(filename, dtype=str)
the usual solution is:
read in one row of data just to get the column names and number of columns
create the dictionary automatically where each columns has a string type
re-read the full data using the dictionary created at step 2.

Categories

Resources