Python code for word search in excel - python

I need a code that can search for the specific word in excel file. In the specific columns and I want it to output with columns letter and rows number and sheet name? I have started but don't know further:
from xlrd import open_workbook
book = open_workbook("excel1.xlsx")
for sheet in book.sheets():
its needs to print row number, column letter, sheet name? also if you can use pandas instead xlrd it will be great.

An alternative is you may use Pandas library to do this. Pandas works on data frames i.e. tabular data, so have rows and columns. You need to specify your needs according to the dataset in Pandas.
import pandas as pd
df = pd.read_excel('filename.xls')
df[df['col_name'].str.contains('ABC')].head()
df.query('col_name == ["words"]').head()
df[df['Column'] >= 'Your_search_word'].head()
etc. You can search more on the documentation of Pandas http://pbpython.com/excel-pandas-comp-2.html
Note: Pandas merge all sheets together to create one data frame in a tabular structure that can make things easier to search.

Related

What code should I use in extracting specific column (with specific data) from a csv file to python. It can be either pandas or numpy

please see attached photo
here's the image
I only need to import a specific column with conditions(such as specific data found in that column). And also, I only need to remove unnecessary columns. dropping them takes too much code. What specific code or syntax is applicable?
How to get a column from pandas dataframe is answered in Read specific columns from a csv file with csv module?
To quote:
Pandas is spectacular for dealing with csv files, and the following
code would be all you need to read a csv and save an entire column
into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
So in your case, you just save the the filtered data frame in a new variable.
This means you do newdf = data.loc[...... and then use the code snippet from above to extract the column you desire, for example newdf.continent

I want to sort data present in excel file in sheet with respect to column. (The excel file has multiple sheets)

Excel Data
This is the data I've in an excel file. There are 10 sheets containing different data and I want to sort data present in each sheet by the 'BA_Rank' column in descending order.
After sorting the data, I've to write the sorted data in an excel file.
(for eg. the data which was present in sheet1 of the unsorted sheet should be written in sheet1 of the sorted list and so on...)
If I remove the heading from the first row, I can use the pandas (sort_values()) function to sort the data present in the first sheet and save it in another list.
like this
import pandas as pd
import xlrd
doc = xlrd.open_workbook('without_sort.xlsx')
xl = pd.read_excel('without_sort.xlsx')
length = doc.nsheets
#print(length)
#for i in range(0,length):
#sheet = xl.parse(i)
result = xl.sort_values('BA_Rank', ascending = False)
result.to_excel('SortedData.xlsx')
print(result)
So is there any way I can sort the data without removing the header file from the first row?
and how can I iterate between sheets so as to sort the data present in multiple sheets?
(Note: All the sheets contain the same columns and I need to sort every sheet using 'BA_Rank' in descending order.)
First input, you don't need to call xlrd when using pandas, it's done under the hood.
Secondly, the read_excel method its REALLY smart. You can (imo should) define the sheet you're pulling data from. You can also set up lines to skip, inform where the header line is or to ignore it (and then set column names manually). Check the docs, it's quite extensive.
If this "10 sheets" it's merely anecdotal, you could use something like xlrd to extract the workbook's sheet quantity and work by index (or extract names directly).
The sorting looks right to me.
Finally, if you wanna save it all in the same workbook, I would use openpyxl or some similar library (there are many others, like pyexcelerate for large files).
This procedure pretty much always looks like:
Create/Open destination file (often it's the same method)
Write down data, sheet by sheet
Close/Save file
If the data is to be writen all on the same sheet, pd.concat([all_dataframes]).to_excel("path_to_store") should get it done

Read Excel sheet table (Listobject) into python with pandas

There are multiple ways to read excel data into python.
Pandas provides aslo an API for writing and reading
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
That works fine.
BUT: What is the way to access the tables of every sheet directly into a pandas dataframe??
The above picture shows a sheet including a table SEPARATED THAN CELL (1,1).
Moreover the sheet might include several tables (listobjects in VBA).
I can not find anywhere the way to read them into pandas.
Note1: It is not possible to modify the workbook to bring all the tables towards cell(1,1).
Note2: I would like to use just pandas (if it is possible) and minimize the need to import other libraries. But it there is no other way I am ready to use other lybray. In any case I could not manage with xlwings for instance.
here it looks like its possible to parse the excel file, but no soilution is provided for tables, just for complete sheets.
The documentation of pandas does not seem to offer that possibility.
Thanks.
You can use xlwings, great package for working with excel files in python.
This is for a single table, but it is pretty trivial to use xlwings collections (App>books>sheets>tables) to iterate over all tables. Tables are ofcourse listobjects.
import xlwings
import pandas
with xlwings.App() as App:
_ = App.books.open('my.xlsx')
rng = App.books['my.xlsx'].sheets['mysheet'].tables['mytablename'].range
df: pandas.DataFrame = rng.expand().options(pandas.DataFrame).value
I understand that this question has been marked solved already, but I found an article that provides a much more robust solution:
Full Post
I suppose a newer version of this library supports better visibility of the workbook structure. Here is a summary:
Load the workbook using the load_workbook function from openpyxl
Then, you are able to access the sheets within, which contains collection of List-Objects (Tables) in excel.
Once you gain access to the tables, you are able to get to the range addresses of those tables.
Finally they loop through the ranges and create a pandas data-frame from it.
This is a nicer solution as it gives us the ability to loop through all the sheets and tables in a workbook.
Here is a way to parse one table, howver it's need you to know some informations on the seet parsed.
df = pd.read_excel("file.xlsx", usecols="B:I", index_col=3)
print(df)
Not elegant and work only if one table is present inside the sheet, but that a first step:
import pandas as pd
import string
letter = list(string.ascii_uppercase)
df1 = pd.read_excel("file.xlsx")
def get_start_column(df):
for i, column in enumerate(df.columns):
if df[column].first_valid_index():
return letter[i]
def get_last_column(df):
columns = df.columns
len_column = len(columns)
for i, column in enumerate(columns):
if df[column].first_valid_index():
return letter[len_column - i]
def get_first_row(df):
for index, row in df.iterrows():
if not row.isnull().values.all():
return index + 1
def usecols(df):
start = get_start_column(df)
end = get_last_column(df)
return f"{start}:{end}"
df = pd.read_excel("file.xlsx", usecols=usecols(df1), header=get_first_row(df1))
print(df)

Using pandas to replace data in excel sheet

I tried to come up with a way to copy data from a sheet in an excel file as
import pandas as pd
origionalFile = pd.ExcelFile('AnnualReport-V5.0.xlsx')
Transfers = pd.read_excel(origionalFile, 'Sheet1')
I have another excel file, which named 'AnnualReport-V6.0.xlsx', it has existing data in the sheet named 'Transfers', I tried to use the dataframe I created easily on to replace data in the sheet 'Transfers' in 'AnnualReport-V6.0.xlsx' from column B, leave column A as it is.
I did a few searches, the closest to what I want is this
Modifying an excel sheet in a excel book with pandas
but it does not allow me the keep column A in the original sheet (column A has some equations I do want to keep them), any idea how to do it? Thanks
Would reading column A and inserting it to the fresh data you want to write solve your problem?

Methods of reading in and creating a list/array with excel information from excel

Imagine I am given two columns: a,b,c,d,e,f and e,f,g,h,i,j (commas indicating a new row in the column)
How could I read in such information from excel and put it in an two separate arrays? I would like to manipulate this info and read it off later. as part of an output.
You have a few choices here. If your data is rectangular, starts from A1, etc. you should just use pandas.read_excel:
import pandas as pd
df = pd.read_excel("/path/to/excel/file", sheetname = "My Sheet Name")
print(df["column1"].values)
print(df["column2"].values)
If your data is a little messier, and the read_excel options aren't enough to get your data, then I think your only choice will be to use something a little lower level like the fantastic xlrd module (read the quickstart on the README)

Categories

Resources