openpyxl - specify column to read from - python

I have searched on how to use openpyxl to read values from a column but instead of specifying the column by a value like A, B, C, I am trying to specify what column to read from based on a string and read the rows below the specified string while essentially ignoring the ones above.
I have a few different spreadsheets that follow a similar format but not exactly the same meaning they contain the same information but not in the same format.
The spreadsheets all contain columns with first and last name, a unique ID for the contractors account, as well as some other confidential info. In this case I am trying to read the rows under the column named "ID" instead of specifying the column as A and the rows as 13-47.
Is there a way to do this? I haven't found anything on reading values without specifying columns by their letter or rows by their number.

First create a column title to column letter map using a dictionary.
Then use the column letter to iterate for values. I did handle such scenario like the following:
# Method to get a map of column index and column name
# Argument to this method is: - Row number of header row
def ref_col_idx_name_map(self, _header_row_num):
# Create an empty dictionary
col_idx_name_map = dict()
# Iterate over the values of the header row
# starting with index value of 0
for _col_idx, _col_cells in enumerate(
self.my_base_active_ws.iter_cols(min_row=_header_row_num, max_row=_header_row_num, values_only=True),
start=1):
col_idx_name_map[_col_idx] = _col_cells[0]
# Return type is dictionary
return col_idx_name_map
##################
from openpyxl.utils import column_index_from_string
# Method to get column values for a given column name
# Arguments to this method are:- Row number for header row & column name
def get_specific_col_val_by_col_name_in_active_ws(self, _header_row_num, _col_name):
# Check if provided column name exists in the worksheet
if _col_name in self.ref_col_idx_name_map(_header_row_num).values():
# Fetch the values from the column provided
# Skip the header row
# Fetch the column index from column name using 'ref_col_name_letter_map' method
return [_col_value[0] for _col_value in self.my_base_active_ws.iter_rows(min_row=_header_row_num + 1,
min_col=column_index_from_string(
self.ref_col_name_letter_map(
_header_row_num)[
_col_name]),
max_col=column_index_from_string(
self.ref_col_name_letter_map(
_header_row_num)[
_col_name]),
values_only=True)]

Related

Creating a new dataframe based on rows of an existing dataframe which contain only specific characters

In Python I am trying to create a new dataframe by appending all rows which do not contain certain
charachters in a certain column of another dataframe. Afterwards I want the generated list containing the results into a dataframe.
However, this result only contains a one column dataframe and does not include all the columns of the first dataframe (which do not contain those characters, which is what I need).
Does anybody have a suggestion on how to add all the columns to a new dataframe?
%%time
newlist = []
for row in old_dataframe['column']:
if row != (r'^[^\s]') :
newlist.append(row)

Pandas Find Index Value

I'm trying to find the index value for "Rental Income" in the dataframe below. The dataframe is being uploaded from an Excel document and needs to be cleaned.
Is there a way to find the index value for "Rental Income" without given the column name or row name? The formatting is different in ever Excel file so the column and row names change with each file. But if I can search the full dataframe at once, I can you the reference as the anchor point.
Try with where then stack
out = df.where(df.eq('A1')).stack()
idx = out.index.get_level_values(0)
col = out.index.get_level_values(1)

How to remove duplicates of column by giving appropriate value to its label in the row?

I have an excel table as follows (contains more data than displayed):
First column contains ids, second column contains labels and the row contains unique labels without any repetition.
Here, I need to remove the duplicates in the ids, by giving value 1 to the appropriate label column per row.
The expected out is:
You can use groupby and pivot in python pandas package

Creating a new column from existing column in the data frame based on the criteria

Create a new column based on the below criteria. Data Type for column A is String
If there exists a text in column A string. Create column B which contains only the string.
eg: Column A: This product is used for pipefitter
Desired: if column A contains pipefitter
Column B = pipefitter
"""df["New_text"] = []
def update_text(selected_text):
for text in df['Activity Name']:
if selected_text in text:
df['New_text'] = df['New_text'].append('text')"""
So if i understand correctly. You have a column Activity Name in your dataframe and you want to write a method which would each time print a subset of a dataframe which has a particular keyword.
If my understanding is right then.
def update_text(df, selected_text):
return df[df["Activity Name"].str.contains(selected_text)]

Filter Excel Spreadsheet to obtain cell value with Python

I have a GUI (shown below) and with it i want to extract a specific IP Address from an Excel spreadsheet which contains IP Address's (~1200 rows). I cannot find an example of how to search and filter the Spreadsheet to achieve what i require.
In my Spreadsheet I want to:
Search Column E for the value i enter in the GUI ie K11, which will narrow it down to ~10 Rows. I then want to search Column C for the string "Telephone" which will narrow it down to 2 Rows. I then want to extract to contents of these 2 rows in the B Column and assign each of them to variables.
Using the solution provided:
#Filter rows by column E (Station name) for results with cell = Asset No. (i.e "G04")
xlsx_filter1 = IP_Plan.index[IP_Plan['Station name'] == IPP2.get()].tolist()
IP_Plan=IP_Plan.loc[xlsx_filter1]
#Filter rows by column C (Type) for results with cell = Device Type (i.e "IP Telephone - Norphonic N-K1")
xlsx_filter2 = IP_Plan.index[IP_Plan['Type'] == "IP Telephone - Norphonic N-K1"].tolist()
IP_Plan=IP_Plan.loc[xlsx_filter2]
#File cell value by column B (IP address)
Output_IP_Address = IP_Plan["IP address"]
print(Output_IP_Address)
Produces this output upon the print command
I would like to use these two IP Addresses with my program so would like to obtain these values from the list without the index and assign them as separate variables how do i do this?
Output_IP_Address1 =
Output_IP_Address2 =
I require this so i can display the variables as a Label in the GUI (see GUI pic example shows 00.000.000.0) and use the variables in my Ping code to test and return result.
IP_Display_Nac = Label(IPP, text=Output_IP_Address1, anchor=W)
IP_Display_Tow = Label(IPP, text=Output_IP_Address2, anchor=W)
Try to use the pandas library to import the excel file, like so:
import pandas as pd
df = pd.read_excel("nameOfYourExcelFile.xlsx")
The variable df is now a so called dataframe object that is similar to an excel table. Here is some small headstart how to work with pandas Data Frames:
df.head()
this gives you the first few rows of the dataframe, by this you can check the structure of the dataframe and the names of the columns, for example.
df["Name of the desired Column"] # gives you the complete desired column as a vector
This only applies if you have headers in your column, like that:
In this example df["a"] would be [5,9,2,3].
If you have no headers in your file, then just import the excel file like so:
df = pd.read_excel("nameOfYourExcelFile.xlsx", header=None)
and call your column by a numbered index starting at 0, so column A would be df[0] and column E would be df[4], etc...
Other helpful functions:
df.iloc[1,:] # gives you the complete row with index 1
df.iloc[1,2] # gives you the item in row with index 1 and column with index 2
list(df) # gives you a list of all column header, in case you are in doubt which to take
Now here some example code how you could achieve your result:
indices_first_check = df.index[df['Name of your column E'] == "K11"].tolist()
This gives you a list of all row indices where the value in column E is "K11".
Then you can slice all other rows off:
df = df.iloc[indices_first_check,:]
Now get the indices for the "Telephone":
indices_second_check = df.index[df['Name of your column C'] == "Telephone"].tolist()
df = df.iloc[indices_second_check,:]
Now you can have your 2 desired values within a list:
list_desired_values = list(df["Name of your column B"])
Hope this helps.
edit: Changed last line so that Dataframe Column (Pandas Series object) ist being casted to a list.

Categories

Resources