Pandas Find Index Value - python

I'm trying to find the index value for "Rental Income" in the dataframe below. The dataframe is being uploaded from an Excel document and needs to be cleaned.
Is there a way to find the index value for "Rental Income" without given the column name or row name? The formatting is different in ever Excel file so the column and row names change with each file. But if I can search the full dataframe at once, I can you the reference as the anchor point.

Try with where then stack
out = df.where(df.eq('A1')).stack()
idx = out.index.get_level_values(0)
col = out.index.get_level_values(1)

Related

How do I make a new dataframe with data from a previous dataframe

I have the data shown in the screenshot. I want to create a new panda with the column headers of the cells in the forces column in the screenshot and I want the respective values to be listed in each column. I have tried indexing each variable and creating a new panda but that hasn't seemed to work. Could I get some help?
I tried indexing and creating a new panda but when i index the variables i get a single value as opposed to a list of values.
one way to this is by filtring your dataframe each time on a specific value for the column Forces:
column = df['Forces'].unique()
dict_of_column_value = {}
for col in column:
dict_of_column_value[col] = list(df[df['Forces'] == col].Values)
pd.DataFrame(dict_of_column_value)

openpyxl - specify column to read from

I have searched on how to use openpyxl to read values from a column but instead of specifying the column by a value like A, B, C, I am trying to specify what column to read from based on a string and read the rows below the specified string while essentially ignoring the ones above.
I have a few different spreadsheets that follow a similar format but not exactly the same meaning they contain the same information but not in the same format.
The spreadsheets all contain columns with first and last name, a unique ID for the contractors account, as well as some other confidential info. In this case I am trying to read the rows under the column named "ID" instead of specifying the column as A and the rows as 13-47.
Is there a way to do this? I haven't found anything on reading values without specifying columns by their letter or rows by their number.
First create a column title to column letter map using a dictionary.
Then use the column letter to iterate for values. I did handle such scenario like the following:
# Method to get a map of column index and column name
# Argument to this method is: - Row number of header row
def ref_col_idx_name_map(self, _header_row_num):
# Create an empty dictionary
col_idx_name_map = dict()
# Iterate over the values of the header row
# starting with index value of 0
for _col_idx, _col_cells in enumerate(
self.my_base_active_ws.iter_cols(min_row=_header_row_num, max_row=_header_row_num, values_only=True),
start=1):
col_idx_name_map[_col_idx] = _col_cells[0]
# Return type is dictionary
return col_idx_name_map
##################
from openpyxl.utils import column_index_from_string
# Method to get column values for a given column name
# Arguments to this method are:- Row number for header row & column name
def get_specific_col_val_by_col_name_in_active_ws(self, _header_row_num, _col_name):
# Check if provided column name exists in the worksheet
if _col_name in self.ref_col_idx_name_map(_header_row_num).values():
# Fetch the values from the column provided
# Skip the header row
# Fetch the column index from column name using 'ref_col_name_letter_map' method
return [_col_value[0] for _col_value in self.my_base_active_ws.iter_rows(min_row=_header_row_num + 1,
min_col=column_index_from_string(
self.ref_col_name_letter_map(
_header_row_num)[
_col_name]),
max_col=column_index_from_string(
self.ref_col_name_letter_map(
_header_row_num)[
_col_name]),
values_only=True)]

Creating a new dataframe based on rows of an existing dataframe which contain only specific characters

In Python I am trying to create a new dataframe by appending all rows which do not contain certain
charachters in a certain column of another dataframe. Afterwards I want the generated list containing the results into a dataframe.
However, this result only contains a one column dataframe and does not include all the columns of the first dataframe (which do not contain those characters, which is what I need).
Does anybody have a suggestion on how to add all the columns to a new dataframe?
%%time
newlist = []
for row in old_dataframe['column']:
if row != (r'^[^\s]') :
newlist.append(row)

Cleaning dataframe- assign value in one cell to column

I am reading multiple CSV files from a folder into a dataframe. I loop for all the files in the folder and then concat the dataframes to obtain the final dataframe.
However the CSV file has one summary row from which I want to extract the date, and then add as a new column for all the rows in that csv/dataframe.
'''
df=pd.read_csv(f,header=None,names=['Inverter',"Day Yield",'month Yield','Year Yield','SpecificYieldDay','SYMth','SYYear','Power'],sep=';', **kwargs)
df['date']=df.loc[[0],['Day Yield']]
df
I expect ['date'] column to be filled with the date for that file for all the rows in that particular csv, but it gets filled correctly only for the first row.
Refer to image of dataframe. I want all the rows of the 'date' column to be showing 7/25/2019 instead of only the first row.
I have also added an example of one of the csv files I am reading from
csv file
If I understood correctly, the value that you want to add as a new column for all rows is in df.loc[[0],['Day Yield']].
If that is correct you can do the following:
df = df.assign(date=[df.loc[[0],['Day Yield']]]*len(df))

Search entire excel sheet with Pandas for word(s)

I am trying to essentially replicate the Find function (control-f) in Python with Pandas. I want to search and entire sheet (all rows and columns) to see if any of the cells on the sheet contain a word and then print out the row in which the word was found. I'd like to do this across multiple sheets as well.
I've imported the sheet:
pdTestDataframe = pd.read_excel(TestFile, sheet_name="Sheet Name",
keep_default_na= False, na_values=[""])
And tried to create a list of columns that I could index into the values of all of the cells but it's still excluding many of the cells in the sheet. The attempted code is below.
columnsList = []
for i, data in enumerate(pdTestDataframe.columns):
columnList.append(pdTestDataframe.columns[i])
for j, data1 in enumerate(pdTestDataframe.index):
print(pdTestDataframe[columnList[i]][j])
I want to make sure that no matter the formatting of the excel sheet, all cells with data inside can be searched for the word(s). Would love any help I can get!
Pandas has a different way of thinking about this. Just calling df[df.text_column.str.contains('whatever')] will show you all the rows in which the text is contained in one specific column. To search the entire dataframe, you can use:
mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]
(Source is here)

Categories

Resources