Python merge 'n' cells in excel based on condition

Python merge 'n' cells in excel based on condition - python

I have an excel where there are values in row 1 from column 1 to column 15. Each cell value in the end has a number.
I would like to create another row which merges cells based on the ending number and puts that corresponding text in the merged cell. But the row values still needs to maintain the order.
For example A1=ABC3, B1=ABC5, C1=ABC4 and so on. Now I would like to create in row 2 a merge of first 3 cells for and place ABC3. I need to create 5 merged cells next in the same row 2 to place ABC5. After that 4 Merged cells in the same row and place ABC4 and so on. Any thoughts how to implement this ?

This can be accomplished with the openpyxl module. If you're not familiar with it yet, then doing some of the tutorials would be a good start.

Related

Create a column and add data from another column into the original using python

I have 3 columns I want to create by using python. The most important is Column C. What I want to happen is Row 1 in Column C starts at 2 like in Column A. Then it adds Row 2 from Column A to reflect into Column C (2+3) to equal the next number in Column C, 5. This process is repeated until it reaches the last number. Is this possible to do in python? I can't use excel because I was told to do it in python but I don't know how. I know how to create and enter the inputs for Columns A and B. Anyone know how to do this?
j_t=list(map(int, input("Column A: ").split(",")))
d_t=list(map(int, input("Column B: ").split(",")))
# to store the job time and date in dictionaries
dict_spt={}
dict_edd={}
for i in range (len(j_t)):
dict_spt[int(j_t[i])]=int(d_t[i])
dict_edd[int(d_t[i])]=int(j_t[i])

How to extract values based on column header in excel?

I have an excel file containing values, I needed values as the highlighted one in single column and deleting the rest on. But due to mismatch in rows and column header file, I am not able to extract. Once you will see the excel will able to understand what values I needed.As this is just a sample of mine data.
Column A2:A17 date is continuous but few date are repeating, but in Row (D1:K1) date are not repeating, so in this case value of same date occurring just below of of one other.
How to get values in one column?
Is there a way to highlight the values of same date occurring in row and column? The sample data consist of manually highlighted. I have huge dataset that cannot be manually highlighted.
Because from colour code also I can get the required values too.
Following is the file I am attaching here
https://docs.google.com/spreadsheets/d/1-xBMKRP1_toA_Ky8mKxCKAFi4uQ8YWJq/edit?usp=sharing&ouid=110042758694954349181&rtpof=true&sd=true
Please visit the link and help me to find the solution.
Thank you

I'm not clear what those values in columns D to K are.
If only the shaded ones matter and they can be derived from the Latitude and Longitude for each row separately:
Insert a column titled "Row", say in A, and populate it 1,2,3...
I think you also want a column E which is whatever the calculation you currently have in D-K. Is this "Distance"?
Then create a Pivot Table on rows A to E and you can do anything you are likely to need: https://support.microsoft.com/en-us/office/create-a-pivottable-to-analyze-worksheet-data-a9a84538-bfe9-40a9-a8e9-f99134456576
Dates at Colum Labels, Row numbers as Row Labels, and Sum of "Distance" as Values.

Identify groups and grouped rows in Excel file

I need to identify different groups in Excel files and rows inside these groups (to be more accurate I need to get the value of the first cell of the main row under which over rows are grouped).
Below is an example of the files structure (I've minimized the groups but when I receive these files they are expanded):
I know how to create new groups using openpyxl or xlwt, I'm familiar with both openpyxl and xlrd but I'm enable to find anything in the API to solve this requirement.
So, is it possible using Python and if so, which part of openpyxl or xlrd API should I use ?

You should be able to do this using the worksheet's row_dimensions. This returns an object accessible like a dict where the keys are the row numbers of the sheet. outline_level will have a non-zero value for each depth of grouping, or 0 if the row is not part of a group.
So, if you had a sheet where rows 2 and 3 were a group, and rows 5 and 6 were another group, iterating through row_dimensions would look like this:
>>> for row in range(ws.min_row, ws.max_row + 1):
... print(f"row {row} is in group {ws.row_dimensions[row].outline_level}")
...
row 1 is in group 0
row 2 is in group 1
row 3 is in group 1
row 4 is in group 0
row 5 is in group 1
row 6 is in group 1
I should point out that there's some weirdness with accessing the information. My original solution was this:
>>> for row_num, row_data in ws.row_dimensions.items():
... print(f"row {row_num} is group {row_data.outline_level}")
...
row 2 is group 1
row 3 is group 1
row 4 is group 0
row 5 is group 1
row 6 is group 1
Notice that row 1 is missing. It wasn't part of row_dimensions until I manually accessed it as row_dimensions[1] and then it appeared. I don't know how to explain that, but the first approach is probably better as it specifically iterates from the first to last row.
The same process applies to column groups through column_dimensions except that it must be keyed using column letter(s), e.g. ws.column_dimensions["A"].current_level.

Pandas Dataframes - Get the dataframe's overall top 5 values and their row and column labels, not by column or by row

I have a pandas DataFrame, let's say its named "df", with numerical values inside it in all columns (floats). I want to retrieve the top 5 highest absolute values from the dataframe, together with their row and column labels.
I've seen suggestions like:
df.abs().stack().nlargest(5)
but the stack method doesn't keep the row and column labels for all elements, it enumerates one of the axis and, for each element, then enumerates the other axis, with a blank element before. I need the value and the names of BOTH the column and the row.
I know I can do this by iterating over each column, then each row inside it, then accessing the value and appending to 3 lists, one with row names, other with column names and a third with the values, then copying the values list to have a fourth list with the absolute values, using this last list to get the positions of the 5 highest values, and using those positions to index the first 3 lists, therefore getting the row name, column name and value. There must be a better, more compact and more pythonic way though, but I seriously cannot find it anywhere, and I am usually good at gooling my issues away.

The suggested solution contains the row and column labels in the index and are not lost.
A simple example where the appropriate names are reattached:
df = pd.DataFrame({'a': np.random.random(100), 'b': np.random.random(100)})
df.abs().stack().nlargest(5).rename('value').rename_axis(['row', 'column']).reset_index()
Result:
row column value
0 87 a 0.958382
1 49 a 0.953590
2 55 a 0.952150
3 31 b 0.949763
4 4 b 0.931452

How to add rows to a specific location in a pandas DataFrame?

enter image description here
enter image description here
I am trying to add rows where there is a gap between month_count. For example, row 0 has month_count = 0 and row 1 has month_count = 7. How can I add extra 6 rows with month counts being 1,2,3,4,5,6? Also, same situation from row 3 to row 4. I would like to add 2 extra rows with month_count 10 and 11. What is the best way to go about this?

One way to do this would be to iterate over all of the rows and re-build the DataFrame with the missing rows inserted. Pandas does not support the direct insertion of rows at an index, however you can hack together a solution using pd.concat():
def pandas_insert(df, idx, row_contents):
top = df.iloc[:idx]
bot = df.iloc[idx:]
inserted = pd.concat([top, row_contents, bot], ignore_index=True)
return inserted
Here row_contents should be a DataFrame with one (or more) rows. We use ignore_index=True to update the index of the new DataFrame to be labeled 0,1, …, n-2, n-1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python merge 'n' cells in excel based on condition - python

This can be accomplished with the openpyxl module. If you're not familiar with it yet, then doing some of the tutorials would be a good start.

Related

Create a column and add data from another column into the original using python

How to extract values based on column header in excel?

Identify groups and grouped rows in Excel file

Pandas Dataframes - Get the dataframe's overall top 5 values and their row and column labels, not by column or by row

How to add rows to a specific location in a pandas DataFrame?

Categories

Resources