Custom header to pandas data frame while writing to excel - python

I have a simple data frame with a few columns
df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])
A B
0 1 2
1 1 3
2 4 6
what I am trying to achieve is writing to excel with a custom header which can be be multline.
So output of excel would be
| App Input | |
| -------- | -------------- |
| -------- | -------------- |
| A |B |
|data |data |
| 1 | 2 |
| 1 | 3 |
| 4 | 6 |
Any ideas how can I achieve this? I was thinking of mult index but I don't think it will work since its not a true multi index

Since headers in Excel will be simple cells with string values, you can just precede the "real" values in columns with some textual values that together with the dataframe's column names are forming the multi-line header you wish to get.
For example you could use the following values to get the desired result you provided:
df = pd.DataFrame([['', ''], ['A', 'B'], ['data', 'data'], [1, 2], [1, 3], [4, 6]], columns=['App Input', ''])

Related

How can I insert an excel formula using pandas when I save it as an excel file?

I have the following dataframe with multiple cols and rows,
A | B | C | D | E |....
2 | b | c | NaN | 1 |
3 | c | b | NaN | 0 |
4 | b | b | NaN | 1 |
.
.
.
Is there a way to add excel formulas (for some columns) in the manner stated below through an example using python in an output excel file?
For instance, I want to be able to have the output something like this,
=SUM(A0:A2) | | | | =SUM(E0:E2)
A | B | C | D | E
0 2 | b | c | =IF(B0=C0, "Yes", "No") | 1
1 3 | c | b | =IF(B1=C1, "Yes", "No") | 0
2 4 | b | b | =IF(B2=C2, "Yes", "No") | 1
.
.
.
Final output,
9 | | | | 2
A | B | C | D | E
0 2 | b | c | No | 1
1 3 | c | b | No | 0
2 4 | b | b | Yes | 1
.
.
.
I want to add formulas in the final output excel file so that if there are any changes in the values of columns (in the final output excel file) other columns can also be updated in the excel file in real time, for instance,
15 | | | | 3
A | B | C | D | E
0 2 | b | b | Yes | 1
1 9 | c | b | No | 1
2 4 | b | b | Yes | 1
.
.
.
If I change the values of, for instance, A1 from 3 to 9, then the sum of the column changes to 15; when I change the value of C0 from "c" to "b", the value of its corresponding row value, that is, D0 changes from "No" to "Yes"; Same for col E.
I know you can use xlsxwriter library to write the formulas but I am not able to figure out as to how I can add the formulas in the manner I have stated in the example above.
Any help would be really appreciated, thanks in advance!
You're best doing all of your formulas you wish to keep via xlsxwriter and not pandas.
You would use pandas if you only wanted to export the result, since you want to preserve the formula, do it when you write your spreadsheet.
The code below will write out the dataframe and formula to an xlsx file called test.
import xlsxwriter
import pandas as pd
from numpy import nan
data = [[2, 'b', 'c', nan, 1], [3, 'c', 'b', nan, 0], [4, 'b', 'b', nan, 1]]
df = pd.DataFrame(data=data, columns=['A', 'B', 'C', 'D', 'E'])
## Send values to a list so we can iterate over to allow for row:column matching in formula ##
values = df.values.tolist()
## Create Workbook ##
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
row = 0
col = 0
## Iterate over the data we extracted from the DF, generating our cell formula for 'D' each iteration ##
for idx, line in enumerate(values):
d = f'=IF(B{row + 1}=C{row + 1}, "Yes", "No")'
a, b, c, _, e = line
## Write cells into spreadsheet ##
worksheet.write(row, col, a)
worksheet.write(row, col + 1, b)
worksheet.write(row, col + 2, c)
worksheet.write(row, col + 3, d)
worksheet.write(row, col + 4, e)
row += 1
## Write the total sums to the bottom row of the sheet utilising the row counter to specify our stop point ##
worksheet.write(row, 0, f'=SUM(A1:A{row})')
worksheet.write(row, 4, f'=SUM(E1:E{row})')
workbook.close()

Creating dictionary from excel file (pandas dataframe)

I've got excel/pandas dataframe/file looking like this:
+------+--------+
| ID | 2nd ID |
+------+--------+
| ID_1 | R_1 |
| ID_1 | R_2 |
| ID_2 | R_3 |
| ID_3 | |
| ID_4 | R_4 |
| ID_5 | |
+------+--------+
How can I transform it to python dictionary? I want my result to be like:
{'ID_1':['R_1','R_2'],'ID_2':['R_3'],'ID_3':[],'ID_4':['R_4'],'ID_5':[]}
What should I do, to obtain it?
If need remove missing values for not exist values use Series.dropna in lambda function in GroupBy.apply:
d = df.groupby('ID')['2nd ID'].apply(lambda x: x.dropna().tolist()).to_dict()
print (d)
{'ID_1': ['R_1', 'R_2'], 'ID_2': ['R_3'], 'ID_3': [], 'ID_4': ['R_4'], 'ID_5': []}
Or use fact np.nan == np.nan return False in list compehension for filter non missing values, check also warning in docs for more explain.
d = df.groupby('ID')['2nd ID'].apply(lambda x: [y for y in x if y == y]).to_dict()
If need remove empty strings:
d = df.groupby('ID')['2nd ID'].apply(lambda x: [y for y in x if y != '']).to_dict()
Apply a function over the dataframe over the rows which appends the value to your dict. Apply is not inplace and thus your dictionary would be created.
d = dict.fromkeys(df.ID.unique(), [])
def func(x):
d[x.ID].append(x["2nd ID"])
# will return a series of Nones
df.apply(func, axis = 1)
Edit:
I asked it on Gitter and #gurukiran07 gave me an answer. What you are trying to do is reverse of explode function
s = pd.Series([[1, 2, 3], [4, 5]])
0 [1, 2, 3]
1 [4, 5]
dtype: object
exploded = s.explode()
0 1
0 2
0 3
1 4
1 5
dtype: object
exploded.groupby(level=0).agg(list)
0 [1, 2, 3]
1 [4, 5]
dtype: object

Remove pandas columns based on list

I have a list:
my_list = ['a', 'b']
and a pandas dataframe:
d = {'a': [1, 2], 'b': [3, 4], 'c': [1, 2], 'd': [3, 4]}
df = pd.DataFrame(data=d)
What can I do to remove the columns in df based on list my_list, in this case remove columns a and b
This is very simple:
df = df.drop(columns=my_list)
drop removes columns by specifying a list of column names
This is a concise script using list comprehension: [df.pop(x) for x in my_list]
my_list = ['a', 'b']
d = {'a': [1, 2], 'b': [3, 4], 'c': [1, 2], 'd': [3, 4]}
df = pd.DataFrame(data=d)
print(df.to_markdown())
| | a | b | c | d |
|---:|----:|----:|----:|----:|
| 0 | 1 | 3 | 1 | 3 |
| 1 | 2 | 4 | 2 | 4 |
[df.pop(x) for x in my_list]
print(df.to_markdown())
| | c | d |
|---:|----:|----:|
| 0 | 1 | 3 |
| 1 | 2 | 4 |
You can select required columns as well:
cols_of_interest = ['c', 'd']
df = df[cols_of_interest]
if you have a range of columns to drop: for example 2 to 8, you can use:
df.drop(df.iloc[:,2:8].head(0).columns, axis=1)

Getting unknow columns from list to dataframe

I have my list contained other 119,554 lists in. All the lists have the same length of 334 lists. I needed to covert list into dataframe using df2=pd.DataFrame(df). The result shows (119554, 706)
I don't know why there are additional columns added. It should be (119554, 33) if I'm not wrong.
Any suggestion? Thank!
Let's say you have the following lists within another list:
lst = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
You can see that the list contains 4 lists having length 3. Lists do not have shape attribute.
You can convert this list into a pandas dataframe in the following way:
df = pd.DataFrame(lst)
The dataframe looks like this:
| | 0 | 1 | 2 |
|---:|----:|----:|----:|
| 0 | 1 | 2 | 3 |
| 1 | 4 | 5 | 6 |
| 2 | 7 | 8 | 9 |
| 3 | 10 | 11 | 12 |
The shape of the dataframe is:
print(df.shape)
>> (4, 3)

How do I remove the last row in the printed list?

The question asked of me is to do the following
Print the 2-dimensional list mult_table by row and column. Hint: Use nested loops. Sample output for the given program:
1 | 2 | 3
2 | 4 | 6
3 | 6 | 9
So far I have this:
mult_table = [
[1, 2, 3],
[2, 4, 6],
[3, 6, 9]
]
for row in mult_table:
for cell in row:
print(cell, end=' | ')
print()
The output this gives me is:
1 | 2 | 3 |
2 | 4 | 6 |
3 | 6 | 9 |
I need to know how I can remove the last column of | that is being printed.
Thank you for your help in advance.
You can use the str.join method instead of always printing a pipe as an ending character:
for row in mult_table:
print(' | '.join(map(str, row)))
Or you can use the sep parameter:
for row in mult_table:
print(*row, sep=' | ')

Categories

Resources