How to sorted data in xlsx column by openpyxl? - python

I would like to save sorted data from A to Z by column C in my Excel file.
My code:
### EXCEL
# Set column names
A = 'SURNAME'
B = 'NAME'
C = 'sAMAccountName'
# Set worksheet
wb = Workbook() # create excel worksheet
ws_01 = wb.active # Grab the active worksheet
ws_01.title = "all inf" # Set the title of the worksheet
# Set first row
title_row = 1
ws_01.cell(title_row, 1, A) # cell(row, col, value)
ws_01.cell(title_row, 2, B)
ws_01.cell(title_row, 3, C)
data_row = 2
for user in retrieved_users:
attributes = user['attributes']
sAMAccountName = attributes['sAMAccountName']
if(user_validation(sAMAccountName) == True):
A = attributes['sn']
B = attributes['givenName']
C = sAMAccountName
ws_01.cell(data_row, 1, str(A))
ws_01.cell(data_row, 2, str(B))
ws_01.cell(data_row, 3, str(C))
data_row = data_row + 1
# Save it in an Excel file
decoded_users_all_inf = root_path + reports_dir + users_all_inf_excel_file
wb.save(decoded_users_all_inf)
Where and what I have put to my code to have this?

If you want to sort retrieved_users, then you can use the built-in list.sort with a key to access the sAMAccountName.
retrieved_users = [
{"attributes": {"sn": "a", "givenName": "Alice", "sAMAccountName": "z"}},
{"attributes": {"sn": "b", "givenName": "Bob", "sAMAccountName": "x"}},
{"attributes": {"sn": "c", "givenName": "Charlie", "sAMAccountName": "y"}},
]
retrieved_users.sort(key=lambda d: d["attributes"]["sAMAccountName"])
retrieved_users contains
[{'attributes': {'sn': 'b', 'givenName': 'Bob', 'sAMAccountName': 'x'}},
{'attributes': {'sn': 'c', 'givenName': 'Charlie', 'sAMAccountName': 'y'}},
{'attributes': {'sn': 'a', 'givenName': 'Alice', 'sAMAccountName': 'z'}}]
On another note, you can do ws.append(row) to append entire rows at a time rather than doing ws.cell(row, col, value) three times:
wb = Workbook()
ws = wb.active
ws.append(('SURNAME', 'NAME', 'sAMAccountName'))
is equivalent to
wb = Workbook()
ws = wb.active
ws.cell(1, 1, 'SURNAME')
ws.cell(1, 2, 'NAME')
ws.cell(1, 3, 'sAMAccountName')

Related

Python using loop to update the cells in excel

The dataframe is created with the Join_Date and Name
data = {'Join_Date': ['2023-01', '2023-01', '2023-02', '2023-03'],
'Name': ['Tom', 'Amy', 'Peter', 'Nick']}
df = pd.DataFrame(data)
I have split the df by Join_Date, can it be printed into excel date by date by using for loop?
df_split = [df[df['Join_Date'] == i] for i in df['Join_Date'].unique()]
Expected result:
You can use the ExcelWriter method in pandas:
import pandas as pd
import xlsxwriter
data = {'Join_Date': ['2023-01', '2023-01', '2023-02', '2023-03'],
'Name': ['Tom', 'Amy', 'Peter', 'Nick']}
df = pd.DataFrame(data)
df_split = [df[df['Join_Date'] == i] for i in df['Join_Date'].unique()]
writer = pd.ExcelWriter("example.xlsx", engine='xlsxwriter')
skip_rows = 0
for df in df_split:
df.to_excel(writer, sheet_name='Sheet1', startcol=2, startrow=2+skip_rows, index=False)
skip_rows += df.shape[0]+2
writer.close()
You can use the pandas methods to do so, like this. (You can add a empty line if you really need it)
import pandas as pd
data = {'Join_Date': ['2023-01', '2023-01', '2023-02', '2023-03'],
'Name': ['Tom', 'Amy', 'Peter', 'Nick']}
df = pd.DataFrame(data)
def add_header(x):
x.loc[-1] = 'Join_date', 'Name'
return x.sort_index().reset_index(drop=True)
df_split = df.groupby(['Join_Date'], group_keys=False)
df_group = df_split.apply(add_header)
df_group.to_excel('output.xlsx', index=False, header=False)
You can add the empty line editing the add_header func like:
def add_header(x):
x.loc[-1] = ' ', ' '
x = x.sort_index().reset_index(drop=True)
x.loc[0.5] = 'Join_date', 'Name'
x = x.sort_index().reset_index(drop=True)
return x

JSON or CSV from list in python

So I have a code that converts my category tree to a list and I wanted to convert it to CSV/json. Each item on list can have more ids as shown below.
def paths(tree):
tree_name = next(iter(tree.keys()))
if tree_name == 'children':
for child in tree['children']:
for descendant in paths(child):
yield (tree['id'],) + descendant
else:
yield (tree['id'],)
pprint.pprint(list(paths(tree)))
Output
[(461123, 1010022280, 10222044, 2222871,2222890),
(461123, 129893, 119894, 1110100250),
(461123, 98943, 944894, 9893445),
(461123, 9844495)]
Is there any way I can improve my code or have another code that converts list to json that looks below output?
Output should look like this
{
{
"column1": "462312",
"column2": "1010022280",
"column3": "10222044",
"column4": "2222871",
"column5": "2222890"
},
{
"column1": "461123",
"column2": "129893",
"column3": "119894",
"column4": "1110100250"
}
and so on...
}
if csv should look like this. ** Can be up to column 10
column1
column2
column3
column4
461123
129893
119894
1110100250
461123
129893
119894
Following is the code to convert list of tuple to a list of dict which you can convert to json and the second function turns the data to a csv
data = [(461123, 1010022280, 10222044, 2222871,2222890),
(461123, 129893, 119894, 1110100250),
(461123, 98943, 944894, 9893445),
(461123, 9844495)]
def convert_to_list_of_dicts(data):
output_list = []
for i in data:
data_dict = {}
for count,j in enumerate(i) :
data_dict["column" + str(count+1)] = j
output_list.append(data_dict)
return output_list
# print(convert_to_list_of_dicts(data))
def convert_to_csv(data):
max_column_num = 0
for i in data:
if len(i) > max_column_num:
max_column_num = len(i)
columns = ["column" + str(i+1) for i in range(max_column_num)]
newdata = [tuple(columns)]
for tup in data:
newdata.append(tup)
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(newdata)
# convert_to_csv(data)

How to make dictionary key-value with CSV file?

I have CSV file look like this, have 8 columns. I want to make dictionary key with department, and value with number of departments.
Check the cvs file image :
example of print out
{'Japanese': 1,
'physiology': 1,
'economics': 2,
'business': 1,
'biology': 1,
'software': 9,
'smart iot': 1,
'medical': 1,
'big data': 2}
so I wrote the code like this way, but it doesn't work well.
names = ['japanese', 'psychology', 'economics', 'buiseness', 'biology', 'medical', 'software', 'smart IOT', 'big data', 'electronics', 'contents IT']
names_count = len(names)
unique_dept_names = set()
unique_dept_names = {name:[] for name in names}
with open('fake_student_records.csv', mode='r', encoding='utf-8') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
for name in names:
unique_dept_names[name].append(row['department'])
number = {name:0 for name in names}
for name in names:
number[name] = names_count
print(number)
You could use collections
import csv
import collections
departments = collections.Counter()
with open('fake_student_records.csv') as input_file:
next(input_file) #skip first row since its column name
for row in csv.reader(input_file, delimiter=','):
departments[row[2]] += 1
print ('Number of Japanese departments: '+ str(departments['Japanese']))
print (departments.most_common())
Example output:
Number of Japanese departments: 3
[('Japanese', 3), ('economics', 1), ('business', 2),...]

Pandas xlsxwriter to write dataframe to excel and implementing column-width and border related formatting

Background:
I am using Pandas and have a dataframe 'df' which I intend to write into an Excel sheet. I use the code below and get the output Excel sheet as shown in attached snapshot 'Present.JPG':
import pandas as pd
import xlsxwriter
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
Problem description:
I would like to write the dataframe to Excel and incorporate the following changes.
1) Get rid of the first column indicating the index
2) Implement text wrapping on all columns (to auto-size each column width)
3) Sketch thick border A1 to C4, D1 to F4 and column G
Eventually, I would like the Excel sheet to look like as shown in snapshot 'Desired.JPG':
Tried till now:
I tried the following commands but they over-write the border on to the content of the cells. Furthermore, I am not able to figure out how to extend the border (and text wrapping) beyond a single cell.
writer = pd.ExcelWriter("output.xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
workbook=writer.book
worksheet= writer.sheets['Sheet1']
full_border = workbook.add_format({"border":1,"border_color": "#000000"})
link_format = workbook.add_format({'text_wrap': True})
worksheet.write("D3", None, full_border)
worksheet.write("E1", None, link_format)
writer.save()
I'm a little late to the party but here is what you were looking for:
import xlsxwriter
import pandas as pd
df = pd.DataFrame({
'Class': ['A', 'A', 'A'],
'Type': ['Mary', 'John', 'Michael'],
'JoinDate YYYY-MM-DD': ['2018-12-12', '2018-12-12', '2018-12-15'],
'Weight': [150, 139, 162],
'Height': [166.4, 160, 143],
'Marks': [96, 89, 71],
'LastDate YYYY-MM-DD': ['2020-01-17', '2020-01-17', '2020-01-17']
})
with pd.ExcelWriter('output.xlsx', engine='xlsxwriter') as writer:
# remove the index by setting the kwarg 'index' to False
df.to_excel(excel_writer=writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# dynamically set column width
for i, col in enumerate(df.columns):
column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
worksheet.set_column(i, i, column_len)
# wrap the text in all cells
wrap_format = workbook.add_format({'text_wrap': True, 'align': 'center'})
worksheet.set_column(0, len(df.columns) - 1, cell_format=wrap_format)
# mimic the default pandas header format for use later
hdr_fmt = workbook.add_format({
'bold': True,
'border': 1,
'text_wrap': True,
'align': 'center'
})
def update_format(curr_frmt, new_prprty, wrkbk):
"""
Update a cell's existing format with new properties
"""
new_frmt = curr_frmt.__dict__.copy()
for k, v in new_prprty.items():
new_frmt[k] = v
new_frmt = {
k: v
for k, v in new_frmt.items()
if (v != 0) and (v is not None) and (v != {}) and (k != 'escapes')
}
return wrkbk.add_format(new_frmt)
# create new border formats
header_right_thick = update_format(hdr_fmt, {'right': 2}, workbook)
normal_right_thick = update_format(wrap_format, {'right': 2}, workbook)
normal_bottom_thick = update_format(wrap_format, {'bottom': 2}, workbook)
normal_corner_thick = update_format(wrap_format, {
'right': 2,
'bottom': 2
}, workbook)
# list the 0-based indices where you want bold vertical border lines
vert_indices = [2, 5, 6]
# create vertical bold border lines
for i in vert_indices:
# header vertical bold line
worksheet.conditional_format(0, i, 0, i, {
'type': 'formula',
'criteria': 'True',
'format': header_right_thick
})
# body vertical bold line
worksheet.conditional_format(1, i,
len(df.index) - 1, i, {
'type': 'formula',
'criteria': 'True',
'format': normal_right_thick
})
# bottom corner bold lines
worksheet.conditional_format(len(df.index), i, len(df.index), i, {
'type': 'formula',
'criteria': 'True',
'format': normal_corner_thick
})
# create bottom bold border line
for i in [i for i in range(len(df.columns) - 1) if i not in vert_indices]:
worksheet.conditional_format(len(df.index), i, len(df.index), i, {
'type': 'formula',
'criteria': 'True',
'format': normal_bottom_thick
})

How can I put excel data to the dictionary?

I wanna put excel data to the dictionary.
Excel is
views.py is
#coding:utf-8
from django.shortcuts import render
import xlrd
book3 = xlrd.open_workbook('./data/excel.xlsx')
sheet3 = book3.sheet_by_index(0)
large_item = None
data_dict = {}
for row_index in range(1,sheet3.nrows):
rows3 = sheet3.row_values(row_index)
large_item = rows3[1] or large_item
data_dict = rows3
Now when I printed out print(data_dict),['', '4', '10', 'Karen', ''] was shown.Before,I wrote data_dict.extend(rows3) in place of data_dict = rows3,but in that time dict has not extend error happens.My ideal output is
data_dict = {
1: {
user_id: 1,
name_id: 1,
name: Blear,
age: 40,
man: false,
employee: leader,
},
2: {
user_id: 2,
name_id: 5,
・
       ・
       ・
},
・
       ・
       ・
}
How should I write to achieve my goal?
Your problem is :
data_dict = rows3
This doesn't add rows3 to data_dict, this set is value. So data_dict is equal to the last row.
To add element to a dict you need to do this:
data_dict[KEY] = VALUE
Your key will be the row index.
Now, you want another dict like VALUE
{
user_id: 1,
name_id: 1,
name: Blear,
age: 40,
man: false,
employee: leader,
}
So for each row you need to construct this dict, use headers and cell value to do it.
I don't test this code, it's just to give you an idea to how to do it.
#coding:utf-8
from django.shortcuts import render
import xlrd
book3 = xlrd.open_workbook('./data/excel.xlsx')
sheet3 = book3.sheet_by_index(0)
headers = sheet3.row_values(0)
large_item = None
data_dict = {}
for row_index in range(1,sheet3.nrows):
rows3 = sheet3.row_values(row_index)
large_item = rows3[1] or large_item
# Create dict with headers and row values
row_data = {}
for idx_col,value in enumerate(rows3):
header_value = headers[idx_col]
# Avoid to add empty column. A column in your example
if header_value:
row_data[headers[idx_col]] = value
# Add row_data to your data_dict with
data_dict[row_index] = row_data
You can use python's library pandas for an easy solution:
from pandas import *
xls = ExcelFile('your_excel_file.xls')
df = xls.parse(xls.sheet_names[0])
df.to_dict()

Categories

Resources