Changing and saving xlsb file in pandas

Changing and saving xlsb file in pandas - python

So I have code like this:
pip install pyxlsb
import numpy as np
import pandas as pd
import openpyxl as xl
import csv as cs
import xlwings as xw
import pyxlsb as pyx
from pyxlsb import open_workbook
df=[]
with open_workbook(path) as wb:
with wb.get_sheet('XYZ') as XYZ:
for row in XYZ.rows():
df.append([item.v for item in row])
df1 = pd.DataFrame(df[1:], columns=df[0])
df2 = pd.read_excel(path1)
df2 = pd.DataFrame(df2)
This code is working so far but I can not find the solution to delete all rows in sheet XYZ (.xlsb file) and insert all data from df2 (.xlsx file) and than save all this in current .xlsb file (path)

Related

Appending dataframes from json files in a for loop

I am trying to iterate through json files in a folder and append them all into one pandas dataframe.
If I say
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os
directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)
df_all = pd.DataFrame()
with open("building_data/rooms.json") as file:
data = json.load(file)
df = json_normalize(data['rooms'])
df_y.append(df, ignore_index=True)
I get a dataframe with the data from the one file. If I turn this thinking into a for loop, I have tried
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os
directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)
df_all = pd.DataFrame()
for file in os.listdir(directory):
with open(directory_in_str+'/'+filename) as file:
data = json.load(file)
df = json_normalize(data['rooms'])
df_all.append(df, ignore_index=True)
print(df_all)
This returns an empty dataframe. Does anyone know why this is happening? If I print df before appending it, it prints the correct values, so I am not sure why it is not appending.
Thank you!

Instead of append next DataFrame I would try to join them like that:
if df_all.empty:
df_all = df
else:
df_all = df_all.join(df)
When joining DataFrames, you can specify on what they should be joined - on index or on specific (key) column, as well as how (default option is similar to appending - 'left').
Here's docs about pandas.DataFrame.join.

In these instances I load everything from json into a list by appending each file's returned dict onto that list. Then I pass the list to pandas.DataFrame.from_records (docs)
In this case the source would become something like...
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os
directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)
json_data = []
for file in os.listdir(directory):
with open(directory_in_str+'/'+filename) as file:
data = json.load(file)
json_data.append( json_normalize(data['rooms']) )
df_all = pandas.DataFrame.from_records( json_data )
print(df_all)

How to add one more column to my file (modifiedFlights.csv) from another file (original.csv)

I want to add one names column in my file (modifiedFlights.csv) from another file (original.csv) which has a column names. The goal is to add names in modifiedFlights.csv after comparing column hashes which is present in both files. But I am not able to do so.
import os
import glob
from pathlib import Path
import pandas as pd
import pandas
import csv
import numpy as np
from pandas import DataFrame
import sys, argparse, csv
hashes=pd.read_csv(r'C:\Users\Sajid\Desktop\original.csv', usecols=[0]) #hashes in original.csv
names=pd.read_csv(r'C:\Users\Sajid\Desktop\original.csv', usecols=[1])
this=pd.read_csv(r'C:\Users\Sajid\Desktop\csv files\modifiedFlights.csv', usecols=[4])# hashes in modifiedFlights.csv
for i in hashes:
for y in this:
if i == y:
results_row=pd.read_csv(r'C:\Users\Sajid\Desktop\original.csv', usecols=[1], userows=[i])
with open(r'C:\Users\Sajid\Desktop\csv files\modifiedFlights.csv','r') as csvinput:
with open(r'C:\Users\Sajid\Desktop\out.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
writer.writerow(row+[result_row])

How to read specific columns in an xlsb in Python

I'm trying to read spreadsheets in an xlsb file in python and I've used to code below to do so. I found the code in stack overflow and I'm sure that it reads every single column in a row of a spreadsheet and appends it to a dataframe. How can I modify this code so that it only reads/appends specific columns of the spreadsheet i.e. I only want to append data in columns B through D into my dataframe.
Any help would be appreciated.
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row])
df = pd.DataFrame(df[1:], columns=df[0])

pyxlsb itself cannot do it, but it is doable with the help of xlwings.
import pandas as pd
import xlwings as xw
from pyxlsb import open_workbook as open_xlsb
with open_xlsb(r"W:\path\filename.xlsb") as wb:
Data=xw.Range('B:D').value
#Creates a dataframe using the first list of elements as columns
Data_df = pd.DataFrame(Data[1:], columns=Data[0])

Just do:
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row if item.c > 0 and item.c < 4])
df = pd.DataFrame(df[1:], columns=df[0])
item.c refers to the column number starting at 0

Excel Integration- How to append a dataframe in a specific range of cells with python Openpyxl

I´m trying to append a pandas dataframe into a xlsm file using the Openpyxl module of Python.
The problem is that i get write on the excel file, but just on the first column, but i want to write on the blank spaces of that excel.
Could anyone point out where my syntax is incorrect?
Below is an image of the table and my python code
import pandas as pd
from pandas import read_excel, read_csv
import openpyxl as px
from openpyxl import Workbook, load_workbook, cell
import numpy as np
from openpyxl.compat import range
from openpyxl.utils.dataframe import dataframe_to_rows
a = load_workbook(r"C:\Users\45050393\Documents\libro_vacio.xlsm", keep_vba = True)
df = pd.DataFrame({1 : [23, 34, 56, 78, 89, 12, 48]})
print(df)
ws = a.active
hoja_a_marcar = a.get_sheet_by_name("Sheet1")
startcol = ws["H2":"H8"]
hoja_a_marcar.cell(row = 3, column = 4).value
for r in dataframe_to_rows(df, index = False, header = False):
for rows in startcol:
ws.append(r)
print("HECHO")
a.save(filename= r"C:\Users\45050393\Documents\libro_vacio.xlsm")

Merge multiple excel sheet to one sheet

I have an xls file which containes multiple sheets, i want to merge all this sheet in one and only one sheet.
import numpy as np
import pandas as pd
import glob
import os
import xlrd
df = pd.concat(map(pd.read_excel, glob.glob(os.path.join('', "bank.xls"))))
Tried this got a warning
WARNING *** file size (25526815) not 512 + multiple of sector size (512)
And nothing Happened
I want to concat all this sheet

This works for me (just tested).
import pandas as pd
import sys
input_file = 'C:\\your_path\\Book1.xlsx'
output_file = 'C:\\your_path\\BookFinal.xlsx'
df = pd.read_excel(input_file, None)
all_df = []
for key in df.keys():
all_df.append(df[key])
data_concatenated = pd.concat(all_df,axis=0,ignore_index=True)
writer = pd.ExcelWriter(output_file)
data_concatenated.to_excel(writer,sheet_name='merged',index=False)
writer.save()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Changing and saving xlsb file in pandas - python

Related

Appending dataframes from json files in a for loop

How to add one more column to my file (modifiedFlights.csv) from another file (original.csv)

How to read specific columns in an xlsb in Python

Excel Integration- How to append a dataframe in a specific range of cells with python Openpyxl

Merge multiple excel sheet to one sheet

Categories

Resources