I need to write pandas dataframe (df_new, in my case) into an xlsb file which has some formulas. I am stuck on the code below and do not know what to do next:
with open_workbook('NiSource SLA_xlsb.xlsb') as wb:
with wb.get_sheet("SL Dump") as sheet:
can anyone suggest me how to write dataframe into xlsb file
You could try reading the xlsb file as a dataframe and then concating the two.
import pandas as pd
existingdf = pd.DataFrame()
originaldf = pd.read_excel('./NiSource SLA_xlsb.xlsb'
twodflist = [originaldf, df_new]
existingdf = pd.concat(twodflist)
existingdf.reset_index(drop = True)
existingdf.to_excel(r'PATH\filename.xlsb')
Change the path to wherever you want the output to go to and change filename to what you want the output to be named. Let me know if this works.
Related
Currently I'm working a script that can convert json file to csv format my script is working but I need to modify it to have proper data format like having rows and columns when the json file is converted to csv file, May I know what I need to add or modify on my script?
import pandas as pd
df = pd.read_json (r'/home/admin/myfile.json')
df.to_csv (r'/home/admin/xml/myfileSample.csv', index = None, sep=":")
Taking reference from your code,you can try
df.to_csv(r'/home/admin/xml/myfileSample.csv', encoding='utf-8', header=header,index = None, sep=":")
This could be useful.
import pandas as pd
df_json=pd.read_json("input_file.json")
df_json.head()
df_json.to_csv("output_file.csv",index=False)
Your code is all fine, Just change the to_csv to to_excel function and it should work all fine!
import pandas as pd
df = pd.read_json (r'/home/admin/myfile.json')
df.to_excel (r'/home/admin/xml/myfileSample.csv', index = None, sep=":")
Learn more about the to_excel function of pandas here:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
I have been able to generate several CSV files through an API. Now I am trying to combine all CSV's into a unique Master file so that I can then work on it. But it does not work. Below code is what I have attempted What am I doing wrong?
import glob
import pandas as pd
from pandas import read_csv
master_df = pd.DataFrame()
for file in files:
df = read_csv(file)
master_df = pd.concat([master_df, df])
del df
master_df.to_csv("./master_df.csv", index=False)
Although it is hard to tell what the precise problem is without more information (i.e., error message, pandas version), I believe it is that in the first iteration, master_df and df do not have the same columns. master_df is an empty DataFrame, whereas df has whatever columns are in your CSV. If this is indeed the problem, then I'd suggest storing all your data-frames (each of which represents one CSV file) in a single list, and then concatenating all of them. Like so:
import pandas as pd
df_list = [pd.read_csv(file) for file in files]
pd.concat(df_list, sort=False).to_csv("./master_df.csv", index=False)
Don't have time to find/generate a set of CSV files and test this right now, but am fairly sure this should do the job (assuming pandas version 0.23 or compatible).
I have a series of csv files in a specific folder on my computer. Need to write a python code to pick those CSV files and extract them into another designated folder on my drive as XLSX. On each file, Column L,M,N is formatted as Date. Column AA & AF is formatted as Number. Other columns can be stored as text or General.
Here is some code i got stuck at
from openpyxl import Workbook
import csv
wb = Workbook()
ws = wb.active
with open('test.csv', 'r') as f:
for row in csv.reader(f):
ws.append(row)
wb.save('name.xlsx')
Using pandas this task should be quite simple.
import pandas as pd
df = pd.read_csv('test.csv')
df.to_excel('test.xlsx')
You can do that for any amount of files by changing the strings to the appropriate filenames.
Edit
I am not sure if you can save by the desired type. You may be able to change that using another package or even pandas. In pandas you can perform pd.to_dateime or pd.to_numeric on a Series to change its type. You can also specify dtype when importing. Hope that helps!
the solution should be something like this
import pandas as pd
import os
dpath = 'path//to//folder'
for filename in os.listdir('dpath'):
df = pd.read_csv(path + '/' + filename)
df = df['a':'b'] #select required columns based on your requirement.
df["a"] = pd.to_numeric(df["a"]) # convert datatype of the column based on your need
df1.append(df)
del df
df1.to_excel('test.xlsx')
I have some data in an Excel file. I would like to analyze them using Python. I started by creating a CSV file using this guide.
Thus I have created a CSV (Comma delimited) file filled with the following data:
I wrote a few lines of code in Python using Spyder:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames)
GDP = data.GDP.tolist()
print(GDP)
The output is nothing I've expected:
It can be easily seen that the output differs a lot from the figures in GDP column. I will appreciate any tips or hints which will help to deal with my problem.
Seems like in the GDP column there are decimal values from the first column in the .csv file and first digits of the second column. There's either something wrong with the .csv you created, but more probably you need to specify separator in the pandas.read_csv line. Also, add header=None, to make sure you don't lose the first line of the file (i.e. it will get replaced by colnames).
Try this:
import pandas
colnames = ['GDP', 'Unemployment', 'CPI', 'HousePricing']
data = pandas.read_csv('Dane_2.csv', names = colnames, header=None, sep=';')
GDP = data.GDP.tolist()
print(GDP)
I have large data-frame in a Csv file sample1 from that i have to generate a new Csv file contain only 100 data-frame.i have generate code for it.but i am getting key Error the label[100] is not in the index?
I have just tried as below,Any help would be appreciated
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv")
data_frame1 = data_frame[:100]
data_frame.to_csv("C:/users/raju/sample.csv")`
`
The correct syntax is with iloc:
data_frame.iloc[:100]
A more efficient way to do it is to use nrows argument who purpose is exactly to extract portions of files. This way you avoid wasting resources and time parsing useless rows:
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv", nrows=101) # 100+1 for header
data_frame.to_csv("C:/users/raju/sample.csv")