I have a series of csv files in a specific folder on my computer. Need to write a python code to pick those CSV files and extract them into another designated folder on my drive as XLSX. On each file, Column L,M,N is formatted as Date. Column AA & AF is formatted as Number. Other columns can be stored as text or General.
Here is some code i got stuck at
from openpyxl import Workbook
import csv
wb = Workbook()
ws = wb.active
with open('test.csv', 'r') as f:
for row in csv.reader(f):
ws.append(row)
wb.save('name.xlsx')
Using pandas this task should be quite simple.
import pandas as pd
df = pd.read_csv('test.csv')
df.to_excel('test.xlsx')
You can do that for any amount of files by changing the strings to the appropriate filenames.
Edit
I am not sure if you can save by the desired type. You may be able to change that using another package or even pandas. In pandas you can perform pd.to_dateime or pd.to_numeric on a Series to change its type. You can also specify dtype when importing. Hope that helps!
the solution should be something like this
import pandas as pd
import os
dpath = 'path//to//folder'
for filename in os.listdir('dpath'):
df = pd.read_csv(path + '/' + filename)
df = df['a':'b'] #select required columns based on your requirement.
df["a"] = pd.to_numeric(df["a"]) # convert datatype of the column based on your need
df1.append(df)
del df
df1.to_excel('test.xlsx')
Related
I am trying to collect multiple csvs files into one excel workbook and keeping the names of csvs files on each sheet but the loop can not save the sheet for each step and I only get only the last sheet only ?
for i in range(0,len(dir)):
for filee in os.listdir(dir):
if filee.endswith(".csv"):
file_path = os.path.join(dir, filee)
df = pd.read_csv(file_path, on_bad_lines='skip')
df.to_excel("output.xlsx",sheet_name=filee, index=False)
i=i+1
I have tried ExcelWriter but the file got error
could anyone help to fix this problem
Regards
This code would produce a SyntaxError since the first for loop is not defined properly. However, assuming that it is an IndentationError and moving to the for-loop body.
In each .csv file, the for-loop reads that into a pandas.DataFrame and writes it into output.xlsx. Basically, you override the file in each iteration. Thus, you only see the last sheet only.
Please! have a look to this link: Add worksheet to existing Excel file with pandas
Usually, the problem is the type of the sheet name. For example in df.to_excel("Output.xlsx",sheet_name = '1') If I don't put the 1 in the quotation, I will get an error. It must always be of str type
For example, I have the following csv files in Google Collab files:
With the following code, I first put all of them in df and then transfer them to the Excel file (in separate sheets).
import pandas as pd
df = {}
for i in range(1,5):
df[i] = pd.read_csv('sample_data/file'+str(i)+'.csv')
with pd.ExcelWriter('output.xlsx') as writer:
for i in range(1,5):
df[i].to_excel(writer, sheet_name = str(i))
It works fine for me and I don't get any errors.
You can use a dict comp to store all dfs and file names from each csv then pass it to a function. Unpack dict with a list comp and write to sheets.
from pathlib import Path
import pandas as pd
path = "/path/to/csv/files"
def write_sheets(file_map: dict) -> None:
with pd.ExcelWriter(f"{path}/output.xlsx", engine="xlsxwriter") as writer:
[df.to_excel(writer, sheet_name=sheet_name, index=False) for sheet_name, df in file_map.items()]
file_mapping = {Path(file).stem: pd.read_csv(file) for file in Path(path).glob("*csv")}
write_sheets(file_mapping)
Python 3.8.5 Pandas 1.1.3
I'm using the following to loop through json files and create csv files:
import os
import glob
impot pandas as pd
def stuff():
results_list = []
for filepath in glob.iglob('/Users/me/data/*.json'):
filename = str(filepath)
file = open(filepath,"r")
data = file.read()
df = pd.json_normalize(data, 'main')
df.to_csv(filename + '.csv')
file.close()
results_list.append(data)
return results_list
The format of the resulting csv files fits my requirements exactly without having to pass any additional params to the to_csv method - when viewing the csv file in Excel, row 1 is the keys as the headers, and column 1 is the index numbers. Exactly what I need. Cell A1 is blank.
One final step that I need to accomplish is to write the filename variable value to the csv file. Ideally I'd like to put it in cell A1, if possible. Can I accomplish this solely with to_csv or am I going to need to get into csv.writer world?
You can exploit the index name for that purpose:
df.rename_axis('somename').to_csv()
I need to capture date from multiple csv filenames and add that date in each file as a new column using Python , I have this code that works well with Excel files and I am trying to do exactly the same with CSV files, If someone could help me that would be much appreciated.
Filenames are as following...
Scan_05-22-2021.csv
Scan_05-23-2021.csv
Scan_05-24-2021.csv and so on..
Excel code that works..
import openpyexcel
import os
import pandas as pd
import glob
import csv
from openpyexcel import load_workbook
import os
path_to_xls = os.getcwd() # or r'<path>'
for xls in os.listdir ('C:\Python'):
if xls.endswith(".csv") or xls.endswith(".xlsx"):
f = load_workbook(filename=xls)
sheet = f.active
# Change here the name of the new column
sheet.cell(row=1, column=25).value = "DateTest"
for i in range(sheet.max_row-1):
#xls.split('_')[1][:-5] #kaes value of Col1 and dumps/overwrites in column 3
sheet.cell(row=i+2, column=25).value = xls.split('_')[1][:-5]
f.save(xls)
f.close()
You should be able to do this with pandas
use pd.read_csv to load the files as DataFrames
you can use the iterrows method to go ever rows
and simply append to the new file.
this cheatsheet could be of use
Good luck!
For the past few days I've been trying to do a relatively simple task but I'd always encounter some errors so I'd really appreciate some help on this. Here goes:
I have an Excel file which contains a specific column (Column F) that has a list of IDs.
What I want to do is for the program to read this excel file and allow the user to input any of the IDs they would like.
When the user types in one of the IDs, I would want the program to return a bunch IDs that contain the text that the user has inputted, and after that I'd like to export those 'bunch of IDs' to a new & separate Excel file where all the IDs would be displayed in one column but in separate rows.
Here's my code so far, I've tried using arrays and stuff but nothing seems to be working for me :/
import pandas as pd
import numpy as np
import re
import xlrd
import os.path
import xlsxwriter
import openpyxl as xl;
from pandas import ExcelWriter
from openpyxl import load_workbook
# LOAD EXCEL TO DATAFRAME
xls = pd.ExcelFile('N:/TEST/TEST UTILIZATION/IA 2020/Dev/SCS-FT-IE-Report.xlsm')
df = pd.read_excel(xls, 'FT')
# GET USER INPUT (USE AD1852 AS EXAMPLE)
value = input("Enter a Part ID:\n")
print(f'You entered {value}\n\n')
i = 0
x = df.loc[i, "MFG Device"]
df2 = np.array(['', 'MFG Device', 'Loadboard Group','Socket Group', 'ChangeKit Group'])
for i in range(17367):
# x = df.loc[i, "MFG Device"]
if value in x:
df = np.array[x]
df2.append(df)
i += 1
print(df2)
# create excel writer object
writer = pd.ExcelWriter('N:/TEST/TEST UTILIZATION/IA 2020/Dev/output.xlsx')
# write dataframe to excel
df2.to_excel(writer)
# save the excel
writer.save()
print('DataFrame is written successfully to Excel File.')
Any help would be appreciated, thanks in advance! :)
It looks like you're doing much more than you need to do. Rather than monkeying around with xlsxwriter, pandas.DataFrame.to_excel is your friend.
Just do
df2.to_excel("output.xlsx")
You don't need xlsxwriter. Simply df.to_excel() would work. In your code df2 is a numpy array/ First convert it into a pandas DataFrame format a/c to the requirement (index and columns) before writing it to excel.
I wanted to delete specific rows from every single csv. files in my directory (i.e. from row 0 to 33), but I have 224 separate csv. files which need to be done. I would be happy if you help me how can I use one code to carry out this.
I think you can use glob and pandas to do this quite easily, I'm not sure if you want to write over your original files something I never recommend, so be careful as this code will do that.
import os
import glob
import pandas as pd
os.chdir(r'yourdir')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
df = pd.read_csv(file)
df = df.iloc[33:,] # read from row 34 onwards.
df.to_csv(file)
print(f"{file} has removed rows 0-33")
or something along those lines..
This is a simple combination of two separate tasks.
First, you need to loop through all the csv files in a folder. See this StackOverflow answer for how to do that.
Next, within that loop, for each file, you need to modify the csv by removing rows. See this answer for how to read a csv, write a csv, and omit certain rows based on a condition.
One final aspect is that you want to omit certain line numbers. A good way to do this is with the enumerate function.
So code such as this will give you the line numbers.
import csv
input = open('first.csv', 'r')
output = open('first_edit.csv', 'w')
writer = csv.writer(output)
for i, row in enumerate(input):
if i > 33:
writer.writerow(row)
input.close()
output.close()
Iterate over CSV files and use Pandas to remove the top 34 rows of each file then save it to an output directory.
Try this code after installing pandas:
from pathlib import Path
import pandas as pd
source_dir = Path('path/to/source/directory')
output_dir = Path('path/to/output/directory')
for file in source_dir.glob('*.csv'):
df = pd.read_csv(file)
df.drop(df.head(34).index, inplace=True)
df.to_csv(output_dir.joinpath(file.name), index=False)