suppose I have a file/directory in which many .csv files are present and I have a python code that can read only one csv file and do some algorithm and store the output in an another csv file.Now I need to update that python code so that we can check the file/directory and store the output of the all csv files(which are present inside the directory) in different csv files.
import pandas as pd
import statistics as st
import csv
data = pd.read_csv('1mb.csv')
x_or = list(range(len(data['Main Avg Power (mW)'])))
y_or = list(data['Main Avg Power (mW)'])
time=list(data['Time (s)'])
rt=5000
i=time[rt]
k=i
tlist=[]
for i in time:
tlist.append(y_or[rt])
rt+=1
if i-k>4:
break
idp=st.mean(tlist)
sidp=st.stdev(tlist)
newlist=[]
imax=max(tlist)
imin=min(min(tlist),idp-sidp)
while imax>=y_or[rt]>=imin-1:
newlist.append(y_or[rt])
rt+= 1
print(rt,"Mean idle power:",st.mean(newlist),"mW")
midp=st.mean(newlist)
with open('new_1pp.csv','w',newline='') as f:
thewriter=csv.writer(f)
thewriter.writerow(['Idle Power(mW)'])
thewriter.writerow([midp])
this is the code done by me.please update it as required in the problem.
In your code you can use glob to list all the CSV files in a directory, then read them in one at a time passing each through whatever algorithm you have, and then output them again, e.g.
import glob
import os
# set the name of the directory you want to list the files in
csvdir = 'my_directory'
# get a list of all CSV files, assuming they have a '.csv' suffix
csvfiles = glob.glob(os.path.join(csvdir, '*.csv'))
# loop over all the files and run your algorithm
for csvfile in csvfiles:
# read the csvfile using your current code
# apply your algorithm
# output a new file (e.g. with the same name as before, but with '_new' added
newfile = csvfile.rstrip('.csv') + '_new.csv'
# save to 'newfile' using yor current code
Does that help?
Update:
From the comments and the updated questions, does the following code help:
import pandas as pd
import statistics as st
import csv
import glob
# get list of CSV files from current directory
csvfiles = glob.glob('*.csv')
for csvfile in csvfiles:
data = pd.read_csv(csvfile)
x_or = list(range(len(data['Main Avg Power (mW)'])))
y_or = list(data['Main Avg Power (mW)'])
time=list(data['Time (s)'])
rt=5000
i=time[rt]
k=i
tlist=[]
for i in time:
tlist.append(y_or[rt])
rt+=1
if i-k>4:
break
idp=st.mean(tlist)
sidp=st.stdev(tlist)
newlist=[]
imax=max(tlist)
imin=min(min(tlist),idp-sidp)
while imax>=y_or[rt]>=imin-1:
newlist.append(y_or[rt])
rt+= 1
print(rt,"Mean idle power:",st.mean(newlist),"mW")
midp=st.mean(newlist)
# create new file name using the old one and adding '_new'
newfile = csvfile.rstrip('.csv') + '_new.csv'
with open(newfile,'w',newline='') as f:
thewriter=csv.writer(f)
thewriter.writerow(['Idle Power(mW)'])
thewriter.writerow([midp])
Related
Still quite new to this and am struggling.
I have a directory of a few hundred text files, each file has thousands of lines of information on it.
Some lines contain one number, some many
example:
39 312.000000 168.871795
100.835446
101.800298
102.414406
104.491999
108.855079
107.384008
103.608815
I need to pull all of the information from each text file, I want the name of the text file (minus the '.txt') to be in the first column, and all other information following that to complete the row (regardless of its layout within the file)
import pandas as pd
import os
data= '/path/to/data/'
path='/other/directory/path/'
lst=['list of files needed']
for dirpath, dirs, subj in os.walk(data):
while i<=5: #currently being used to break before iterating through entire directory to check it's working
with open(dirpath +lst[i], 'r') as file:
info=file.read().replace('\n', '') #txt file onto one line
corpus.append(lst[i]+' ') #begin list with txt file name
corpus.append(info) #add file contents to list after file name
output=''.join(corpus) #get out of list format
output.split()
i+=1
df=pd.read_table(output, lineterminator=',')
df.to_csv(path + 'testing.csv')
if i >5:
break
Currently, this is printing Errno 2 (no such file or directory) then goes on to print the contents of the first file and no others, but does not save it to csv.
This also seems horribly convoluted and I'm sure there's another way of doing it
I also suspect the lineterminator will not force each new text file onto a new row, so any suggestions there would be appreciated
desired output:
file1 39 312.000 168.871
file2 72 317.212 173.526
You are loading os and pandas so you can take advantage of their functionality (listdir, path, DataFrame, concat, and to_csv) and drastically reduce your code's complexity.
import os
import pandas as pd
data='data/'
path='output/'
files = os.listdir(data)
output = pd.DataFrame()
for file in files:
file_name = os.path.splitext(file)[0]
with open(os.path.join(data, file)) as f:
info = [float(x) for x in f.read().split()]
#print(info)
df = pd.DataFrame(info, columns=[file_name], index = range(len(info)))
output = pd.concat([output, df], axis=1)
output = output.T
print(output)
output.to_csv(path + 'testing.csv', index=False)
I would double-check that your data folder only has txt files. And, maybe add a check for txt files to the code.
This got less elegant as I learned about the requirements. If you want to flip the columns and rows, just take out the output.T line. This transposes the dataframe.
I need help with my Python code.
The goal is:
read in between 100 and 200 CSV files that are in a folder
copy a variable in each CSV file from position (2,2)
create the sum of all values of column 17 in every CSV
to transfer the values in the form of a dataframe
create a new Excel file
transfer the dataframe in the Excel file
My attempt was the following code
# import necessary libraries
import pandas as pd
import os
import glob
# use glob to get all the csv files
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_csv(f,sep=';', skiprows=2,usecols=[2,16],header=None)
#ID
ID = (df.loc[2][2])
#summ of col.16
dat_Verbr = df[16].sum()
# data in single dataframe
df4 = pd.DataFrame({'SIM-Karte': ID, 'Datenverbrauch': dat_Verbr}, index=[0,1,2,3,4,5])
# Specify the name of the excel file
file_name = 'Auswertung.xlsx'
# saving the excelsheet
concatenated.to_excel(file_name)
print(' record successfully exported into Excel File')
unfortunately, it doesn't work.
Problem is that only the first id and first sum are imported in the excel file.
How can I work with the index by creating a single dataframe. I don’t know the exact number of csv files, only somewhat between 100 and 200.
I'm a beginner with python.
Can someone help me please?
You can use the updated code. One assumption I made is that there is data in all rows 1 thru 16. If your file has just ;;;;... in the first row, read_csv sometimes makes a mistake. Also, as you are using skiprow = 1, it will not add the value in row 1, column 17 if present. You can need to change the code if that needs to be added. Rest, I have corrected/changed so the code works. Note that in to_excel I have used index=False as I didnt think you need the index to be added. Remove if you want to see the index as well.
# use glob to get all the csv files
# in the folder
import os, glob
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))
# data in single dataframe
df4 = pd.DataFrame(columns =['SIM-Karte', 'Datenverbrauch'])
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_csv(f,sep=';', skiprows = 1, usecols=[1,16],header=None)
#ID
ID = (df.iloc[0][1])
#summ of col.16
dat_Verbr = df[16].sum()
df4.loc[len(df4.index)] = [ID, dat_Verbr]
# Specify the name of the excel file
file_name = 'Auswertung.xlsx'
# saving the excelsheet
df4.to_excel(file_name, index=False)
print(' record successfully exported into Excel File')
Output excel - I had 3 files in the folder
I have many csv files in one subfolder, say data. Each of these .csv files contain a date column.
430001.csv, 43001(1).csv,43001(2).csv,..........,43001(110).csv etc.
I want to rename all the files in folder according to the date inside column of csv file.
Desired output:
430001-1980.csv, 43001-1981.csv,43001-1985.csv,..........,43001-2010.csv etc.
I tried to follow the steps advised in :
Renaming multiple csv files
Still could not get the desired output.
Any help would be highly appreciated.
Thanks!
You can loop through them, extract the date to create a new filename, and then save it.
# packages to import
import os
import pandas as pd
import glob
import sys
data_p = "Directory with your data"
output_p = "Directory where you want to save your output"
retval = os.getcwd()
print (retval) # see in which folder you are
os.chdir(data_p) # move to the folder with your data
os.getcwd()
filenames = sorted(glob.glob('*.csv'))
fnames = list(filenames) # get the names of all your files
#print(fnames)
for f in range(len(fnames)):
print(f'fname: {fnames[f]}\n')
pfile = pd.read_csv(fnames[f], delimiter=",") # read in file
#extract filename
filename = fnames[f]
parts = filename.split(".") # giving you the number in file name and .csv
only_id = parts[0].split("(") # if there is a bracket included
# get date from your file
filedate = pfile["date"][0] # assuming this is on the first row
filedate = str(filedate)
# get new filename
newfilename = only_id[0]+"-"+filedate+parts[1]
# save your file (don't put a slash at the end of your directories on top)
pfile.to_csv(output_p+"/"+newfilename, index = False, header = True)
Hello!
I would like to combine horizontally many CSV files (the total number will oscillate around 120-150) into one CSV file by adding one column from each file (in this case column called “grid”). All those files have the same columns and number of the rows (they are constructed the same) and are stored in the same catalogue. I’ve tried with CSV module and pandas. I don't want to define all 120 files. I need a script to do it automatically. I’m stuck and I have no ideas...
Some input CSV files (data) and CSV file (merged) which I would like to get:
https://www.dropbox.com/transfer/AAAAAHClI5b6TPzcmW2dmuUBaX9zoSKYD1ZrFV87cFQIn3PARD9oiXQ
That's how my code looks like when I use the CSV module:
import os
import glob
import csv
os.chdir('\csv_files_direction')
extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
out_merg = ('\merged_csv_file_direction')
with open(out_merg,'wt') as out:
writer = csv.writer(out)
for file in files:
with open(file) as csvfile:
data = csv.reader(csvfile, delimiter=';')
result = []
for row in data:
a = row[3] #column which I need
result.append(a)
Using this code I receive values only from the last CSV. The rest is missing. As a result I would like to have one precise column from each CSV file from the catalogue.
And Pandas:
import os
import glob
import pandas as pd
import csv
os.chdir('\csv_files_direction')
extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
out_merg = ('\merged_csv_file_direction')
in_names = [pd.read_csv(f, delimiter=';', usecols = ['grid']) for f in files]
Using pandas I receive data from all CSV's as the list which can be navigated using e.g in_names[1].
I confess that this is my first try with pandas and I don't have ideas what should be my next step.
I will really appreciate any help!
Thanks in advance,
Mateusz
For the part of CSV i think you need another list define OUTSIDE the loop.
Something like
import os
import sys
dirname = os.path.dirname(os.path.realpath('__file__'))
import glob
import csv
extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
out_merg = ('merged_csv_file_direction')
result= []
with open(out_merg,'wt') as out:
writer = csv.writer(out)
for file in files:
with open(file) as csvfile:
data = csv.reader(csvfile, delimiter=';')
col = []
for row in data:
a = row[3] #column which I need
col.append(a)
result.append((col))
NOTE: I have also changed the way to go into the folder. Now you can run the file direcly in the folder that contains the 2 folders (one for take the data and the other to save the data)
Regarding the part of pandas
you can create a loop again. This time you need to CONCAT the dataframes that you have created using in_names = [pd.read_csv(f, delimiter=';', usecols = ['grid']) for f in files]
I think you can use
import os
import glob
import pandas as pd
import csv
os.chdir('\csv_files_direction')
extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
out_merg = ('\merged_csv_file_direction')
in_names = [pd.read_csv(f, delimiter=';', usecols = ['grid']) for f in files]
result = pd.concat(in_names)
Tell me if it works
So far for my code to read from text files and export to Excel I have:
import glob
data = {}
for infile in glob.glob("*.txt"):
with open(infile) as inf:
data[infile] = [l[:-1] for l in inf]
with open("summary.xls", "w") as outf:
outf.write("\t".join(data.keys()) + "\n")
for sublst in zip(*data.values()):
outf.write("\t".join(sublst) + "\n")
The goal with this was to reach all of the text files in a specific folder.
However, when I run it, Excel gives me an error saying,
"File cannot be opened because: Invalid at the top level of the document. Line 1, Position 1. outputgooderr.txt outputbaderr.txt. fixed_inv.txt
Note: outputgooderr.txt, outputbaderr.txt.,fixed_inv.txt are the names of the text files I wish to export to Excel, one file per sheet.
When I only have one file for the program to read, it is able to extract the data. Unfortunately, this is not what I would like since I have multiple files.
Please let me know of any ways I can combat this. I am very much so a beginner in programming in general and would appreciate any advice! Thank you.
If you're not opposed to having the outputted excel file as a .xlsx rather than .xls, I'd recommend making use of some of the features of Pandas. In particular pandas.read_csv() and DataFrame.to_excel()
I've provided a fully reproducible example of how you might go about doing this. Please note that I create 2 .txt files in the first 3 lines for the test.
import pandas as pd
import numpy as np
import glob
# Creating a dataframe and saving as test_1.txt/test_2.txt in current directory
# feel free to remove the next 3 lines if yo want to test in your directory
df = pd.DataFrame(np.random.randn(10, 3), columns=list('ABC'))
df.to_csv('test_1.txt', index=False)
df.to_csv('test_2.txt', index=False)
txt_list = [] # empty list
sheet_list = [] # empty list
# a for loop through filenames matching a specified pattern (.txt) in the current directory
for infile in glob.glob("*.txt"):
outfile = infile.replace('.txt', '') #removing '.txt' for excel sheet names
sheet_list.append(outfile) #appending for excel sheet name to sheet_list
txt_list.append(infile) #appending for '...txt' to txtt_list
writer = pd.ExcelWriter('summary.xlsx', engine='xlsxwriter')
# a for loop through all elements in txt_list
for i in range(0, len(txt_list)):
df = pd.read_csv('%s' % (txt_list[i])) #reading element from txt_list at index = i
df.to_excel(writer, sheet_name='%s' % (sheet_list[i]), index=False) #reading element from sheet_list at index = i
writer.save()
Output example: