I am trying to create an importable module to delete a range of columns (specifically columns 73-177 in the file I am working with).I am attempting to edit this file i/o code that is written for removing a column based on the field name. I want to modify this code to delete columns 73-177 in a csv file. What do I need to do to accomplish this?
def removeColumns(num1, num2, inputFILE, FileName):
inPUTfile = open(inputFILE, 'r')
outPUTfile = open(FileName, 'w')
line = inPUTfile.readline()
while line:
# Delete Specified columns. First column range number, second column range number (+1)
lineList = line.split('\t')
removeCOL = "Calendar-Year"
i = 0
while lineList[i] != removeCOL: #(linesout?):
i = i + 1
lineList.pop(i) #remove these fields from the list.append
#write modified fields
remove = "\t".join(lineList)
outPUTfile.write(line) #write the new field names outfile
for line in inPUTfile: #remove field i from each remaining line and write it in the output file &modify input line
lineList = line.split( ) #convert to a list
lineList.pop(i) #remove fields from the list
line = '\t'.join(lineList)
line = line + '\n' #add a carriage return to the end of the row
outPUTfile.write(line)# Write the modified line in the output file
inPUTfile.close() #close the input file
outPUTfile.close() #close the output file
return outPUTfile
print outPUTfile
I realize that you asked how to modify the original code, but here I honestly think it'd be easier to understand how to do it a different way. Python has a useful csv module which handles a lot of the work for you. Something like:
import csv
remove_from = 2
remove_to = 5
with open("to_delete.csv", "rb") as fp_in, open("newfile.csv", "wb") as fp_out:
reader = csv.reader(fp_in, delimiter="\t")
writer = csv.writer(fp_out, delimiter="\t")
for row in reader:
del row[remove_from:remove_to]
writer.writerow(row)
will turn
$ cat to_delete.csv
a b c d e f g
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
into
$ cat newfile.csv
a b f g
1 2 6 7
8 9 13 14
15 16 20 21
Related
So, my program is in python. If I have a spreadsheet in .csv format I want to delete every row that does not contain '80' and move all the remaining row's into a new .csv. I have heard pandas is very useful for this type of data manipulation but that you need certain software installed on your computer for it to run correctly? I am not sure if that is true or not but I am willing to try pandas as well.
A B C D
5 11 20 6
6 1 80 11
40 4 5 6
11 30 16 8
180 2 3 19
7 20 13 74
80 4 11 22
The Result should create a new .csv file that looks like this. Take note that If that row has number has 80 in it stays. Other wise I want it deleted. In the example, there was a row that 180 2 3 4 because there is a 80 in 180 I still wanted that whole row to stay.
A B C D
6 1 80 11
180 2 3 19
80 4 11 22
So far this is my code and I know it's NOT near complete. I am justing testing my code to see if I can copy all the rows to a new file. Like I said I want to target any row that doesn't have an 80 in it and delete it. The next part I need to do in my code is isolated all the rows with 80 and keep them. If anyone has any tutorials or resources for editing .csv that would be greatly appreciated.
import csv
outfile = open("TESTSHEET_editted.csv", 'w') # 'w" is write mode and is the new file being created
with open('TESTSHEET.csv', 'r') as openfile: # 'r' is read mode
reader = csv.reader(openfile, delimiter=',') # delimiter isnt necessary. It just adds a comma
header = next(reader) #if i didnt have A B C D then this line would not be needed
for row in reader: # is there any easier way to write this recursively?
row1 = row[0]
row2 = row[1]
row3 = row[2]
row4 = row[3]
new_line = "{}, {}, {}, {}\n".format(row1, row2, row3, row4)
outfile.write(new_line)
outfile.close() #needs to close new created csv file for it to be used by other programs
import csv
b = [] #create a list to save each row of the csv
with open('TESTSHEET.csv', 'r') as openfile:
reader = csv.reader(openfile)
for i in reader:
b.append(i)
b = [[str(i) for i in k] for k in b] #change all elements to strings
#check whether "80" is in the each joined element, delete the row if 80 is in
#the row
for i,k in enumerate(b):
if "80" in " ".join(b[i]):
b.pop(i)
#change all elements that is a number to int, string(A,B,C,D) remains the same
b = [[int(i) if i.isdigit() else i for i in k ] for k in b]
#write list of list to csv file
with open("TESTSHEET_Edited.csv", "w") as f:
writer = csv.writer(f)
writer.writerows(b)
1 7 c
5 2 q
4 5 a
5 0 c
for i,line in enumerate(read_tsv):
first = read_tsv[i][0]
second = read_tsv[i][1]
letter = read_tsv[i][2]
if i == 2:
I have a tsv file and I'd like to delete the rows where the 3rd values are not c. So I'd like it to look like this. So far I know how to seperate the values I just don't know how to delete the row based on the third tabbed value.
1 7 c
5 0 c
You can open the doc read/iterate it and filter out the unwanted rows then open it in write and write that data back
import csv
with open('filename.tsv', 'r') as f:
reader = csv.reader(f, delimiter='\t')
data = [row for row in reader if row[2] == 'c']
with open('filename.tsv', 'w') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerows(data)
I am reading multiple csv files and combine it in one csv file. The desired outcome of the combined data looks like the following:
0 4 6 8 10 12
1 2 5 4 2 1
5 3 0 1 5 10
....
But in the following code, I intend the column to go from 0,4,6,8,10,12.
for indx, file in enumerate(files_File1):
if file.endswith('csv'): #reading csv filed in the designated folder
filepath = os.path.join(folder_File1, file) #reading csv filed in the designated folder
current = pd.read_csv(filepath, header=None) #reading csv filed in the designated folder
if indx == 0:
mydata_File1 = current.copy()
mydata_File1.columns.values[1] = 4
print(mydata_File1.columns.values)
else:
mydata_File1[2*indx+4] = current.iloc[:,1]
print(mydata_File1.columns.values)
But instead, the outcome looks like this where the column goes from 0,2,4,6,8,10,12.
0 4 2 6 8 10 12
1 2 5 4 2 1
5 3 0 1 5 10
....
I am not quite sure what causes the column named "2".
Any idea?
If there is some reason you need panda, then this will work. Your code references mydata_File1.columns.values which is the name of the columns, not the value in the columns. If this doesn't answer your question, then please provide more complete answer per #juanpa.arrivillaga comment.
#! python3
import os
import pandas as pd
import glob
folder_File1 = r"C:\Users\Public\Documents\Python\CombineCSVFiles"
csv_only = r"\*.csv"
files_File1 = glob.glob(f'{folder_File1}{csv_only}')
new_csv = f'{folder_File1}\\newcsv.csv'
mydata_File1 = []
for indx, file in enumerate(files_File1):
if file == new_csv:
pass
else:
current = pd.read_csv(file, header=None) #reading csv filed in the designated folder
print (current)
if indx == 0:
mydata_File1 = current.copy()
print(mydata_File1.values)
else:
pass
mydata_File1 = mydata_File1.append(current, ignore_index=True)
print(mydata_File1.values)
mydata_File1.to_csv(new_csv)
If you are really just trying to combine .csv files, no need for panda.
#! python3
import glob
folder_File1 = r"C:\Users\Public\Documents\Python\CombineCSVFiles"
csv_only = r"\*.csv"
files_File1 = glob.glob(f'{folder_File1}{csv_only}')
new_csv = f'{folder_File1}\\newcsv.csv'
lines = []
for file in files_File1:
with open(file) as filein:
if filein.name == new_csv:
pass
else:
for line in filein:
line = line.strip() # or some other preprocessing
lines.append(line) # storing everything in memory!
with open(new_csv, 'w') as out_file:
out_file.writelines(line + u'\n' for line in lines)
I wrote a script that finds the maximum value in a log file. I can then write the max value to another file.
My problem:
a. is how do I run it against all the files in a directory
b. write "Filename" + "Max Value" into 1 file.
here's my code:
1 import re
2
3 a = 'NA total MB July.csv'
4 b = 'Total.csv'
5
6 with open(a, 'r') as f1:
7 with open(b,'w') as f2:
8 header = f1.readline()
9 data= f1.readlines()
10 pattern= re.compile(",|\s")
11 maxMB=[]
12 for line in data:
13 parts = pattern.split(line)
14 #print "Log line split",parts #splits the number
15 mbCount= parts[2] #index of the number
16 mbint=float(mbCount)
17 maxMB.append(mbint)# creates a list of all MBs
18 #print "MAX: ", maxMB #prints out the max MB
19 highest=max(maxMB)
20 print highest
21 f2.write(str(highest))#writes highest value to file
Here's my files output
167.94
What I'm looking to see in Total.csv is
NA total MB July : 167.94
NA total MB August: 123.45
...
..
. For all the csv files with in a folder
Can't quite figure out how to make this work without processing 1 file at a time and manually changing the file name. Any help to this n00b would be greatly appreciated. THANKS!
You can use os.listdir() to grab files in current directory.
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
#do something
You can open Total.csv file in ab mode so that you can append all the max values into that one file only.
with open('Total.csv', 'ab') as out:
writer = csv.writer(out, delimiter=',', quotechar='"',quoting=csv.QUOTE_ALL)
row = (,)
writer.writerow(row)
I have a CSV file with hundreds of rows, and I would like to select and export every 3 rows to a new CSV file with the new output CSV file being named after the first row of the selection.
For example in the following CSV file....
1980 10 12
1 2 3 4 5 6 7
4 6 8 1 0 8 6
1981 10 12
2 4 9 7 5 4 1
8 9 3 8 3 7 3
I would like to select the first 3 rows and export to a new CSV named "1980 10 12" based on the first row then select the next 3 rows and export to a new CSV named "1981 10 12" based on the first row of the next 3 rows. I would like to do this using python.
Using the csv module, plus itertools.islice() to select 3 rows each time:
import csv
import os.path
from itertools import islice
with open(inputfilename, 'rb') as infh:
reader = csv.reader(infh)
for row in reader:
filename = row[0].replace(' ', '_') + '.csv')
filename = os.path.join(directory, filename)
with open(filename, 'wb') as outfh:
writer = csv.writer(outfh)
writer.writerow(row)
writer.writerows(islice(reader, 2))
The writer.writerows(islice(reader, 2)) line takes the next 2 rows from the reader, copying them across to the writer CSV, after writing the current row (with the date) to the output file first.
You may need to adjust the delimiter argument for the csv.reader() and csv.writer() objects; the default is a comma, but you didn't specify the exact format and perhaps you need to set it to a '\t' tab instead.
If you are using Python 3, open the files with 'r' and 'w' text mode, and set newline='' for both; open(inputfilename, 'r', newline='') and open(filename, 'w', newline='').
import csv
with open("in.csv") as f:
reader = csv.reader(f)
chunks = []
for ind, row in enumerate(reader, 1):
chunks.append(row)
if ind % 3 == 0: # if we have three new rows, create a file using the first row as the name
with open("{}.csv".format(chunks[0][0].strip(), "w") as f1:
wr = csv.writer(f1)
wr.writerows(chunks) # write all rows
chunks = [] # reset chunks to an empty list
Using slight iterator trickery:
with open('in.csv', 'r') as infh:
for block in zip(*[infh]*3):
filename = block[0].strip() + '.csv'
with open(filename, 'w') as outfh:
outfh.writelines(block)
On Python 2.X you would use itertools.izip. The docs actually mention izip(*[iter(s)]*n) as an idiom for clustering a data series.