Python: Looping through multiple csv files and making multiple new csv files

Python: Looping through multiple csv files and making multiple new csv files - python

I am starting out in Python, and I am looking at csv files.
Basically my situation is this:
I have coordinates X, Y, Z in a csv.
X Y Z
1 1 1
2 2 2
3 3 3
and I want to go through and add a user defined offset value to all Z values and make a new file with the edited z-values.
here is my code so far which I think is right:
# list of lists we store all data in
allCoords = []
# get offset from user
offset = int(input("Enter an offset value: "))
# read all values into memory
with open('in.csv', 'r') as inFile: # input csv file
reader = csv.reader(inFile, delimiter=',')
for row in reader:
# do not add the first row to the list
if row[0] != "X":
# create a new coord list
coord = []
# get a row and put it into new list
coord.append(int(row[0]))
coord.append(int(row[1]))
coord.append(int(row[2]) + offset)
# add list to list of lists
allCoords.append(coord)
# write all values into new csv file
with open(in".out.csv", "w", newline="") as f:
writer = csv.writer(f)
firstRow = ['X', 'Y', 'Z']
allCoords.insert(0, firstRow)
writer.writerows(allCoords)
But now come's the hard part. How would I go about going through a bunch of csv files (in the same location), and producing a new file for each of the csv's.
I am hoping to have something like: "filename.csv" turns into "filename_offset.csv" using the original file name as a starter for the new filename, appending ".offset" to the end.
I think I need to use "os." functions, but I am not sure how to, so any explanation would be much appreciated along with the code! :)
Sorry if I didn't make much sense, let me know if I need to explain more clearly. :)
Thanks a bunch! :)

shutil.copy2(src, dst)¶
Similar to shutil.copy(), but metadata is copied as well
shutil
The glob module finds all the pathnames matching a specified pattern
according to the rules used by the Unix shell. No tilde expansion is
done, but *, ?, and character ranges expressed with [] will be correctly matched
glob
import glob
from shutil import copy2
import shutil
files = glob.glob('cvs_DIR/*csv')
for file in files:
try:
# need to have full path of cvs_DIR
oldName = os.path.join(cvs_DIR, file)
newName = os.path.join(cvs_DIR, file[:4] + '_offset.csv')
copy2(oldName,newName)
except shutil.Error as e:
print('Error: {}'.format(e))

BTW, you can write ...
for row in reader:
if row[0] == "X":
break
for row in reader:
coord = []
...
... instead of ...
for row in reader:
if row[0] != "X":
coord = []
...
This stops checking for 'X'es after the first line.
It works because you dont work with a real list here but with a self consuming iterator, which you can stop and restart.
See also: Detecting if an iterator will be consumed.

Related

How to print rows from a list (CSV) using the results of xrange?

I'm trying to figure out how to print rows from a list using inputed xrange. I'm trying not to abbreviate my code because I'm still learning and want to follow the logic clearly. So what I want is to print a range of rows from the user provided arguments. I know I have some of the order wrong, but I'm stuck on how to type it out... Here is what I have so far:
import csv
import sys
sourcef = sys.argv[1] # Source file path/name
destf = sys.argv[2] # Destination file path/name
linestart = int(sys.argv[3]) # Starting row to delete
lineend = int(sys.argv[4]) # Ending row to delete
with open (sourcef, 'rb') as file1, open (destf, 'wb') as file2:
reader = csv.reader(file1)
writer = csv.writer(file2)
row = list(reader)
for i in xrange(linestart, lineend):
print i

What you want is a slice object not xrange:
...
rows = list(reader)
for x in rows[slice(linestart, lineend)]:
print x
which is the more verbose form of:
...
rows = list(reader)
for x in rows[linestart: lineend]:
print x
You may not need to materialise the reader iterator as a list, especially in cases where you only need a small portion of the rows. In that case, you can use itertools.islice.
from itertools import islice
...
for x in islice(reader, linestart, lineend)
print x

python: adding a zero if my value is less then 3 digits long

I have a csv file that needs to add a zero in front of the number if its less than 4 digits.
I only have to update a particular row:
import csv
f = open('csvpatpos.csv')
csv_f = csv.reader(f)
for row in csv_f:
print row[5]
then I want to parse through that row and add a 0 to the front of any number that is shorter than 4 digits. And then input it into a new csv file with the adjusted data.

You want to use string formatting for these things:
>>> '{:04}'.format(99)
'0099'
Format String Syntax documentation

When you think about parsing, you either need to think about regex or pyparsing. In this case, regex would perform the parsing quite easily.
But that's not all, once you are able to parse the numbers, you need to zero fill it. For that purpose, you need to use str.format for padding and justifying the string accordingly.
Consider your string
st = "parse through that row and add a 0 to the front of any number that is shorter than 4 digits."
In the above lines, you can do something like
Implementation
parts = re.split(r"(\d{0,3})", st)
''.join("{:>04}".format(elem) if elem.isdigit() else elem for elem in parts)
Output
'parse through that row and add a 0000 to the front of any number that is shorter than 0004 digits.'

The following code will read in the given csv file, iterate through each row and each item in each row, and output it to a new csv file.
import csv
import os
f = open('csvpatpos.csv')
# open temp .csv file for output
out = open('csvtemp.csv','w')
csv_f = csv.reader(f)
for row in csv_f:
# create a temporary list for this row
temp_row = []
# iterate through all of the items in the row
for item in row:
# add the zero filled value of each temporary item to the list
temp_row.append(item.zfill(4))
# join the current temporary list with commas and write it to the out file
out.write(','.join(temp_row) + '\n')
out.close()
f.close()
Your results will be in csvtemp.csv. If you want to save the data with the original filename, just add the following code to the end of the script
# remove original file
os.remove('csvpatpos.csv')
# rename temp file to original file name
os.rename('csvtemp.csv','csvpatpos.csv')
Pythonic Version
The code above is is very verbose in order to make it understandable. Here is the code refactored to make it more Pythonic
import csv
new_rows = []
with open('csvpatpos.csv','r') as f:
csv_f = csv.reader(f)
for row in csv_f:
row = [ x.zfill(4) for x in row ]
new_rows.append(row)
with open('csvpatpos.csv','wb') as f:
csv_f = csv.writer(f)
csv_f.writerows(new_rows)

Will leave you with two hints:
s = "486"
s.isdigit() == True
for finding what things are numbers.
And
s = "486"
s.zfill(4) == "0486"
for filling in zeroes.

CSV parsing in Python

I want to parse a csv file which is in the following format:
Test Environment INFO for 1 line.
Test,TestName1,
TestAttribute1-1,TestAttribute1-2,TestAttribute1-3
TestAttributeValue1-1,TestAttributeValue1-2,TestAttributeValue1-3
Test,TestName2,
TestAttribute2-1,TestAttribute2-2,TestAttribute2-3
TestAttributeValue2-1,TestAttributeValue2-2,TestAttributeValue2-3
Test,TestName3,
TestAttribute3-1,TestAttribute3-2,TestAttribute3-3
TestAttributeValue3-1,TestAttributeValue3-2,TestAttributeValue3-3
Test,TestName4,
TestAttribute4-1,TestAttribute4-2,TestAttribute4-3
TestAttributeValue4-1-1,TestAttributeValue4-1-2,TestAttributeValue4-1-3
TestAttributeValue4-2-1,TestAttributeValue4-2-2,TestAttributeValue4-2-3
TestAttributeValue4-3-1,TestAttributeValue4-3-2,TestAttributeValue4-3-3
and would like to turn this into tab seperated format like in the following:
TestName1
TestAttribute1-1 TestAttributeValue1-1
TestAttribute1-2 TestAttributeValue1-2
TestAttribute1-3 TestAttributeValue1-3
TestName2
TestAttribute2-1 TestAttributeValue2-1
TestAttribute2-2 TestAttributeValue2-2
TestAttribute2-3 TestAttributeValue2-3
TestName3
TestAttribute3-1 TestAttributeValue3-1
TestAttribute3-2 TestAttributeValue3-2
TestAttribute3-3 TestAttributeValue3-3
TestName4
TestAttribute4-1 TestAttributeValue4-1-1 TestAttributeValue4-2-1 TestAttributeValue4-3-1
TestAttribute4-2 TestAttributeValue4-1-2 TestAttributeValue4-2-2 TestAttributeValue4-3-2
TestAttribute4-3 TestAttributeValue4-1-3 TestAttributeValue4-2-3 TestAttributeValue4-3-3
Number of TestAttributes vary from test to test. For some tests there are only 3 values, for some others 7, etc. Also as in TestName4 example, some tests are executed more than once and hence each execution has its own TestAttributeValue line. (in the example testname4 is executed 3 times, hence we have 3 value lines)
I am new to python and do not have much knowledge but would like to parse the csv file with python. I checked 'csv' library of python and could not be sure whether it will be enough for me or shall I write my own string parser? Could you please help me?
Best

I'd use a solution using the itertools.groupby function and the csv module. Please have a close look at the documentation of itertools -- you can use it more often than you think!
I've used blank lines to differentiate the datasets, and this approach uses lazy evaluation, storing only one dataset in memory at a time:
import csv
from itertools import groupby
with open('my_data.csv') as ifile, open('my_out_data.csv', 'wb') as ofile:
# Use the csv module to handle reading and writing of delimited files.
reader = csv.reader(ifile)
writer = csv.writer(ofile, delimiter='\t')
# Skip info line
next(reader)
# Group datasets by the condition if len(row) > 0 or not, then filter
# out all empty lines
for group in (v for k, v in groupby(reader, lambda x: bool(len(x))) if k):
test_data = list(group)
# Write header
writer.writerow([test_data[0][1]])
# Write transposed data
writer.writerows(zip(*test_data[1:]))
# Write blank line
writer.writerow([])
Output, given that the supplied data is stored in my_data.csv:
TestName1
TestAttribute1-1 TestAttributeValue1-1
TestAttribute1-2 TestAttributeValue1-2
TestAttribute1-3 TestAttributeValue1-3
TestName2
TestAttribute2-1 TestAttributeValue2-1
TestAttribute2-2 TestAttributeValue2-2
TestAttribute2-3 TestAttributeValue2-3
TestName3
TestAttribute3-1 TestAttributeValue3-1
TestAttribute3-2 TestAttributeValue3-2
TestAttribute3-3 TestAttributeValue3-3
TestName4
TestAttribute4-1 TestAttributeValue4-1-1 TestAttributeValue4-2-1 TestAttributeValue4-3-1
TestAttribute4-2 TestAttributeValue4-1-2 TestAttributeValue4-2-2 TestAttributeValue4-3-2
TestAttribute4-3 TestAttributeValue4-1-3 TestAttributeValue4-2-3 TestAttributeValue4-3-3

The following does what you want, and only reads up to one section at a time (saves memory for a large file). Replace in_path and out_path with the input and output file paths respectively:
import csv
def print_section(section, f_out):
if len(section) > 0:
# find maximum column length
max_len = max([len(col) for col in section])
# build and print each row
for i in xrange(max_len):
f_out.write('\t'.join([col[i] if len(col) > i else '' for col in section]) + '\n')
f_out.write('\n')
with csv.reader(open(in_path, 'r')) as f_in, open(out_path, 'w') as f_out:
line = f_in.next()
section = []
for line in f_in:
# test for new "Test" section
if len(line) == 3 and line[0] == 'Test' and line[2] == '':
# write previous section data
print_section(section, f_out)
# reset section
section = []
# write new section header
f_out.write(line[1] + '\n')
else:
# add line to section
section.append(line)
# print the last section
print_section(section, f_out)
Note that you'll want to change 'Test' in the line[0] == 'Test' statement to the correct word for indicating the header line.
The basic idea here is that we import the file into a list of lists, then write that list of lists back out using an array comprehension to transpose it (as well as adding in blank elements when the columns are uneven).

How to add a data point to an already existing .csv (with Python)?

I am a complete newb at this and I have the following script..
It writes some random data to .csv. My end goal is to keep this preexisting .csv but add ONE random generated datapoint to the beginning of this csv in a separate Python script.
Completely new at this -- not sure how to go about doing this. Thanks for your help.
output = [a,b]
d = csv.writer(csvfile, delimiter=',', quotechar='|',
quoting=csv.QUOTE_MINIMAL)
d.writerow(output)

Are you sure you are trying to add it to the start of the file? I feel like you would want to add it to the end or if you did want to add it at the beginning you would at least want to put it after the header row which is ['name', 'value'].
As it stands your current script has several errors when I try to compile it myself so I can help you out a bit there.
The directory string doesn't work because of the slashes. It will work if you add an r in front (for raw string) like so r'C:/Users/AMB/Documents/Aptana Studio 3 Workspace/RAVE/RAVE/resources/csv/temperature.csv'
You don't need JSON to import json or logging if this is the entirety of your code.
Inside of your for loop you redefine the temperature writer which is unnecessary, your definition at the start is good enough.
You have an extra comma in your the line output = [timeperiod, temp,]
Moving on to a script that inserts a single data point. This script reads in your existing file. Inserts a new line (you would use random values, I used 1 for time and 2 for value) on the second line which is beneath the header. Let me know if this isn't what you are looking for.
directory = r"C:/Users/AMB/Documents/Aptana Studio 3 Workspace/RAVE/RAVE/resources/csv/temperature.csv"
with open(directory, 'r') as csvfile:
s = csvfile.readlines()
time = 1
value = 2
s.insert(2, '%(time)d,%(value)d\n\n' % \
{'time': time, "value": value})
with open(directory, 'w') as csvfile:
csvfile.writelines(s)
This next section is in response to your more detailed question in the comments:
import csv
import random
directory = r"C:\Users\snorwood\Desktop\temperature.csv"
# Open the file
with open(directory, 'r') as csvfile:
s = csvfile.readlines()
# This array will store your data
data = []
# This for loop converts the data read from the text file into integers values in your data set
for i, point in enumerate(s[1:]):
seperatedPoint = point.strip("\n").split(",")
if len(seperatedPoint) == 2:
data.append([int(dataPoint) for dataPoint in seperatedPoint])
# Loop through your animation numberOfLoops times
numberOfLoops = 100
for i in range(numberOfLoops):
if len(data) == 0:
break
del data[0] # Deletes the first data point
newTime = data[len(data) - 1][0] + 1 # An int that is one higher than the current last time value
newRandomValue = 2
data.append([newTime, newRandomValue]) # Adds the new data point to the end of the array
# Insert your drawing code here
# Write the data back into the text file
with open(directory, 'w') as csvfile: #opens the file for writing
temperature = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL) # The object that knows how to write to files
temperature.writerow(["name", "values"]) # Write the header row
for point in data: # Loop through the points stored in data
temperature.writerow(point) # Write current point in set

Python- Read from Multiple Files

I have 125 data files containing two columns and 21 rows of data. Please see the image below:
and I'd like to import them into a single .csv file (as 250 columns and 21 rows).
I am fairly new to python but this what I have been advised, code wise:
import glob
Results = [open(f) for f in glob.glob("*.data")]
fout = open("res.csv", 'w')
for row in range(21):
for f in Results:
fout.write( f.readline().strip() )
fout.write(',')
fout.write('\n')
fout.close()
However, there is slight problem with the code as I only get 125 columns, (i.e. the force and displacement columns are written in one column) Please refer to the image below:
I'd very much appreciate it if anyone could help me with this !

import glob
results = [open(f) for f in glob.glob("*.data")]
sep = ","
# Uncomment if your Excel formats decimal numbers like 3,14 instead of 3.14
# sep = ";"
with open("res.csv", 'w') as fout:
for row in range(21):
iterator = (f.readline().strip().replace("\t", sep) for f in results)
line = sep.join(iterator)
fout.write("{0}\n".format(line))
So to explain what went wrong with your code, your source files use tab as a field separator, but your code uses comma to separate the lines it reads from those files. If your excel uses period as a decimal separator, it uses comma as a default field separator. The whitespace is ignored unless enclosed in quotes, and you see the result.
If you use the text import feature of Excel (Data ribbon => From Text) you can ask it to consider both comma and tab as valid field separators, and then I'm pretty sure your original output would work too.
In contrast, the above code should produce a file that will open correctly when double clicked.

You don't need to write your own program to do this, in python or otherwise. You can use an existing unix command (if you are in that environment):
paste *.data > res.csv

Try this:
import glob, csv
from itertools import cycle, islice, count
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).next for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
Results = [open(f).readlines() for f in glob.glob("*.data")]
fout = csv.writer(open("res.csv", 'wb'), dialect="excel")
row = []
for line, c in zip(roundrobin(Results), cycle(range(len(Results)))):
splitline = line.split()
for item,currItem in zip(splitline, count(1)):
row[c+currItem] = item
if count == len(Results):
fout.writerow(row)
row = []
del fout
It should loop over each line of your input file and stitch them together as one row, which the csv library will write in the listed dialect.

I suggest to get used to csv module. The reason is that if the data is not that simple (simple strings in headings, and then numbers only) it is difficult to implement everything again. Try the following:
import csv
import glob
import os
datapath = './data'
resultpath = './result'
if not os.path.isdir(resultpath):
os.makedirs(resultpath)
# Initialize the empty rows. It does not check how many rows are
# in the file.
rows = []
# Read data from the files to the above matrix.
for fname in glob.glob(os.path.join(datapath, '*.data')):
with open(fname, 'rb') as f:
reader = csv.reader(f)
for n, row in enumerate(reader):
if len(rows) < n+1:
rows.append([]) # add another row
rows[n].extend(row) # append the elements from the file
# Write the data from memory to the result file.
fname = os.path.join(resultpath, 'result.csv')
with open(fname, 'wb') as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
The with construct for opening a file can be replaced by the couple:
f = open(fname, 'wb')
...
f.close()
The csv.reader and csv.writer are simply wrappers that parse or compose the line of the file. The doc says that they require to open the file in the binary mode.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Looping through multiple csv files and making multiple new csv files - python

Related

How to print rows from a list (CSV) using the results of xrange?

python: adding a zero if my value is less then 3 digits long

CSV parsing in Python

How to add a data point to an already existing .csv (with Python)?

Python- Read from Multiple Files

Categories

Resources