Combining columns of multiple files into one - Python

Combining columns of multiple files into one - Python - python

I am trying to write a simple script that would import a specific column from multiple data files (.csv like file but with no extension) and export it all to one file with filenames in each column header. I tried this solution (also the code bellow by shaktimaan), which seems to do almost exactly the same, however, I got to some difficulties. Firstly I am still getting ''expected str, bytes or os.PathLike object, not list'' error and I am not really sure what I am doing wrong. I am not sure if the File_name variable should contain file names or file paths, and if I should use a different function to import files because my files don't have a .csv extension in the name.
Thank you for your help,
Šimon
import csv
# List of your files
file_names = ['file1', 'file2']
# Output list of generator objects
o_data = []
# Open files in the succession and
# store the file_name as the first
# element followed by the elements of
# the third column.
for afile in file_names:
file_h = open(afile)
a_list = []
a_list.append(afile)
csv_reader = csv.reader(file_h, delimiter=' ')
for row in csv_reader:
a_list.append(row[2])
# Convert the list to a generator object
o_data.append((n for n in a_list))
file_h.close()
# Use zip and csv writer to iterate
# through the generator objects and
# write out to the output file
with open('output', 'w') as op_file:
csv_writer = csv.writer(op_file, delimiter=' ')
for row in list(zip(*o_data)):
csv_writer.writerow(row)
op_file.close()

Related

Using a loop to open and process txt files with csv

I have data (mixed text and numbers in txt files) and I'd like to write a for loop that creates a list of lists, such that I can process the data from all the files using fewer lines.
So far I have written this:
import csv
path = (some path...)
files = [path + 'file1.txt',path + 'file2.txt', path +
'file3.txt', ...]
for i in files:
with open(i, 'r') as j:
Reader = csv.reader(j)
List = [List for List in Reader]
I think I overwrite List instead of creating a nested list, since I get Reader with size of 1 and a list that's with dimensions for one of the files.
My questions:
Given that the files may contain different line numbers, is it the right approach to save some lines of code? (What could be done better?)
I think the problem is in [List for List in Reader], is there a way to change it so I don't overwrite List? Something like add to List?

You can use the list append() method to add to an existing list. Since csv.reader instances are iterable objects, you can just pass one of them to the method as shown below:
import csv
from pathlib import Path
path = Path('./')
filenames = ['in_file1.txt', 'in_file2.txt'] # etc ...
List = []
for filename in filenames:
with open(path / filename, 'r', newline='') as file:
List.append(list(csv.reader(file)))
print(List)
Update
An even more succinct way to do it would be to use something called a "list comprehension":
import csv
from pathlib import Path
path = Path('./')
filenames = ['in_file1.txt', 'in_file2.txt'] # etc ...
List = [list(csv.reader(open(path / filename, 'r', newline='')))
for filename in filenames]
print(List)

Yes, use .append():
import numpy as np
import matplotlib.pyplot as plt
import csv
path = (some path...)
files = [path+x for x in ['FILESLIST']]
for i in files:
with open(i, 'r') as j:
Reader = csv.reader(j)
List.append([L for L in Reader])

Why is my code resulting in a horizontal list instead of a vertical one?

I'm writing a script to pull in all the file names from a directory, modify the name, and then output the name to an Excel file. As of right now, I'm simply focused on getting the list of file names to output to a .csv file, and when I do this, the files names appear in a horizontal list, one item in each column, instead of a vertical list, one item in each row under a header.
I tried both using the writer.writerow(files) and for i in files: writer.writerow(i). First one gave me the current horizontal output, while the second one broke up each character into a new cell, horizontally, while write the list vertically.
import os
import csv
path = "C:\\Users\\[REST OF PATH]\\"
files = []
csv_filename = "python_list_test.csv"
#r = root, d = directory, f = files
''' * for r,d,f, in os.walk(path):
for file in f:
if '.txt' in file:
files.append(os.path.join(r,file))
for f in files:
print(f)
'''
for r,d,f in os.walk(path):
for name in f:
if '.pdf' in name:
files.append(name)
for i in files:
print(i)
with open(csv_filename, mode='w',newline='') as c:
writer = csv.writer(c)
writer.writerow(['File Name',])
writer.writerow(files)
I expected the code to give me a single column list with each row being the next item. When I print the second method I mentioned earlier(for i in files:...), it looks perfect, but writing to a .csv file separated out the characters.

Just need to change the last row:
writer.writerows([f] for f in files)

Assign csv files to a collection of dictionaries (list) with file name as the keys and file content as the values

i have a problem with the iteration process in python, I've tried and search the solutions, but i think this more complex than my capability (fyi, I've been writing code for 1 month).
The case:
Let say i have 3 csv files (the actual is 350 files), they are file_1.csv, file_2.csv, file_3.csv. I've done the iteration process/algorithm to create all of the filenames in into single list.
each csv contains single column with so many rows.
i.e.
#actual cvs much more like this:
# for file_1.csv:
value_1
value_2
value_3
Below is not the actual csv content (i mean i have converted them into array/series)
file_1.csv --> [['value_1'],['value_2'],['value_3']]
file_2.csv --> [['value_4'],['value_5']]
file_3.csv --> [['value_6']]
#first step was done, storing csv files name to a list, so it can be read and use in csv function.
filename = ['file_1.csv', 'file_2.csv', 'file_3.csv']
I want the result as a list:
#assigning a empty list
result = []
Desired result
print (result)
out:
[{'keys': 'file_1', 'values': 'value_1, value_2, value_3'},
{'keys': 'file_2', 'values': 'value_4, value_5'}
{'keys': 'file_3', 'values': 'value_6'}]
See above that the result's keys are no more containing ('.csv') at the end of file name, they are all replaced. And note that csv values (previously as a list of list or series) become one single string - separated with comma.
Any help is appreciated, Thank you very much

I'd like to answer this to the best of my capacity (I'm a newbie too).
Step1: Reading those 350 filenames
(if you've not figured out already, you could use glob module for this step)
Define the directory where the files are placed, let's say 'C:\Test'
directory = "C:/Test"
import glob
filename = sorted (glob.glob(directory, + "/*.csv"))
This will read all the 'CSV' files in the directory.
Step2: Reading CSV files and mapping them to dictionaries
result = []
import os
for file in files:
filename = str (os.path.basename(file).split('.')[0]) # removes the CSV extension from the filename
with open (file, 'r') as infile:
tempvalue = []
tempdict = {}
print (filename)
for line in infile.readlines():
tempvalue.append(line.strip()) # strips the lines and adds them to a list of temporary values
value = ",".join(tempvalue) # converts the temp list to a string
tempdict[filename] = value # Assigns the filename as key and the contents as value to a temporary dictionary
result.append(tempdict) # Adds the new temp dictionary for each file to the result list
print (result)
This piece of code should work (though there might be a smaller and more pythonic code someone else might share).

Since it seems that the contents of the files is already pretty much in the format you need them (bar the line endings) and you have the names of the 350 files in a list, there isn't a huge amount of processing you need to do. It's mainly a question of reading the contents of each file, and stripping the newline characters.
For example:
import os
result = []
filenames = ['file_1.csv', 'file_2.csv', 'file_3.csv']
for name in filenames:
# Set the filename minus extension as 'keys'
file_data = {'keys': os.path.basename(name).split('.')[0]}
with open(name) as f:
# Read the entire file
contents = f.read()
# Strip the line endings (and trailing comma), and set as 'values'
file_data['values'] = contents.replace(os.linesep, ' ').rstrip(',')
result.append(file_data)
print(result)

Combining csv headers with corresponding file paths into new file

I am not sure how to "crack" the following Python-nut. So I was hoping that some of you more experienced Python'ers could push me in the right direction.
What I got:
Several directories containing many csv files
For instance:
/home/Date/Data1 /home/Date/Data2 /home/Date/Data3/sub1 /home/Date/Data3/sub2
What I want:
A file containing the "splitted" path for each file, followed by the variables (=row/headers) of the corresponding file. Like this:
home /t Date /t Data1 /t "variable1" "variable2" "variable3" ...
home /t Date /t Data2 /t "variable1" "variable2" "variable3" ...
home /t Date /t Data3 /t sub1 /t "variable1" "variable2" "variable3" ...
home /t Date /t Data3 /t sub2 /t "variable1" "variable2" "variable3" ...
Where am I right now?: The first step was to figure out how to print out the first row (the variables) of a single csv file (I used a test.txt file for testing)
# print out variables of a single file:
import csv
with open("test.txt") as f:
reader = csv.reader(f)
i = next(reader)
print(i)
The second step was to figure out how to print the paths of the csv files in directories inclusive subfolders. This is what I ended with:
import os
# Getting the current work directory (cwd)
thisdir = os.getcwd()
# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if ".csv" in file:
print(os.path.join(r, file))
Prints:
/home/Date/Data1/file1.csv
/home/Date/Data1/file2.csv
/home/Date/Data2/file1.csv
/home/Date/Data2/file2.csv
/home/Date/Data2/file3.csv
/home/Date/Data3/sub1/file1.csv
/home/Date/Data3/sub2/file1.csv
/home/Date/Data3/sub2/file2.csv
Where I am stuck?: I am struggling to figure out how to get from here, any ideas, approaches etc. in the right direction is greatly appreciated!
Cheers, B
##### UPDATE #####
Inspired by Tim Pietzcker's useful comments I have gotten a long way (Thanks Tim!).
But I could not get the output.write & join part to work, therefore the code is slightly different. The new issue is now to "merge" the two lists as two separate columns with comma as delimiter (I want to create a csv file). Since I am stuck, yet again, I wanted to see if there is any good suggestions from the experienced python'ers inhere?
#!/usr/bin/python
import os
import csv
thisdir = os.getcwd()
# Extract file-paths and append them to "csvfiles"
for r, d, f in os.walk(thisdir): # r=root, d=directories, f = files
for file in f:
if ".csv" in file:
csvfiles.append(os.path.join(r, file))
# get each file-path on new line + convert to list of str
filepath = "\n".join(["".join(sub) for sub in csvfiles])
filepath = filepath.replace(".csv", "") # remove .csv
filepath = filepath.replace("/", ",") # replace / with ,
Results in:
,home,Date,Data1,file1
,home,Date,Data1,file2
,home,Date,Data1,file3
... and so on
Then on to the headers:
# Create header-extraction function:
def get_csv_headers(filename):
with open(filename, newline='') as f:
reader = csv.reader(f)
return next(reader)
# Create empty list for headers
headers=[]
# Extract headers with the function and append them to "headers" list
for l in csvfiles:
headers.append(get_csv_headers(l))
# Create file with headers
headers = "\n".join(["".join(sublst) for sublst in headers]) # new lines + str conversion
headers = headers.replace(";", ",") # replace ; with ,
Results in:
variable1,variable2,variable3
variable1,variable2,variable3,variable4,variable5,variable6
variable1,variable2,variable3,variable4
and so on..
What I want now: a csv like this:
home,Date,Data1,file1,variable1,variable2,variable3
home,Date,Data1,file2,variable1,variable2,variable3,variable4,variable5,variable6
home,Date,Data1,file3, variable1,variable2,variable3,variable4
For instance:
with open('text.csv', 'w') as f:
writer = csv.writer(f, delimiter=',')
writer.writerows(zip(filepath,headers))
resulted in:
",",v
h,a
o,r
m,i,
e,a
and so on..
Any ideas and pushes in the right direction are very welcome!

About your edit: I would recommend against transforming everything into strings that early in the process. It makes much more sense keeping the data in a structured format and allow the modules designed to handle structured data to do the rest. So your program might look something like this:
#!/usr/bin/python
import os
import csv
thisdir = os.getcwd()
# Extract file-paths and append them to "csvfiles"
for r, d, f in os.walk(thisdir): # r=root, d=directories, f = files
for file in f:
if ".csv" in file:
csvfiles.append(os.path.join(r, file))
This (taken directly from your question) leaves you with a list of CSV filenames.
Now let's read those files. From the script in your question it seems that your CSV files are actually semicolon-separated, not comma-separated. This is common in Europe (because the comma is needed as a decimal point), but Python needs to be told that:
# Create header-extraction function:
def get_csv_headers(filename):
with open(filename, newline='') as f:
reader = csv.reader(f, delimiter=";") # semicolon-separated files!
return next(reader)
# Create empty list for headers
headers=[]
# Extract headers with the function and append them to "headers" list
for l in csvfiles:
headers.append(get_csv_headers(l))
Now headers is a list containing many sub-lists (which contain all the headers as separate items, just as we need them).
Let's not try to put everything on a single line; better keep it readable:
with open('text.csv', 'w', newline="") as f:
writer = csv.writer(f, delimiter=',') # maybe use semicolon again??
for path, header in zip(csvfiles, headers):
writer.writerow(list(path.split("\\")) + header)
If all your paths start with \, you could also use
writer.writerow(list(path.split("\\")[1:]) + header)
to avoid the empty field at the start of each line.

This looks promising; you've already done most of the work.
What I would do is
Collect all your CSV filenames in a list. So instead of printing the filenames, create an empty list (csvfiles=[]) before the os.walk() loop and do something like csvfiles.append(os.path.join(r, file)).
Then, iterate over those filenames, passing each to the routine that's currently used to read test.txt. If you place that in a function, it could look like this:
def get_csv_headers(filename):
with open(filename, newline="") as f:
reader = csv.reader(f)
return next(reader)
Now, you can write the split filename to a new file and add the headers. I'm questioning your file format a bit - why separate part of the line by tabs and the rest by spaces (and quotes)? If you insist on doing it like this, you could use something like
output.write("\t".join(filename.split("\\"))
output.write("\t")
output.write(" ".join(['"{}"'.format(header) for header in get_csv_headers(filename)])
but you might want to rethink this approach. A standard format like JSON might be more readable and portable.

getting a specific row of multiple csv's and writing to a new csv

Have had a good search but can't quite find what I'm looking for. I have a number of csv files printed by a CFD simulation. The goal of my python script was to:
get the final row of each csv and
add the rows to a new file with the filename added to the start of each row
Currently I have
if file.endswith(".csv"):
with open(file, 'r') as f:
tsrNum = file.translate(None, '.csv')
print(tsrNum + ', ' + ', '.join(list(csv.reader(f))[-1]))
Which prints the correct values into the terminal, but I have to manually and paste it into a new file.
Can somebody help with the last step? I'm not familiar enough with the syntax of python, certainly on my to-do list once I finish this CFD project as so far it's been fantastic when I've managed to implement it correctly. I tried using loops and csv.dictWriter, but with little success.
EDIT
I couldn't get the posted solution working. Here's the code a guy helped me make
import csv
import os
import glob
# get a list of the input files, all CSVs in current folder
infiles = glob.glob("*.csv")
# open output file
ofile = open('outputFile.csv', "w")
# column names
fieldnames = ['parameter','time', 'cp', 'cd']
# access it as a dictionary for easy access
writer = csv.DictWriter(ofile, fieldnames=fieldnames)
# output file header
writer.writeheader()
# iterate through list of input files
for ifilename in infiles:
# open input file
ifile = open(ifilename, "rb+")
# access it as a dictionary for easy access
reader = csv.DictReader(ifile)
# get the rows in reverse order
rows = list(reader)
rows.reverse()
# get the last row
row = rows[0]
# output row to output csv
writer.writerow({'parameter': ifilename.translate(None, '.csv'), 'time': row['time'], 'cp': row['cp'], 'cd': row['cd']})
# close input file
ifile.close()
# close output file
ofile.close()

Split your problem in smaller pieces:
looping over directory
getting last line
writing to your new csv.
I have tried to be very verbose, so that you should try to do something like this:
import os
def get_last_line_of_this_file(filename):
with open(filename) as f:
for line in f:
pass
return line
def get_all_csv_filenames(directory):
for filename in os.listdir(directory):
if filename.endswith('.csv'):
yield filename
def write_your_new_csv_file(new_filename):
with open(new_filename, 'w') as writer:
for filename in get_all_csv_filenames('now the path to your dir'):
last_line = get_last_line_of_this_file(filename)
writer.write(filename + ' ' + last_line)
if __name__ == '__main__':
write_your_new_csv_file('your_created_filename.csv')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combining columns of multiple files into one - Python - python

Related

Using a loop to open and process txt files with csv

Why is my code resulting in a horizontal list instead of a vertical one?

Assign csv files to a collection of dictionaries (list) with file name as the keys and file content as the values

Combining csv headers with corresponding file paths into new file

getting a specific row of multiple csv's and writing to a new csv

Categories

Resources