i have a problem with the iteration process in python, I've tried and search the solutions, but i think this more complex than my capability (fyi, I've been writing code for 1 month).
The case:
Let say i have 3 csv files (the actual is 350 files), they are file_1.csv, file_2.csv, file_3.csv. I've done the iteration process/algorithm to create all of the filenames in into single list.
each csv contains single column with so many rows.
i.e.
#actual cvs much more like this:
# for file_1.csv:
value_1
value_2
value_3
Below is not the actual csv content (i mean i have converted them into array/series)
file_1.csv --> [['value_1'],['value_2'],['value_3']]
file_2.csv --> [['value_4'],['value_5']]
file_3.csv --> [['value_6']]
#first step was done, storing csv files name to a list, so it can be read and use in csv function.
filename = ['file_1.csv', 'file_2.csv', 'file_3.csv']
I want the result as a list:
#assigning a empty list
result = []
Desired result
print (result)
out:
[{'keys': 'file_1', 'values': 'value_1, value_2, value_3'},
{'keys': 'file_2', 'values': 'value_4, value_5'}
{'keys': 'file_3', 'values': 'value_6'}]
See above that the result's keys are no more containing ('.csv') at the end of file name, they are all replaced. And note that csv values (previously as a list of list or series) become one single string - separated with comma.
Any help is appreciated, Thank you very much
I'd like to answer this to the best of my capacity (I'm a newbie too).
Step1: Reading those 350 filenames
(if you've not figured out already, you could use glob module for this step)
Define the directory where the files are placed, let's say 'C:\Test'
directory = "C:/Test"
import glob
filename = sorted (glob.glob(directory, + "/*.csv"))
This will read all the 'CSV' files in the directory.
Step2: Reading CSV files and mapping them to dictionaries
result = []
import os
for file in files:
filename = str (os.path.basename(file).split('.')[0]) # removes the CSV extension from the filename
with open (file, 'r') as infile:
tempvalue = []
tempdict = {}
print (filename)
for line in infile.readlines():
tempvalue.append(line.strip()) # strips the lines and adds them to a list of temporary values
value = ",".join(tempvalue) # converts the temp list to a string
tempdict[filename] = value # Assigns the filename as key and the contents as value to a temporary dictionary
result.append(tempdict) # Adds the new temp dictionary for each file to the result list
print (result)
This piece of code should work (though there might be a smaller and more pythonic code someone else might share).
Since it seems that the contents of the files is already pretty much in the format you need them (bar the line endings) and you have the names of the 350 files in a list, there isn't a huge amount of processing you need to do. It's mainly a question of reading the contents of each file, and stripping the newline characters.
For example:
import os
result = []
filenames = ['file_1.csv', 'file_2.csv', 'file_3.csv']
for name in filenames:
# Set the filename minus extension as 'keys'
file_data = {'keys': os.path.basename(name).split('.')[0]}
with open(name) as f:
# Read the entire file
contents = f.read()
# Strip the line endings (and trailing comma), and set as 'values'
file_data['values'] = contents.replace(os.linesep, ' ').rstrip(',')
result.append(file_data)
print(result)
I'm trying to create a list of text files from a directory so I can extract key data from them, however, the list my function returns also contains a list of the file pathways as the first item of the list. I've tried del full_text[0] which didn't work, as well as any other value, and also the remove function. Any ideas as to why this might be happening?
Thanks
import glob
file_paths = []
file_paths.extend(glob.glob("C:\Users\12342255\PycharmProjects\Sequence diagrams\*"))
matching_txt = [s for s in file_paths if ".txt" in s]
print matching_txt
full_text = []
def fulltext():
for file in matching_txt:
f = open(file, "r")
ftext = f.read()
all_seqs = ftext.split("title ")
print all_seqs
full_text.append(fulltext())
print full_text
You can use slicing to get rid of the first element - full_text[1:]. This creates a copy of the list. Otherwise, you can full_text.pop(0) and resume using full_text
I see at least two ways to do so:
1) you can create a new list from first position e.g. newList = oldList[1:]
2) use remove method - full_text.remove(full_text[0])
I have to write a script that lists all the text files in a directory, then counts the number of lines in each file and then gives you the max amount, the minimum amount and average.
so far I have this:
import glob
import os
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f, start = 1):
pass
return i
files = glob.glob("/home/seb/Learning/*.txt")
print files
length = []
for file in files:
file_len(file)
length.append(i)
print length
As you (and me) could expect it works up until
length.append(i)
because i is not identified - I thought it was worth a shot though.
My question would be, how can I use the return of the function to append it to a list?
You need to assign the return value of file_len(file) to a variable:
flength = file_len(file)
length.append(flength)
The name i is a local name in file_len and not visible outside of the function, but the function does return the value.
Im getting this error and i have no idea what it means, i can get the program to print the files from there values but its just a long incoherent now im trying to get it to print it in an organized manor and thats where the issues arise.
import os
def listfiles (path):
files = []
for dirName, subdirList, fileList in os.walk(path):
dir = dirName.replace(path, '')
for fname in fileList:
files.append(os.path.join(dir, fname))
return files
a = input('Enter a primary file path: ')
b = input('Enter a secondary file path: ')
x = listfiles(a)
y = llistfiles(b)
files_only_x = set(x) - set (y)
files_only_y = set(y) - set (x)
this next line of code is where python is saying the error is
for dirName, subdirList, fileList in files_only_x:
print ('Directory: %s' % dirName)
for fname in fileList:
print ('\%s' % fname)
Your files_only_x is a set of single values; your listfiles() function returns a list of strings, not of tuples with 3 values:
for fname in files_only_x:
print ('\\%s' % fname)
You built files as a list of strings, therefore the loop in your 2nd code block is wrong as it suggests files is list of 3-value tuples.
Look at the data flow:
You call listfiles() with a path. It collects all files below that path in a list.
(BTW, IMHO dir = dirName.replace(path, '') is dangerous. What happens if path is lib/ and you encouter a sub path lib/misc/collected/lib/whatever? While this path males not much sense, it might have been created...)
You return this list from listfiles() and then convert them into sets.
If you try to iterate over these sets, you get one path per iteration step.
I compare two text files and print out the results to a 3rd file. I am trying to make it so the script i'm running would iterate over all of the folders that have two text files in them, in the CWD of the script.
What i have so far:
import os
import glob
path = './'
for infile in glob.glob( os.path.join(path, '*.*') ):
print('current file is: ' + infile)
with open (f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
Would this be a good way to start the iteration process?
It's not the most clear code but it gets the job done. However, i'm pretty sure i need to take the logic out of the read / write methods but i'm not sure where to start.
What i'm basically trying to do is have a script iterate over all of the folders in its CWD, open each folder, compare the two text files inside, write a 3rd text file to the same folder, then move on to the next.
Another method i have tried is as follows:
import os
rootDir = 'C:\\Python27\\test'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
And this outputs the following (to give you a better example of the file structure:
Found directory: C:\Python27\test
test.py
Found directory: C:\Python27\test\asdd
asd1.txt
asd2.txt
Found directory: C:\Python27\test\chro
ch1.txt
ch2.txt
Found directory: C:\Python27\test\hway
hw1.txt
hw2.txt
Would it be wise to put the compare logic under the for fname in fileList? How do i make sure it compares the two text files inside the specific folder and not with other fnames in the fileList?
This is the full code that i am trying to add this functionality into. I appologize for the Frankenstein nature of it but i am still working on a refined version but it does not work yet.
from collections import defaultdict
from operator import itemgetter
from itertools import groupby
from collections import deque
import os
class avs_auto:
def load_and_compare(self, input_file1, input_file2, output_file1, output_file2, result_file):
self.load(input_file1, input_file2, output_file1, output_file2)
self.compare(output_file1, output_file2)
self.final(result_file)
def load(self, fileIn1, fileIn2, fileOut1, fileOut2):
with open(fileIn1+'.txt') as fin1, open(fileIn2+'.txt') as fin2:
frame_rects = defaultdict(list)
for row in (map(str, line.split()) for line in fin1):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects[frame].append(id)
frame_rects[frame].append(rect)
frame_rects2 = defaultdict(list)
for row in (map(str, line.split()) for line in fin2):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects2[frame].append(id)
frame_rects2[frame].append(rect)
with open(fileOut1+'.txt', 'w') as fout1, open(fileOut2+'.txt', 'w') as fout2:
for frame, rects in sorted(frame_rects.iteritems()):
fout1.write('{{{}:{}}}\n'.format(frame, rects))
for frame, rects in sorted(frame_rects2.iteritems()):
fout2.write('{{{}:{}}}\n'.format(frame, rects))
def compare(self, fileOut1, fileOut2):
with open(fileOut1+'.txt', 'r') as fin1:
with open(fileOut2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
diff_lines = [l.strip() for l in lines1 if l not in lines2]
diffs = defaultdict(list)
with open(fileOut1+'x'+fileOut2+'.txt', 'w') as result_file:
for line in diff_lines:
d = eval(line)
for k in d:
list_ids = d[k]
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
result_file.write('{0} {1} {2}\n'.format(id_, group[0], group[-1]))
def final(self, result_file):
with open(result_file+'.txt', 'r') as fin:
lines = (line.split() for line in fin)
for k, g in groupby(lines, itemgetter(0)):
fst = next(g)
lst = next(iter(deque(g, 1)), fst)
with open('final/{}.avs'.format(k), 'w') as fout:
fout.write('video0=ImageSource("old\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video1=ImageSource("new\%06d.jpeg", {}-3, {}+3, 15)\n'.format(fst[1], lst[2]))
fout.write('video0=BilinearResize(video0,640,480)\n')
fout.write('video1=BilinearResize(video1,640,480)\n')
fout.write('StackHorizontal(video0,video1)\n')
fout.write('Subtitle("ID: {}", font="arial", size=30, align=8)'.format(k))
using the load_and_compare() function, i define two input text files, two output text files, a file for the comparison results and a final phase that writes many files for all of the differences.
What i am trying to do is have this whole class run on the current working directory and go through every sub folder, compare the two text files, and write everything into the same folder, specifically the final() results.
You can indeed use os.walk(), since that already separates the directories from the files. You only need the directories it returns, because that's where you're looking for your 2 specific files.
You could also use os.listdir() but that returns directories as well files in the same list, so you would have to check for directories yourself.
Either way, once you have the directories, you iterate over them (for subdir in dirnames) and join the various path components you have: The dirpath, the subdir name that you got from iterating over the list and your filename.
Assuming there are also some directories that don't have the specific 2 files, it's a good idea to wrap the open() calls in a try..except block and thus ignore the directories where one of the files (or both of them) doesn't exist.
Finally, if you used os.walk(), you can easily choose if you only want to go into directories one level deep or walk the whole depth of the tree. In the former case, you just clear the dirnames list by dirnames[:] = []. Note that dirnames = [] wouldn't work, since that would just create a new empty list and put that reference into the variable instead of clearing the old list.
Replace the print("do something ...") with your program logic.
#!/usr/bin/env python
import errno
import os
f1 = "test1"
f2 = "test2"
path = "."
for dirpath, dirnames, _ in os.walk(path):
for subdir in dirnames:
filepath1, filepath2 = [os.path.join(dirpath, subdir, f + ".txt") for f in f1, f2]
try:
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
except IOError as e:
# ignore directiories that don't contain the 2 files
if e.errno != errno.ENOENT:
# reraise exception if different from "file or directory doesn't exist"
raise
# comment the next line out if you want to traverse all subsubdirectories
dirnames[:] = []
Edit:
Based on your comments, I hope I understand your question better now.
Try the following code snippet instead. The overall structure stays the same, only now I'm using the returned filenames of os.walk(). Unfortunately, that would also make it harder to do something like "go only into the subdirectories 1 level deep", so I hope walking the tree recursively is fine with you. If not, I'll have to add a little code to later.
#!/usr/bin/env python
import fnmatch
import os
filter_pattern = "*.txt"
path = "."
for dirpath, dirnames, filenames in os.walk(path):
# comment this out if you don't want to filter
filenames = [fn for fn in filenames if fnmatch.fnmatch(fn, filter_pattern)]
if len(filenames) == 2:
# comment this out if you don't want the 2 filenames to be sorted
filenames.sort(key=str.lower)
filepath1, filepath2 = [os.path.join(dirpath, fn) for fn in filenames]
with open(filepath1, 'r') as fin1, open(filepath2, 'r') as fin2:
print("do something with " + str(fin1) + " and " + str(fin2))
I'm still not really sure what your program logic does, so you will have to interface the two yourself.
However, I noticed that you're adding the ".txt" extension to the file name explicitly all over your code, so depending on how you are going to use the snippet, you might or might not need to remove the ".txt" extension first before handing the filenames over. That would be achieved by inserting the following line after or before the sort:
filenames = [os.path.splitext(fn)[0] for fn in filenames]
Also, I still don't understand why you're using eval(). Do the text files contain python code? In any case, eval() should be avoided and be replaced by code that's more specific to the task at hand.
If it's a list of comma separated strings, use line.split(",") instead.
If there might be whitespace before or after the comma, use [word.strip() for word in line.split(",")] instead.
If it's a list of comma separated integers, use [int(num) for num in line.split(",")] instead - for floats it works analogously.
etc.