Python automating text data files into csv - python
I am trying to automate a process where in a specific folder, there are multiple text files following the same data format/structure. In the text files, the data is separated by a comma. I want to be able to output all of these text files into one cumulative csv file. This is what I currently have, and seem to be stuck where I am because of my lack of python knowledge.
from collections import defaultdict
import glob
def get_site_files():
sites = defaultdict(list)
for fname in glob.glob('*.txt'):
csv_out = csv.writer(open('out.csv', 'w'), delimiter=',')
f = open('myfile.txt')
for line in f:
vals = line.split(',')
csv_out.writerow()
f.close()
EDIT: bringing up comments: I want to make sure that all of the text files are read, not just only myfile.txt.
Also, if I could combine them all into one large .txt file and then I could make those into a csv that would be great too, I just am not sure the exact way to do this.
Just a little bit of reordering of your code.
import csv
import glob
def get_site_files():
with open('out.csv', 'w') as out_file:
csv_out = csv.writer(out_file, delimiter=',')
for fname in glob.glob('*.txt'):
with open(fname) as f:
for line in f:
vals = line.split(',')
csv_out.writerow(vals)
get_site_files()
But since they are all in the same format you can just concatenate them:
import glob
with ('out.csv', 'w') as fout:
for fname in glob.glob('*.txt'):
with open(fname, 'r') as fin:
fout.write(fin.read())
You could also try a different way:
I used os.listdir() once. That gives you a List of all the files in your directory. In combination with os.path.join you can manage all *.csv files in a certain directory.
Some additional Information can be found in the reference: os and os.path
So I would just loop through all the files in the directory (searching for them ending on ".csv"), for each of them, store each line in a list as a string, separate the strings by the colums delimiter, make "," to "." in the left strings and concatenate the strings again. Afterwards push each line of the list to the outputfile you wish to use
I highly recommend the python standard library for information about the total functionality of python to newbies ;)
Hope that helps ;)
I adapted the code above to covert text files to csv and get working code to convert all csv files in a folder to one text files appending all csv files. Works great.
import glob
import csv
def get_site_files():
with open('out.txt', 'w') as out_file:
csv_out = csv.writer(out_file, delimiter=',')
for fname in glob.glob('*.csv'):
with open(fname) as f:
for line in f:
vals = line.split(',')
csv_out.writerow(vals)enter code here
Related
Extracting individual elements from CSV file using python
I have the task of converting the data available in csv files into JSON format. I did the same earlier for '.txt' files using the '.readlines()'. However, I cannot find any suitable methods for CSV. Right now I am converting the .csv files into .txt and then running the operations. I have tried: with open(file, 'r') as in_file: #file has .csv extension Lines = in_file.readlines() out_filename_a = vid_name+ "_" + ".json" for line in Lines: raw_list = line.strip().split(";") Above code generates the desired outputs but somehow the iteration does not work properly. I have also tried: import csv with open('X:\data.csv', 'rt') as f: data = csv.reader(f) for row in data: print(row) The generated output look like:['Programming language; Designed by; Appeared; Extension'] which is not really useful for me as it is a single element in the list and I need the individual elements extracted from the output.
If I understand you correctly, your file contains this string: ['Programming language; Designed by; Appeared; Extension'] to parse it, you can use next example: from ast import literal_eval with open("your_file.txt", "r") as f_in: for line in map(str.strip, f_in): if line == "": continue line = literal_eval(line) for item in map(str.strip, line[0].split(";")): print(item) Prints: Programming language Designed by Appeared Extension
Replacing Delimiter In CSV Files with Python
I have a folder with several CSV files. These files all contain the box drawing double vertical and horizontal as the delimiter. I am trying to import all these files into python, change that delimiter to a pipe, and then save the new files into another location. The code I currently have runs without any errors but doesn't actually do anything. Any suggestions? import os import pandas as pd directory = 'Y:/Data' dirlist = os.listdir(directory) file_dict = {} x = 0 for filename in dirlist: if filename.endswith('.csv'): file_dict[x] = pd.read_csv(filename) column = file_dict[x].columns[0] file_dict[x] = file_dict[x][column].str.replace('╬', '|') file_dict[x].to_csv("python/file{}.csv".format(x)) x += 1 Here is a picture of sample data:
Instead of directly replacing occurrences with the new character (which may replace escaped occurrences of the character as well), we can just use built-in functionality in the csv library to read the file for us, and then write it again import csv with open('myfile.csv', newline='') as infile, open('outfile.csv', 'w', newline='') as outfile: reader = csv.reader(infile, delimiter='╬') writer = csv.writer(outfile, delimiter='|') for row in reader: writer.writerow(row) Adapted from the docs
with i as open(filename): with o as open(filename+'.new', 'w+): for line in i.readlines(): o.write(line.replace('╬', '|')) or, skip the python, and use sed from your terminal: $ sed -i 's/╬/|/g' *.csv Assuming the original delimiter doesn't appear in any escaped strings, this should be slightly faster than using the regular csv module. Panada seems to do some filesystem voodoo when reading CSVs, so I wouldn't be too surprised if it is just as fast. sed will almost certainly beat them both by far.
os.walk-ing through directory to read and write all the CSVs
I have a bunch of folders and sub-folders with CSVs that have quotation marks that I need to get rid of, so I'm trying to build a script that iterates through and performs the operation on all CSVs. Below is the code I have. It correctly identifies what is and is not a CSV. And it re-writes them all -- but it's writing blank data in -- and not the row data without the quotation marks. I know that this is happening around lines 14-19 but I don't know know what to do. import csv import os rootDir = '.' for dirName, subDirList, fileList in os.walk(rootDir): print('Found directory: %s' % dirName) for fname in fileList: # Check if it's a .csv first if fname.endswith('.csv'): input = csv.reader(open(fname, 'r')) output = open(fname, 'w') with output: writer = csv.writer(output) for row in input: writer.writerow(row) # Skip if not a .csv else: print 'Not a .csv!!'
The problem is here: input = csv.reader(open(fname, 'r')) output = open(fname, 'w') As soon as you do that second open in 'w' mode, it erases the file. So, your input is looping over an empty file. One way to fix this is to you read the whole file into memory, and only then erase the whole file and rewrite it: input = csv.reader(open(fname, 'r')) contents = list(input) output = open(fname, 'w') with output: writer = csv.writer(output) for row in contents: writer.writerow(row) You can simplify this quite a bit: with open(fname, 'r') as infile: contents = list(csv.reader(infile)) with open(fname, 'w') as outfile: csv.writer(outfile).writerows(contents) Alternatively, you can write to a temporary file as you go, and then move the temporary file on top of the original file. This is a bit more complicated, but it has a major advantage—if you have an error (or someone turns off the computer) in the middle of writing, you still have the old file and can start over, instead of having 43% of the new file and all your data is lost: dname = os.path.dirname(fname) with open(fname, 'r') as infile, tempfile.NamedTemporaryFile('w', dir=dname, delete=False) as outfile: writer = csv.writer(outfile) for row in csv.reader(infile): writer.writerow(row) os.replace(outfile.name, fname) If you're not using Python 3.3+, you don't have os.replace. On Unix, you can just use os.rename instead, but on Windows… it's a pain to get this right, and you probably want to look for a third-party library on PyPI. (I haven't used any of then, buy if you're using Windows XP/2003 or later and Python 2.6/3.2 or later, pyosreplace looks promising.)
Python changing Comma Delimitation CSV
NEWBIE USING PYTHON (2.7.9)- When I export a gzipped file to a csv using: myData = gzip.open('file.gz.DONE', 'rb') myFile = open('output.csv', 'wb') with myFile: writer = csv.writer(myFile) writer.writerows(myData) print("Writing complete") It is printing in the csv with a comma deliminated in every character. eg. S,V,R,","2,1,4,0,",",2,0,1,6,1,1,3,8,0,4,",",5,0,5,0,1,3,4,2,0,6,4,7,3,6,4,",",",",2,0,0,0,5,6,5,9,2,9,6,7,4,",",2,0,0,7,2,4,5,2,3,5,",",0,0,0,2,"," I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,2,1,4,4,9,3,7,0,",":,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",","," " S,V,R,",",4,7,3,3,5,5,",",2,0,5,7,",",5,0,5,0,1,4,5,0,1,6,4,8,6,3,7,",",",",2,0,0,0,5,5,3,9,2,9,2,8,0,",",2,0,4,4,1,0,8,3,7,8,",",0,0,0,2,"," I,V,E,",",",",",",E,N,",",N,/,A,",",0,4,4,7,3,3,5,4,5,5,",",,:,I,R,_,",",N,/,A,",",U,N,A,N,S,W,",",",",",",","," How do I get rid of the comma so that it is exported with the correct fields? eg. SVR,2144370,20161804,50501342364,,565929674,2007245235,0002,1,PPDAP,PPLUS,DEACTIVE,,,EN,N/A,214370,:IR_,N/A,,,,, SVR,473455,208082557,14501648637,,2000553929280,2044108378,0002,1,3G,CODAP,INACTIVE,,,EN,N/A,35455,:IR_,N/A,,,,,
You are only opening the gzip file. I think you are expecting the opened file to act automatically like an iterator. Which it does. However each line is a text string. The writerows expects an iterator with each item being an array of values to write with comma separation. Thus given an iterator with each item being a sting, and given that a string is an array of characters you get the result you found. Since you didn't mention what the gzip data lines really contain I can't guess how to parse the lines into an array of reasonable chunks. But assuming a function called 'split_line' appropriate to that data you could do with gzip.open('file.gz.Done', 'rb') as gzip_f: data = [split_line(l) for l in gzip_f] with open('output.csv', 'wb') as myFile: writer = csv.writer(myFile) writer.writerows(data) print("Writing complete") Of course at this point doing row by row and putting the with lines together makes sense. See https://docs.python.org/2/library/csv.html
I think it's simply because gzip.open() will give you a file-like object but csvwriter.writerows() needs a list of lists of strings to do its work. But I don't understand why you want to use the csv module. You look like you only want to extract the content of the gzip file and save it in a output file uncompressed. You could do that like that: import gzip input_file_name = 'file.gz.DONE' output_file_name = 'output.csv' with gzip.open(input_file_name, 'rt') as input_file: with open('output.csv', 'wt') as output_file: for line in input_file: output_file.write(line) print("Writing complete") If you want to use the csv module because you're not sure your input data is properly formatted (and you want an error message right away) you could then do: import gzip import csv input_file_name = 'file.gz.DONE' output_file_name = 'output.csv' with gzip.open(input_file_name, 'rt', newline='') as input_file: reader_csv = csv.reader(input_file) with open('output.csv', 'wt', newline='') as output_file: writer_csv = csv.writer(output_file) writer_csv.writerows(reader_csv) print("Writing complete") Is that what you were trying to do ? It's difficult to guess because we don't have the input file to understand. If it's not what you want, could you care to clarify what you want?
Since I now have information the gzipped file is itself comma, separated values it simplifies thus.. with gzip.open('file.gz.DONE', 'rb') as gzip_f, open('output.csv', 'wb') as myFile: myfile.write(gzip_f.read()) In other words it is just a round about gunzip to another file.
Read csv lines and save it as seperate txt file, named as a line - python
i have some problem with simple code. I have a csv file with one column, and hundreds rows. I would like to get a code to read each line of csv and save it as separate txt files. What is important, the txt files should have be named as read line. Example: 1.Adam 2. Doroty 3. Pablo will give me adam.txt, doroty.txt and pablo txt. files. Please, help.
This should do what you need on python 3.6 with open('file.csv') as f: # Open file with hundreds of rows for name in f.read().split('\n'): # Get list of all names with open(f'{name.strip()}.txt', 'w') as s: # Create file per name pass
Alternatively you can use built-in CSV library to avoid any complications with parsing csv files: import csv with open('names.csv') as csvfile: reader = csv.DictReader(csvfile) for row in reader: file_name ='{0}.txt'.format(row['first_name']) with open(file_name, 'w') as f: pass