Using Python to Merge Single Line .dat Files into one .csv file - python

I am beginner in the programming world and a would like some tips on how to solve a challenge.
Right now I have ~10 000 .dat files each with a single line following this structure:
Attribute1=Value&Attribute2=Value&Attribute3=Value...AttibuteN=Value
I have been trying to use python and the CSV library to convert these .dat files into a single .csv file.
So far I was able to write something that would read all files, store the contents of each file in a new line and substitute the "&" to "," but since the Attribute1,Attribute2...AttributeN are exactly the same for every file, I would like to make them into column headers and remove them from every other line.
Any tips on how to go about that?
Thank you!

Since you are a beginner, I prepared some code that works, and is at the same time very easy to understand.
I assume that you have all the files in the folder called 'input'. The code beneath should be in a script file next to the folder.
Keep in mind that this code should be used to understand how a problem like this can be solved. Optimisations and sanity checks have been left out intentionally.
You might want to check additionally what happens when a value is missing in some line, what happens when an attribute is missing, what happens with a corrupted input etc.. :)
Good luck!
import os
# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
attributes = []
values = []
# first we split the input over the &
AtributeValues = line.split('&')
for attrVal in AtributeValues:
# we split the attribute=value over the '=' sign
# the left part goes to split[0], the value goes to split[1]
split = attrVal.split('=')
attributes.append(split[0])
values.append(split[1])
# return the attributes list and values list
return attributes,values
# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)
# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
f_in = open(inFile, 'r') # only reading the file
f_out = open(wfile, 'ab+') # file is opened for reading and appending
# read the whole file line by line
lines = f_in.readlines()
# loop throug evert line in the file and write its values
for line in lines:
# let's check if the file is empty and write the headers then
first_char = f_out.read(1)
header, values = getAttributesAndValues(line)
# we write the header only if the file is empty
if not first_char:
for attribute in header:
f_out.write(attribute+delim)
f_out.write("\n")
# we write the values
for value in values:
f_out.write(value+delim)
f_out.write("\n")
# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]
# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
writeToCsv('input/'+singleFile)

but since the Attribute1,Attribute2...AttributeN are exactly the same
for every file, I would like to make them into column headers and
remove them from every other line.
input = 'Attribute1=Value1&Attribute2=Value2&Attribute3=Value3'
once for the the first file:
','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))
for each file's content:
','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))
Maybe you need to trim the strings additionally; don't know how clean your input is.

Put the dat files in a folder called myDats. Put this script next to the myDats folder along with a file called temp.txt. You will also need your output.csv. [That is, you will have output.csv, myDats, and mergeDats.py in the same folder]
mergeDats.py
import csv
import os
g = open("temp.txt","w")
for file in os.listdir('myDats'):
f = open("myDats/"+file,"r")
tempData = f.readlines()[0]
tempData = tempData.replace("&","\n")
g.write(tempData)
f.close()
g.close()
h = open("text.txt","r")
arr = h.read().split("\n")
dict = {}
for x in arr:
temp2 = x.split("=")
dict[temp2[0]] = temp2[1]
with open('output.csv','w' """use 'wb' in python 2.x""" ) as output:
w = csv.DictWriter(output,my_dict.keys())
w.writeheader()
w.writerow(my_dict)

Related

Weird appending to Json in python

I need to append a new line in 9000 json files, so, i want to automate that. And i need to put the new line between the "name" and "description", but every time i try to do it, it give me a weird result.
sample file
Tried to search how to do it but i don't get any good result.
Problem solved.
Basicly, i understood that i can store all lines inside an list, and rewrite the file.
Now, i open the file, store the data, add my text from an string to my list, and rewrite the file with the text.
# The files i need to work with have numbers as names
# it make the process easy
fileIndex = 1
# We will use that number in the string Nlink
Number = 1
while fileIndex < 6 :
# Formated strings cant be used directly on list
NLink = f' "Link" : "https://Link_placeHolder/{Number}.jpg",\n'
# Link stores the data, and we can use this on our list
Link = NLink
# Openning your file for reading to store the lines
with open(f"C:\\Cteste\\Original\\{fileIndex}.json", "r+") as f:
# Create an list with all string information of the file
lines = f.readlines()
# Open a new file for writing
with open(f"C:\\Cteste\\New\\{fileIndex}.json", "w")as f2 :
# Insert the link at especifc position in the list
# in my case index 2
lines.insert(2, Link)
# Write everything on the file, them close the files
for i in lines :
f2.write(i)
# add to index
fileIndex += 1
# add to number
Number += 1

Multiple passes trough csv.reader in python

trying to implement nested "for" loop in CSV files search in way - 'name' found in one CSV file being searched in other file. Here is code example:
import csv
import re
# Open the input file
with open("Authentication.csv", "r") as citiAuthen:
with open("Authorization.csv", "r") as citiAuthor:
#Set up CSV reader and process the header
csvAuthen = csv.reader(citiAuthen, quoting=csv.QUOTE_ALL, skipinitialspace=True)
headerAuthen = next(csvAuthen)
userIndex = headerAuthen.index("'USERNAME'")
statusIndex = headerAuthen.index("'STATUS'")
csvAuthor = csv.reader(citiAuthor)
headerAuthor = next(csvAuthor)
userAuthorIndex = headerAuthor.index("'USERNAME'")
iseAuthorIndex = headerAuthor.index("'ISE_NODE'")
# Make an empty list
userList = []
usrNumber = 0
# Loop through the authen file and build a list of
for row in csvAuthen:
user = row[userIndex]
#status = row[statusIndex]
#if status == "'Pass'" :
for rowAuthor in csvAuthor:
userAuthor = rowAuthor[userAuthorIndex]
print userAuthor
What is happening that "print userAuthor" make just one pass, while it has to make as many passes as there rows in csvAuthen.
What I am doing wrong? Any help is really appreciated.
You're reading the both files line-by-line from storage. When you search csvAuthor the first time, if the value you are searching for is not found, the file pointer remains at the end of the file after the search. The next search will start at the end of the file and return immediately. You could need to reset the file pointer to the beginning of the file before each search. Probably better just to read both files into memory before you start searching them.

How to append the data from a text file to a list only from a given starting point rather then appending full file using python

I want to append data from text file to a list from a given starting point. The stating point string can be anywhere in the file. i want to append the data from that starting point. I tried by using startswith method:
list1=[]
TextFile = "txtfile.txt"
# open the file for data processing
with open(TextFile,'rt',encoding="utf8") as IpFile:
for i,j in enumerate(IpFile):
if(j.startswith("Starting point")):
list1.append(str(j).strip())
i+=1
but it only append the starting point. i want to append the all data from starting point. How to do that ?
Use a boolean variable
list1=[]
TextFile = "txtfile.txt"
doAppend = False
# open the file for data processing
with open(TextFile,'rt',encoding="utf8") as IpFile:
for i,j in enumerate(IpFile):
if(j.startswith("Starting point")):
doAppend = True
if doAppend:
list1.append(str(j).strip())
i+=1
You could do it without a bool as well by breaking the for loop and then reading the rest of the file.
list1 = []
text_file = "txtfile.txt"
# rt is the default so theres no need to specify it
with open(text_file, encoding="utf8") as ip_file:
for line in ip_file:
if line.startswith("Starting point"):
break
# read the rest of the file
remainder = ip_file.read()
# extend the list with the rest of the file split on newlines
list1.extend(remainder.split('\n'))

How would i append two lists already on a txt file?

I want to be able to create two lists: Time and data.
import time
date = [time.strftime("%Y/%m/%d %I:%M%p")]
data = []
x = input()
data.append(x)
with open("RapData.txt", "a") as output:
output.write(str(date))
output.write(str(data))
This code makes the two lists and saves it all on one line in the txt file like this if ran twice:
['2017/06/28 02:15PM']['x']['2017/06/28 02:15PM']['x']
and i want it to be:
['2017/06/28 02:15PM']['2017/06/28 02:15PM']
['x']['x']
You need to write the newline character to the file as well:
import time
date = [time.strftime("%Y/%m/%d %I:%M%p")]
f = open("RapData.txt", "a")
data = [input()]
f.write(str(date))
f.write('\n')
f.write(str(data))
To achieve what you are asking for you can't use append (as append adds items to the end of the file).
You would need to read the data to a local variable and output it to the file again:
open("RapData.txt","r")
... read code...
open("RapData.txt","w")
... write code..

Replace a particular text from a CSV file without knowing the text

I need to replace a particular value from a text file, without knowing the value of the string to be replaced. All I know is the line number, and the location of the value in that line which has to be replaced. This has to be done using Python 2.7
for example, the file is like this:
a,b,c,s,d,f
s,d,f,g,d,f
a,d,s,f,g,r
a,s,d,f,e,c
My code goes like this:
point_file = open(pointfile,'r+')
read_lines = point_file.readlines()
arr1 = []
for i in range(len(read_lines)):
arr1.append(read_lines[i])
Now I need to replace
arr1[3].split(',')[3]
How to do that?
Edit
I do not wish to achieve this using a temporary copy file and then overwrite the existing file. I need to edit the value in place the existing file.
OK, so I'm assuming fields can have any value (or the following can be shortened considerably by clever substitution tricks).
from __future__ import print_function
target = (3, 3) # Coordinates of the replaced value
new_val = 'X' # New value for the replaced cells
with open('src.txt') as f_src:
data = f_src.read().strip()
table = [line.split(',') for line in data.split('\n')]
old_val = table[target[0]][target[1]]
new_data = '\n'.join(
','.join(
new_val if cell == old_val else cell
for cell in row)
for row in table)
with open('tgt.txt', 'w') as f_tgt:
print(new_data, file=f_tgt)
My test src.txt:
a,b,c,s,d,f
s,d,f,g,d,f
a,d,s,f,g,r
a,s,d,f,e,c
My output tgt.txt:
a,b,c,s,d,X
s,d,X,g,d,X
a,d,s,X,g,r
a,s,d,X,e,c
Try like this. Read the data as a csv file and turn it into a list of lists. You can then alter the value at the required index [3][3] and write back to another csv.
import csv
with open('indata.csv') as f:
lines = [line for line in csv.reader(f)]
# change the required value
lines[3][3] = 'X'
with open('outdata.csv', 'w') as fout:
csv.writer(fout).writerows(lines)
[Solved] Turns out that one cannot edit one particular instance of the file without over writing the whole file. This could be memory intensive in cases where I have a lot of data to be saved in the same file. So, i replaces saving data in file with saving data in an array.

Categories

Resources