Replace a particular text from a CSV file without knowing the text - python

I need to replace a particular value from a text file, without knowing the value of the string to be replaced. All I know is the line number, and the location of the value in that line which has to be replaced. This has to be done using Python 2.7
for example, the file is like this:
a,b,c,s,d,f
s,d,f,g,d,f
a,d,s,f,g,r
a,s,d,f,e,c
My code goes like this:
point_file = open(pointfile,'r+')
read_lines = point_file.readlines()
arr1 = []
for i in range(len(read_lines)):
arr1.append(read_lines[i])
Now I need to replace
arr1[3].split(',')[3]
How to do that?
Edit
I do not wish to achieve this using a temporary copy file and then overwrite the existing file. I need to edit the value in place the existing file.

OK, so I'm assuming fields can have any value (or the following can be shortened considerably by clever substitution tricks).
from __future__ import print_function
target = (3, 3) # Coordinates of the replaced value
new_val = 'X' # New value for the replaced cells
with open('src.txt') as f_src:
data = f_src.read().strip()
table = [line.split(',') for line in data.split('\n')]
old_val = table[target[0]][target[1]]
new_data = '\n'.join(
','.join(
new_val if cell == old_val else cell
for cell in row)
for row in table)
with open('tgt.txt', 'w') as f_tgt:
print(new_data, file=f_tgt)
My test src.txt:
a,b,c,s,d,f
s,d,f,g,d,f
a,d,s,f,g,r
a,s,d,f,e,c
My output tgt.txt:
a,b,c,s,d,X
s,d,X,g,d,X
a,d,s,X,g,r
a,s,d,X,e,c

Try like this. Read the data as a csv file and turn it into a list of lists. You can then alter the value at the required index [3][3] and write back to another csv.
import csv
with open('indata.csv') as f:
lines = [line for line in csv.reader(f)]
# change the required value
lines[3][3] = 'X'
with open('outdata.csv', 'w') as fout:
csv.writer(fout).writerows(lines)

[Solved] Turns out that one cannot edit one particular instance of the file without over writing the whole file. This could be memory intensive in cases where I have a lot of data to be saved in the same file. So, i replaces saving data in file with saving data in an array.

Related

How to add to specific item in a list in a CSV file

How to add/modify a specific item in a list in a CSV file
1,2,5,x
1,5,7,x
6,5,9,x
How to add the second and third Item of each row and save the Result in x of each row in python?
This is the solution opening the csv file, changing the values in memory and then writing back the changes to disk.
r = csv.reader(open('/tmp/test.csv')) # Here your csv file
lines = list(r)
Content of lines:
[[1,2,5,0],#0 value is to be replaced
[1,5,7,0],
[6,5,9,0]]
Modifying the x values for the first row, in our case where 0 is:
lines[0][3] = lines[0][1] + lines[0][2]
Content of lines:
[[1,2,5,7],#value changed here
[1,5,7,0],
[6,5,9,0]]
Now we only have to write it back to a file
writer = csv.writer(open('/tmp/output.csv', 'w'))
writer.writerows(lines)
You need to follow the same logic for all other rows. Ideally using a for loop, e.g.
for i in range(len(lines)):
lines[i][3] = lines[0][1] + lines[0][2]

Check for key in csv file if key is matched then add data into different rows of matched column using python

I have csv file like below
I need to search for a key then some values should be added in that key column. for example I need to search for folder and some values should be added in folder column. in the same way I need to search for name and some values should be added in name column.
so the final output looks like below
I have followed the below way but it doesn't work for me
import csv
list1 = [['ab', 'cd', 'ed']]
with open('1.csv', 'a') as f_csv:
data_to_write_list1 = zip(*list1)
writer = csv.writer(f_csv, delimiter=',', dialect='excel')
writer.writerows(data_to_write_list1)
If you want to only use built-in methods, you can get the first row of a file (in the case of a CSV file like yours, the headers) like this:
>>> with open('file_you_need.csv', 'r') as f:
>>> file = f.readline()
In your case the variable file would then be (supposing the delimiter is ","):
folder,name,service
You can now do file.split(",") (eventually replacing "," with whatever your delimiter is) and you'll get back a list of headers. You can then create a list of lists where each list is a row of your file and write back to the file or use a dictionary to link new entries to each header. Depending on your choice you would then in different ways write back to the file, i.e. supposing you go with list of lists:
with open('file_you_need.csv','w') as f:
for list in listoflists:
row = ""
for el, i in enumerate(list):
if i != len(list):
row += el+","
else:
row += el
f.write(row)
As others have mentioned you could also use Pandas and DataFrames to make it cleaner, but I don't think this is too hard to grasp

Parsing a text file with line breaks in python

I have a text file with about 20 entries. They look like this:
~
England
Link: http://imgur.com/foobar.jpg
Capital: London
~
Iceland
Link: http://imgur.com/foobar2.jpg
Capital: Reykjavik
...
etc.
I would like to take these entries and turn them into a CSV.
There is a '~' separating each entry. I'm scratching my head trying to figure out how to go thru line by line and create the CSV values for each country. Can anyone give me a clue on how to go about this?
Use the libraries luke :)
I'm assuming your data is well formatted. Most real world data isn't that way. So, here goes a solution.
>>> content.split('~')
['\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n', '\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n']
For writing the CSV, Python has standard library functions.
>>> import csv
>>> csvfile = open('foo.csv', 'wb')
>>> fieldnames = ['Country', 'Link', 'Capital']
>>> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
>>> for entry in entries:
... cols = entry.strip().splitlines()
... writer.writerow({'Country': cols[0], 'Link':cols[1].split(': ')[1], 'Capital':cols[2].split(':')[1]})
...
If your data is more semi structured or badly formatted, consider using a library like PyParsing.
Edit:
Second column contains URLs, so we need to handle the splits well.
>>> cols[1]
'Link: http://imgur.com/foobar2.jpg'
>>> cols[1].split(':')[1]
' http'
>>> cols[1].split(': ')[1]
'http://imgur.com/foobar2.jpg'
The way that I would do that would be to use the open() function using the syntax of:
f = open('NameOfFile.extensionType', 'a+')
Where "a+" is append mode. The file will not be overwritten and new data can be appended. You could also use "r+" to open the file in read mode, but would lose the ability to edit. The "+" after a letter signifies that if the document does not exist, it will be created. The "a+" I've never found to work without the "+".
After that I would use a for loop like this:
data = []
tmp = []
for line in f:
line.strip() #Removes formatting marks made by python
if line == '~':
data.append(tmp)
tmp = []
continue
else:
tmp.append(line)
Now you have all of the data stored in a list, but you could also reformat it as a class object using a slightly different algorithm.
I have never edited CSV files using python, but I believe you can use a loop like this to add the data:
f2 = open('CSVfileName.csv', 'w') #Can change "w" for other needs i.e "a+"
for entry in data:
for subentry in entry:
f2.write(str(subentry) + '\n') #Use '\n' to create a new line
From my knowledge of CSV that loop would create a single column of all of the data. At the end remember to close the files in order to save the changes:
f.close()
f2.close()
You could combine the two loops into one in order to save space, but for the sake of explanation I have not.

Using Python to Merge Single Line .dat Files into one .csv file

I am beginner in the programming world and a would like some tips on how to solve a challenge.
Right now I have ~10 000 .dat files each with a single line following this structure:
Attribute1=Value&Attribute2=Value&Attribute3=Value...AttibuteN=Value
I have been trying to use python and the CSV library to convert these .dat files into a single .csv file.
So far I was able to write something that would read all files, store the contents of each file in a new line and substitute the "&" to "," but since the Attribute1,Attribute2...AttributeN are exactly the same for every file, I would like to make them into column headers and remove them from every other line.
Any tips on how to go about that?
Thank you!
Since you are a beginner, I prepared some code that works, and is at the same time very easy to understand.
I assume that you have all the files in the folder called 'input'. The code beneath should be in a script file next to the folder.
Keep in mind that this code should be used to understand how a problem like this can be solved. Optimisations and sanity checks have been left out intentionally.
You might want to check additionally what happens when a value is missing in some line, what happens when an attribute is missing, what happens with a corrupted input etc.. :)
Good luck!
import os
# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
attributes = []
values = []
# first we split the input over the &
AtributeValues = line.split('&')
for attrVal in AtributeValues:
# we split the attribute=value over the '=' sign
# the left part goes to split[0], the value goes to split[1]
split = attrVal.split('=')
attributes.append(split[0])
values.append(split[1])
# return the attributes list and values list
return attributes,values
# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)
# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
f_in = open(inFile, 'r') # only reading the file
f_out = open(wfile, 'ab+') # file is opened for reading and appending
# read the whole file line by line
lines = f_in.readlines()
# loop throug evert line in the file and write its values
for line in lines:
# let's check if the file is empty and write the headers then
first_char = f_out.read(1)
header, values = getAttributesAndValues(line)
# we write the header only if the file is empty
if not first_char:
for attribute in header:
f_out.write(attribute+delim)
f_out.write("\n")
# we write the values
for value in values:
f_out.write(value+delim)
f_out.write("\n")
# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]
# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
writeToCsv('input/'+singleFile)
but since the Attribute1,Attribute2...AttributeN are exactly the same
for every file, I would like to make them into column headers and
remove them from every other line.
input = 'Attribute1=Value1&Attribute2=Value2&Attribute3=Value3'
once for the the first file:
','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))
for each file's content:
','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))
Maybe you need to trim the strings additionally; don't know how clean your input is.
Put the dat files in a folder called myDats. Put this script next to the myDats folder along with a file called temp.txt. You will also need your output.csv. [That is, you will have output.csv, myDats, and mergeDats.py in the same folder]
mergeDats.py
import csv
import os
g = open("temp.txt","w")
for file in os.listdir('myDats'):
f = open("myDats/"+file,"r")
tempData = f.readlines()[0]
tempData = tempData.replace("&","\n")
g.write(tempData)
f.close()
g.close()
h = open("text.txt","r")
arr = h.read().split("\n")
dict = {}
for x in arr:
temp2 = x.split("=")
dict[temp2[0]] = temp2[1]
with open('output.csv','w' """use 'wb' in python 2.x""" ) as output:
w = csv.DictWriter(output,my_dict.keys())
w.writeheader()
w.writerow(my_dict)

Extracting variable names and data from csv file using Python

I have a csv file that has each line formatted with the line name followed by 11 pieces of data. Here is an example of a line.
CW1,0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64
There are 12 lines in total, each with a unique name and data.
What I would like to do is extract the first cell from each line and use that to name the corresponding data, either as a variable equal to a list containing that line's data, or maybe as a dictionary, with the first cell being the key.
I am new to working with inputting files, so the farthest I have gotten is to read the file in using the stock solution in the documentation
import csv
path = r'data.csv'
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
print(row[0])
I am failing to figure out how to assign each row to a new variable, especially when I am not sure what the variable names will be (this is because the csv file will be created by a user other than myself).
The destination for this data is a tool that I have written. It accepts lists as input such as...
CW1 = [0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64]
so this would be the ideal end solution. If it is easier, and considered better to have the output of the file read be in another format, I can certainly re-write my tool to work with that data type.
As Scironic said in their answer, it is best to use a dict for this sort of thing.
However, be aware that dict objects do not have any "order" - the order of the rows will be lost if you use one. If this is a problem, you can use an OrderedDict instead (which is just what it sounds like: a dict that "remembers" the order of its contents):
import csv
from collections import OrderedDict as od
data = od() # ordered dict object remembers the order in the csv file
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile, delimiter = ' ')
for row in reader:
data[row[0]] = row[1:] # Slice the row up into 0 (first item) and 1: (remaining)
Now if you go looping through your data object, the contents will be in the same order as in the csv file:
for d in data.values():
myspecialtool(*d)
You need to use a dict for these kinds of things (dynamic variables):
import csv
path = r'data.csv'
data = {}
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
data[row[0]] = row[1:]
dicts are especially useful for dynamic variables and are the best method to store things like this. to access you just need to use:
data['CW1']
This solution also means that if you add any extra rows in with new names, you won't have to change anything.
If you are desperate to have the variable names in the global namespace and not within a dict, use exec (N.B. IF ANY OF THIS USES INPUT FROM OUTSIDE SOURCES, USING EXEC/EVAL CAN BE HIGHLY DANGEROUS (rm * level) SO MAKE SURE ALL INPUT IS CONTROLLED AND UNDERSTOOD BY YOURSELF).
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
exec("{} = {}".format(row[0], row[1:])
In python, you can use slicing: row[1:] will contain the row, except the first element, so you could do:
>>> d={}
>>> with open("f") as f:
... c = csv.reader(f, delimiter=',')
... for r in c:
... d[r[0]]=map(int,r[1:])
...
>>> d
{'var1': [1, 3, 1], 'var2': [3, 0, -1]}
Regarding variable variables, check How do I do variable variables in Python? or How to get a variable name as a string in Python?. I would stick to dictionary though.
An alternative to using the proper csv library could be as follows:
path = r'data.csv'
csvRows = open(path, "r").readlines()
dataRows = [[float(col) for col in row.rstrip("\n").split(",")[1:]] for row in csvRows]
for dataRow in dataRows: # Where dataRow is a list of numbers
print dataRow
You could then call your function where the print statement is.
This reads the whole file in and produces a list of lines with trailing newlines. It then removes each newline and splits each row into a list of strings. It skips the initial column and calls float() for each entry. Resulting in a list of lists. It depends how important the first column is?

Categories

Resources