Python split and csv; Modification of existing Python Script - python

Another users was kind enough to help me with a script that reads in a file and removes/replaces '::' and moves columns to headers:
(I am reposting as it may be useful to someone in this form- my question follows)
with open('infile', "rb") as fin, open('outfile', "wb") as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
split = [item.split("::") for item in line if item.strip()]
if not split: # blank line
continue
keys, vals = zip(*split)
if i == 0:
# first line: write header
writer.writerow(keys)
writer.writerow(vals)
I was not aware that the last column of this file had the following text at the end:
{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
How do I modify this existing code to take the above and:
1. remove the brackets { }
2. convert the '[%2C]' to a ',' - making it comma delim like the rest of the file
3. Produce 'Xa Ya Za' and 'Xb Yb Zb' as headers for the values liberated in #2
The above text is the input file. Output from the original script produces this:
{StartPoint,EndPoint}
7858.35924983374[%2C]1703.69341358077[%2C]-3.075, 7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
Is it possible to insert a simple strip command in there?
Thanks, I appreciate your guidance - I am a Python newbie

It seems like you're looking for the replace method on strings: http://docs.python.org/2/library/stdtypes.html#str.replace
Perhaps something like this after the zip:
keys = [key.replace('{', '') for key in keys]
vals = [val.split('[%2C]') for val in vals]
I'm not sure if csv.writer will handle a nested list like would be in vals after this, but you get the idea. If so, vals should be a flat list. Perhaps something like this would flatten it out:
vals = [item for sublist in vals for item in sublist]

Related

Sort file by key

I am learning Python 3 and I'm having issues completing this task. It's given a file with a string on each new line. I have to sort its content by the string located between the first hyphen and the second hyphen and write the sorted content into a different file. This is what I tried so far, but nothing gets sorted:
def sort_keys(path, input, output):
list = []
with open(path+'\\'+input, 'r') as f:
for line in f:
if line.count('-') >= 1:
list.append(line)
sorted(list, key = lambda s: s.split("-")[1])
with open(path + "\\"+ output, 'w') as o:
for line in list:
o.write(line)
sort_keys("C:\\Users\\Daniel\\Desktop", "sample.txt", "results.txt")
This is the input file: https://pastebin.com/j8r8fZP6
Question 1: What am I doing wrong with the sorting? I've used it to sort the words of a sentence on the last letter and it worked fine, but here don't know what I am doing wrong
Question 2: I feel writing the content of the input file in a list, sorting the list and writing aftwerwards that content is not very efficient. What is the "pythonic" way of doing it?
Question 3: Do you know any good exercises to learn working with files + folders in Python 3?
Kind regards
Your sorting is fine. The problem is that sorted() returns a list, rather than altering the one provided. It's also much easier to use list comprehensions to read the file:
def sort_keys(path, infile, outfile):
with open(path+'\\'+infile, 'r') as f:
inputlines = [line.strip() for line in f.readlines() if "-" in line]
outputlines = sorted(inputlines, key=lambda s: s.split("-")[1])
with open(path + "\\" + outfile, 'w') as o:
for line in outputlines:
o.write(line + "\n")
sort_keys("C:\\Users\\Daniel\\Desktop", "sample.txt", "results.txt")
I also changed a few variable names, for legibility's sake.
EDIT: I understand that there are easier ways of doing the sorting (list.sort(x)), however this way seems more readable to me.
First, your data has a couple lines without hyphens. Is that a typo? Or do you need to deal with those lines? If it is NOT a typo and those lines are supposed to be part of the data, how should they be handled?
I'm going to assume those lines are typos and ignore them for now.
Second, do you need to return the whole line? But each line is sorted by the 2nd group of characters between the hyphens? If that's the case...
first, read in the file:
f = open('./text.txt', 'r')
There are a couple ways to go from here, but let's clean up the file contents a little and make a list object:
l = [i.replace("\n","") for i in f]
This will create a list l with all the newline characters removed. This particular way of creating the list is called a list comprehension. You can do the exact same thing with the following code:
l = []
for i in f:
l.append(i.replace("\n","")
Now lets create a dictionary with the key as the 2nd group and the value as the whole line. Again, there are some lines with no hyphens, so we are going to just skip those for now with a simple try/except block:
d = {}
for i in l:
try:
d[i.split("-")[1]] = i
except IndexError:
pass
Now, here things can get slightly tricky. It depends on how you want to approach the problem. Dictionaries are inherently unsorted in python, so there is not a really good way to simply sort the dictionary. ONE way (not necessarily the BEST way) is to create a sorted list of the dictionary keys:
s = sorted([k for k, v in d.items()])
Again, I used a list comprehension here, but you can rewrite that line to do the exact same thing here:
s = []
for k, v in d.items():
s.append(k)
s = sorted(s)
Now, we can write the dictionary back to a file by iterating through the dictionary using the sorted list. To see what I mean, lets print out the dictionary one value at a time using the sorted list as the keys:
for i in s:
print(d[i])
But instead of printing, we will now append the line to a file:
o = open('./out.txt', 'a')
for i in s:
o.write(d[i] + "\n")
Depending on your system and formatting, you may or may not need the + "\n" part. Also note that you want to use 'a' and not 'w' because you are appending one line at a time and if you use 'w' your file will only be the last item of the list.

Simple parsing and sorting data from file

Sorry if this has already been answered before; the searches I have done have not been helpful.
I have a file that stores data as such:
name,number
(Although perhaps not relevant to the question, I will have to add entries to this file. I know how to do this.)
My question is for the pythonic(?) way of analyzing the data and sorting it in ascending order. So if the file was:
alex,30
bob,20
and I have to add the entry
carol, 25
The file should be rewritten as
bob,20
carol,25
alex,30
My first attempt was to store the entire file as a string (by read()) and then split by lines to get a list of strings, procedurally split those strings by a comma, and then create a new list of scores then sort that, but this doesn't seem right and fails because I don't have a way to go "back" once I have the order of scores.
I am unable to use libraries for this program.
Edit:
My first attempt I did not test because all it manages to do is sort a list of the scores; I don't know of a way to get the "entries" back.
file = open("scores.txt" , "r")
data = file.read()
list_data = data.split()
data.append([name,score])
for i in range(len(list_data)):
list_scores = list_scores.append(list_data[i][1])
list_scores = sorted(list_scores)
As you can see, this gives me an ascending list of scores, but I do not know where to go from here in order to sort the list of name, score entries.
You will just have to write the sorted entries back to some file, using some basic string formatting:
with open('scores.txt') as f_in, open('file_out.txt', 'w') as f_out:
entries = [(x, int(y)) for x, y in (line.strip().split(',') for line in f_in)]
entries.append(('carol', 25))
entries.sort(key=lambda e: e[1])
for x, y in entries:
f_out.write('{},{}\n'.format(x, y))
I'm going to assume you're capable of putting your data into a .csv file in the following format:
Name,Number
John,20
Jane,25
Then you can use csv.DictReader to read this into a dictionary with something like as shown in the listed example:
with(open('name_age.csv', 'w') as csvfile:
reader = csv.DictReader(csvfile)
and write to it using
with(open('name_age.csv') as csvfile:
writer = csv.DictWriter(csvfile)
writer.writerow({'Name':'Carol','Number':25})
You can then sort it using python's built-in operator as shown here
this a function that will take a filename and sort it for you
def sort_file(filename):
f = open(filename, 'r')
text = f.read()
f.close()
lines = [i.split(',') for i in text.splitlines()]
lines.sort(key=lambda x: x[1])
lines = [', '.join(i) for i in lines]
text = '\n'.join(lines)
f = open(filename, 'w')
f.write(text)
f.close()

Concatenate row values using python

I am new to pyhton and am stuck on this topic from 2 days,tried looking for a basic answer but couldn't,so finally I decided to come up with my question.
I want to concatenate the values of only first two rows of my csv file(if possible with help of inbuilt modules).
Any kind of help would be appreciated. Thnx in advance
Below is my sample csv file without headers:
1,Suraj,Bangalore
2,Ahuja,Karnataka
3,Rishabh,Bangalore
Desired Output:
1 2,Suraj Ahuja,Bangalore Karnataka
3,Rishabh,Bangalore
Just create a csv.reader object (and a csv.writer object). Then use next() on the first 2 rows and zip them together (using list comprehension) to match the items.
Then process the rest of the file normally.
import csv
with open("file.csv") as fr, open("output.csv","w",newline='') as fw:
cr=csv.reader(fr)
cw=csv.writer(fw)
title_row = [" ".join(z) for z in zip(next(cr),next(cr))]
cw.writerow(title_row)
# dump the rest as-is
cw.writerows(rows)
(you'll get an exception if the file has only 1 row of course)
You can use zip() for your first 2 lines like below:
with open('f.csv') as f:
lines = f.readlines()
res = ""
for i, j in zip(lines[0].strip().split(','), lines[1].strip().split(',')):
res += "{} {},".format(i, j)
print(res.rstrip(','))
for line in lines[2:]:
print(line)
Output:
1 2,Suraj Ahuja,Bangalore Karnataka
3,Rishabh,Bangalore
with open('file2', 'r') as f, open('file2_new', 'w') as f2:
lines = [a.split(',') for a in f.read().splitlines() if a.strip()]
newl2 = [[' '.join(x) for x in zip(lines[0], lines[1])]] + lines[2:]
for a in newl2:
f2.write(', '.join(a)+'\n')

python csv replace listitem

i have following output from a csv file:
word1|word2|word3|word4|word5|word6|01:12|word8
word1|word2|word3|word4|word5|word6|03:12|word8
word1|word2|word3|word4|word5|word6|01:12|word8
what i need to do is change the time string like this 00:01:12.
my idea is to extract the list item [7] and add a "00:" as string to the front.
import csv
with open('temp', 'r') as f:
reader = csv.reader(f, delimiter="|")
for row in reader:
fixed_time = (str("00:") + row[7])
begin = row[:6]
end = row[:8]
print begin + fixed_time +end
get error message:
TypeError: can only concatenate list (not "str") to list.
i also had a look on this post.
how to change [1,2,3,4] to '1234' using python
i neeed to know if my approach to soloution is the right way. maybe need to use split or anything else for this.
thx for any help
The line that's throwing the exception is
print begin + fixed_time +end
because begin and end are both lists and fixed_time is a string. Whenever you take a slice of a list (that's the row[:6] and row[:8] parts), a list is returned. If you just want to print it out, you can do
print begin, fixed_time, end
and you won't get an error.
Corrected code:
I'm opening a new file for writing (I'm calling it 'final', but you can call it whatever you want), and I'm just writing everything to it with the one modification. It's easiest to just change the one element of the list that has the line (row[6] here), and use '|'.join to write a pipe character between each column.
import csv
with open('temp', 'r') as f, open('final', 'w') as fw:
reader = csv.reader(f, delimiter="|")
for row in reader:
# just change the element in the row to have the extra zeros
row[6] = '00:' + row[6]
# 'write the row back out, separated by | characters, and a new line.
fw.write('|'.join(row) + '\n')
you can use regex for that:
>>> txt = """\
... word1|word2|word3|word4|word5|word6|01:12|word8
... word1|word2|word3|word4|word5|word6|03:12|word8
... word1|word2|word3|word4|word5|word6|01:12|word8"""
>>> import re
>>> print(re.sub(r'\|(\d\d:\d\d)\|', r'|00:\1|', txt))
word1|word2|word3|word4|word5|word6|00:01:12|word8
word1|word2|word3|word4|word5|word6|00:03:12|word8
word1|word2|word3|word4|word5|word6|00:01:12|word8

Python CSV read-> write; remove and replace PLUS: end of line is JSON format

I am having problems getting my Python script to do what I want. It does not appear to be modifying my file.
I want to:
Read in a *.csv file that has the following format
PropertyName::PropertyValue,…,PropertyName::PropertyValue,{ExtPropertyName::ExtPropertyValue},…,{ExtPropertyName:: ExtPropertyValue}
I want to remove PropertyName:: and leave behid just a column of the PropertyValue
I want to add a header line
I was trying to step through replacing the :: values with a comma, but cant seem to get this to work:
fin = csv.reader(open('infile', 'rb'), delimiter=',')
fout = open('outfile', 'w')
for row in fin:
fout.write(','.join(','.join(item.split()) for item in row) + '::')
fout.close()
Any advice, whether on my first step problem, or to a bigger picture resolution is always appreciated. Thanks.
UPDATE/EDIT asked for by a person nice enough to review for me!
Here is the first line of the *.csv file (INPUT)
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::44.6743867864386,Length3dCenterToCenter::44.6768028159989,Length2dToInsideEdge::44.2678260053526,Length3dToInsideEdge::44.2717800813466,Length2dToOutsideEdge::44.6743867864386,Length3dToOutsideEdge::44.6768028159989,MinimumCover::0,MaximumCover::0,StartConnection::ImmxGisUtilityNetworkCommon.Connection,
In a perfect world here is what I would like my text file to look like (OUTPUT)
InnerDiameterOrWidth, InnerHeight, Length2dCenterToCenter,,,,,,,,,,,
0.1,0.1,44.6743867864386
so one header line and the values in column
UPDATED JSON Info
The end of each line has JSON formatted text:
{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
WHich I need to split into X Y Z and X Y Z with headers
Maybe something like this (assuming that each line has the same keys, and in the same order):
import csv
with open("diam.csv", "rb") as fin, open("diam_out.csv", "wb") as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
split = [item.split("::") for item in line if item.strip()]
if not split: # blank line
continue
keys, vals = zip(*split)
if i == 0:
# first line: write header
writer.writerow(keys)
writer.writerow(vals)
which produces
localhost-2:coding $ cat diam_out.csv
InnerDiameterOrWidth,InnerHeight,Length2dCenterToCenter,Length3dCenterToCenter,Length2dToInsideEdge,Length3dToInsideEdge,Length2dToOutsideEdge,Length3dToOutsideEdge,MinimumCover,MaximumCover,StartConnection
0.1,0.1,44.6743867864386,44.6768028159989,44.2678260053526,44.2717800813466,44.6743867864386,44.6768028159989,0,0,ImmxGisUtilityNetworkCommon.Connection
I think most of that code should make sense, except maybe the zip(*split) trick: that basically transposes a sequence, i.e.
>>> s = [['a','1'],['b','2']]
>>> zip(*s)
[('a', 'b'), ('1', '2')]
so that the elements are now grouped together by their index (the first ones are all together, the second, etc.)

Categories

Resources