I have a csv file that looks like this:
123456,456789,12345,123.45,123456
123456,456789,12345,123.45,123456
123456,456789,12345,123.45,123456
I am extremly new to Python programming but I'm learning and finding Python to be very useful. I basically want the output to look like this:
123456 456789 12345 123.45 123456
123456 456789 12345 123.45 123456
123456 456789 12345 123.45 123456
Basically, all fields right justified, having fixed length. There are no heading in the csv file.
Here's the code I have tried so far and like I said, I'm very new to Python:
import csv
with open('test.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
print(', '.join(row))
with open('test2.txt', 'wb') as f:
writer = csv.writer(f)
writer.writerows(f)
Any help would be greatly appreciated: Thank You in advance.
OK you have a mess of problems with your code:
Your indentation is all wrong. That's one of the basic concepts of python. Go search the web and read a little about it if you don't understand what I mean
the part that opens 'test2.txt' is inside the loop of spamreader, meaning it is re-opened and truncated for every row in 'test.csv'.
you are trying to write the file to itself with this line: writer.writerows(f) (remember? f is the file you are writing to...)
You are using a csv.writer to write lines to a txt file.
You want a spacing between each item but you're not doing that anywhere in your code
So to sum up all those problems, here's a fixed example, which is really not that far away from your code as it is:
import csv
res = []
# start a loop to collect the data
with open('test.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
line = '\t'.join(row) + '\r\n' # the \n is for linebreaks. \r is so notepad loves you too
res.append(line)
# now, outside the loop, we can do this:
with open('test2.txt', 'wb') as f:
f.writelines(res)
EDIT
If you want to control the spacing you can use ljust function like this:
line = ''.ljust(2).join(row)
This will make sure there are 2 spaces between each item. space is the default, but if you want to specify what ljust will be using you can add a second parameter to it:
line = ''.ljust(5, '-').join(row)
then each line would look like this:
123456-----456789-----12345-----123.45-----123456
And thanks for Philippe T. who mentioned it in the comments
2nd Edit
If you want a different length for each column you need to predefine it. The best way would be to create a list in the same length as your csv file columns, with each item being the length of that column and last one being the ending of the line (which is convenient because ''.join doesn't do that by itself), then zip it with your row. Say you want a tab for the first column, then two spaces between each of the other columns. Then your code would look like this:
spacing = ['\t', ' ', ' ', ' ', '\r\n']
# ... the same code from before ...
line = ''.join([j for i in zip(row, spacing) for j in i])
# ... rest of the code ...
The list comprehension loop is a bit convoluted, but think about it like this:
for i in zip(row, spacing): # the zip here equals ==> [(item1, '\t'), (item2, ' ') ...]
for j in i: # now i == (item1, '\t')
j # so j is just the items of each tuple
With the list comprehension, this outputs: [item1, '\t', item2, ' ', ... ]. You join that together and thats it.
Try this:
import csv
with open('data.csv') as fin, open('out.txt','w') as fout:
data = csv.reader(fin,delimiter=',')
resl = csv.writer(fout,delimiter='\t')
resl.writerows(data)
Related
I have a text file with about 20 entries. They look like this:
~
England
Link: http://imgur.com/foobar.jpg
Capital: London
~
Iceland
Link: http://imgur.com/foobar2.jpg
Capital: Reykjavik
...
etc.
I would like to take these entries and turn them into a CSV.
There is a '~' separating each entry. I'm scratching my head trying to figure out how to go thru line by line and create the CSV values for each country. Can anyone give me a clue on how to go about this?
Use the libraries luke :)
I'm assuming your data is well formatted. Most real world data isn't that way. So, here goes a solution.
>>> content.split('~')
['\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n', '\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n']
For writing the CSV, Python has standard library functions.
>>> import csv
>>> csvfile = open('foo.csv', 'wb')
>>> fieldnames = ['Country', 'Link', 'Capital']
>>> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
>>> for entry in entries:
... cols = entry.strip().splitlines()
... writer.writerow({'Country': cols[0], 'Link':cols[1].split(': ')[1], 'Capital':cols[2].split(':')[1]})
...
If your data is more semi structured or badly formatted, consider using a library like PyParsing.
Edit:
Second column contains URLs, so we need to handle the splits well.
>>> cols[1]
'Link: http://imgur.com/foobar2.jpg'
>>> cols[1].split(':')[1]
' http'
>>> cols[1].split(': ')[1]
'http://imgur.com/foobar2.jpg'
The way that I would do that would be to use the open() function using the syntax of:
f = open('NameOfFile.extensionType', 'a+')
Where "a+" is append mode. The file will not be overwritten and new data can be appended. You could also use "r+" to open the file in read mode, but would lose the ability to edit. The "+" after a letter signifies that if the document does not exist, it will be created. The "a+" I've never found to work without the "+".
After that I would use a for loop like this:
data = []
tmp = []
for line in f:
line.strip() #Removes formatting marks made by python
if line == '~':
data.append(tmp)
tmp = []
continue
else:
tmp.append(line)
Now you have all of the data stored in a list, but you could also reformat it as a class object using a slightly different algorithm.
I have never edited CSV files using python, but I believe you can use a loop like this to add the data:
f2 = open('CSVfileName.csv', 'w') #Can change "w" for other needs i.e "a+"
for entry in data:
for subentry in entry:
f2.write(str(subentry) + '\n') #Use '\n' to create a new line
From my knowledge of CSV that loop would create a single column of all of the data. At the end remember to close the files in order to save the changes:
f.close()
f2.close()
You could combine the two loops into one in order to save space, but for the sake of explanation I have not.
i have been trying to read/write values(lists) in a .txt file and using them later, but i can't find a function or something to help me use these values as lists and not strings, since using the readline function doesn't help.
Also, im don't want to use multiple text files to make up 1 list
example:
v=[]
f = open("test.txt","r+",-1)
f.seek(0)
v.append(f.readline())
print(v)
in test.txt
cat, dog, dinosaur, elephant
cheese, hotdog, pizza, sushi
101, 23, 58, 23
im expecting to the list v = [cat, dog, dinosaur, elephant] in separate indexes, but by doing this code (which is totally wrong) i get this instead
v = ['cat,dog,dinosaur,elephant'] which is what i don't want
Sounds like you want to read it as comma separated values.
Try the following
import csv
with open('test.txt', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(row)
I believe that will put you on the right track. For more information about how the csv parser works, have a look at the docs
https://docs.python.org/3/library/csv.html
To me, it looks like you're trying to read a file, and split it by ,.
This can be accomplished by
f = open("test.txt", "r+").read()
v = f.split(",")
print(v)
It should output
['cat', ' dog', ' dinosaur', ' elephant\ncheese', ...]
And so forth.
This seems a very basic question, but I am new to python, and after spending a long time trying to find a solution on my own, I thought it's time to ask some more advanced people!
So, I have a file (sample):
ENSMUSG00000098737 95734911 95734973 3 miRNA
ENSMUSG00000077677 101186764 101186867 4 snRNA
ENSMUSG00000092727 68990574 68990678 11 miRNA
ENSMUSG00000088009 83405631 83405764 14 snoRNA
ENSMUSG00000028255 145003817 145032776 3 protein_coding
ENSMUSG00000028255 145003817 145032776 3 processed_transcript
ENSMUSG00000028255 145003817 145032776 3 processed_transcript
ENSMUSG00000098481 38086202 38086317 13 miRNA
ENSMUSG00000097075 126971720 126976098 7 lincRNA
ENSMUSG00000097075 126971720 126976098 7 lincRNA
and I need to write a new file with all the same information, but sorted by the first column.
What I use so far is :
lines = open(my_file, 'r').readlines()
output = open("intermediate_alphabetical_order.txt", 'w')
for line in sorted(lines, key=itemgetter(0)):
output.write(line)
output.close()
It doesn't return me any error, but just writes the output file exactly as the input file.
I know it is certainly a very basic mistake, but it would be amazing if some of you could tell me what I'm doing wrong!
Thanks a lot!
Edit
I am having trouble with the way I open the file, so the answers concerning already opened arrays don't really help.
The problem you're having is that you're not turning each line into a list. When you read in the file, you're just getting the whole line as a string. You're then sorting by the first character of each line, and this is always the same character in your input, 'E'.
To just sort by the first column, you need to split the first block off and just read that section. So your key should be this:
for line in sorted(lines, key=lambda line: line.split()[0]):
split will turn your line into a list, and then the first column is taken from that list.
If your input file is tab-separated, you can also use the csv module.
import csv
from operator import itemgetter
reader = csv.reader(open("t.txt"), delimiter="\t")
for line in sorted(reader, key=itemgetter(0)):
print(line)
sorts by first column.
Change the number in
key=itemgetter(0)
for sorting by a different column.
Same idea as SuperBiasedMan, but I prefer this approach: if you want another way of sorting (for example: if first column matches, sort by second, then third, etc) it is more easily implemented
with open(my_file) as f:
lines = [line.split(' ') for line in f]
output = open("result.txt", 'w')
for line in sorted(lines):
output.write(' '.join(line), key=itemgetter(0))
output.close()
You can write a function that takes a filename, delimiter and column to sort by using csv.reader to parse the file:
from operator import itemgetter
import csv
def sort_by(fle,col,delim):
with open(fle) as f:
r = csv.reader(f, delim=delim)
for row in sorted(r, key=itemgetter(col)):
yield row
for row in sort_by("your_file",2, "\t"):
print(row)
You can do this quickly with pandas as follows, with the data file set up exactly as you show it (i.e., with variable spaces as separators):
import pandas as pd
df = pd.read_csv('csvdata.csv', sep=' ', skipinitialspace=True, header=None)
df.sort(columns=[0], inplace=True)
df.to_csv('sorted_csvdata.csv', header=None, index=None)
Just to check the result:
with open('sorted_csvdata.csv', 'r') as f:
print(f.read())
ENSMUSG00000028255,145003817,145032776,3,protein_coding
ENSMUSG00000028255,145003817,145032776,3,processed_transcript
ENSMUSG00000028255,145003817,145032776,3,processed_transcript
ENSMUSG00000077677,101186764,101186867,4,snRNA
ENSMUSG00000088009,83405631,83405764,14,snoRNA
ENSMUSG00000092727,68990574,68990678,11,miRNA
ENSMUSG00000097075,126971720,126976098,7,lincRNA
ENSMUSG00000097075,126971720,126976098,7,lincRNA
ENSMUSG00000098481,38086202,38086317,13,miRNA
ENSMUSG00000098737,95734911,95734973,3,miRNA
You can do multi column sorting by adding additional columns to the list in the colmuns=[...] keyword argument.
Here is another option. Similar to some of the ideas above. Basically, mysort is a function that will do the custom sorting for you which here is based on
def mysort(line):
return line.split()[0]
with open("records.txt", "r") as f:
text = f.readlines()
for line in sorted(text, key=mysort):
print line
i have following output from a csv file:
word1|word2|word3|word4|word5|word6|01:12|word8
word1|word2|word3|word4|word5|word6|03:12|word8
word1|word2|word3|word4|word5|word6|01:12|word8
what i need to do is change the time string like this 00:01:12.
my idea is to extract the list item [7] and add a "00:" as string to the front.
import csv
with open('temp', 'r') as f:
reader = csv.reader(f, delimiter="|")
for row in reader:
fixed_time = (str("00:") + row[7])
begin = row[:6]
end = row[:8]
print begin + fixed_time +end
get error message:
TypeError: can only concatenate list (not "str") to list.
i also had a look on this post.
how to change [1,2,3,4] to '1234' using python
i neeed to know if my approach to soloution is the right way. maybe need to use split or anything else for this.
thx for any help
The line that's throwing the exception is
print begin + fixed_time +end
because begin and end are both lists and fixed_time is a string. Whenever you take a slice of a list (that's the row[:6] and row[:8] parts), a list is returned. If you just want to print it out, you can do
print begin, fixed_time, end
and you won't get an error.
Corrected code:
I'm opening a new file for writing (I'm calling it 'final', but you can call it whatever you want), and I'm just writing everything to it with the one modification. It's easiest to just change the one element of the list that has the line (row[6] here), and use '|'.join to write a pipe character between each column.
import csv
with open('temp', 'r') as f, open('final', 'w') as fw:
reader = csv.reader(f, delimiter="|")
for row in reader:
# just change the element in the row to have the extra zeros
row[6] = '00:' + row[6]
# 'write the row back out, separated by | characters, and a new line.
fw.write('|'.join(row) + '\n')
you can use regex for that:
>>> txt = """\
... word1|word2|word3|word4|word5|word6|01:12|word8
... word1|word2|word3|word4|word5|word6|03:12|word8
... word1|word2|word3|word4|word5|word6|01:12|word8"""
>>> import re
>>> print(re.sub(r'\|(\d\d:\d\d)\|', r'|00:\1|', txt))
word1|word2|word3|word4|word5|word6|00:01:12|word8
word1|word2|word3|word4|word5|word6|00:03:12|word8
word1|word2|word3|word4|word5|word6|00:01:12|word8
I am having problems getting my Python script to do what I want. It does not appear to be modifying my file.
I want to:
Read in a *.csv file that has the following format
PropertyName::PropertyValue,…,PropertyName::PropertyValue,{ExtPropertyName::ExtPropertyValue},…,{ExtPropertyName:: ExtPropertyValue}
I want to remove PropertyName:: and leave behid just a column of the PropertyValue
I want to add a header line
I was trying to step through replacing the :: values with a comma, but cant seem to get this to work:
fin = csv.reader(open('infile', 'rb'), delimiter=',')
fout = open('outfile', 'w')
for row in fin:
fout.write(','.join(','.join(item.split()) for item in row) + '::')
fout.close()
Any advice, whether on my first step problem, or to a bigger picture resolution is always appreciated. Thanks.
UPDATE/EDIT asked for by a person nice enough to review for me!
Here is the first line of the *.csv file (INPUT)
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::44.6743867864386,Length3dCenterToCenter::44.6768028159989,Length2dToInsideEdge::44.2678260053526,Length3dToInsideEdge::44.2717800813466,Length2dToOutsideEdge::44.6743867864386,Length3dToOutsideEdge::44.6768028159989,MinimumCover::0,MaximumCover::0,StartConnection::ImmxGisUtilityNetworkCommon.Connection,
In a perfect world here is what I would like my text file to look like (OUTPUT)
InnerDiameterOrWidth, InnerHeight, Length2dCenterToCenter,,,,,,,,,,,
0.1,0.1,44.6743867864386
so one header line and the values in column
UPDATED JSON Info
The end of each line has JSON formatted text:
{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
WHich I need to split into X Y Z and X Y Z with headers
Maybe something like this (assuming that each line has the same keys, and in the same order):
import csv
with open("diam.csv", "rb") as fin, open("diam_out.csv", "wb") as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
split = [item.split("::") for item in line if item.strip()]
if not split: # blank line
continue
keys, vals = zip(*split)
if i == 0:
# first line: write header
writer.writerow(keys)
writer.writerow(vals)
which produces
localhost-2:coding $ cat diam_out.csv
InnerDiameterOrWidth,InnerHeight,Length2dCenterToCenter,Length3dCenterToCenter,Length2dToInsideEdge,Length3dToInsideEdge,Length2dToOutsideEdge,Length3dToOutsideEdge,MinimumCover,MaximumCover,StartConnection
0.1,0.1,44.6743867864386,44.6768028159989,44.2678260053526,44.2717800813466,44.6743867864386,44.6768028159989,0,0,ImmxGisUtilityNetworkCommon.Connection
I think most of that code should make sense, except maybe the zip(*split) trick: that basically transposes a sequence, i.e.
>>> s = [['a','1'],['b','2']]
>>> zip(*s)
[('a', 'b'), ('1', '2')]
so that the elements are now grouped together by their index (the first ones are all together, the second, etc.)