I have two lists:
x = [['a','b','c'], ['d','e','f'], ['g','h','i']]
y = [['j','k','l'], ['m','n','o'], ['p','q','r']]
I'd like to write lists x and y to a CSV file such that it reads in columns:
Col 1:
a
b
c
Col 2:
j
k
l
Col 3:
d
e
f
Col 4:
m
n
o
etc. I'm not really sure how to do this.
You can use zip to do the transpose and csv to create your output file, eg:
x = [['a','b','c'], ['d','e','f'], ['g','h','i']]
y = [['j','k','l'], ['m','n','o'], ['p','q','r']]
from itertools import chain
import csv
res = zip(*list(chain.from_iterable(zip(x, y))))
with open(r'yourfile.csv', 'wb') as fout:
csvout = csv.writer(fout)
csvout.writerows(res)
If you have unequal lengths, then you may wish to look at itertools.izip_longest and specify a suitable fillvalue= instead of using the builtin zip
Related
So I have this list called data_lst:
[1,2,3,4]
[3,2,4,1]
[4,3,1,2]
and I want my output to be the numbers by columns, like
[1,3,4]
[2,2,3]
[3,4,1]
[4,1,2]
I have attempted to do this so far...
f = []
for x in data_lst:
for w in range(0,len(x)-1):
f.append(data_lst[w])
print(f)
However, the output I'm getting is the same as the input I provided, which is..
[1,2,3,4]
[3,2,4,1]
[4,3,1,2]
What should I change in my code?
The operation is called transpose of a matrix.
You can do it an variety of ways like using numpy, with zip etc.
This is just a solution modifying your code.
data_lst = [
[1,2,3,4],
[3,2,4,1],
[4,3,1,2]
]
rowLen = len(data_lst)
colLen = len(data_lst[0])
f = [[0 for _ in range(rowLen)] for _ in range(colLen)]
for c in range(colLen):
for r in range(rowLen):
f[c][r] = data_lst[r][c]
print(f)
I am trying to compare two csv files to look for common values in column 1.
import csv
f_d1 = open('test1.csv')
f_d2 = open('test2.csv')
f1_csv = csv.reader(f_d1)
f2_csv = csv.reader(f_d2)
for x,y in zip(f1_csv,f2_csv):
print(x,y)
I am trying to compare x[0] with y[0]. I am fairly new to python and trying to find the most pythonic way to achieve the results. Here is the csv files.
test1.csv
Hadrosaurus,1.2
Struthiomimus,0.92
Velociraptor,1.0
Triceratops,0.87
Euoplocephalus,1.6
Stegosaurus,1.4
Tyrannosaurus Rex,2.5
test2.csv
Euoplocephalus,1.87
Stegosaurus,1.9
Tyrannosaurus Rex,5.76
Hadrosaurus,1.4
Deinonychus,1.21
Struthiomimus,1.34
Velociraptor,2.72
I believe you're looking for the set intersection:
import csv
f_d1 = open('test1.csv')
f_d2 = open('test2.csv')
f1_csv = csv.reader(f_d1)
f2_csv = csv.reader(f_d2)
x = set([item[0] for item in f1_csv])
y = set([item[0] for item in f2_csv])
print(x & y)
Assuming that the files are not prohibitively large, you can read both of them with a CSV reader, convert the first columns to sets, and calculate the set intersection:
with open('test1.csv') as f:
set1 = set(x[0] for x in csv.reader(f))
with open('test2.csv') as f:
set2 = set(x[0] for x in csv.reader(f))
print(set1 & set2)
#{'Hadrosaurus', 'Euoplocephalus', 'Tyrannosaurus Rex', 'Struthiomimus',
# 'Velociraptor', 'Stegosaurus'}
I added a line to test whether the numerical values in each row are the same. You can modify this to test whether, for instance, the values are within some distance of each other:
import csv
f_d1 = open('test1.csv')
f_d2 = open('test2.csv')
f1_csv = csv.reader(f_d1)
f2_csv = csv.reader(f_d2)
for x,y in zip(f1_csv,f2_csv):
if x[1] == y[1]:
print('they match!')
Take advantage of the defaultdict in Python and you can iterate both the files and maintain the count in a dictionary like this
from collections import defaultdict
d = defaultdict(list)
for row in f1_csv:
d[row[0]].append(row[1])
for row in f2_csv:
d[row[0]].append(row[1])
d = {k: d[k] for k in d if len(d[k]) > 1}
print(d)
Output:
{'Hadrosaurus': ['1.2', '1.4'], 'Struthiomimus': ['0.92', '1.34'], 'Velociraptor': ['1.0', '2.72'],
'Euoplocephalus': ['1.6', '1.87'], 'Stegosaurus': ['1.4', '1.9'], 'Tyrannosaurus Rex': ['2.5', '5.76']}
I have a tab delimited file text file with two columns. I need to find a way that prints all values that “hit” each other to one line.
For example, my input looks like this:
A B
A C
A D
B C
B D
C D
B E
D E
B F
C F
F G
F H
H I
K L
My desired output should look like this:
A B C D
B D E
B C F
F G H
H I
K L
My actual data file is much larger than this if that makes any difference. I would prefer to do this in Unix or Python where possible.
Can anybody help?
Thanks in advance!
There's no way to put input file as .csv? It would be easier to parse delimiters.
If it wouldn't be posible, try next example:
from itertools import groupby
from operator import itemgetter
with open('example.txt','rb') as txtfile:
cleaned = []
#store file information in a list of lists
for line in txtfile.readlines():
cleaned.append(line.split())
#group py first element of nested list
for elt, items in groupby(cleaned, itemgetter(0)):
row = [elt]
for item in items:
row.append(item[1])
print row
Hope it helps you.
Solution using a .csv file:
from itertools import groupby
from operator import itemgetter
import csv
with open('example.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
for row in reader:
cleaned.append(row) #group py first element of nested list
for elt, items in groupby(cleaned, itemgetter(0)):
row = [elt]
for item in items:
row.append(item[1])
print row
So far all I've been able to find are questions relating to merging lists together.
I'm looking to take 3 lists (2 lists and one 2D list)
list1 = [[1,11,12],[2,21,22],[3,31,32]]
list2 = [4,5,6]
list3 = [7,8,9]
And write these values to a CSV file resulting in the following // Desired Output
Row1,Row2,Row3
1,4,7
2,5,8
3,6,9
11,12 etc shouldn't be written to the CSV file.
Code:
f = open('file.csv', 'wb')
fileWriter = csv.writer(f)
fileWriter.writerow(['Row1', 'Row2', 'Row3']) # Write header row
Attempt:
listlen = len(list2)
list1len = len(list1)
for i in range(listlen):
for j in range(listlen):
for k in range(list1len):
fileWriter.writerow([list1[k][0],list2[i],list3[j]])
Results:
Row1,Row2,Row3
1,4,7
2,4,7
3,4,7
Attempt:
for A,B,C in list1:
for X in list2:
for Y in list3:
tempA = A
tempX = X
tempY = Y
fileWriter.writerow([tempA,tempX,tempY])
Results:
Row1,Row2,Row3
1,4,7
1,4,8
1,4,9
etc.
The current code (both attempts) is iterating through all the values in list2 and list3 for each single digit value in list1. All I've managed to do between the two attempts is change the order of the numbers that are written out.
What I'd like to do is write out the 1st values of each list, then the 2nd etc. to give the desired output.
How could I adjust this to give the desired output / are there better alternatives?
Python 2.7 only
Added the solution I created from Abhijit's answer:
result = zip(zip(*list1)[0], list2, list3)
for row in result:
tempRow = row
fileWriter.writerow(tempRow)
Assuming you are already aware, to write a csv , and the part you might be wondering is an elegant way to access the data.
A nifty way to access the data is use the built-in zip. zip([iterable, ...]) has the elegant ability to transpose data, and that what you would need here
>>> zip(zip(*list1)[0], list2, list3)
[(1, 4, 7), (2, 5, 8), (3, 6, 9)]
Unfortunately, the target result you are contemplating is a formatted data rather than a csv, so even if you are writing the csv as a tab delimited, the result may not be desirable
>>> result = [['row1','row2','row3']] + zip(zip(*list1)[0], list2, list3)
>>> with open('file.csv', 'wb') as fin:
writer = csv.writer(fin, delimiter = '\t')
writer.writerows(result)
row1 row2 row3
1 4 7
2 5 8
3 6 9
One option is to forego the idea of csv and instead write the data as a formatted string using str.format.
>>> header = ['row1','row2','row3']
>>> result = zip(zip(*list1)[0], list2, list3)
>>> format_string = "{:<10}{:<10}{:<10}"
>>> with open('file.csv', 'wb') as fin:
fin.write(format_string.format(*header))
fin.write('\n')
for row in result:
fin.write(format_string.format(*row))
fin.write('\n')
row1 row2 row3
1 4 7
2 5 8
3 6 9
I have an "asin.txt" document:
in,Huawei1,DE
out,Huawei2,UK
out,Huawei3,none
in,Huawei4,FR
in,Huawei5,none
in,Huawei6,none
out,Huawei7,IT
I'm opening this file and make an OrderedDict:
from collections import OrderedDict
reader = csv.reader(open('asin.txt','r'),delimiter=',')
reader1 = csv.reader(open('asin.txt','r'),delimiter=',')
d = OrderedDict((row[0], row[1].strip()) for row in reader)
d1 = OrderedDict((row[1], row[2].strip()) for row in reader1)
Then I want to create variables (a,b,c,d) so if we take the first line of the asin.txt it should be like: a = in; b = Huawei1; c = Huawei1; d = DE. To do this I'm using a "for" loop:
from itertools import izip
for (a, b), (c, d) in izip(d.items(), d1.items()): # here
try:
.......
It worked before, but now, for some reason, it prints an error:
d = OrderedDict((row[0], row[1].strip()) for row in reader)
IndexError: list index out of range
How do I fix that?
Probably you have a row in your textfile which does not have at least two fields delimited by ",". E.g.:
in,Huawei1
Try to find the solution along these lines:
d = OrderedDict((row[0], row[1].strip()) for row in reader if len(row) >= 2)
or
l = []
for row in reader:
if len(row) >= 2:
l.append(row[0], row[1].strip())
d = OrderedDict(l)