get the first row in multiple files in python - python

I'm trying to iterate over files to get the first line and put it in a database.
The problem that I have is that i can't figure out how to only read 1 row.
Now it runs over all the rows while I only need 1 row per file.
The files looks like this:
batch_name sample_barcode pool_barcode pool_type pooling_volume_ul pooling_comments
NIPT20200304 0101002253 PT2129764 A 2.0
NIPT20200304 0109011474 PT2129764 A 17.66
And my code is currently this:
pools = []
for files in library:
with open(files, 'r') as f:
next(f)
reader = csv.reader(f, delimiter='\t')
for row in reader:
print("row: ", row)
pools.append(row)
print("pools", pools)
with the for row in reader, it shows me all the rows, and with row[0], I only get the first column, but still get all the rows. i tried f.readline.().rstrip() but then I don't know where to put the delimiter as "\t" shows up in the pools variable instead of a space.

I got what I wanted thanks with this:
#read over the pool_report files and get the first line
pools = []
for files in library:
with open(files, 'r') as f:
next(f)
data = f.readline().strip()
values = data.split()
pools.append(values)
print("pools: ", pools)

Try to call next(reader) instead of next(f):
pools = []
for files in library:
with open(files, 'r') as f:
reader = csv.reader(f, delimiter='\t')
next(reader, None)
for row in reader:
print("row: ", row)
pools.append(row)
print("pools", pools)

You dont need teh last for loop, like below
pools = []
for files in library:
with open(files, 'r') as f:
next(f)
reader = csv.reader(f, delimiter='\t')
pools.append(next(reader))
print("pools", pools)

Related

Compare two CSV files and write difference in the same file as an extra column in python

Hey intelligent community,
I need a little bit of help because i think i don't see the the wood in the trees.
i have to CSV files that look like this:
Name,Number
AAC;2.2.3
AAF;2.4.4
ZCX;3.5.2
Name,Number
AAC;2.2.3
AAF;2.4.4
ZCX;3.5.5
I would like to compare both files and than write any changes like this:
Name,Number,Changes
AAC;2.2.3
AAF;2.4.4
ZCX;5.5.5;change: 3.5.2
So on every line when there is a difference in the number, i want to add this as a new column at the end of the line.
The Files are formated the same but sometimes have a new row so thats why i think i have to map the keys.
I come this far but now iam lost in my thoughts:
Python 3.10.9
import csv
Reading the first csv and set mapping
with open('test1.csv', 'r') as csvfile:
reader= csv.reader(csvfile)
rows = list(reader)
file1_dict = {row[1]: row[0] for row in rows}
Reading the second csv and set mapping
with open('test2.csv', 'r') as csvfile:
reader= csv.reader(csvfile)
rows = list(reader)
file2_dict = {row[1]: row[0] for row in rows}
comparing the keys and find the diff
for k in test1_dict:
if test1_dict[k] != test2:dict[k]
test1_dict[k] = test2_dict[k]
for row in rows:
if row[1] == k:
row.append(test2_dict[k])
#write the csv (not sure how to add the word "change:")
with open('test1.csv', 'w', newline ='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(rows)
If i try this, i don't get a new column, it just "updates" the csv file with the same columns.
For example this code gives me the diff row but i'am not able to just add it to existing file and row.
with open('test1.csv') as fin1:
with open('test2.csv') as fin2:
read1 = csv.reader(fin1)
read2 = csv.reader(fin2)
diff_rows = (row1 for row1, row2 in zip(read1, read2) if row1 != row2)
with open('test3.csv', 'w') as fout:
writer = csv.writer(fout)
writer.writerows(diff_rows)
Does someone have any tips or help for my problem? I read many answers on here but can't figure it out.
Thanks alot.
#bigkeefer
Thanks for your answer, i tried to change it for the delimiter ; but it gives an "list index out of range error".
with open('test3.csv', 'r') as file1:
reader = csv.reader(file1, delimiter=';')
rows = list(reader)[1:]
file1_dict = {row[0]: row[1] for row in rows}
with open('test4.csv', 'r') as file2:
reader = csv.reader(file2, delimiter=';')
rows = list(reader)[1:]
file2_dict = {row[0]: row[1] for row in rows}
new_file = ["Name;Number;Changes\n"]
with open('output.csv', 'w') as nf:
for key, value in file1_dict.items():
if value != file2_dict[key]:
new_file.append(f"{key};{file2_dict[key]};change: {value}\n")
else:
new_file.append(f"{key};{value}\n")
nf.writelines(new_file)
You will need to adapt this to overwrite your first file etcetera, as you mentioned above, but I've left it like this for your testing purposes. Hopefully this will help you in some way.
I've assumed you've actually got the headers above in each file. If not, remove the slicing on the list creations, and change the new_file variable assignment to an empty list ([]).
with open('f1.csv', 'r') as file1:
reader = csv.reader(file1, delimiter=";")
rows = list(reader)[1:]
file1_dict = {row[0]: row[1] for row in rows if row}
with open('f2.csv', 'r') as file2:
reader = csv.reader(file2, delimiter=";")
rows = list(reader)[1:]
file2_dict = {row[0]: row[1] for row in rows if row}
new_file = ["Name,Number,Changes\n"]
for key, value in file1_dict.items():
if value != file2_dict[key]:
new_file.append(f"{key};{file2_dict[key]};change: {value}\n")
else:
new_file.append(f"{key};{value}\n")
with open('new.csv', 'w') as nf:
nf.writelines(new_file)

Iterating over DictReader variable [duplicate]

I open a file and read it with csv.DictReader. I iterate over it twice, but the second time nothing is printed. Why is this, and how can I make it work?
with open('MySpreadsheet.csv', 'rU') as wb:
reader = csv.DictReader(wb, dialect=csv.excel)
for row in reader:
print row
for row in reader:
print 'XXXXX'
# XXXXX is not printed
You read the entire file the first time you iterated, so there is nothing left to read the second time. Since you don't appear to be using the csv data the second time, it would be simpler to count the number of rows and just iterate over that range the second time.
import csv
from itertools import count
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
row_count = count(1)
for row in reader:
next(count)
print(row)
for i in range(row_count):
print('Stack Overflow')
If you need to iterate over the raw csv data again, it's simple to open the file again. Most likely, you should be iterating over some data you stored the first time, rather than reading the file again.
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print(row)
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print('Stack Overflow')
If you don't want to open the file again, you can seek to the beginning, skip the header, and iterate again.
with open('MySpreadsheet.csv', 'rU') as f:
reader = csv.DictReader(f, dialect=csv.excel)
for row in reader:
print(row)
f.seek(0)
next(reader)
for row in reader:
print('Stack Overflow')
You can create a list of dictionaries, each dictionary representing a row in your file, and then count the length of the list, or use list indexing to print each dictionary item.
Something like:
with open('YourCsv.csv') as csvfile:
reader = csv.DictReader(csvfile)
rowslist = list(reader)
for i in range(len(rowslist))
print(rowslist[i])
add a wb.seek(0) (goes back to the start of the file) and next(reader) (skips the header row) before your second loop.
You can try store the dict in list and output
input_csv = []
with open('YourCsv.csv', 'r', encoding='UTF-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
input_csv.append(row)
for row in input_csv:
print(row)
for row in input_csv:
print(row)

Leaving just the first column in the csv file using python

I am trying to leave just the first column of a csv file. But it seems not to be working for me and can not find the working solution.
def leavethefirstcolumn(filename):
with open(filename) as f, open('out.csv', "w") as out:
reader = csv.reader(f)
for row in reader:
out.write(row[0])
import csv
def leavethefirstcolumn(filename):
with open(filename) as file, open('out.csv', "w") as out:
reader = csv.reader(file)
for row in reader:
out.write(row[0] + "\n")
# example of using the function
leavethefirstcolumn("in.csv")
You are calling csv.reader(file) while on the previous line, you wrote with open(filename) as f instead of with open(filename) as file.
Also when you are writing to out, you should add a new line
character '\n'

reading data from specified location

import CSV
#Get high temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
highs = []
for row in reader:
highs.append(row[1])
print(highs)
I encountered the code above when I am learning about extract and reading data.
I didn’t quite get the usage of next():
header_row = next(reader)
The book explains that because we have already read the header row,the loop will begins at the second line where the actual data begins
What to do if we need to read from the third line? Is the following right?
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
row_1 = next(reader)
highs = []
for row in reader:
highs.append(row[2])
Print(highs)
The question might be frivolous, but I’m very confused
The next function moves the cursor one row ahead, so yes in the code segment
header_row = next(reader)
row_1 = next(reader)
highs = []
for row in reader:
highs.append(row[2])
Print(highs)
The line does start from the third line though it's not the best way to do it.
If you want to access rows directly try this instead
with open(filename) as f:
reader = csv.reader(f)
rows = list(reader)
print rows[2] # this will get you the third row

How to write a matrix (read from one file) to another csv file in python with a specific output format

I have a csv file which has data in matrix format a sample of which is shown below:
index,col1,col2,col3,col4,col5,col6
col1_1,1,0.005744233,0.013118052,-0.003772589,0.004284689
col2_1,0.005744233,1,-0.013269414,-0.007132092,0.013950261
col3_1,0.013118052,-0.013269414,1,-0.014029249,-0.00199437
col4_1,-0.003772589,-0.007132092,-0.014029249,1,0.022569309
col5_1,0.004284689,0.013950261,-0.00199437,0.022569309,1
No I want to read the data in this file and write it to another csv file but the format I need is this:
col1_1,value,col1
col1_1,value,col2
col1_1,value,col3
.
.
.
col2_1,value,col1
col2_1,value,col2
.
.
.
So basically 1st element will be the column names in 1st column followed by value for that column and element in 1st row.
I wrote this code but it just writes in the wrong format:
reader = csv.reader(open(IN_FILE, "r"), delimiter=',')
writer = csv.writer(open(OUT_FILE, "w"), delimiter=',')
with open(IN_FILE) as infile:
with open(OUT_FILE, "w") as outfile:
reader = csv.reader(infile, delimiter=",")
writer = csv.writer(outfile, delimiter=",")
writer.writerow(next(reader))
for line in reader:
writer.writerow([line[0],line[1]])
How can I do this in python?
Try this:
reader = csv.reader(open(IN_FILE, "r"), delimiter=',')
writer = csv.writer(open(OUT_FILE, "w"), delimiter=',')
with open(IN_FILE) as infile:
with open(OUT_FILE, "w") as outfile:
reader = csv.reader(infile, delimiter=",")
writer = csv.writer(outfile, delimiter=",")
first_row = None
for line in reader:
if first_row is None:
first_row = line
else:
for index, col in enumerate(first_row[1:]):
writer.writerow([line[0],line[index + 1],col])
This seems to work. Although your test data looked to be missing a 'col6'.
The problem with your initial code was that it wasn't looping through each column the rows.
If your file includes the column and row indices like I assume, this should do it.
old_data = reader
new_data = []
for row in xrange(0,len(old_data)):
for col in xrange(0,len(row)):
if (not row == 0 and not col == 0):
new_data.append([old_data[row][0],old_data[row][col],old_data[0][col]])
writer.writerows(new_data)
csv_file.close()

Categories

Resources