How to dynamically join two CSV files?

How to dynamically join two CSV files? - python

I have two csv files and I was thinking about combining them via python - to practice my skill, and it turns out much more difficult than I ever imagined...
A simple conclusion of my problem: I feel like my code should be correct but the edited csv file turns out not to be what I thought.
One file, which I named as chrM_location.csv is the file that I want to edit.
The first file looks like this
The other file, named chrM_genes.csv is the file that I take reference at.
The second file looks like this:
There are a few other columns but I'm not using them at the moment. The first few roles are subject "CDS", then there is a blank row, followed by a few other roles with subject "exon", then another blank row, followed by some rows "genes" (and a few others).
What I tried to do is, I want to read first file row by row, focus on the number in the second column (42 for row 1 without header), see if it belongs to the range of 4-5 columns in file two (also read row by row), then if it is, I record the information of that corresponding row, and paste it back to the first file, at the end of the row, if not, I skip it.
below is my code, where I set out to run everything through the CDS section first, so I wrote a function refcds(). It returns me with:
whether or not the value is in range;
if in range, it forms a list of the information I want to paste to the second file.
Everything works fine for the main part of the code, I have the list final[] containing all information of that row, supposedly I only need to past it on that row and overwrite everything before. I used print(final) to check the info and it seems like just what I want it to be.
but this is what the result looks like:
I have no idea why a new row is inserted and why some rows are pasted here together, when column 2 is supposedly small -> large according to value.
similar things happened in other places as well.
Thank you so much for your help! I'm running out of solution... No error messages are given and I couldn't really figure out what went wrong.
import csv
from csv import reader
from csv import writer
mylist=[]
a=0
final=[]
def refcds(value):
mylist=[]
with open("chrM_genes.csv", "r") as infile:
r = csv.reader(infile)
for rows in r:
for i in range(0,12):
if value >= rows[3] and value <= rows[4]:
mylist = ["CDS",rows[3],rows[4],int(int(value)-int(rows[3])+1)]
return 0, mylist
else:
return 1,[]
with open('chrM_location.csv','r+') as myfile:
csv_reader = csv.reader(myfile)
csv_writer = csv.writer(myfile)
for row in csv_reader:
if (row[1]) != 'POS':
final=[]
a,mylist = refcds(row[1])
if a==0:
lista=[row[0],row[1],row[2],row[3],row[4],row[5]]
final.extend(lista)
final.extend(mylist),
csv_writer.writerow(final)
if a==1:
pass
if (row[1]) == 'END':
break
myfile.close()```

If I understand correctly - your code is trying to read and write to the same file at the same time.
csv_reader = csv.reader(myfile)
csv_writer = csv.writer(myfile)
I haven't tried your code: but I'm pretty sure this is going to cause weird stuff to happen... (If you refactor and output to a third file - do you still see the same issue?)

I think the problem is that you have your reader and writer set to the same file—I have no idea what that does. A much cleaner solution is to accumulate your modified rows in the read loop, then once you're out of the read loop (and have closed the file), open the same file for writing (not appending) and write your accumulated rows.
I've made the one big change that fixes the problem.
You also said you were trying to improve your Python, so I made some other changes that are more pythonic.
import csv
# Return a matched list, or return None
def refcds(value):
with open('chrM_genes.csv', 'r', newline='') as infile:
reader = csv.reader(infile)
for row in reader:
if value >= row[3] and value <= row[4]:
computed = int(value)-int(row[3])+1 # probably negative??
mylist = ['CDS', row[3], row[4], computed]
return mylist
return None # if we get to this return, we've evaluated every row and didn't already return (because of a match)
# Accumulate rows here
final_rows = []
with open('chrM_location.csv', 'r', newline='') as myfile:
reader = csv.reader(myfile)
# next(reader) ## if you know your file has a header
for row in reader:
# Show unusual conditions first...
if row[1] == 'POS':
continue # skip header??
if row[1] == 'END':
break
# ...and if not met, do desired work
mylist = refcds(row[1])
if mylist is not None:
# no need to declare an empty list and then extend it
# just create it with initial items...
final = row[0:6] # use slice notation to get a subset of a list (6 non-inclusive, so only to 5th col)
final.extend(mylist)
final_rows.append(final)
# Write accumulated rows here
with open('final.csv', 'w', newline='') as finalfile:
writer = csv.writer(finalfile)
writer.writerows(final_rows)
I also tried to figure out the whole thing, and came up with the following...
I think you want to look up rows of chrM_genes by Subject and compare a POS (from chrM_locaction) against Start and End bound for each gene, if POS is within the range of Start and End, return the chrM_gene data and fill in some empty cells already in chrM_location.
My first step would be to create a data structure from chrM_genes, since we'll be reading from that over and over again. Reading a bit into your problem, I can see the need to "filter" the results by subject ('CDS','exon', etc...), but I'm not sure of this. Still, I'm going to index this data structure by subject:
import csv
from collections import defaultdict
# This will create a dictionary, where subject will be the key
# and the value will be a list (of chrM (gene) rows)
chrM_rows_by_subject = defaultdict(list)
# Fill the data structure
with open('chrM_genes.csv', newline='') as f:
reader = csv.reader(f)
next(reader) # read (skip) header
subject_col = 2
for row in reader:
# you mentioned empty rows, that divide subjects, so skip empty rows
if row == []:
continue
subject = row[subject_col]
chrM_rows_by_subject[subject].append(row)
I mocked up chrM_genes.csv (and added a header, to try and clarify the structure):
Col1,Col2,Subject,Start,End
chrM,ENSEMBL,CDS,3307,4262
chrM,ENSEMBL,CDS,4470,5511
chrM,ENSEMBL,CDS,5904,7445
chrM,ENSEMBL,CDS,7586,8266
chrM,ENSEMBL,exon,100,200
chrM,ENSEMBL,exon,300,400
chrM,ENSEMBL,exon,700,750
Just printing the data structure to get an idea of what it's doing:
import pprint
pprint.pprint(chrM_rows_by_subject)
yields:
defaultdict(<class 'list'>,
{'CDS': [['chrM', 'ENSEMBL', 'CDS', '3307', '4262'],
['chrM', 'ENSEMBL', 'CDS', '4470', '5511'],
...
],
'exon': [['chrM', 'ENSEMBL', 'exon', '100', '200'],
['chrM', 'ENSEMBL', 'exon', '300', '400'],
...
],
})
Next, I want a function to match a row by subject and POS:
# Return a row that matches `subject` with `pos` between Start and End; or return None.
def match_gene_row(subject, pos):
rows = chrM_rows_by_subject[subject]
pos = int(pos)
start_col = 3
end_col = 4
for row in rows:
start = row[start_col])
end = row[end_col])
if pos >= start and pos <= end:
# return just the data we want...
return row
# or return nothing at all
return None
If I run these commands to test:
print(match_gene_row('CDS', '42'))
print(match_gene_row('CDS', '4200'))
print(match_gene_row('CDS', '7586'))
print(match_gene_row('exon', '500'))
print(match_gene_row('exon', '399'))
I get :
['chrM', 'ENSEMBL', 'CDS', '3307', '4262']
['chrM', 'ENSEMBL', 'CDS', '3307', '4262']
['chrM', 'ENSEMBL', 'CDS', '7586', '8266']
None # exon: 500
['chrM', 'ENSEMBL', 'exon', '300', '400']
Read chrM_location.csv, and build a list of rows with matching gene data.
final_rows = [] # accumulate all rows here, for writing later
with open('chrM_location.csv', newline='') as f:
reader = csv.reader(f)
# Modify header
header = next(reader)
header.extend(['CDS','Start','End','cc'])
final_rows.append(header)
# Read rows and match to genes
pos_column = 1
for row in reader:
pos = row[pos_column]
matched_row = match_gene_row('CDS', pos) # hard-coded to CDS
if matched_row is not None:
subj, start, end = matched_row[2:5]
computed = str(int(pos)-int(start)+1) # this is coming out negative??
row.extend([subj, start, end, computed])
final_rows.append(row)
Finally, write.
with open('final.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(final_rows)
I mocked up chrM_location.csv:
name,POS,id,Ref,ALT,Frequency
chrM,41,.,C,T,0.002498
chrM,42,rs377245343,T,TC,0.001562
chrM,55,.,TA,T,0.00406
chrM,55,.,T,C,0.001874
When I run the whole thing, I get a final.csv that looks likes this:
name
POS
id
Ref
ALT
Frequency
CDS
Start
End
sequence_cc
chrM
41
.
C
T
0.002498
CDS
3307
4262
-3265
chrM
42
rs377245343
T
TC
0.001562
CDS
3307
4262
-3264
chrM
55
.
TA
T
0.00406
CDS
4470
5511
-4414
chrM
55
.
T
C
0.001874
CDS
4470
5511
-4414
I put this all together in a Gist.

Related

Read csv file with empty lines

Analysis software I'm using outputs many groups of results in 1 csv file and separates the groups with 2 empty lines.
I would like to break the results in groups so that I can then analyse them separately.
I'm sure there is a built-in function in python (or one of it's libraries) that does this, I tried this piece of code that I found somewhere but it doesn't seem to work.
import csv
results = open('03_12_velocity_y.csv').read().split("\n\n")
# Feed first csv.reader
first_csv = csv.reader(results[0], delimiter=',')
# Feed second csv.reader
second_csv = csv.reader(results[1], delimiter=',')
Update:
The original code actually works, but my python skills are pretty limited and I did not implement it properly.
.split(\n\n\n) method does work but the csv.reader is an object and to get the data in a list (or something similar), it needs to iterate through all the rows and write them to the list.
I then used Pandas to remove the header and convert the scientific notated values to float. Code is bellow. Thanks everyone for help.
import csv
import pandas as pd
# Open the csv file, read it and split it when it encounters 2 empty lines (\n\n\n)
results = open('03_12_velocity_y.csv').read().split('\n\n\n')
# Create csv.reader objects that are used to iterate over rows in a csv file
# Define the output - create an empty multi-dimensional list
output1 = [[],[]]
# Iterate through the rows in the csv file and append the data to the empty list
# Feed first csv.reader
csv_reader1 = csv.reader(results[0].splitlines(), delimiter=',')
for row in csv_reader1:
output1.append(row)
df = pd.DataFrame(output1)
# remove first 7 rows of data (the start position of the slice is always included)
df = df.iloc[7:]
# Convert all data from string to float
df = df.astype(float)

If your row counts are inconsistent across groups, you'll need a little state machine to check when you're between groups and do something with the last group.
#!/usr/bin/env python3
import csv
def write_group(group, i):
with open(f"group_{i}.csv", "w", newline="") as out_f:
csv.writer(out_f).writerows(group)
with open("input.csv", newline="") as f:
reader = csv.reader(f)
group_i = 1
group = []
last_row = []
for row in reader:
if row == [] and last_row == [] and group != []:
write_group(group, group_i)
group = []
group_i += 1
continue
if row == []:
last_row = row
continue
group.append(row)
last_row = row
# flush remaining group
if group != []:
write_group(group, group_i)
I mocked up this sample CSV:
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
g2r1c1,g2r1c2,g2r1c3
g2r2c1,g2r2c2,g2r2c3
g3r1c1,g3r1c2,g3r1c3
g3r2c1,g3r2c2,g3r2c3
g3r3c1,g3r3c2,g3r3c3
g3r4c1,g3r4c2,g3r4c3
g3r5c1,g3r5c2,g3r5c3
And when I run the program above I get three CSV files:
group_1.csv
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
group_2.csv
g2r1c1,g2r1c2,g2r1c3
g2r2c1,g2r2c2,g2r2c3
group_3.csv
g3r1c1,g3r1c2,g3r1c3
g3r2c1,g3r2c2,g3r2c3
g3r3c1,g3r3c2,g3r3c3
g3r4c1,g3r4c2,g3r4c3
g3r5c1,g3r5c2,g3r5c3

If your row counts are consistent, you can do this with fairly vanilla Python or using the Pandas library.
Vanilla Python
Define your group size and the size of the break (in "rows") between groups.
Loop over all the rows adding each row to a group accumulator.
When the group accumulator reaches the pre-defined group size, do something with it, reset the accumulator, and then skip break-size rows.
Here, I'm writing each group to its own numbered file:
import csv
group_sz = 5
break_sz = 2
def write_group(group, i):
with open(f"group_{i}.csv", "w", newline="") as f_out:
csv.writer(f_out).writerows(group)
with open("input.csv", newline="") as f_in:
reader = csv.reader(f_in)
group_i = 1
group = []
for row in reader:
group.append(row)
if len(group) == group_sz:
write_group(group, group_i)
group_i += 1
group = []
for _ in range(break_sz):
try:
next(reader)
except StopIteration: # gracefully ignore an expected StopIteration (at the end of the file)
break
group_1.csv
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
g1r4c1,g1r4c2,g1r4c3
g1r5c1,g1r5c2,g1r5c3
With Pandas
I'm new to Pandas, and learning this as I go, but it looks like Pandas will automatically trim blank rows/records from a chunk of data^1.
With that in mind, all you need to do is specify the size of your group, and tell Pandas to read your CSV file in "iterator mode", where you can ask for a chunk (your group size) of records at a time:
import pandas as pd
group_sz = 5
with pd.read_csv("input.csv", header=None, iterator=True) as reader:
i = 1
while True:
try:
df = reader.get_chunk(group_sz)
except StopIteration:
break
df.to_csv(f"group_{i}.csv")
i += 1
Pandas add an "ID" column and default header when it writes out the CSV:
group_1.csv
,0,1,2
0,g1r1c1,g1r1c2,g1r1c3
1,g1r2c1,g1r2c2,g1r2c3
2,g1r3c1,g1r3c2,g1r3c3
3,g1r4c1,g1r4c2,g1r4c3
4,g1r5c1,g1r5c2,g1r5c3

TRY this out with your output:
import pandas as pd
# csv file name to be read in
in_csv = 'input.csv'
# get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))
# size of rows of data to write to the csv,
# you can change the row size according to your need
rowsize = 500
# start looping through data writing it to a new file for each set
for i in range(1,number_lines,rowsize):
df = pd.read_csv(in_csv,
header=None,
nrows = rowsize,#number of rows to read at each loop
skiprows = i)#skip rows that have been read
#csv to write data to a new file with indexed name. input_1.csv etc.
out_csv = 'input' + str(i) + '.csv'
df.to_csv(out_csv,
index=False,
header=False,
mode='a', #append data to csv file
)

I updated the question with the last details that answer my question.

find the maximum value in multiple fields within a single column in a file using python

I have multiple files formatted as shown below and tab delimited. The bing column contains multiple values as shown below. File is named file1.txt
#searchE google yahoo bing
1 0 2 h=1;d=-0.2;f=0.5;i=3
1 0 2 h=2;d=-0.6;f=1.2;i=1
What I am trying to do is to find the maximum value of d under the bing column. For this case the max will be -0.2 and print out the entire row with all the headers and values. I also wish to have the filename from which the row originated from printed at the start of the row. My final file will contain something like:
#searchE google yahoo bing
file1.txt 1 0 2 h=1;d=-0.2;f=0.5;i=3
I am currently stuck and not even sure how to proceed. This is what I have so far:
def main(infile, outFile):
firstfile = []
rIndex = 0;
cIndex = 0;
ignore = 1;
prefix = "";
with open(infile) as f:
for line in f:
rows = line.split("\t");
if(rows[0] == "#searchE"):
ignore = 0;
elif(ignore == 1):
prefix += line
if(ignore == 0):
for i in range(len(rows)):
rows[i] = rows[i].strip();
if i = 4 and 'd=' in rows[i]:
return
I am learning, I think I am on the right path but still a long way to go. Please if you could give some explanation on an answer to help me learn, I will appreciate it. Thank you in advance

Using the standard's library's csv module and max with a key function and assuming the fourth column is as regular as you present it:
import csv
with open(infile) as f:
reader = csv.reader(f, delimiter='\t')
next(reader) # only if you have to skip headers
final_row = max(reader, key=lambda r: float(r[3].split(';')[1].split('=')[1]))
with open(outfile, 'wb') as f2:
writer = csv.writer(f, delimiter='\t')
writer.writerow([outfile] + final_row)
reader is an iterator producing lists of strings. Each string is a field in a csv row. The key function of the max call takes the 4th field ([3]) of a row, splits it on ';', takes the second token of the resulting list ('d=x'), splits again, this time on '=', takes the second part of that split and finally turns it into a float. This float is used to determine the row with the max value: final_row.

Code swap. How would I swap the value of one CSV file column to another?

I have two CSV files. The first file(state_abbreviations.csv) has only states abbreviations and their full state names side by side(like the image below), the second file(test.csv) has the state abbreviations with additional info.
I want to replace each state abbreviation in test.csv with its associated state full name from the first file.
My approach was to read reach file, built a dict of the first file(state_abbreviations.csv). Read the second file(test.csv), then compare if an abbreviation matches the first file, if so replace it with the full name.
Any help is appreacited
import csv
state_initials = ("state_abbr")
state_names = ("state_name")
state_file = open("state_abbreviations.csv","r")
state_reader = csv.reader(state_file)
headers = None
final_state_initial= []
for row in state_reader:
if not headers:
headers = []
for i, col in enumerate(row):
if col in state_initials:
headers.append(i)
else:
final_state_initial.append((row[0]))
print final_state_initial
headers = None
final_state_abbre= []
for row in state_reader:
if not headers:
headers = []
for i, col in enumerate(row):
if col in state_initials:
headers.append(i)
else:
final_state_abbre.append((row[1]))
print final_state_abbre
final_state_initial
final_state_abbre
state_dictionary = dict(zip(final_state_initial, final_state_abbre))
print state_dictionary

You almost got it, the approach that is - building out a dict out of the abbreviations is the easiest way to do this:
with open("state_abbreviations.csv", "r") as f:
# you can use csv.DictReader() instead but lets strive for performance
reader = csv.reader(f)
next(reader) # skip the header
# assuming the first column holds the abbreviation, second the full state name
state_map = {state[0]: state[1] for state in reader}
Now you have state_map containing a map of all your state abbreviations, for example: state_map["FL"] contains Florida.
To replace the values in your test.csv, tho, you'll either have to load the whole file into memory, parse it, do the replacement and save it, or create a temporary file and stream-write to it the changes, then overwrite the original file with the temporary file. Assuming that test.csv is not too big to fit into your memory, the first approach is much simpler:
with open("test.csv", "r+U") as f: # open the file in read-write mode
# again, you can use csv.DictReader() for convenience, but this is significantly faster
reader = csv.reader(f)
header = next(reader) # get the header
rows = [] # hold our rows
if "state" in header: # proceed only if `state` column is found in the header
state_index = header.index("state") # find the state column index
for row in reader: # read the CSV row by row
current_state = row[state_index] # get the abbreviated state value
# replace the abbreviation if it exists in our state_map
row[state_index] = state_map.get(current_state, current_state)
rows.append(row) # append the processed row to our `rows` list
# now lets overwrite the file with updated data
f.seek(0) # seek to the file begining
f.truncate() # truncate the rest of the content
writer = csv.writer(f) # create a CSV writer
writer.writerow(header) # write back the header
writer.writerows(rows) # write our modified rows

It seems like you are trying to go through the file twice? This is absolutely not necessary: the first time you go through you are already reading all the lines, so you can then create your dictionary items directly.
In addition, comprehension can be very useful when creating lists or dictionaries. In this case it might be a bit less readable though. The alternative would be to create an empty dictionary, start a "real" for-loop and adding all the key:value pairs manually. (i.e: with state_dict[row[abbr]] = row[name])
Finally, I used the with statement when opening the file to ensure it is safely closed when we're done with it. This is good practice when opening files.
import csv
with open("state_abbreviations.csv") as state_file:
state_reader = csv.DictReader(state_file)
state_dict = {row['state_abbr']: row['state_name'] for row in state_reader}
print(state_dict)
Edit: note that, like the code you showed, this only creates the dictionary that maps abbreviations to state names. Actually replacing them in the second file would be the next step.

Step 1: Ask Python to remember the abbreviated full names, so we are using dictionary for that
with open('state_abbreviations.csv', 'r') as f:
csvreader = csv.reader(f)
next(csvreader)
abs = {r[0]: r[1] for r in csvreader}
step 2: Replace the abbreviations with full names and write to an output, I used "test_output.csv"
with open('test.csv', 'r') as reading:
csvreader = csv.reader(reading)
next(csvreader)
header = ['name', 'gender', 'birthdate', 'address', 'city', 'state']
with open( 'test_output.csv', 'w' ) as f:
writer = csv.writer(f)
writer.writerow(header)
for a in csvreader:
writer.writerow(a[0], a[1], a[2], a[3], a[4], abs[a[5]])

Remove columns + keep certain rows in multiple large .csv files using python

Hello I'm really new here as well as in the world of python.
I have some (~1000) .csv files, including ~ 1800000 rows of information each. The files are in the following form:
5302730,131841,-0.29999999999999999,NULL,2013-12-31 22:00:46.773
5303072,188420,28.199999999999999,NULL,2013-12-31 22:27:46.863
5350066,131841,0.29999999999999999,NULL,2014-01-01 00:37:21.023
5385220,-268368577,4.5,NULL,2014-01-01 03:12:14.163
5305752,-268368587,5.1900000000000004,NULL,2014-01-01 03:11:55.207
So, i would like for all of the files:
(1) to remove the 4th (NULL) column
(2) to keep in every file only certain rows (depending on the value of the first column i.e.5302730, keep only the rows that containing that value)
I don't know if this is even possible, so any answer is appreciated!
Thanks in advance.

Have a look at the csv module
One can use the csv.reader function to generate an iterator of lines, with each lines cells as a list.
for line in csv.reader(open("filename.csv")):
# Remove 4th column, remember python starts counting at 0
line = line[:3] + line[4:]
if line[0] == "thevalueforthefirstcolumn":
dosomethingwith(line)

If you wish to do this sort of operation with CSV files more than once and want to use different parameters regarding column to skip, column to use as key and what to filter on, you can use something like this:
import csv
def read_csv(filename, column_to_skip=None, key_column=0, key_filter=None):
data_from_csv = []
with open(filename) as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
# Skip data in specific column
if column_to_skip is not None:
del row[column_to_skip]
# Filter out rows where the key doesn't match
if key_filter is not None:
key = row[key_column]
if key_filter != key:
continue
data_from_csv.append(row)
return data_from_csv
def write_csv(filename, data_to_write):
with open(filename, 'w') as csvfile:
csv_writer = csv.writer(csvfile)
for row in data_to_write:
csv_writer.writerow(row)
data = read_csv('data.csv', column_to_skip=3, key_filter='5302730')
write_csv('data2.csv', data)

Python: General CSV file parsing and manipulation

The purpose of my Python script is to compare the data present in multiple CSV files, looking for discrepancies. The data are ordered, but the ordering differs between files. The files contain about 70K lines, weighing around 15MB. Nothing fancy or hardcore here. Here's part of the code:
def getCSV(fpath):
with open(fpath,"rb") as f:
csvfile = csv.reader(f)
for row in csvfile:
allRows.append(row)
allCols = map(list, zip(*allRows))
Am I properly reading from my CSV files? I'm using csv.reader, but would I benefit from using csv.DictReader?
How can I create a list containing whole rows which have a certain value in a precise column?

Are you sure you want to be keeping all rows around? This creates a list with matching values only... fname could also come from glob.glob() or os.listdir() or whatever other data source you so choose. Just to note, you mention the 20th column, but row[20] will be the 21st column...
import csv
matching20 = []
for fname in ('file1.csv', 'file2.csv', 'file3.csv'):
with open(fname) as fin:
csvin = csv.reader(fin)
next(csvin) # <--- if you want to skip header row
for row in csvin:
if row[20] == 'value':
matching20.append(row) # or do something with it here
You only want csv.DictReader if you have a header row and want to access your columns by name.

This should work, you don't need to make another list to have access to the columns.
import csv
import sys
def getCSV(fpath):
with open(fpath) as ifile:
csvfile = csv.reader(ifile)
rows = list(csvfile)
value_20 = [x for x in rows if x[20] == 'value']

If I understand the question correctly, you want to include a row if value is in the row, but you don't know which column value is, correct?
If your rows are lists, then this should work:
testlist = [row for row in allRows if 'value' in row]
post-edit:
If, as you say, you want a list of rows where value is in a specified column (specified by an integer pos, then:
testlist = []
pos = 20
for row in allRows:
testlist.append([element if index != pos else 'value' for index, element in enumerate(row)])
(I haven't tested this, but let me now if that works).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to dynamically join two CSV files? - python

Related

Read csv file with empty lines

find the maximum value in multiple fields within a single column in a file using python

Code swap. How would I swap the value of one CSV file column to another?

Remove columns + keep certain rows in multiple large .csv files using python

Python: General CSV file parsing and manipulation

Categories

Resources