I am having two csv files where I need a python code to do a vlookup that does match the values and takes only the needed column and creates a new csv file. I know it can be done with pandas but I need it to do this without pandas or any 3rd party tools.
INPUT 1 csv file
ID NAME SUBJECT
1 Raj CS
2 Allen PS
3 Bradly DP
4 Tim FS
INPUT 2 csv file
ID COUNTRY TIME
2 USA 1:00
4 JAPAN 14:00
1 ENGLAND 5:00
3 CHINA 0.00
OUTPUT csv file
ID NAME SUBJECT COUNTRY
1 Raj CS ENGLAND
2 Allen PS USA
3 Bradly DP CHINA
4 Tim FS JAPAN
Probably a more efficient way to do it, but basically create a nested dictionary (using the ID as the key) with the other column names and their values under the ID key. Then when you iterate through each file, it'll update the dictionary on the ID key.
Finally put them together into a list and write to file:
input_files = ['C:/test/input_1.csv', 'C:/test/input_2.csv']
lookup_column_name = 'ID'
output_dict = {}
for file in input_files:
file = open(file, 'r')
header = {}
# Read each line in the csv
for idx, line in enumerate(file.readlines()):
# If it's the first line, store as the header
if idx == 0:
header = line.split(',')
# Get the index value of the lookup column from the list of headers
header_dict = {idx:x.strip() for idx, x in enumerate(header)}
lookup_column_idx = dict((v,k) for k,v in header_dict.items())[lookup_column_name]
continue
line_split = line.split(',')
# Initialize the dictionary by look up column
if line_split[lookup_column_idx] not in output_dict.keys():
output_dict[line_split[lookup_column_idx]] = {}
# If not the lookup column, then add the other column and data to the dictionary
for idx, value in enumerate(line_split):
if idx != lookup_column_idx:
output_dict[line_split[lookup_column_idx]].update({header_dict[idx]:value})
# Create a list of the rows that will be written to file under the correct columns
rows = []
for k, v in output_dict.items():
header = [lookup_column_name] + list(v.keys())
row = [k] + [output_dict[k][x].strip() for x in header if x != lookup_column_name]
row = ','.join(row) + '\n'
rows.append(row)
# Final list of rows, begining with the header
output_lines = [','.join(header) + '\n'] + rows
# writing to file
output = open('C:/test/output.csv', 'w')
output.writelines(output_lines)
output.close()
To do this without pandas (and assuming you know the structure of your data + it fits in memory), you can iterate through the csv file and store the results in a dictionary, where you fill the entries where the ID maps to the other information that you want to keep.
You can do this for both csv files and join them manually afterwards by iterating over the keys of the dictionary.
input1='.\file1.csv'
input2='.\file2.csv'
with open(input1,'r',encoding='utf-8-sig') as inuputlist:
with open(input2, "r",encoding='utf-8-sig') as inputlist1:
with open('.\output.csv','w',newline='',encoding='utf-8-sig') as output:
reader = csv.reader(inputlist)
reader2 = csv.reader(inputlist1)
writer = csv.writer(output)
dict1 = {}
for xl in reader2:
dict1[xl[0]] = xl[1]
for i in reader:
if i[2] in dict1:
i.append(dict1[i[2]])
writer.writerow(i)
else:
i.append("N/A")
writer.writerow(i)
Related
I have found other posts very closely related to this, but they are not helping.
I have a Master CSV file, and I need to find specific 'string' from the second column. Shown below:
Name,ID,Title,Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
Joshua Morales,MF6B9X,Tech_Rep, 08-Nov-2016,948,740,8,8
Betty García,ERTW77,SME, 08-Nov-2016,965,854,15,12
Kathleen Marrero,KTD684,Probation, 08-Nov-2016,946,948,na,na
Mark León,GSL89D,Tech_Rep, 08-Nov-2016,951,844,6,4
The ID column is unique, and so I was trying to find 'KTD684'(for expample). Once found, I need to export the values of "Date", "Prj1_Assigned", "Prj1_closed", "Prj2_assigned" and "Prj2_solved".
The export would be to a file 'KTD684.csv'(same as ID) where there is already headers available 'Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved'
So far (as I am a non-programmer) I have not been able to draft this, but can one please be kind to guide me in:
Finding the row with the element 'KTD684'.
Selecting the values of the below from that row:
['Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved']
Appending the file with the ID name itself please('KTD684.csv')
I need to perform this for 45 userIDs, and now with hiring in company, its 195. I tried to write excel macro(didn't work either), but I feel python is most reliable.
I know I need to at least show the basic progress, but after over 2 months of trying to learn from someone, I'm still unable to find the element in this csv.
If I understand your problem correctly; You need to read from 2 input files:
1 containing the users IDs you are looking for
2 containing the project data related to users
In that fashion something like this would find all the users you specify in 1 in file 2 and write them out to result.csv
Sepicify your search IDs in search_for.csv. Keep in mind that this
will revrite your result.csv every time you run it.
import csv
import sys
import os
inputPatterns = open(os.curdir + '/search_for.csv', 'rt')
# Reader for the IDs (users) you are looking to find (key)
reader = csv.reader(inputPatterns)
ids = []
# reading the IDs you are looking for from search_for.csv
for row in reader:
ids.append(row[0])
inputPatterns.close()
# Let's see if any of the user IDs we are looking for has any project related info
# if so write them to your output CSV
for userID in ids:
# Organization list with names and Company ID and reader
userList = open(os.curdir + '/users.csv', 'rt')
reader = csv.reader(userList)
# This will be the output file
result_f = open(os.curdir + "/" + userID + ".csv", 'w')
w = csv.writer(result_f)
# Writing header information
w.writerow(['Date', 'Prj1_Assigned', 'Prj1_closed', 'Prj2_assigned', 'Prj2_solved'])
# Scanning for projects for user and appending them
for row in reader:
if userID == row[1]:
w.writerow([row[3], row[4], row[5], row[6], row[7]])
result_f.close()
userList.close()
For example, search_for.csv looks like this
This is an ideal use-case for pandas:
import pandas as pd
id_list = ['KTD684']
df = pd.read_csv('input.csv')
# Only keep values that are in 'id_list'
df = df[df['ID'].isin(id_list)]
gb = df.groupby('ID')
for name, group in gb:
with open('{}.csv'.format(name), 'a') as f:
group.to_csv(f, header=False, index=False,
columns=["Date", "Prj1_Assigned", "Prj1_closed",
"Prj2_assigned", "Prj2_solved"])
This will open the CSV, only select rows that are in your list (id_list), group by the values in the ID column and save individual CSV files for each unique ID. You just need to expand id_list to have the ids you are interested in.
Extended example:
Reading in the CSV results in a DataFrame object like this:
df = pd.read_csv('input.csv')
Name ID Title Date Prj1_Assigned \
0 Joshua Morales MF6B9X Tech_Rep 08-Nov-2016 948
1 Betty García ERTW77 SME 08-Nov-2016 965
2 Kathleen Marrero KTD684 Probation 08-Nov-2016 946
3 Mark León GSL89D Tech_Rep 08-Nov-2016 951
Prj1_closed Prj2_assigned Prj2_solved
0 740 8 8
1 854 15 12
2 948 na na
3 844 6 4
If you just select KTD684 and GSL89D:
id_list = ['KTD684', 'GSL89D']
df = df[df['ID'].isin(id_list)]
Name ID Title Date Prj1_Assigned \
2 Kathleen Marrero KTD684 Probation 08-Nov-2016 946
3 Mark León GSL89D Tech_Rep 08-Nov-2016 951
Prj1_closed Prj2_assigned Prj2_solved
2 948 na na
3 844 6 4
The groupby operation groups on ID and export each unique ID to a CSV file resulting in:
KTD684.csv
Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,946,948,na,na
GSL89D.csv
Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,951,844,6,4
Here's a pure python approach which reads the master .csv file with csv.DictReader, matches the ids, and appends the file data into a new or existing .csv file with csv.DictWriter():
from csv import DictReader
from csv import DictWriter
from os.path import isfile
def export_csv(user_id, master_csv, fieldnames, key_id, extension=".csv"):
filename = user_id + extension
file_exists = isfile(filename)
with open(file=master_csv) as in_file, open(
file=filename, mode="a", newline=""
) as out_file:
# Create reading and writing objects
csv_reader = DictReader(in_file)
csv_writer = DictWriter(out_file, fieldnames=fieldnames)
# Only write header once
if not file_exists:
csv_writer.writeheader()
# Go through lines and match ids
for line in csv_reader:
if line[key_id] == user_id:
# Modify line and append to file
line = {k: v.strip() for k, v in line.items() if k in fieldnames}
csv_writer.writerow(line)
Which can be called like this:
export_csv(
user_id="KTD684",
master_csv="master.csv",
fieldnames=["Date", "Prj1_Assigned", "Prj1_closed", "Prj2_assigned", "Prj2_solved"],
key_id="ID",
)
And produces the following KTD684.csv:
Date,Prj1_Assigned,Prj1_closed,Prj2_assigned,Prj2_solved
08-Nov-2016,946,948,na,na
I am new to Python and I prepared a script that will modify the following csv file
accordingly:
1) Each row that contains multiple Gene entries separated by the /// such as:
C16orf52 /// LOC102725138 1.00551
should be transformed to:
C16orf52 1.00551
LOC102725138 1.00551
2) The same gene may have different ratio values
AASDHPPT 0.860705
AASDHPPT 0.983691
and we want to keep only the pair with the highest ratio value (delete the pair AASDHPPT 0.860705)
Here is the script I wrote but it does not assign the correct ratio values to the genes:
import csv
import pandas as pd
with open('2column.csv','rb') as f:
reader = csv.reader(f)
a = list(reader)
gene = []
ratio = []
for t in range(len(a)):
if '///' in a[t][0]:
s = a[t][0].split('///')
gene.append(s[0])
gene.append(s[1])
ratio.append(a[t][1])
ratio.append(a[t][1])
else:
gene.append(a[t][0])
ratio.append(a[t][1])
gene[t] = gene[t].strip()
newgene = []
newratio = []
for i in range(len(gene)):
g = gene[i]
r = ratio[i]
if g not in newgene:
newgene.append(g)
for j in range(i+1,len(gene)):
if g==gene[j]:
if ratio[j]>r:
r = ratio[j]
newratio.append(r)
for i in range(len(newgene)):
print newgene[i] + '\t' + newratio[i]
if len(newgene) > len(set(newgene)):
print 'missionfailed'
Thank you very much for any help or suggestion.
Try this:
with open('2column.csv') as f:
lines = f.read().splitlines()
new_lines = {}
for line in lines:
cols = line.split(',')
for part in cols[0].split('///'):
part = part.strip()
if not part in new_lines:
new_lines[part] = cols[1]
else:
if float(cols[1]) > float(new_lines[part]):
new_lines[part] = cols[1]
import csv
with open('clean_2column.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for k, v in new_lines.items():
writer.writerow([k, v])
First of all, if you're importing Pandas, know that you have I/O Tools to read CSV files.
So first, let's import it that way :
df = pd.read_csv('2column.csv')
Then, you can extract the indexes where you have your '///' pattern:
l = list(df[df['Gene Symbol'].str.contains('///')].index)
Then, you can create your new rows :
for i in l :
for sub in df['Gene Symbol'][i].split('///') :
df=df.append(pd.DataFrame([[sub, df['Ratio(ifna vs. ctrl)'][i]]], columns = df.columns))
Then, drop the old ones :
df=df.drop(df.index[l])
Then, I'll do a little trick to remove your lowest duplicate values. First, I'll sort them by 'Ratio (ifna vs. ctrl)' then I'll drop all the duplicates but the first one :
df = df.sort('Ratio(ifna vs. ctrl)', ascending=False).drop_duplicates('Gene Symbol', keep='first')
If you want to keep your sorting by Gene Symbol and reset indexes to have simpler ones, simply do :
df = df.sort('Gene Symbol').reset_index(drop=True)
If you want to re-export your modified data to your csv, do :
df.to_csv('2column.csv')
EDIT : I edited my answer to correct syntax errors, I've tested this solution with your csv and it worked perfectly :)
This should work.
It uses the dictionary suggestion of Peter.
import csv
with open('2column.csv','r') as f:
reader = csv.reader(f)
original_file = list(reader)
# gets rid of the header
original_file = original_file[1:]
# create an empty dictionary
genes_ratio = {}
# loop over every row in the original file
for row in original_file:
gene_name = row[0]
gene_ratio = row[1]
# check if /// is in the string if so split the string
if '///' in gene_name:
gene_names = gene_name.split('///')
# loop over all the resulting compontents
for gene in gene_names:
# check if the component is in the dictionary
# if not in dictionary set value to gene_ratio
if gene not in genes_ratio:
genes_ratio[gene] = gene_ratio
# if in dictionary compare value in dictionary to gene_ratio
# if dictionary value is smaller overwrite value
elif genes_ratio[gene] < gene_ratio:
genes_ratio[gene] = gene_ratio
else:
if gene_name not in genes_ratio:
genes_ratio[gene_name] = gene_ratio
elif genes_ratio[gene_name] < gene_ratio:
genes_ratio[gene_name] = gene_ratio
#loop over dictionary and print gene names and their ratio values
for key in genes_ratio:
print key, genes_ratio[key]
I am trying to save below dictionary to csv. Format of my dictionary content is:
dict = {0:[u'ab', u'cd'], 1:[u'b'], 2: [u'Ge', u'TT'], 3: [u'Stas'], 4: [u'sap', u'd3', u'ch99']}.
My code is:
with open('Cr_pt.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(dict.keys())
writer.writerows(zip(*dict.values())).
Below is the format I am trying to save in csv.
1 2 3 4
ab Ge Stas sap
cd TT d3
ch99
However I am getting only few values from dictionary in my final csv such as:
1 2 3 4
ab Ge Stas sap
Firstly, you should avoid using dict as a variable name as it shadows a built-in.
The problem at hand is straightforward to solve with pandas;
import pandas as pd
df = pd.DataFrame.from_dict(dict, orient='index').T.to_csv('Cr_pt.csv', index=False)
Output:
$ cat Cr_pt.csv
0,1,2,3,4
ab,b,Ge,Stas,sap
cd,,TT,,d3
,,,,ch99
Here's one solution that writes to a CSV file directly without using the csv module and delimits each row with commas, creating a file of comma-separated values (CSV):
data = {0:[u'ab', u'cd'], 1:[u'b'], 2: [u'Ge', u'TT'], 3: [u'Stas'], 4: [u'sap', u'd3', u'ch99']}
delimiter = ","
with open('data.csv', 'w') as data_file:
keys = [str(key) for key in data.keys()]
values = data.values()
max_values = max([len(value) for value in values])
data_file.write(delimiter.join(keys) + "\n")
for line in range(0, max_values):
line_values = []
for key in keys:
try:
line_values.append(data[int(key)][line])
except:
line_values.append("")
data_file.write(delimiter.join(line_values) + "\n")
Output:
0,1,2,3,4
ab,b,Ge,Stas,sap
cd,,TT,,d3
,,,,ch99
In this format, it would be easier to read the data back from the file to another script, since the values are only comma-separated, not delimited by variable length whitespaces.
I have a problem with appending data for missing rows in the csv file: I am reading rows from a csv file for each customer and appending lists with the data the rows have. Each customer needs to have the same id's that are highlighted in green in the example image. If the next customer doesn't have the rows with all needed id's, I still need to append 0 values to the lists for these missing rows. So the customer highlighted in yellow needs to have same number of values appended to the data lists as the one in green.
I am trying to read each row and compare its id with the list of all possible id's that I created, but I am always stuck on the first id and not sure if this is the right way to go and read the previous row again until it's id is equal to the id from the list for possible id's (I do this to add the missing row's data to the list). Please let me know if you have any suggestions?
Note: if take into consideration only the column with id's, for these two customers I would like the list to look like this: list_with_ids = [410, 409, 408, 407, 406, 405, 403, 402, **410, 409, 408, 407, 406, 405, 403, 402**]. So I am looking for a way - once I am on row 409 in yellow - to first append the first needed id 410, and only then 409 and so forth. And same - append the two missing ids at the end: 403, 402.
Code:
def write_data(workbook):
[...]
# Lists.
list_cust = []
list_quantity = [] # from Some_data columns
# Get the start row in the csv file.
for row in range(worksheet.nrows):
base_id = str(410)
value = worksheet.cell(row, 1).value
start = str(value)
if base_id [0] == start[0]:
num_of_row_for_id = row
# Append the first id.
first_cust = str(worksheet.cell(num_of_row_for_id, 0).value)
list_cust.append(first_cust)
# Needed to count id's.
count = 0
# List with all needed id's for each customer.
# instead of ... - all ids' in green from the image.
all_ids = [....]
# Get data.
for row in range(worksheet.nrows):
next_id = str(worksheet.cell(num_of_row_for_id, 1).value)
cust = str(worksheet.cell(num_of_row_for_id, 0).value)
# Append id to the list.
list_cust.append(cust)
# Needed to separate rows for each customer.
if list_cust[len(list_cust)-1] == list_cust[len(list_cust)-2]:
# Get data: I read columns to get data.
# Let's say I read col 4 to 21.
for col_num in range(3, 20):
# Here is the prolem: ############################
if next_id != all_ids[count]:
list_quantity.append(0)
if next_id == all_ids[count]:
qty = worksheet.cell(num_of_row_for_id, col_num).value
list_quantity.append(qty)
# Get the next row in reverse order.
num_of_row_for_id -= 1
# Increment count for id's index.
if list_cust[len(list_cust)-1] == list_cust[len(list_cust)-2]:
# 8 possible id's.
if count < 7:
count += 1
else:
count = 0
Consider the following data wrangling with list comprehensions and loops using following data input containing random data columns:
Input Data
# Cust ID Data1 Data2 Data3 Data4 Data5
# 2011 62,404 0.269101238 KPT 0.438881697 UAX 0.963170513
# 2011 62,405 0.142397746 XYD 0.51668728 PTQ 0.761695425
# 2011 62,406 0.782342616 QCN 0.259141256 FNX 0.870971924
# 2011 62,407 0.221750017 EIU 0.358439487 MAN 0.13633062
# 2011 62,408 0.097509568 CRU 0.410058705 BFK 0.680228327
# 2011 62,409 0.322871333 LAC 0.489425167 GUX 0.449476844
# 919 62,403 0.371461633 PUR 0.626146074 KWX 0.525711736
# 919 62,404 0.384859932 AJZ 0.223408599 JSU 0.914916663
# 919 62,405 0.020630503 SFY 0.260778598 VUU 0.213559498
# 919 62,406 0.952425138 EBI 0.59595738 ZYU 0.283794413
# 919 62,407 0.410368534 BTT 0.252698401 FFY 0.41080646
# 919 62,408 0.553390336 GMA 0.846309022 BIN 0.049852419
# 919 62,409 0.193437955 NBB 0.877311494 XQX 0.080656637
Python code
import csv
i = 0
data = []
# READ CSV AND CAPTURE HEADERS AND DATA
with open('Input.csv', 'r') as f:
rdr = csv.reader(f)
for line in rdr:
if i == 0:
headers = line
else:
line[1] = int(line[1].replace(',',''))
data.append(line)
i += 1
# CREATE NEEDED LISTS
cust_list = list(set([i[0] for i in data]))
id_list = [62402,62403,62404,62405,62406,62407,62408,62409,62410]
# CAPTURE MISSING IDS BY CUSTOMER
for c in cust_list:
currlist = [d[1] for d in data if d[0] == c]
missingids = [i for i in id_list if i not in currlist]
for m in missingids:
data.append([c, m,'','','','',''])
# WRITE DATA TO NEW CSV IN SORTED ORDER
with open('Output.csv', 'w') as f:
wtr = csv.writer(f, lineterminator='\n')
wtr.writerow(headers)
for c in cust_list:
for i in sorted(id_list, reverse=True):
for d in data:
if d[0] == c and d[1] == i:
wtr.writerow(d)
Output Data
Consider even Python third-party modules such as pandas, the data analysis package; and even an SQL solution using pyodbc since Windows' built-in Jet/ACE SQL Engine can query CSV files directly.
You will notice below and previous solution, quite a bit of handling is needed to remove the thousand comma separators in the ID column as modules consider them as string first. If you remove such commas from original csv file, you can reduce lines of code.
Pandas (with left merge on two dataframes)
import pandas as pd
df = pd.read_csv('Input.csv')
cust_list = df['Cust'].unique()
id_list = [62402,62403,62404,62405,62406,62407,62408,62409,62410]
ids = pd.DataFrame({'Cust': [int(c) for i in id_list for c in cust_list],
'ID': [int(i) for i in id_list for c in cust_list]})
df['ID'] = df['ID'].str.replace(',','').astype(int)
df = ids.merge(df, on=['Cust', 'ID'], how='left').\
sort_values(['Cust', 'ID'], ascending=[True, False])
df.to_csv('Output_pandas.csv', index=False)
PyODBC (works only for Windows machines using left join on two csv files)
import pyodbc
conn = pyodbc.connect(r'Driver=Microsoft Access Text Driver (*.txt, *.csv);' + \
'DBQ=C:\Path\To\CSV\Files;Extensions=asc,csv,tab,txt;',
autocommit=True)
cur = conn.cursor()
cust_list = [i[0] for i in cur.execute("SELECT DISTINCT c.Cust FROM Input.csv c")]
id_list = [62402,62403,62404,62405,62406,62407,62408,62409,62410]
cur.close()
with open('ID_list.csv', 'w') as f:
wtr = csv.writer(f, lineterminator='\n')
wtr.writerow(['Cust', 'ID'])
for item in [[int(c),int(i)] for c in cust_list for i in id_list]:
wtr.writerow(item)
i = 0
with open('Input.csv', 'r') as f1, open('Input_without_commas.csv', 'w') as f2:
rdr = csv.reader(f1); wtr = csv.writer(f2, lineterminator='\n')
for line in rdr:
if i > 0:
line[1] = int(line[1].replace(',',''))
wtr.writerow(line)
i += 1
strSQL = "SELECT i.Cust, i.ID, c.Data1, c.Data2, c.Data3, c.Data4, c.Data5 " +\
" FROM ID_list.csv i" +\
" LEFT JOIN Input_without_commas.csv c" +\
" ON i.Cust = c.Cust AND i.ID = c.ID" +\
" ORDER BY i.Cust, i.ID DESC"
cur = conn.cursor()
with open('Output_sql.csv', 'w') as f:
wtr = csv.writer(f, lineterminator='\n')
wtr.writerow(['Cust', 'ID', 'Data1', 'Data2', 'Data3', 'Data4', 'Data5'])
for i in cur.execute(strSQL):
wtr.writerow(i)
cur.close()
conn.close()
Output (for both above solutions)
unique.txt file contains: 2 columns with columns separated by tab. total.txt file contains: 3 columns each column separated by tab.
I take each row from unique.txt file and find that in total.txt file. If present then extract entire row from total.txt and save it in new output file.
###Total.txt
column a column b column c
interaction1 mitochondria_205000_225000 mitochondria_195000_215000
interaction2 mitochondria_345000_365000 mitochondria_335000_355000
interaction3 mitochondria_345000_365000 mitochondria_5000_25000
interaction4 chloroplast_115000_128207 chloroplast_35000_55000
interaction5 chloroplast_115000_128207 chloroplast_15000_35000
interaction15 2_10515000_10535000 2_10505000_10525000
###Unique.txt
column a column b
mitochondria_205000_225000 mitochondria_195000_215000
mitochondria_345000_365000 mitochondria_335000_355000
mitochondria_345000_365000 mitochondria_5000_25000
chloroplast_115000_128207 chloroplast_35000_55000
chloroplast_115000_128207 chloroplast_15000_35000
mitochondria_185000_205000 mitochondria_25000_45000
2_16595000_16615000 2_16585000_16605000
4_2785000_2805000 4_2775000_2795000
4_11395000_11415000 4_11385000_11405000
4_2875000_2895000 4_2865000_2885000
4_13745000_13765000 4_13735000_13755000
My program:
file=open('total.txt')
file2 = open('unique.txt')
all_content=file.readlines()
all_content2=file2.readlines()
store_id_lines = []
ff = open('match.dat', 'w')
for i in range(len(all_content)):
line=all_content[i].split('\t')
seq=line[1]+'\t'+line[2]
for j in range(len(all_content2)):
if all_content2[j]==seq:
ff.write(seq)
break
Problem:
but istide of giving desire output (values of those 1st column that fulfile the if condition). i nead somthing like if jth of unique.txt == ith of total.txt then write ith row of total.txt into new file.
import csv
with open('unique.txt') as uniques, open('total.txt') as total:
uniques = list(tuple(line) for line in csv.reader(uniques))
totals = {}
for line in csv.reader(total):
totals[tuple(line[1:])] = line
with open('output.txt', 'w') as outfile:
writer = csv.writer(outfile)
for line in uniques:
writer.writerow(totals.get(line, []))
I will write your code in this way:
file=open('total.txt')
list_file = list(file)
file2 = open('unique.txt')
list_file2 = list(file2)
store_id_lines = []
ff = open('match.dat', 'w')
for curr_line_total in list_file:
line=curr_line_total.split('\t')
seq=line[1]+'\t'+ line[2]
if seq in list_file2:
ff.write(curr_line_total)
Please, avoid readlines() and use the with syntax when you open your files.
Here is explained why you don't need to use readlines()