Csv file 2d array and txt file 2d array comparison - python

def read_text_file(filename):
with open(filename, 'r') as file:
array = [line.split() for line in file]
return array
array = read_text_file("file.txt")
print(array[0][1])
Gives a list out of range error
import csv
with open('file.csv', 'r') as file:
reader = csv.reader(file)
data = list(reader)
print(data[0][1])
Why is it the csv file to 2d array works and the txt file doesn't? Just curious

Without seeing your inputs we really cannot say for sure, still I can try to approximate what's going on.
Given these TXT and CSV files:
file1.txt file1.csv
========= =========
foo FOO foo,FOO
bar BAR bar,BAR
file2.txt file2.csv
========= =========
foo foo
bar bar
If your program were to read file1.txt and file1.csv, array and data would look like:
[['foo', 'FOO'], ['bar', 'BAR']]
array[0]/data[0] is ['foo', 'FOO'], and the second item from that, [1], is FOO. All is well.
If your program were to read file2.txt and file2.csv, array and data would look like:
[['foo'], ['bar']]
array[0]/data[0] is ['foo'], and the second item from that, [1], doesn't exist. Error, [1] is out of range.
Here's a similar program I wrote that gives me those results:
import csv
def main():
print("reading TXT files:")
print("==================")
for fname in ["file1.txt", "file2.txt"]:
with open(fname) as f_txt:
array = [line.split() for line in f_txt]
try_second_of_first(fname, array)
print()
print("reading CSV files:")
print("==================")
for fname in ["file1.csv", "file2.csv"]:
with open(fname, newline="") as f_csv:
data = list(csv.reader(f_csv))
try_second_of_first(fname, data)
def try_second_of_first(fname, stuff):
"""Try to print the second "column" from the first line/row of stuff."""
print(f"{fname}: ", end="")
try:
print(stuff[0][1])
except Exception as e:
print(f"could not read {stuff}[0][1]: {e}")
if __name__ == "__main__":
main()
reading TXT files:
==================
file1.txt: FOO
file2.txt: could not read [['foo'], ['bar']][0][1]: list index out of range
reading CSV files:
==================
file1.csv: FOO
file2.csv: could not read [['foo'], ['bar']][0][1]: list index out of range
So, there's nothing different in principle between processing your TXT and CSV files, you just appear to be using differently "shaped" data between the two kinds that's giving different results.

Related

Nested for loop doesn't work in python while reading a same csv file

I'm a beginner in python, and tried to find solution by googling. However, I couldn't find any solution that I wanted.
What I'm trying to do with python is pre-processing of data that finds keywords and get all rows that include keyword from a large csv file.
And somehow the nested loop goes through just once and then it doesn't go through on second loop.
The code shown below is a part of my code that finds keywords from the csv file and writes into text file.
def main():
#Calling file (Directory should be changed)
data_file = 'dataset.json'
#Loading data.json file
with open(data_file, 'r') as fp:
data = json.load(fp)
#Make the list for keys
key_list = list(data.keys())
#print(key_list)
preprocess_txt = open("test_11.txt", "w+", -1, "utf-8")
support_fact = 0
for i, k in enumerate(key_list):
count = 1
#read csv, and split on "," the line
with open("my_csvfile.csv", 'r', encoding = 'utf-8') as csvfile:
reader = csv.reader(csvfile)
#The number of q_id is 2
#This is the part that the nested for loop doesn't work!!!!!!!!!!!!!!!!!!!!!!!!!!!!
if len(data[k]['Qids']) == 2:
print("Number 2")
for m in range(len(data[k]['Qids'])):
print(len(data[k]['Qids']))
q_id = [data[k]['Qids'][m]]
print(q_id)
for row in reader: #--->This nested for loop doesn't work after going through one loop!!!!!
if all([x in row for x in q_id]):
print("YES!!!")
preprocess_txt.write("%d %s %s %s\n" % (count, row[0], row[1], row[2]))
count += 1
For the details of above code,
First, it extracts all keys from data.json file, and then put those keys into list(key_list).
Second, I used all([x in row for x in q_id]) method to check each row which contains a keyword(q_id).
However, as I commented above in the code, when the length of data[k]['Qids'] has 2, it prints out YES!!! at first loop correctly, but doesn't print out YES!!!at second loop which means it doesn't go into for row in reader loop even though that csv file contains the keyword.
The figure of print is shown as below,
What did I do wrong..? or what should I add for the code to make it work..?
Can anybody help me out..?
Thanks for looking!
For sake of example, let's say I have a CSV file which looks like this:
foods.csv
beef,stew,apple,sauce
apple,pie,potato,salami
tomato,cherry,pie,bacon
And the following code, which is meant to simulate the structure of your current code:
def main():
import csv
keywords = ["apple", "pie"]
with open("foods.csv", "r") as file:
reader = csv.reader(file)
for keyword in keywords:
for row in reader:
if keyword in row:
print(f"{keyword} was in {row}")
print("Done")
main()
The desired result is that, for every keyword in my list of keywords, if that keyword exists in one of the lines in my CSV file, I will print a string to the screen - indicating in which row the keyword has occurred.
However, here is the actual output:
apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
Done
>>>
It was able to find both instances of the keyword apple in the file, but it didn't find pie! So, what gives?
The problem
The file handle (in your case csvfile) yields its contents once, and then they are consumed. Our reader object wraps around the file-handle and consumes its contents until they are exhausted, at which point there will be no rows left to read from the file (the internal file pointer has advanced to the end), and the inner for-loop will not execute a second time.
The solution
Either move the interal file pointer to the beginning using seek after each iteration of the outer for-loop, or read the contents of the file once into a list or similar collection, and then iterate over the list instead:
Updated code:
def main():
import csv
keywords = ["apple", "pie"]
with open("foods.csv", "r") as file:
contents = list(csv.reader(file))
for keyword in keywords:
for row in contents:
if keyword in row:
print(f"{keyword} was in {row}")
print("Done")
main()
New output:
apple was in ['beef', 'stew', 'apple', 'sauce']
apple was in ['apple', 'pie', 'potato', 'salami']
pie was in ['apple', 'pie', 'potato', 'salami']
pie was in ['tomato', 'cherry', 'pie', 'bacon']
Done
>>>
I believe that your reader variable contains only the first line of your csv file, thus for row in reader executes only once.
try:
with open("my_csvfile.csv", newline='', 'r', encoding = 'utf-8') as csvfile:
newline='' is the new argument introduced above.
reference: https://docs.python.org/3/library/csv.html#id3
Quote: "If csvfile is a file object, it should be opened with newline=''

When does 1<>1 when comparing one CSV as a lookup to another CSV? [duplicate]

This question already has answers here:
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
(15 answers)
Closed 4 years ago.
I want to use file1.csv as a lookup for file2.csv. Anything that comes up should print the entire row from file2.csv.
However, as I loop trough the rows of file2.csv and evaluate my lookup to my data I am unable to get the line variable to equal my second column (row1). What do I appear to be missing?
import csv
import sys
file1 = 'file1.csv'
file2 = 'file2.csv'
appcode = []
with open(file1, "r") as f:
f.readline() # Skip the first line
for line in f:
appcode.append(str(line.strip("\n")))
print('This is what we are looking for from file1 ...' +line)
csv_file = csv.reader(open(file2, "rb"), delimiter=",") # was rb
#loop through csv list
for row in csv_file:
print('line = '+line +' '+'row is... '+row[1])
#if current rows 2nd value is equal to input, print that row
if str(line) is str(row[1]):
print row
else:
print 'thinks '+str(line)+'='+str(row[1])+' is false'
You can restructure your code a bit:
import csv
import sys
file1 = 'file1.csv'
file2 = 'file2.csv'
# create your files data:
with open(file1,"w") as f:
f.write("""appname\n1\neee\naaa\n""")
with open(file2,"w") as f:
f.write("""server,appname\nfrog,1\napple,aaa\n""")
# process files
with open(file1, "r") as f:
f.readline() # Skip the first line
f_set = set( (line.strip() for line in f) )
# no need to indent the rest as well - you are done with reading file1
print('This is what we are looking for from file1 ...', f_set)
with open(file2,"r") as f:
csv_file = csv.reader(f, delimiter=",")
#loop through csv list
for row in csv_file:
if row[1] in f_set: # check for a in b is fastest with sets
print(row)
else:
pass # do something else
By checking row[1] in f_set you avoid the wrong comparison using is - as a general rule: use is only if you really want to check if 2 things are identical object - not if they containt the same things.
Output (2.7): # remove the ( and ) at print to make it nicer
('This is what we are looking for from file1 ...', set(['1', 'eee', 'aaa']))
['frog', '1']
['apple', 'aaa']
Output (3.6):
This is what we are looking for from file1 ... {'1', 'aaa', 'eee'}
['frog', '1']
['apple', 'aaa']
Readup:
https://docs.python.org/2/library/sets.html

Elegant way to parse a C array from python

Is there any elegant way to parse a C array and to extract elements with defined indexes to an out file.
For example:
myfile.c
my_array[SPECIFIC_SIZE]={
0x10,0x12,0x13,0x14,0x15,0x23,0x01,0x02,0x04,0x07,0x08,
0x33,0x97,0x52,0x27,0x56,0x11,0x99,0x97,0x95,0x77,0x23,
0x45,0x97,0x90,0x97,0x68,0x23,0x28,0x05,0x66,0x99,0x38,
0x11,0x37,0x27,0x11,0x22,0x33,0x44,0x66,0x09,0x88,0x17,
0x90,0x97,0x17,0x90,0x97,0x22,0x77,0x97,0x87,0x25,0x22,
0x25,0x47,0x97,0x57,0x97,0x67,0x26,0x62,0x67,0x69,0x96
}
Python script:
I would like to do something like (just as pseudocode)
def parse_data():
outfile = open(newfile.txt,'w')
with open(myfile, 'r')
SEARCH FOR ELEMENT WITH INDEX 0 IN my_array
COPY ELEMENT TO OUTFILE AND LABEL WITH "Version Number"
SEARCH FOR ALL ELEMENTS WITH INDEX 1..10 IN my_array
COPY ELEMENTS TO OUTFILE WITH NEW LINE AND LABEL with "Date"
....
....
At the end I would like to have an newfile.txt like:
Version Number:
0x10
Date:
0x12,0x13,0x14,0x15,0x23,0x01,0x02,0x04,0x07,0x08
Can you show an example on that pseudocode?
If your .c file is always parsed like this, as in :
First line is the declaration of the array.
Middle lines are the data.
Last line is the closing bracket.
You can do...
def parse_myfile(fileName; outName):
with open(outName, 'w') as out:
with open(fileName, 'r') as f:
""" 1. Read all lines, except first and last.
2. Join all the lines together.
3. Replace all the '\n' by ''.
4. Split using ','.
"""
lines = (''.join(f.readlines()[1:-1])).replace('\n', '').split(',')
header = lines[0]
date = lines[1:11]
out.write('Version Number:\n{}\n\nDate:\n{}'.format(header, date))
if __name__ == '__main__':
fileName = 'myfile.c'
outFile = 'output.txt'
parse_myfile(fileName, outFile)
cat output.txt outputs...
Version Number:
0x10
Date:
['0x12', '0x13', '0x14', '0x15', '0x23', '0x01', '0x02', '0x04', '0x07', '0x08']

How can I properly read in this text data to a list?

I have some data in a .txt file structured as follows:
Soup Tomato
Beans Kidney
.
.
.
I read in the data with
combo=open("combo.txt","r")
lines=combo.readlines()
However, the data is then appears as
lines=['Soup\tTomato\r\n','Beans\tKidney\r\n',...]
I would like each entry to be its own element in the list, like
lines=['Soup','Tomato',...]
And even better would be to have two lists, one for each column.
Can someone suggest a way to achieve this or fix my problem?
You should split the lines:
lines = [a_line.strip().split() for a_line in combo.readlines()]
Or without using readlines:
lines = [a_line.strip().split() for a_line in combo]
You look like your opening a csv tab delimeted file.
use the python csv class.
lines = []
with open('combo.txt', 'rb') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
lines += row
print(lines)
now as a list.
or with a list of lists you can invert it ...
lines = []
with open('combo.txt', 'rb') as csvfile:
for row in csv.reader(csvfile, delimiter='\t'):
line.append(rows) # gives you a list of lists.
columns = map(list, zip(*lines))
columns[0] = ['Soup','Beans',...];
If you want to get all the items in a single list:
>>> with open('combo.txt','r') as f:
... all_soup = f.read().split()
...
>>> all_soup
['Soup', 'Tomato', 'Beans', 'Kidney']
If you want to get each column, then do this:
>>> with open('combo.txt','r') as f:
... all_cols = zip(*(line.strip().split() for line in f))
...
>>> all_cols
[('Soup', 'Beans'), ('Tomato', 'Kidney')]
Use the csv module to handle csv-like files (in this case, tab-separated values, not comma-separated values).
import csv
import itertools
with open('path/to/file.tsv') as tsvfile:
reader = csv.reader(tsvfile, delimiter="\t")
result = list(itertools.chain.from_iterable(reader))
csv.reader turns your file into a list of lists, essentially:
def reader(file, delimiter=",")
with open('path/to/file.tst') as tsvfile:
result = []
for line in tsvfile:
sublst = line.strip().split(delimiter)
result += sublst
return result

Python merger data from two text files and storing it in a seprate text file

I have 2 text files named text1.txt and text2.txt with the following data:-
text1.txt
1
2
3
4
5
text2.txt
sam
Gabriel
Henry
Bob
Bill
I want to write a python scripting reading both the text files and displaying/writing the result in a third text filed, lets call it result.txt in the following format:-
1#sam
2#Gabriel
3#Henry
4#Bob
5#Bill
So I want the data to be merged together separated by '#' in result.txt.
Any Help?Thanks
This works, and unlike other answers I am not reading all the lines into memory here:
from itertools import izip
with open('text1.txt') as f1:
with open('text2.txt') as f2:
with open('out.txt', 'w') as out:
for a, b in izip(f1, f2):
out.write('{0}#{1}'.format(a.rstrip(), b))
...
>>> !cat out.txt
1#sam
2#Gabriel
3#Henry
4#Bob
5#Bill
Here you go. Code comments in line:
data_one = []
data_two = []
# Open the input files for reading
# Open the output file for writing
with open('text1.txt') as in1_file, open('text2.txt') as in2_file, open('ouput') as o_file:
# Store the data from the first input file
for line in in1_file:
data_one.append(line.strip())
data_one = (a for a in data_one)
# Store the data from the second input file
for line in in2_file:
data_two.append(line.strip())
data_two = (a for a in data_two)
# Combine the data from both the sources
# and write it to the output file
for a, b in zip(data_one, data_two):
o_file.write('{0}#{1}'.format(a, b))
EDIT:
For python 2.7 and earlier, multiple with statement with multiple context managers are used as:
with open('text1.txt') as in1_file:
with open('text2.txt') as in2_file:
with open('ouput') as o_file:
This should be your answer:
with open('text1.txt', 'r') as t1, open('text2.txt', 'r') as t2:
with open('text3.txt', 'w') as t3:
for row in zip(t1.readlines(), t2.readlines()):
t3.writeline("%s#%s" % row)

Categories

Resources