A function to read CSV data from a file into memory - python

I am trying to create a function that reads a csv file into a memory in a list form. When I run my code, it gives me this error message ("string indices must be integers"). Were am I getting it wrong.
Below is the code. Thanks for your help
# create the empty set to carry the values of the columns
Hydropower_heading = []
Solar_heading = []
Wind_heading = []
Other_heading = []
def my_task1_file(filename): # defines the function "my_task1_file"
with open(filename,'r') as myNew_file: # opens and read the file
for my_file in myNew_file.readlines(): # loops through the file
# read the values into the empty set created
Hydropower_heading.append(my_file['Hydropower'])
Solar_heading.append(my_file['Solar'])
Wind_heading.append(my_file['Wind'])
Other_heading.append(my_file['Other'])
#Hydropower_heading = int(Hydropower)
#Solar_heading = int(Solar)
#Wind_heading = int(Wind)
#Other_heading = int(Other)
my_task1_file('task1.csv') # calls the csv file into the function
# print the Heading and the column values in a row form
print('Hydropower: ', Hydropower_heading)
print('Solar: ', Solar_heading)
print('Wind: ', Wind_heading)
print('Other: ', Other_heading)

We can read CSV files by the column using csv.DictReader method.
Code: (code.py)
import csv
def my_task1_file(filename): # defines the function "my_task1_file"
Hydropower_heading = []
Solar_heading = []
Wind_heading = []
Other_heading = []
with open(filename, newline='\n') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# read the values into the empty set created
Hydropower_heading.append(row['Hydropower'])
Solar_heading.append(row['Solar'])
Wind_heading.append(row['Wind'])
Other_heading.append(row['Other'])
return Hydropower_heading, Solar_heading, Wind_heading, Other_heading
if __name__ == "__main__":
Hydropower_heading, Solar_heading, Wind_heading, Other_heading = my_task1_file('task1.csv')
# print the Heading and the column values in a row form
print('Hydropower: ', Hydropower_heading)
print('Solar: ', Solar_heading)
print('Wind: ', Wind_heading)
print('Other: ', Other_heading)
task1.csv:
Hydropower,Solar,Wind,Other
5,6,3,8
6,8,5,12
3,6,9,7
Output:
Hydropower: ['5', '6', '3']
Solar: ['6', '8', '6']
Wind: ['3', '5', '9']
Other: ['8', '12', '7']
Explanation:
The __main__ condition will check if the file is running directly. If the file is being run directly by using python code.py, it will execute this portion. Otherwise if we import code.py from another python file, this portion will not be executed.
You can remove the __main__ block as necessary like below. But it is a good practice to separate the methods from executing while importing one python file from another using the __main__ block. Let me know if it clears your confusion.
code.py (without __main__):
import csv
def my_task1_file(filename): # defines the function "my_task1_file"
Hydropower_heading = []
Solar_heading = []
Wind_heading = []
Other_heading = []
with open(filename, newline='\n') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# read the values into the empty set created
Hydropower_heading.append(row['Hydropower'])
Solar_heading.append(row['Solar'])
Wind_heading.append(row['Wind'])
Other_heading.append(row['Other'])
return Hydropower_heading, Solar_heading, Wind_heading, Other_heading
Hydropower_heading, Solar_heading, Wind_heading, Other_heading = my_task1_file('task1.csv')
print('Hydropower: ', Hydropower_heading)
print('Solar: ', Solar_heading)
print('Wind: ', Wind_heading)
print('Other: ', Other_heading)
References:
csv.DictReader method
__main__ documentation from Python website

Since the error is "string indices must be integers", you must be using a data type that cannot take in a string value as an index. In this segment of your code...
for my_file in myNew_file.readlines():
Hydropower_heading.append(my_file['Hydropower'])
Solar_heading.append(my_file['Solar'])
Wind_heading.append(my_file['Wind'])
Other_heading.append(my_file['Other'])
... you are using "Hydropower", "Solar", "Wind", and "Other" as index values, which cannot be valid index values of my_file, which, I assume, is a string data type since you are reading the file myNew_file. If you change these values to integers as is appropriate, then the error should not appear anymore.

Related

Read file line into lists delimited by a specific string (sharp "#")

My file content is something like:
############################
Data1
133
124
FRE
new
Cable
Sat
############################
DataB
233
445
DEU
Old
Sat
###########################
MyValue
4566
455
ITA
NEW
###########################
MyValue5
455
22332
Eng
Sat
Cable
##############################
What I need is to put each of them into a list and the separator must be the "#":
The result here must be:
mylist1=["Data1","133","124","FRE","new","Cable","Sat"]
mylist2=["DataB","233","445","DEU","Old","Sat"]
etc...
The number of lists is variable since the data file length can be variable.
This is one approach
with open('your_file.txt', 'r') as f:
data = f.readlines()
master_list = []
lst = []
for i in data:
if '#' in i:
master_list.append(lst)
lst = []
else:
lst.append(i.replace('\n', ''))
Drop the first element
master_list[1:]
['Data1', '133', '124', 'FRE', 'new', 'Cable', 'Sat'], ['DataB', '233', '445', 'DEU', 'Old', 'Sat'], ['MyValue', '4566', '455', 'ITA', 'NEW'], ['MyValue5', '455', '22332', 'Eng', 'Sat', 'Cable']]```
This will convert the file to a list of lists instead of the named variables after reading the file into a string using an intermediate character, / here, but you could change that if it's in other places in your data.
data = [line.split('\n') for line in re.sub('\n?#+\n?', '/', text).split('/')]
If you prefer the names, you could do something similar for a dictionary, which will likely be better than individual variables.
data = {'mylist' + str(line[0]): line[1].split('\n') for line in enumerate(re.sub('\n?#+\n?', '/', text).split('/'))}
Both of these will have an extra list if there are separators at the top or bottom of the file like in the post, but you could chop those off if needed.
If you really need to assign to variables, you could use exec to set them, but I wouldn't recommend this.
the version of python >3.5
I do not quite understand what you mean, What I need is to put each of them into a list and the separator must be the #?
the example of your result doesn't match your requirements. Maybe you want:
filepath = "./data.txt" # the full pathname of your data file
with open(filepath, "r", encoding="utf-8") as f: # get data
data = f.readlines()
# handle data
for item in data[1:]:
if "#" in item:
print(myList)
myList = []
continue
myList.append(item.strip("\n"))
The result is shown below:
['Data1', '133', '124', 'FRE', 'new', 'Cable', 'Sat']
['DataB', '233', '445', 'DEU', 'Old', 'Sat']
['MyValue', '4566', '455', 'ITA', 'NEW']
['MyValue5', '455', '22332', 'Eng', 'Sat', 'Cable']
if the number of the separator (#) is all the same
it's easy to handle your problem by the faction of "split" like below:
filepath = "./data.txt" # the full pathname of your data file
with open(filepath, "r", encoding="utf-8") as f: # get data
# f.readline() # fiter the first line
data = f.read()
data = data.strip("#").strip("\n").split("###########################\n")
for item in data:
print(item.split("\n"))
I'm shame that I'm a newbie for the site so not have enough reputation to post an image.

List of 2 elements to csv

I have been facing an issue parsing an horrible txt file, I have manage to extract to a list the information I need:
['OS-EXT-SRV-ATTR:host', 'compute-0-4.domain.tld']
['OS-EXT-SRV-ATTR:hostname', 'commvault-vsa-vm']
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-4.domain.tld']
['OS-EXT-SRV-ATTR:instance_name', 'instance-00000008']
['OS-EXT-SRV-ATTR:root_device_name', '/dev/vda']
['hostId', '985035a85d3c98137796f5799341fb65df21e8893fd988ac91a03124']
['key_name', '-']
['name', 'Commvault_VSA_VM']
['OS-EXT-SRV-ATTR:host', 'compute-0-28.domain.tld']
['OS-EXT-SRV-ATTR:hostname', 'dummy-vm']
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-28.domain.tld']
['OS-EXT-SRV-ATTR:instance_name', 'instance-0000226e']
['OS-EXT-SRV-ATTR:root_device_name', '/dev/hda']
['hostId', '7bd08d963a7c598f274ce8af2fa4f7beb4a66b98689cc7cdc5a6ef22']
['key_name', '-']
['name', 'Dummy_VM']
['OS-EXT-SRV-ATTR:host', 'compute-0-20.domain.tld']
['OS-EXT-SRV-ATTR:hostname', 'mavtel-sif-vsifarvl11']
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-20.domain.tld']
['OS-EXT-SRV-ATTR:instance_name', 'instance-00001da6']
['OS-EXT-SRV-ATTR:root_device_name', '/dev/vda']
['hostId', 'dd82c20a014e05fcfb3d4bcf653c30fa539a8fd4e946760ee1cc6f07']
['key_name', 'mav_tel_key']
['name', 'MAVTEL-SIF-vsifarvl11']
I would like to have the element 0 as headers and 1 has rows, for example:
OS-EXT-SRV-ATTR:host, OS-EXT-SRV-ATTR:hostname,...., name
compute-0-4.domain.tld, commvault-vsa-vm,....., Commvault_VSA_VM
compute-0-28.domain.tld, dummy-vm,...., Dummy_VM
Here is my code so far:
import re
with open('metadata.txt', 'r') as infile:
lines = infile.readlines()
for line in lines:
if re.search('hostId|properties|OS-EXT-SRV-ATTR:host|OS-EXT-SRV-ATTR:hypervisor_hostname|name', line):
re.sub("[\t]+", " ", line)
find = line.strip()
format = ''.join(line.split()).replace('|', ',')
list = format.split(',')
new_list = list[1:-1]
I am very new at python, so sometimes I ran out of ideas on how to make things work.
Looking at your input file, I see that it contains what appears to be output from the openstack nova show command, mixed with other stuff. There are basically two types of lines: valid ones, and invalid ones (duh).
The valid ones have this structure:
'| key | value |'
and the invalid ones have anything else.
So we could define that every valid line
can be split at the | into exactly four parts, of which
the first and the last part must be empty, and the other parts must be filled.
Python can do this (it's called unpacking assignment):
a, b, c, d = [1, 2, 3, 4]
a, b, c, d = some_string.split('|')
which will succeed when the right-hand side has exactly four parts, otherwise it will fail with a ValueError. When we now make sure that a and d are empty, and b and c are not empty - we have a valid line.
Furthermore we can say, if b equals 'Property' and c equals 'Value', we have hit a header row and what follows must describe a "new record".
This function does exactly that:
def parse_metadata_file(path):
""" parses a data file generated by `nova show` into records """
with open(path, 'r', encoding='utf8') as file:
record = {}
for line in file:
try:
# unpack line into 4 fields: "| key | val |"
a, key, val, z = map(str.strip, line.split('|'))
if a != '' or z != '' or key == '' or val == '':
continue
except ValueError:
# skip invalid lines
continue
if key == 'Property' and val == 'Value' and record:
# output current record and start a new one
yield record
record = {}
else:
# write property to current record
record[key] = val
# output last record
if record:
yield record
It spits out a new dict for each record it finds and disregards all lines that do not pass the sanity check. Effectively this function generates a stream of dicts.
Now we can use the csv module to write this stream of dicts to a CSV file:
import csv
# list of fields we are interested in
fields = ['hostId', 'properties', 'OS-EXT-SRV-ATTR:host', 'OS-EXT-SRV-ATTR:hypervisor_hostname', 'name']
with open('output.csv', 'w', encoding='utf8', newline='') as outfile:
writer = csv.DictWriter(outfile, fieldnames=fields, extrasaction='ignore')
writer.writeheader()
writer.writerows(parse_metadata_file('metadata.txt'))
The CSV module has a DictWriter which is designed to accept dicts as input and write them—according to the given key names—to a CSV row.
With extrasaction='ignore' it does not matter if the current record has more fields than required
With fields list it becomes extremely easy to extract a different set of fields.
Configure the writer to suit your needs (docs).
This:
writer.writerows(parse_metadata_file('metadata.txt'))
is a convenient shorthand for
for record in parse_metadata_file('metadata.txt'):
writer.writerow(record)
You can take a step by step approach to build a 2D array by keeping track of your headers and each entry in the text file.
headers = list(set([entry[0] for entry in data])) # obtain unique headers
num_rows = 1
for entry in data: # figuring out how many rows we are going to need
if 'name' in entry: # name is unique per row so using that
num_rows += 1
num_cols = len(headers)
mat = [[0 for _ in range(num_cols)] for _ in range(num_rows)]
mat[0] = headers # add headers as first row
header_lookup = {header: i for i, header in enumerate(headers)}
row = 1
for entry in data:
header, val = entry[0], entry[1]
col = header_lookup[header]
mat[row][col] = val # add entries to each subsequent row
if header == 'name':
row += 1
print mat
output:
[['hostId', 'OS-EXT-SRV-ATTR:host', 'name', 'OS-EXT-SRV-ATTR:hostname', 'OS-EXT-SRV-ATTR:instance_name', 'OS-EXT-SRV-ATTR:root_device_name', 'OS-EXT-SRV-ATTR:hypervisor_hostname', 'key_name'], ['985035a85d3c98137796f5799341fb65df21e8893fd988ac91a03124', 'compute-0-4.domain.tld', 'Commvault_VSA_VM', 'commvault-vsa-vm', 'instance-00000008', '/dev/vda', 'compute-0-4.domain.tld', '-'], ['7bd08d963a7c598f274ce8af2fa4f7beb4a66b98689cc7cdc5a6ef22', 'compute-0-28.domain.tld', 'Dummy_VM', 'dummy-vm', 'instance-0000226e', '/dev/hda', 'compute-0-28.domain.tld', '-'], ['dd82c20a014e05fcfb3d4bcf653c30fa539a8fd4e946760ee1cc6f07', 'compute-0-20.domain.tld', 'MAVTEL-SIF-vsifarvl11', 'mavtel-sif-vsifarvl11', 'instance-00001da6', '/dev/vda', 'compute-0-20.domain.tld', 'mav_tel_key']]
if you need to write the new 2D array to a file so its not as "horrible" :)
with open('output.txt', 'w') as f:
for lines in mat:
lines_out = '\t'.join(lines)
f.write(lines_out)
f.write('\n')
Looks like a job for pandas:
import pandas as pd
list_to_export = [['OS-EXT-SRV-ATTR:host', 'compute-0-4.domain.tld'],
['OS-EXT-SRV-ATTR:hostname', 'commvault-vsa-vm'],
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-4.domain.tld'],
['OS-EXT-SRV-ATTR:instance_name', 'instance-00000008'],
['OS-EXT-SRV-ATTR:root_device_name', '/dev/vda'],
['hostId', '985035a85d3c98137796f5799341fb65df21e8893fd988ac91a03124'],
['key_name', '-'],
['name', 'Commvault_VSA_VM'],
['OS-EXT-SRV-ATTR:host', 'compute-0-28.domain.tld'],
['OS-EXT-SRV-ATTR:hostname', 'dummy-vm'],
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-28.domain.tld'],
['OS-EXT-SRV-ATTR:instance_name', 'instance-0000226e'],
['OS-EXT-SRV-ATTR:root_device_name', '/dev/hda'],
['hostId', '7bd08d963a7c598f274ce8af2fa4f7beb4a66b98689cc7cdc5a6ef22'],
['key_name', '-'],
['name', 'Dummy_VM'],
['OS-EXT-SRV-ATTR:host', 'compute-0-20.domain.tld'],
['OS-EXT-SRV-ATTR:hostname', 'mavtel-sif-vsifarvl11'],
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-20.domain.tld'],
['OS-EXT-SRV-ATTR:instance_name', 'instance-00001da6'],
['OS-EXT-SRV-ATTR:root_device_name', '/dev/vda'],
['hostId', 'dd82c20a014e05fcfb3d4bcf653c30fa539a8fd4e946760ee1cc6f07'],
['key_name', 'mav_tel_key'],
['name', 'MAVTEL-SIF-vsifarvl11']]
data_dict = {}
for i in list_to_export:
if i[0] not in data_dict:
data_dict[i[0]] = [i[1]]
else:
data_dict[i[0]].append(i[1])
pd.DataFrame.from_dict(data_dict, orient = 'index').T.to_csv('filename.csv')

Python splitting data record

I have a record as below:
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355
0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103
0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
I want to split the data into key-value pairs neglecting the first top row i.e 29 16. It should be neglected.
The output should be something like this:
x = A , B
y = 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
I am able to neglect the first line using the below code:
f = open(fileName, 'r')
lines = f.readlines()[1:]
Now how do I separate rest record in Python?
So here's my take :D I expect you'd want to have the numbers parsed as well?
def generate_kv(fileName):
with open(fileName, 'r') as file:
# ignore first line
file.readline()
for line in file:
if '' == line.strip():
# empty line
continue
values = line.split(' ')
try:
yield values[0], [float(x) for x in values[1:]]
except ValueError:
print(f'one of the elements was not a float: {line}')
if __name__ == '__main__':
x = []
y = []
for key, value in generate_kv('sample.txt'):
x.append(key)
y.append(value)
print(x)
print(y)
assumes that the values in sample.txt look like this:
% cat sample.txt
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
and the output:
% python sample.py
['A', 'B']
[[1.2595034, 0.82587254, 0.7375044, 1.1270138, -0.35065323, 0.55985355, 0.7200067, -0.889543, 0.2300735, 0.56767654, 0.2789483, 0.32296127, -0.6423197, 0.26456305, -0.07363393, -1.0788593], [1.2467299, 0.78651106, 0.4702038, 1.204216, -0.5282698, 0.13987103, 0.5911153, -0.6729466, 0.377103, 0.34090135, 0.3052503, 0.028784657, -0.39129165, 0.079238065, -0.29310825, -0.99383247]]
Alternatively, if you'd wanted to have a dictionary, do:
if __name__ == '__main__':
print(dict(generate_kv('sample.txt')))
That will convert the list into a dictionary and output:
{'A': [1.2595034, 0.82587254, 0.7375044, 1.1270138, -0.35065323, 0.55985355, 0.7200067, -0.889543, 0.2300735, 0.56767654, 0.2789483, 0.32296127, -0.6423197, 0.26456305, -0.07363393, -1.0788593], 'B': [1.2467299, 0.78651106, 0.4702038, 1.204216, -0.5282698, 0.13987103, 0.5911153, -0.6729466, 0.377103, 0.34090135, 0.3052503, 0.028784657, -0.39129165, 0.079238065, -0.29310825, -0.99383247]}
you can use this script if your file is a text
filename='file.text'
with open(filename) as f:
data = f.readlines()
x=[data[0][0],data[1][0]]
y=[data[0][1:],data[1][1:]]
If you're happy to store the data in a dictionary here is what you can do:
records = dict()
with open(filename, 'r') as f:
f.readline() # skip the first line
for line in file:
key, value = line.split(maxsplit=1)
records[key] = value.split()
The structure of records would be:
{
'A': ['1.2595034', '0.82587254', '0.7375044', ... ]
'B': ['1.2467299', '0.78651106', '0.4702038', ... ]
}
What's happening
with ... as f we're opening the file within a context manager (more info here). This allows us to automatically close the file when the block finishes.
Because the open file keeps track of where it is in the file we can use f.readline() to move the pointer down a line. (docs)
line.split() allows you to turn a string into a list of strings. With the maxsplits=1 arg it means that it will only split on the first space.
e.g. x, y = 'foo bar baz'.split(maxsplit=1), x = 'foo' and y = 'bar baz'
If I understood correctly, you want the numbers to be collected in a list. One way of doing this is:
import string
text = '''
29 16
A 1.2595034 0.82587254 0.7375044 1.1270138 -0.35065323 0.55985355 0.7200067 -0.889543 0.2300735 0.56767654 0.2789483 0.32296127 -0.6423197 0.26456305 -0.07363393 -1.0788593
B 1.2467299 0.78651106 0.4702038 1.204216 -0.5282698 0.13987103 0.5911153 -0.6729466 0.377103 0.34090135 0.3052503 0.028784657 -0.39129165 0.079238065 -0.29310825 -0.99383247
'''
lines = text.split('\n')
x = [
line[1:].strip().split()
for i, line in enumerate(lines)
if line and line[0].lower() in string.ascii_letters]
This will produce a list of lists when the outer list contains A, B, etc. and the inner lists contain the numbers associated to A, B, etc.
This code assumes that you are interested in lines starting with any single letter (case-insensitive).
For more elaborated conditions you may want to look into regular expressions.
Obviously, if your text is in a file, you could substitute lines = ... with:
with open(filepath, 'r') as lines:
x = ...
Also, if the items in x should not be separated, but rather in a string, you may want to change line[1:].strip().split() with line[1:].strip().
Instead, if you want the numbers as float and not string, you should replace line[1:].strip().split() with [float(value) for value in line[1:].strip().split()].
EDIT:
Alternatively to line[1:].strip().split() you may want to do:
line.split(maxsplit=1)[1].split()
as suggested in some other answer. This would generalize better if the first token is not a single character.

How to find a string that is repeated in a csv file and formatting the result in python?

I have a file in .csv with contains ht:
1,winter
2,autumn
3,winter
4,spring
5,summer
6,autumn
7,autumn
8,summer
9,spring
10,spring
I need to parse this file to generate one containing :
winter = 1,3
autumn = 2,6,7
spring = 4,9,10
summer = 5,8
I find this post How to print count of occourance of some string in the same CSV file using Python? but I could not adapt to what I want.
I appreciate any help or guidance to address this concern.
Thanks.
create an empty dict and open csv and read each row.
while reading each row check if row[1] in dict.
Else add it to the dict, create a list with the value row[0].
If that is already present append the value to the dict.
Something like this.
import csv
try :
fr = open("mycsv.csv")
except:
print "Couldn't open the file"
reader = csv.reader(fr)
base={}
for row in reader :
if len(row) > 0:
if row[1] in base: #Check if that season is in dict.
base[row[1]].append(row[0]) # If so add the number to the list already there.
else:
base[row[1]]=[row[0]] # Else create new list and add the number
print base
It gives output something like this.
{'autumn': ['2', '6', '7'], 'spring': ['4', '9', '10'], 'winter': ['1', '3'], 'summer': ['5', '8']}

Download CSV directly into Python CSV parser

I'm trying to download CSV content from morningstar and then parse its contents. If I inject the HTTP content directly into Python's CSV parser, the result is not formatted correctly. Yet, if I save the HTTP content to a file (/tmp/tmp.csv), and then import the file in the python's csv parser the result is correct. In other words, why does:
def finDownload(code,report):
h = httplib2.Http('.cache')
url = 'http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=' + code + '&region=AUS&culture=en_us&reportType='+ report + '&period=12&dataType=A&order=asc&columnYear=5&rounding=1&view=raw&productCode=usa&denominatorView=raw&number=1'
headers, data = h.request(url)
return data
balancesheet = csv.reader(finDownload('FGE','is'))
for row in balancesheet:
print row
return:
['F']
['o']
['r']
['g']
['e']
[' ']
['G']
['r']
['o']
['u']
(etc...)
instead of:
[Forge Group Limited (FGE) Income Statement']
?
The problem results from the fact that iteration over a file is done line-by-line whereas iteration over a string is done character-by-character.
You want StringIO/cStringIO (Python 2) or io.StringIO (Python 3, thanks to John Machin for pointing me to it) so a string can be treated as a file-like object:
Python 2:
mystring = 'a,"b\nb",c\n1,2,3'
import cStringIO
csvio = cStringIO.StringIO(mystring)
mycsv = csv.reader(csvio)
Python 3:
mystring = 'a,"b\nb",c\n1,2,3'
import io
csvio = io.StringIO(mystring, newline="")
mycsv = csv.reader(csvio)
Both will correctly preserve newlines inside quoted fields:
>>> for row in mycsv: print(row)
...
['a', 'b\nb', 'c']
['1', '2', '3']

Categories

Resources