Need help understanding why this is only returning the results for the first row and not the remaining rows inside a csv file. Thank you
with open('customerData.csv') as csvfile:
reader = csv.DictReader(csvfile)
for rows in reader:
data.append(rows)
print(data[0])
print(data[1]["Name"])
print(data[2]["Zip"])
print(data[3]["Gender"])
print(data[3]["Favorite Radio Station"])
When referencing items in a list, you can use the [] notation to indicate which item in the list you want. The first item in the list is some_list[0], the second is some_list[1], and so on. You can also go from the end of the list, where some_list[-1] is the last item, and some_list[-2] is the second to last, and so on.
However you want to print every row, you would need to iterate through the list like so:
for item in data:
print(item)
From here you can reference the keys of the item in the list directly, in a similar manner to how you did originally in your code:
for item in data:
print(item["Name"])
print(item["Zip"])
print(item["Gender"])
Hope this helps!
Related
For the below csv file, i'm trying to use the first item in 3rd to 6th rows (aka row 2 to 5) each as input for a function, so I need to be able to call them. then I need to do the same for first items in the 8th and 9th rows. how would I do that?
NETFLIX
PROGRAMS
Halt and Catch Fire,Drama,Christopher Cantwell and Christopher C. Rogers,2014
Arrival,Drama,Denis Villeneuve,2016
Us,Horror,Jordan Peele,2018
Matrix; The,Sci-Fi,The Wachowskis,1999
SUBSCRIBERS
John,john123,password
Farah,f206,abcdef
So far, I put each line into a list of strings and when I call line[0] to give me the first item in the list, I get this:
PROGRAMS
Halt and Catch Fire
Arrival
Us
Matrix; The
SUBSCRIBERS
John
Farah
which is kind of what I want, but I don't want all the rows at the same time. I want to be able to call each 3rd to 6th rows together, then 8th and 9th together to get this:
Halt and Catch Fire
Arrival
Us
Matrix; The
and
John
Farah
the point of doing this is so I could use the elements of these rows as parameters for functions I created.
Rows with only one column separate the lists. You could store the lists by the last single-row columns seen. Note defaultdict will create its default object (in this case an empty list) if the key in the dictionary doesn't exist. Prevents having to check if the key is already present in the dictionary before appending.
import csv
from collections import defaultdict
with open('input.csv','r',newline='') as f:
r = csv.reader(f)
data = defaultdict(list)
next(f) # skip first row
for row in r:
if len(row) == 1: # single column, remember the list name
key = row[0]
else:
data[key].append(row[0]) # append to last list remembered.
print(data)
print(data['PROGRAMS'])
print(data['SUBSCRIBERS'])
Output:
defaultdict(<class 'list'>, {'PROGRAMS': ['Halt and Catch Fire', 'Arrival', 'Us', 'Matrix; The'], 'SUBSCRIBERS': ['John', 'Farah']})
['Halt and Catch Fire', 'Arrival', 'Us', 'Matrix; The']
['John', 'Farah']
Then you can iterate on one of the lists to call some function:
for program in data['PROGRAMS']:
some_function(program)
I have a list like below.
list = [[Name,ID,Age,mark,subject],[karan,2344,23,87,Bio],[karan,2344,23,87,Mat],[karan,2344,23,87,Eng]]
I need to get only the name 'Karan' as output.
How can I get that?
This is a 2D list,
list[i][j]
will give you the 'i'th list within your list and the 'j'th item within that list.
So to get Karen you want list[1][0]
I upvoted Lio Elbammalf, but decided to provide an answer that made a couple of assumptions that should have been clarified in the question:
The First item of the list is the headers, they are actually in the list (and not there as part of the question), and they are provided as part of the list because there is no guarantee that the headers will always be in the same order.
This is probably a CSV file
Ignoring 2 for the moment, what you would want to do is remove the "headers" from the list (so that the rest of the list is uniform), and then find the index of "Name" (your desired output).
myinput = [["Name","ID","Age","mark","subject"],
["karan",2344,23,87,"Bio"],
["karan",2344,23,87,"Mat"],
["karan",2344,23,87,"Eng"]]
## Remove the headers from the list to simplify everything
headers = myinput.pop(0)
## Figure out where to find the person's Name
nameindex = headers.index("Name")
## Return a list of the Name in each row
return [stats[nameindex] for stats in myinput]
If the name is guaranteed to be the same in each row, then you can just return myinput[0][nameindex] like is suggested in the other answer
Now, if 2 is true, I'm assuming you're using the csv module, in which case load the file using the DictReader class and then just access each row using the 'Name' key:
def loadfile(myfile):
with open(myfile) as f:
reader = csv.DictReader(f)
return list(reader)
def getname(rows):
## This is the same return as above, and again you can just
## return rows[0]['Name'] if you know you only need the first one
return [row['Name'] for row in rows]
In Python 3 you can do this
_, [x, _, _, _, _], *_ = ls
Now x will be karan.
I'm interested in finding the FASTEST way to iterate through a list of lists and replace a character in the innermost list. I am generating the list of lists from a CSV file in Python.
Bing Ads API sends me a giant report but any percentage is represented as "20.00%" as opposed to "20.00". This means I can't insert each row as is to my database because "20.00%" doesn't convert to a numeric on SQL Server.
My solution thus far has been to use a list comprehension inside a list comprehension. I wrote a small script to test how fast this runs compared to just getting the list and it's doing ok (about 2x the runtime) but I am curious to know if there is a faster way.
Note: Every record in the report has a rate and therefore a percent. So every
record has to be visited once, and every rate has to be visited once (is that the cause of the 2x slowdown?)
Anyway I would love a faster solution as the size of these reports continue to grow!
import time
import csv
def getRecords1():
with open('report.csv', 'rU',encoding='utf-8-sig') as records:
reader = csv.reader(records)
while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
next(reader)
recordList = list(reader)
return recordList
def getRecords2():
with open('report.csv', 'rU',encoding='utf-8-sig') as records:
reader = csv.reader(records)
while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
next(reader)
recordList = list(reader)
data = [[field.replace('%', '') for field in record] for record in recordList]
return recordList
def getRecords3():
data = []
with open('c:\\Users\\sflynn\\Documents\\Google API Project\\Bing\\uploadBing\\reports\\report.csv', 'rU',encoding='utf-8-sig') as records:
reader = csv.reader(records)
while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
next(reader)
for row in reader:
row[10] = row[10].replace('%','')
data+=[row]
return data
def main():
t0=time.time()
for i in range(2000):
getRecords1()
t1=time.time()
print("Get records normally takes " +str(t1-t0))
t0=time.time()
for i in range(2000):
getRecords2()
t1=time.time()
print("Using nested list comprehension takes " +str(t1-t0))
t0=time.time()
for i in range(2000):
getRecords3()
t1=time.time()
print("Modifying row as it's read takes " +str(t1-t0))
main()
Edit: I have added a third function getRecords3() which is the fastest implementation I have seen yet. The output of running the program is as follows:
Get records normally takes 30.61197066307068
Using nested list comprehension takes 60.81756520271301
Modifying row as it's read takes 43.761850357055664
This means we have taken it down from a 2x slower algorithm to approximately 1.5x slower. Thank you everyone!
You could potentially check if the in-place inner-list modification is faster than creating a new list of list using list comprehension.
So, something like
for field in record:
for index in range(len(field)):
range[index] = range[index].replace('%', '')
We can't really modify the string in-place since strings are immutable.
I'm trying to loop over items in a csv and for anything in the dataset that is blank, set it to "None" or "Null". I am using the modules csv and psycopg2, both already imported.
The overall goal here is to read any items that are blank in the csv and set them to Null. I'm using
item = "None" just to check if the items are found. From there I think I can set it to None.
Sample Data:
name, age, breed_name, species_name, shelter_name, adopted
Titchy, 12, mixed, cat, BCSPCA, 1
Ginger, 1, labradoodle, dog,,1
Sample Code:
import psycopg2
import csv
for new_pet in dictReader:
for item in new_pet:
item = item.capitalize()
if item is '':
print item # Used to check/debugging
item = "None"
I can't figure out where I am going wrong here. Any advice is greatly appreciated.
When you update the item inside the for loop it will have no effect on the list it came from. You are not modifying the list but the local "copy" of one item.
You can replace the whole inner for loop with a list comprehension:
import csv
for new_pet in dictReader:
new_pet = [value.capitalize() if value else None for value in new_pet]
print new_pet
This will take all items in new_pet, and run value.capitalize() if value else None on them.
That means: If value evaluates to False (empty strings do), return the value capitalized, if not, return None.
Remember to do your data processing per line inside the outer for loop.
I'm writing a Python script that reads a CSV file and creates a list of deques. If I print out exactly what gets appended to the list before it gets added, it looks like what I want, but when I print out the list itself I can see that append is overwriting all of the elements in the list with the newest one.
# Window is a list containing many instances
def slideWindow(window, nextInstance, num_attributes):
attribute = nextInstance.pop(0)
window.popleft()
for i in range(num_attributes):
window.pop()
window.extendleft(reversed(nextInstance))
window.appendleft(attribute)
return window
def convertDataFormat(filename, window_size):
with open(filename, 'rU') as f:
reader = csv.reader(f)
window = deque()
alldata = deque()
i = 0
for row in reader:
if i < (window_size-1):
window.extendleft(reversed(row[1:]))
i+=1
else:
window.extendleft(reversed(row))
break
alldata.append(window)
for row in reader:
window = slideWindow(window, row, NUM_ATTRIBUTES)
alldata.append(window)
# print alldata
f.close()
return alldata
This is really difficult to track what you exactly want from this code. I suspect the problem lies in the following:
alldata.append(window)
for row in reader:
window = slideWindow(window, row, NUM_ATTRIBUTES)
alldata.append(window)
Notice that in your slideWindow function, you modify the input deque (window), and then return the modified deque. So, you're putting a deque into the first element of your list, then you modify that object (inside slideWindow) and append another reference to the same object onto your list.
Is that what you intend to do?
The simple fix is to copy the window input in slideWindow and modify/return the copy.
I don't know for sure, but I'm suspicious it might be similar to this problem http://forums.devshed.com/python-programming-11/appending-object-to-list-overwrites-previous-842713.html.