I am setting up a script that will extract data from excel and return it in lists. Right now I am trying to be able to reorganized the data into smaller lists that have a common attribute. (Such as: A list that has the indices of the rows that contained, 'Pencil') I keep having the smaller list returning None.
I've checked and the lists that extract the data are working fine. But I cant seem to get the smaller lists working.
#Create a class for the multiple lists of Columns
class Data_Column(list):
def Fill_List (self,col): #fills the list
for i in range(sheet.nrows):
self.append(sheet.cell_value(i,col))
#Create a class for a specific list that has data of a common artifact
class Specific_List(list):
def Find_And_Fill (self, listy, word):
for i in range (sheet.nrows):
if listy[i] == word:
self.append(I)
#Initiate and Populate lists from excel spreadsheet
date = Data_Column()
date.Fill_List(0)
location = Data_Column()
location.Fill_List(1)
name = Data_Column()
name.Fill_List(2)
item = Data_Column()
item.Fill_List(3)
specPencil = Specific_List()
print(specPencil.Find_And_Fill(item,'Pencil'))
I expected a List that contained the indices where 'Pencil' was found such as [1,6,12,14,19].
The actual output was: None
I needed to take the print out of the very last line.
specPencil.Find_And_Fill(item,'Pencil')
print(specPencil)
I knew it was a simple fix
Related
I have created a list during a for loop which works well but I want create a dictionary instead.
from System.Collections.Generic import List
#Collector
viewPorts = list(FilteredElementCollector(doc).OfClass(Viewport))
#create a dictionary
viewPortDict = {}
#add Sheet Number, View Name and boxoutline to dictionary
for vp in viewPorts:
sheet = doc.GetElement(vp.SheetId)
view = doc.GetElement(vp.ViewId)
vbox = vp.GetBoxOutline()
viewPortDict = {view.ViewName : {'sheetNum': sheet.SheetNumber, 'viewBox' : vbox}}
print(viewPortDict)
The output from this is as follows:
{'STEEL NOTES': {'viewBox': <Autodesk.Revit.DB.Outline object at 0x000000000000065A [Autodesk.Revit.DB.Outline]>, 'sheetNum': 'A0.07'}}
Which the structure is perfect but I want it to grab everything as while it does the for loop it seems to stop on the first loop. Why is that? And how can I get it to keep the loop going?
I have tried various things like creating another list of keys called "Keys" and list of values called "viewPortList" like:
dict.fromkeys(Keys, viewPortList)
But I always have the same problem I am not able to iterate over all elements. For full disclosure I am successful when I create a list instead. Here is what that looks like.
from System.Collections.Generic import List
#Collector
viewPorts = list(FilteredElementCollector(doc).OfClass(Viewport))
#create a dictionary
viewPortList = []
#add Sheet Number, View Name and boxoutline to dictionary
for vp in viewPorts:
sheet = doc.GetElement(vp.SheetId)
view = doc.GetElement(vp.ViewId)
vbox = vp.GetBoxOutline()
viewPortList.append([sheet.SheetNumber, view.ViewName, vbox])
print(viewPortList)
Which works fine and prints the below (only portion of a long list)
[['A0.01', 'APPLICABLE CODES', <Autodesk.Revit.DB.Outline object at 0x000000000000060D [Autodesk.Revit.DB.Outline]>], ['A0.02', etc.]
But I want a dictionary instead. Any help would be appreciated. Thanks!
In your list example, you are appending to the list. In your dictionary example, you are creating a new dictionary each time (thus removing the data from previous iterations of the loop). You can do the equivalent of appending to it as well by just assigning to a particular key in the existing dictionary.
viewPortDict[view.ViewName] = {'sheetNum': sheet.SheetNumber, 'viewBox' : vbox}
I am trying to iterate through two series which are derived from ios_mod2_apps
list_of_genre = ios_mod2_apps['prime_genre'].unique()
list_of_app = ios_mod2_apps['track_name']
Then I am iterating through the two series and run the following code inside the two for loops
app_rating_percentage[app] = ios_mod2_apps['rating_count_total'][ios_mod2_apps['track_name']==app].sum() /(ios_mod2_apps['rating_count_tot'][ios_mod2_apps['prime_genre']==genre].sum())
Basically the code above calculates the sum of the series of 'rating_count_total' which has the track_name as the app valuable in that iteration.
When I ran this code I get the error
KeyError: 'rating_count_total'
I have tried to understand this error but could not. I would use some help if someone have a clue of what is wrong.
Full code
#initiating an empty dictionary to take the values that will be created on the for loops
app_rating_percentage = {}
#create a list of unique genres
#create a list of apps these two lists will be used to iterate
list_of_genre = ios_mod2_apps['prime_genre'].unique()
list_of_app = ios_mod2_apps['track_name']
#iterate using the two above lists, and calculate the rating_count_total(of that app) devide by the sum of rating_count_total in that genre
for genre in list_of_genre:
for app in list_of_app:
app_rating_percentage[app] = ios_mod2_apps['rating_count_total'][ios_mod2_apps['track_name']==app].sum() /(ios_mod2_apps['rating_count_tot'][ios_mod2_apps['prime_genre']==genre].sum())
app_rating_percentage
I am trying to figure out the most efficient way of finding similar values of a specific cell in a specified column(not all columns) in an excel .xlsx document. The code I have currently assumes all of the strings are unsorted. However the file I am using and the files I will be using all have strings sorted from A-Z. So instead of doing a linear search I wonder what other search algorithm I could use as well as being able to fix my coding eg(binary search etc).
So far I have created a function: find(). Before the function runs the program takes in a value from the user's input that then gets set as the sheet name. I print out all available sheet names in the excel doc just to help the user. I created an empty array results[] to store well....the results. I created a for loop that iterates through only column A because I only want to iterate through a custom column. I created a variable called start that is the first coordinate in column A eg(A1 or A400) this will change depending on the iteration the loop is on. I created a variable called next that will get compared with the start. Next is technically just start + 1, however since I cant add +1 to a string I concatenate and type cast everything so that the iteration becomes a range from A1-100 or however many cells are in column A. My function getVal() gets called with two parameters, the coordinate of the cell and the worksheet we are working from. The value that is returned from getVal() is also passed inside my function Similar() which is just a function that calls SequenceMatcher() from difflib. Similar just returns the percentage of how similar two strings are. Eg. similar(hello, helloo) returns int 90 or something like that. Once the similar function is called if the strings are above 40 percent similar appends the coordinates into the results[] array.
def setSheet(ws):
sheet = wb[ws]
return sheet
def getVal(coordinate, worksheet):
value = worksheet[coordinate].value
return value
def similar(first, second):
percent = SequenceMatcher(None, first, second).ratio() * 100
return percent
def find():
column = "A"
print("\n")
print("These are all available sheets: ", wb.sheetnames)
print("\n")
name = input("What sheet are we working out of> ")
results = []
ws = setSheet(name)
for i in range(1, ws.max_row):
temp = str(column + str(i))
x = ws[temp]
start = ws[x].coordinate
y = str(column + str(i + 1))
next = ws[y].coordinate
if(similar(getVal(start,ws), getVal(next,ws)) > 40):
results.append(getVal(start))
return results
This is some nasty looking code so I do apologize in advance. The expected results should just be a list of strings that are "similar".
I have a problem with creating a reportlab table containing elements from a list of tuples.
Having the input:
meta= [('#Instances (Test)', '250'), ('#Instances (Train)', '250')]
I intuitively thought of writing it that way:
for key, value in meta:
data = [['Solver', '%s'%(solver_name)],
['%s'%(key), '%s'%(value)],
['%s'%(key), '%s'%(value)]]
meta_data = Table(data, colWidths=None, rowHeights=None, style=None, splitByRow=1,
repeatRows=0, repeatCols=0)
But it only considers the last tuple making ('#Instances (Train)', '250') appear in both rows.
Any ideas on what I did wrong?
You are only getting the last key, value from your input because every time in the loop you are changing the whole data variable. What you have meant is propably this
data = []
for key, value in meta:
data.append([['Solver', solver_name],[key, value]])
meta_data = Table(data, colWidths=None, rowHeights=None, style=None, \
splitByRow=1,repeatRows=0, repeatCols=0)
In the above code, I initialize data variable as an empty list, then go through each tuple in meta, assigning that tuple[0] as key and tuple[1] as value. The only thing that is done with those variables is we append them to the list we initialized at the start.
I would like to build up a list using a for loop and am trying to use a slice notation. My desired output would be a list with the structure:
known_result[i] = (record.query_id, (align.title, align.title,align.title....))
However I am having trouble getting the slice operator to work:
knowns = "output.xml"
i=0
for record in NCBIXML.parse(open(knowns)):
known_results[i] = record.query_id
known_results[i][1] = (align.title for align in record.alignment)
i+=1
which results in:
list assignment index out of range.
I am iterating through a series of sequences using BioPython's NCBIXML module but the problem is adding to the list. Does anyone have an idea on how to build up the desired list either by changing the use of the slice or through another method?
thanks zach cp
(crossposted at [Biostar])1
You cannot assign a value to a list at an index that doesn't exist. The way to add an element (at the end of the list, which is the common use case) is to use the .append method of the list.
In your case, the lines
known_results[i] = record.query_id
known_results[i][1] = (align.title for align in record.alignment)
Should probably be changed to
element=(record.query_id, tuple(align.title for align in record.alignment))
known_results.append(element)
Warning: The code above is untested, so might contain bugs. But the idea behind it should work.
Use:
for record in NCBIXML.parse(open(knowns)):
known_results[i] = (record.query_id, None)
known_results[i][1] = (align.title for align in record.alignment)
i+=1
If i get you right you want to assign every record.query_id one or more matching align.title. So i guess your query_ids are unique and those unique ids are related to some titles. If so, i would suggest a dictionary instead of a list.
A dictionary consists of a key (e.g. record.quer_id) and value(s) (e.g. a list of align.title)
catalog = {}
for record in NCBIXML.parse(open(knowns)):
catalog[record.query_id] = [align.title for align in record.alignment]
To access this catalog you could either iterate through:
for query_id in catalog:
print catalog[query_id] # returns the title-list for the actual key
or you could access them directly if you know what your looking for.
query_id = XYZ_Whatever
print catalog[query_id]