I have the following problem:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
I need to find which elements of list2 are in list1. In actual fact the elements of list1 correspond to a numerical value which I need to obtain then change. The problem is that 'xyz2' contains 'xyz' and therefore matches also with a regular expression.
My code so far (where 'data' is a python dictionary and 'specie_name_and_initial_values' is a list of lists where each sublist contains two elements, the first being specie name and the second being a numerical value that goes with it):
all_keys = list(data.keys())
for i in range(len(all_keys)):
if all_keys[i]!='Time':
#print all_keys[i]
pattern = re.compile(all_keys[i])
for j in range(len(specie_name_and_initial_values)):
print re.findall(pattern,specie_name_and_initial_values[j][0])
Variations of the regular expression I have tried include:
pattern = re.compile('^'+all_keys[i]+'$')
pattern = re.compile('^'+all_keys[i])
pattern = re.compile(all_keys[i]+'$')
And I've also tried using 'in' as a qualifier (i.e. within a for loop)
Any help would be greatly appreciated. Thanks
Ciaran
----------EDIT------------
To clarify. My current code is below. its used within a class/method like structure.
def calculate_relative_data_based_on_initial_values(self,copasi_file,xlsx_data_file,data_type='fold_change',time='seconds'):
copasi_tool = MineParamEstTools()
data=pandas.io.excel.read_excel(xlsx_data_file,header=0)
#uses custom class and method to get the list of lists from a file
specie_name_and_initial_values = copasi_tool.get_copasi_initial_values(copasi_file)
if time=='minutes':
data['Time']=data['Time']*60
elif time=='hour':
data['Time']=data['Time']*3600
elif time=='seconds':
print 'Time is already in seconds.'
else:
print 'Not a valid time unit'
all_keys = list(data.keys())
species=[]
for i in range(len(specie_name_and_initial_values)):
species.append(specie_name_and_initial_values[i][0])
for i in range(len(all_keys)):
for j in range(len(specie_name_and_initial_values)):
if all_keys[i] in species[j]:
print all_keys[i]
The table returned from pandas is accessed like a dictionary. I need to go to my data table, extract the headers (i.e. the all_keys bit), then look up the name of the header in the specie_name_and_initial_values variable and obtain the corresponding value (the second element within the specie_name_and_initial_value variable). After this, I multiply all values of my data table by the value obtained for each of the matched elements.
I'm most likely over complicating this. Do you have a better solution?
thanks
----------edit 2 ---------------
Okay, below are my variables
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
species = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
You don't need a regex to find common elements, set.intersection will find all elements in list2 that are also in list1:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
print(set(list2).intersection(list1))
set(['xyz'])
Also if you wanted to compare 'xyz' to 'xyz2' you would use == not in and then it would correctly return False.
You can also rewrite your own code a lot more succinctly, :
for key in data:
if key != 'Time':
pattern = re.compile(val)
for name, _ in specie_name_and_initial_values:
print re.findall(pattern, name)
Based on your edit you have somehow managed to turn lists into strings, one option is to strip the []:
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
specie_name_and_initial_values = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
specie_name_and_initial_values = set(s.strip("[]") for s in specie_name_and_initial_values)
print(all_keys.intersection(specie_name_and_initial_values))
Which outputs:
set([u'Cyp26_G_R1', u'Cyp26_G_rep1'])
FYI, if you had lists inside the set you would have gotten an error as lists are mutable so are not hashable.
i have a query which returns me a Viewobject with all the entries i want to process. I know i can iterate over this view Object so that i can use the single entries for my purposes.
Now i want to extract only the first and the last row. The first row is no problem because i can just iterate and break the loop after the first item.
Now my question is, how to get the last element from the View.
I tried by:
for row in result_rows:
rowvalue = row[3].value
diagdata = rowvalue[models.DIAGDATA]
if models.ODOMETER in diagdata:
start_mileage = diagdata[models.ODOMETER]
start_mileage_found = True
break
row = result_rows[len(result_rows)]
rowvalue = row[3].value
diagdata = rowvalue[models.DIAGDATA]
if models.ODOMETER in diagdata:
end_mileage = diagdata[models.ODOMETER]
end_mileage_found = True
The second value i obviously wont get, because view has neither a length nor can i access the rows by a index. Has anyone an idea how to get the last element?
You might run another request but with descending=True option, so that the server will stream results in reverse order.
Or you can convert iterator to array which basically the same a iterate through all values. I'm not a python expert, but it seems like list(result_rows) will do it for you. And when you are doing len(...) it probably doing it for you implicitly. There is rows_returned method to get the number of rows without turning it to list.
I need to search a feature class for multiple text entries and display all the values and their object ids that aren't on the list, i.e. find all the mistakes. (Basically, want to search for text entries such as AVE, TRL, ST, and display entries that aren't formatted like that). I want to write it in python.
Can I use the searchCursor to do this, or is it something more complicated.
Any help would be appreciated, Thanks! I think this is the solution, but it is still printing AVE. Any idea as to why?
import arcpy
fc = "Z:\Street_Centerlines"
field = "StSuffix"
field1 = "OBJECTID"
cursor = arcpy.SearchCursor(fc)
for row in cursor:
if field == "AVE":
pass
else:
print(row.getValue(field1)); print(row.getValue(field))
The field variable is equal to "StSuffix", so field == "AVE" is always false. I think you want this:
valid_values = 'AVE', 'TRL', 'ST'
for row in cursor:
value = row.getValue(field)
if value in valid_values:
continue
print("Invalid value: OBJECTID={}, StSuffix={}".format(
row.getValue(field1),
value
))
I am working on python plugins.I used PyQt4 Designer.
I want to list query result into QTreeWidget.
My code is as follows:
c = self.db.con.cursor()
self.db._exec_sql(c, "select est from bio")
for row in c.fetchall():
item_value=unicode(row[0])
top_node1 = QTreeWidgetItem(item_value)
self.treeWidget.insertTopLevelItem(0, top_node1)
The query returns the values as:
But when i list these values into QTreeWidget using above code,it is shown as below :
Only first character is shown.If i change '0' to some other number in self.treeWidget.insertTopLevelItem(0, top_node1) ,nothing appears in QTreeWidget.
How do i do it????
thanx.
If you take a look at the documentation for a QTreeWidgetItem, you will see there are a number of possible constructors for creating an instance. Though none of which it seems you are using in a way that is going to give you desirable results. The closest match to the signature you are providing is:
QTreeWidgetItem ( const QStringList & strings, int type = Type )
What this is probably doing is taking your string (I am assuming row[0] is a string because I don't know which drivers you are using) and applying it as a sequence, which would fullfill the requiremets of QStringList. Thus what you are getting is populating multiple columns of your item with each letter of your string value. If this is what you wanted, then you would n eed to tell your widget to show more columns: self.treeWidget.setColumnCount(10). But this isn't what you are looking for I am sure.
More likely what you should be trying is to create a new item, then add the value to the desired column:
item = QTreeWidgetItem()
item.setText(0, unicode(row[0]))
self.treeWidget.insertTopLevelItem(0, item)
You can use the default constructor with no arguments, set the text value of the first column to your database record field value, and then add that item to the tree. You could also build up a list of the items and add them at once:
items = []
for row in c.fetchall():
item = QTreeWidgetItem()
item.setText(0, unicode(row[0]))
items.append(item)
self.treeWidget.insertTopLevelItems(0, items)
Your first aproach could be corrected just add a list to the widgetitem not a string like this:
top_node1 = QTreeWidgetItem([item_value])
I would like to build up a list using a for loop and am trying to use a slice notation. My desired output would be a list with the structure:
known_result[i] = (record.query_id, (align.title, align.title,align.title....))
However I am having trouble getting the slice operator to work:
knowns = "output.xml"
i=0
for record in NCBIXML.parse(open(knowns)):
known_results[i] = record.query_id
known_results[i][1] = (align.title for align in record.alignment)
i+=1
which results in:
list assignment index out of range.
I am iterating through a series of sequences using BioPython's NCBIXML module but the problem is adding to the list. Does anyone have an idea on how to build up the desired list either by changing the use of the slice or through another method?
thanks zach cp
(crossposted at [Biostar])1
You cannot assign a value to a list at an index that doesn't exist. The way to add an element (at the end of the list, which is the common use case) is to use the .append method of the list.
In your case, the lines
known_results[i] = record.query_id
known_results[i][1] = (align.title for align in record.alignment)
Should probably be changed to
element=(record.query_id, tuple(align.title for align in record.alignment))
known_results.append(element)
Warning: The code above is untested, so might contain bugs. But the idea behind it should work.
Use:
for record in NCBIXML.parse(open(knowns)):
known_results[i] = (record.query_id, None)
known_results[i][1] = (align.title for align in record.alignment)
i+=1
If i get you right you want to assign every record.query_id one or more matching align.title. So i guess your query_ids are unique and those unique ids are related to some titles. If so, i would suggest a dictionary instead of a list.
A dictionary consists of a key (e.g. record.quer_id) and value(s) (e.g. a list of align.title)
catalog = {}
for record in NCBIXML.parse(open(knowns)):
catalog[record.query_id] = [align.title for align in record.alignment]
To access this catalog you could either iterate through:
for query_id in catalog:
print catalog[query_id] # returns the title-list for the actual key
or you could access them directly if you know what your looking for.
query_id = XYZ_Whatever
print catalog[query_id]