Formatted string in list - python

What is its solution or another way?
I wrote a strings list as shown before this I wanted to format it and include options then I wanted to print (by taking input from user) its index but on the terminal error was thrown that 0 or 1 index is not present in list.
options = ["a) 31\n b) 32\n c) 33\n d)34", "a) Energy\n b) Motion\n c) Motion and Energy\n d)Nothing"]
questions = [f"Our brain is consists of ..... bones:\n{options.index(1)}",
f"Physics is the study of .......\n{options.index(2)}"]
q_1 = input(questions[0])

options.index(1) searches 1 in the list and returns its index. This is not what you seem to want.
You seem to want to get the first element of options, which is done with options[0] instead.
It is 0 instead of 1, because Python lists are indexed starting from 0, not 1.
You are doing this correctly when you index into questions.

I have corrected your code with options[1] insted of options.index(1)
options = ["a) 31\n b) 32\n c) 33\n d)34", "a) Energy\n b) Motion\n c) Motion and Energy\n d)Nothing"]
questions = [f"Our brain is consists of ..... bones:\n{options[0]}", f"Physics is the study of .......\n{options[1]}"]
q_1 = input(questions[0])
The index method searches if the value you provided is in the list ie it options.index(0) checks if an element with value 0 is present and returns its index
To get the element at index 0 use list[0]

Related

Python ValueError: dictionary update sequence element #4 has length 3; 2 is required

I need to transform a tuple to dictionary with its respective key->value. The issue is that for a specific tuple I get the following error:
ValueError: dictionary update sequence element #4 has length 3; 2 is required.
But for other tuples with the same format it transforms it without problems. Could someone guide me to what is the reason of the error?
In the attached code the tuple1 value works fine, but the tuple value gives the above error.
tupla = ['.1.3.6.1.4.1.35873.5.1.2.1.1.1.1="314"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.2="10943"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.3="RTU : otu-8000e-Comtec (172.17.74.133)..Alarm type: OPTICAL..Timestamp: Jan 15 2022 - 08:31..Severity: CLEAR..Link name: PROV-21-82-83-84 (PRI) RUTA 7 (PROV) - Port 2..Probable cause:"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.5="1"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.4="port=2"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.6="1"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.7="1"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.8="0x07e6010f081f1400"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.9="otu-8000e-Comtec (172.17.74.133)"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.1="PROV-21-82-83-84 (PRI) RUTA 7 (PROV)"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.2="0"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.3="0.18"', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.4=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.5=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.6=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.1=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.2=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.3=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.4=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.5=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.6=""', '.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.7=""']
tupla1 = ['.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.1.3701361="3701361"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.2.3701361="CRITICAL"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.3.3701361="CRITICAL"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.4.3701361="VALE-078-001"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.5.3701361="Microreflection Threshold 1 Violation"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.6.3701361="2021-09-02T19:14:04.834Z"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.7.3701361="0"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.8.3701361="1333972"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.9.3701361="http://SRVXPTPRODSTG01.vtr.cl/pathtrak/analysis/view.html#/node/1333972"', '.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.10.3701361="7"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.2.3701361.0="28400000"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.3.3701361.0="0"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.4.3701361.0="HOLA"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.2.3701361.0="30800000"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.3.3701361.0="7"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.4.3701361.0="CRITICAL"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.2.3701361.0="40700000"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.3.3701361.0="0"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.4.3701361.0="NONE"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.2.3701361.0="35600000"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.3.3701361.0="2"', '.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.4.3701361.0="CRITICAL"']
miDiccionarioTupla= dict([(tupla[x].split('"')[0]+tupla[x].split('"')[1]).split('=') for x in range(len(tupla))])
print(miDiccionarioTupla)
#miDiccionarioTupla1= dict([(tupla1[x].split('"')[0]+tupla1[x].split('"')[1]).split('=') for x in range(len(tupla1))])
#print(miDiccionarioTupla1)
The problem is the fifth item in tupla:
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.4="port=2"'
That line contains two equal signs, so the final .split('=') produces too many values.
As noted by John Gordon, the data has an extraneous "=" in one of the rows.
I am not one hundred percent sure what you are hoping to achieve with your code, but I have a potential solution that might help to deal with the extraneous equals sign. The code may also be a bit easier to read:
tupla = ['.1.3.6.1.4.1.35873.5.1.2.1.1.1.1="314"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.2="10943"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.3="RTU : otu-8000e-Comtec (172.17.74.133)..Alarm type: OPTICAL..Timestamp: Jan 15 2022 - 08:31..Severity: CLEAR..Link name: PROV-21-82-83-84 (PRI) RUTA 7 (PROV) - Port 2..Probable cause:"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.5="1"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.4="port=2"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.6="1"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.7="1"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.8="0x07e6010f081f1400"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.9="otu-8000e-Comtec (172.17.74.133)"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.1="PROV-21-82-83-84 (PRI) RUTA 7 (PROV)"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.2="0"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.3="0.18"',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.4=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.5=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.10.6=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.1=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.2=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.3=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.4=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.5=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.6=""',
'.1.3.6.1.4.1.35873.5.1.2.1.1.1.11.7=""']
tupla1 = ['.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.1.3701361="3701361"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.2.3701361="CRITICAL"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.3.3701361="CRITICAL"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.4.3701361="VALE-078-001"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.5.3701361="Microreflection Threshold 1 Violation"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.6.3701361="2021-09-02T19:14:04.834Z"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.7.3701361="0"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.8.3701361="1333972"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.9.3701361="http://SRVXPTPRODSTG01.vtr.cl/pathtrak/analysis/view.html#/node/1333972"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.101.1.10.3701361="7"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.2.3701361.0="28400000"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.3.3701361.0="0"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.4.3701361.0="HOLA"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.2.3701361.0="30800000"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.3.3701361.0="7"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.4.3701361.0="CRITICAL"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.2.3701361.0="40700000"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.3.3701361.0="0"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.4.3701361.0="NONE"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.2.3701361.0="35600000"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.3.3701361.0="2"',
'.1.3.6.1.4.1.4100.2.2.1.2.1.102.1.4.3701361.0="CRITICAL"']
With Python, it is not necessary to use the length of the object and reference the index of the object (using [x]). We can simply parse the object directly with a for loop (for item in tupla):
miDiccionarioTupla = dict()
for item in tupla:
We can split strings on a character (such as =) AND with this data, we can choose how many times to check for that using the maxsplit parameter.
key, value = item.split('=', maxsplit=1)
I am presuming you want to eliminate any extra quotes in the values to the right side of your items, so I added a .replace() method call on the value: (i.e. "CRITICAL" becomes CRITICAL). This replaces any examples of " with an empty string, essentially removing all the double quotes.
value = value.replace('"', '')
miDiccionarioTupla.update({key: value})
print(miDiccionarioTupla)
miDiccionarioTupla1 = dict()
for item in tupla1:
key, value = item.split('=', 1)
value = value.replace('"', '')
miDiccionarioTupla1.update({key: value})
print(miDiccionarioTupla1)

Push data to google sheet from dataframe

I'm trying to push data into my google sheet with the following code, how can i change the code so that it will print in the 2nd row at the correct column base on the header that I've created.
First code:
class Header:
def __init__(self):
self.No_DOB_Y=1
self.No_DOB_M=2
self.No_DOB_D=3
self.Paid_too_much_little=4
self.No_number_of_ins=5
self.No_gender=6
self.No_first_login=7
self.No_last_login=8
self.Too_young_old=9
def __repr__(self):
return str(self.__dict__)
def add_col(self,name):
setattr(self,name,max(anomali_header.__dict__.values())+1)
anomali_header=Header()
2nd part of code (NEW):
# No_gender
a = list(df.loc[df['gender'].isnull()]['id'])
#print(a)
cells=sh3.range(1,1,len(a),1)
for i,cell in enumerate(cells):
cell.value=a[i]
sh3.update_cells(cells)
At the moment it updates into A1 cell....
This is what I essentially want to
As you can see, the code writes the results onto the first available cell which is A1, i essentially want it to appear at the bottom of my anomali_header of "No_gender" but I'm not sure how to link my 1st part of the code to the 2nd part of the code...
Thanks to v25, the code below works, but rather than going through the code one by one, i wanted to create a loop which goes through all the function
I'm trying to run the code below, but it seems I get an error when I use the loop.
Error:
TypeError: 'list' object cannot be interpreted as an integer
Code:
# No_DOB_Y
a = list(df.loc[df['Year'].isnull()]['id'])
# No number of ins
b = list(df.loc[df['number of ins'].isnull()]['id'])
# No_gender
c = list(df.loc[df['gender'].isnull()]['id'])
# Updating anomalies to sheet
condition = [a,b,c]
column = [1,2,3]
for j in range(column,condition):
cells=sh3.range(2,column,len(condition)+1,column)
for i,cell in enumerate(cells):
cell.value=condition[i]
print('end of check')
sh3.update_cells(cells)
You need to change the range() parameters:
first_row (int) – Row number
first_col (int) – Row number
last_row (int) – Row number
last_col (int) – Row number
So something like:
cells=sh3.range(2, 6, len(a)+1, 6)
Or you could issue the range as a string:
cells=sh3.range('F2:F' + str(len(a)+1))
These numbers may not be perfect, but this should change the positioning. You might need to tweak the digits slightly ;)
UPDATE:
I've encountered an error use a loop, updated my original post
TypeError: 'list' object cannot be interpreted as an integer
This is happneing because the function range which you use in the for loop (not to be confused with sh3.range which is a different function altogether) expects integers, but you're passing it lists.
However, a simpler way to implement this would be to create a list of tuples which map the strings to column integers, then loop based on this. Something like:
col_map = [ ('Year', 1),
('number of ins', 5),
('gender', 6)
]
for col_tup in col_map:
df_list = list(df.loc[df[col_tup[0]].isnull()]['id'])
cells = sh3.range(2, col_tup[1], len(df_list)+1, col_tup[1])
for i, cell in enumerate(cells)
cell.value=df_list[i]
sh3.update_cells(cells)

How to control all combination subset against userinput using itertool in Python 2.7

I want to write a code to get all combinations against 5 user input sets where each output subset only matches <= 3 elements from any of the input sets.
Example:
userInput1=(a,b,c,d,e)
userInput2=(c,d,e,f,g)
userInput3=(f,g,h,i,j)
userInput4=(g,h,i,j,k)
userInput5=(k,l,m,n,o)
# Turn 5 lists into 1 large list with no duplicates
allEntries = list(set(userInput1 + userInput2 + userInput3 + userInput4 + userInput5 ))
# Generate all possible list combinations
allCombinations = list(itertools.combinations( allEntries,5))
print "All combinations:"
for subset in allCombinations:
?????????
print subset
How do I do this check to limit the overlap? For instance, (g,i,j,k,o) fails because it shares 4 elements with userInput4.
E.g. - all combination
(a,c,j,l,o)
(k,b,a,m,n)
This isn't a simple solution with itertools. However, you do have the correct start. Now, check each list as you produce it:
check_set = [
set(userinput1),
set(userinput2),
set(userinput3),
set(userinput4),
set(userinput5)
]
for five in itertools.combinations( allEntries,5):
five_set = set(five)
# If there are no overlaps of more than 3 elements,
# accept the solution.
if !any(len(five_set.intersection(user_set)) > 3
for user_set in check_set):
print five
# ... or whatever you do to save the good combination.

Finding exon/ intron borders in a gene

I would like to go through a gene and get a list of 10bp long sequences containing the exon/intron borders from each feature.type =='mRNA'. It seems like I need to use compoundLocation, and the locations used in 'join' but I can not figure out how to do it, or find a tutorial.
Could anyone please give me an example or point me to a tutorial?
Assuming all the info in the exact format you show in the comment, and that you're looking for 20 bp on either side of each intro/exon boundary, something like this might be a start:
Edit: If you're actually starting from a GenBank record, then it's not much harder. Assuming that the full junction string you're looking for is in the CDS feature info, then:
for f in record.features:
if f.type == 'CDS':
jct_info = str(f.location)
converts the "location" information into a string and you can continue as below.
(There are ways to work directly with the location information without converting to a string - in particular you can use "extract" to pull the spliced sequence directly out of the parent sequence -- but the steps involved in what you want to do are faster and more easily done by converting to str and then int.)
import re
jct_info = "join{[0:229](+), [11680:11768](+), [11871:12135](+), [15277:15339](+), [16136:16416](+), [17220:17471](+), [17547:17671](+)"
jctP = re.compile("\[\d+\:\d+\]")
jcts = jctP.findall(jct_info)
jcts
['[0:229]', '[11680:11768]', '[11871:12135]', '[15277:15339]', '[16136:16416]', '[17220:17471]', '[17547:17671]']
Now you can loop through the list of start:end values, pull them out of the text and convert them to ints so that you can use them as sequence indexes. Something like this:
for jct in jcts:
(start,end) = jct.replace('[', '').replace(']', '').split(':')
try: # You need to account for going out of index, e.g. where start = 0
start_20_20 = seq[int(start)-20:int(start)+20]
except IndexError:
# do your alternatives e.g. start = int(start)

Format a python list and search for patterns

I am getting rows from a spreadsheet with mixtures of numbers, text and dates
I want to find elements within the list, some numbers and some text
for example
sg = [500782, u'BMOU9015488', u'SD4', u'CLOSED', -1, '', '', -1]
sg = map(str, sg)
#sg = map(unicode, sg) #option?
if any("-1" in s for s in sg):
#do something if matched
I don't feel this is the correct way to do this, I am also trying to match stuff like -1.5 and -1.5C and other unexpected characters like OPEN15 compared to 15
I have also looked at
sg.index("-1")
If positive then its a match (Only good for direct matches)
Some help would be appreciated
If you want to call a function for each case, I would do it this way:
def stub1(elem):
#do something for match of type '-1'
return
def stub2(elem):
#do something for match of type 'SD4'
return
def stub3(elem):
#do something for match of type 'OPEN15'
return
sg = [500782, u'BMOU9015488', u'SD4', u'CLOSED', -1, '', '', -1]
sg = map(unicode, sg)
patterns = {u"-1":stub1, u"SD4": stub2, u"OPEN15": stub3} # add more if you want
for elem in sg:
for k, stub in patterns.iteritems():
if k in elem:
stub(elem)
break
Where stub1, stub2, ... are the fonctions that contains the code for each case.
It will be called (max 1 time per strings) if the string contains a matching substring.
What do you mean by "I don't feel this is the correct way to do this" ? Are you not getting the result you expect ? Is it too slow ?
Maybe, you can organize your data by columns instead of rows and have a more specific filters. If you are looking for speed, I'd suggest using the numpy module which has a very intersting function called select()
Scipy select example
By transforming all your rows in a numpy array, you can test several columns in one pass. This function is amazingly efficient and powerful ! Basically it's used like this:
import numpy as np
a = array(...)
conds = [a < 10, a % 3 == 0, a > 25]
actions = [a + 100, a / 3, a * 10]
result = np.select(conds, actions, default = 0)
All values in a will be transformed as follow:
A value 100 will be added to any value of a which is smaller than 10
Any value in a which is a multiple of 3, will be divided by 3
Any value above 25 will be multiplied by 10
Any other value, not matching the previous conditions, will be set to 0
Bot conds and actions are lists, and must have the same number of arguments. The first element in conds has its action set as the first element of actions.
It could be used to determine the index in a vector for a particular value (eventhough this should be done using the nonzero() numpy function).
a = array(....)
conds = [a <= target, a > target]
actions = [1, 0]
index = select(conds, actions).sum()
This is probably a stupid way of getting an index, but it demonstrates how we can use select()... and it works :-)

Categories

Resources