I am trying to import xml file with some empty attributes to table. getting this error AttributeError: 'NoneType' object has no attribute 'strip' - python

def XML_get_fields_and_types_and_data_levels_3(xml_file_name):
data_2d = []
for child in root:
grandchildren = child.findall(".//")
fields = []
types = []
data_1d = []
data_2d.append(data_1d)
for grandchild in grandchildren:
data_1d.append(convert_string_to_type(grandchild.text))
if grandchild.tag not in fields:
fields.append(grandchild.tag)
types.append(get_type_of_string(grandchild.text))
return (fields, types, data_2d)
def get_type_of_string(string):
clean_string = string.strip()
try:
if clean_string is not None:
clean_string = string.strip()
return string.strip()
if "." in clean_string:
clean_string = string.split()
if isinstance(clean_string, list):
point_or_segment = [float(i) for i in clean_string]
if len(point_or_segment) == 2:
return 'POINT'
else:
return 'LSEG'
else:
val = float(clean_string)
return 'REAL'
else:
val = int(clean_string)
return 'INTEGER'
except ValueError:
return 'TEXT'

the issue is the line-of-code (loc) after your method definition,
def get_type_of_string(string):
clean_string = string.strip()
there string might be None, so the exception is raised. Instead of re-writing the method for you, which would easy for me but not be very helpful for you, I suggest you to re-design this method. Here are my hints:
there is duplicated code
the split() method always returns a list, no matter if the separator is found, isn't, so there is no need for this line to exists if isinstance(clean_string, list)
then why is this conversion in place if then val is not used thereafter? the easiest way to evaluate a variable type is by using the isinstance() builtin method as you did it few lines above.
try to split this method, in simpler and smaller methods, or try to simplify its logic.
Hope this hints will help you through. Enjoy coding.

Related

I receive to many DNA objects in my read DNA function

class DnaSeq:
def __init__(self, accession, seq):
self.accession = accession
self.seq = seq
def __len__(self):
if self.seq == None:
raise ValueError
elif self.seq =='':
raise ValueError
else:
return len(self.seq)
def __str__(self):
if self.accession =='':
raise ValueError
elif self.accession == None:
raise ValueError
else:
return f"<DnaSeq accession='{self.accession}'>"
def read_dna(filename):
DnaSeq_objects = []
new_dna_seq = DnaSeq("s1", "AAA")
with open(filename, 'r') as seq:
for line in seq.readlines():
if line.startswith('>'):
new_dna_seq.accession = line
else:
new_dna_seq.seq = line.strip()
DnaSeq_objects.append(new_dna_seq)
return DnaSeq_objects
this is the .fa file I tried to read
> s0
> ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGT GTTAATCTTACAACCAGAACTCAAT
> s1
> GTTAATCTTACAACCAGAACTCAATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCAGTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTC
> s2
> ACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTAC
> s3
> TCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCCCT
> s4
> AGACCCAGTCCCTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCATTT
> s5
> TTTGTAATGATCCATTTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAGAGTTTATTCTAGTGCGA
It's supposed to return 6 DNA objects but I received too many.
read_dna('ex1.fa')
[<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>,
<__main__.DnaSeq object at 0x000001C67208F820>
]
How can I fix this, so that it receives the right amount
Your code is reading every line beginning with > as an accession, but it's not populating the .seq attribute because it's not finding any sequences. In the FASTA format, only the header/description/accession ID line begins with >. The sequence line(s) don't have any prefix, they're just single-letter bases or amino acids.
There's actually a lot more you need to do. You need to have a default value for self.seq, you need to parse the sequences for spaces and other irrelevant characters, and you need to be able to concatenate multiple sequence lines. Instead of rolling your own code, I highly recommend checking out Biopython.
I decided to give you some example code that will help you on your way, using a couple of neat Python constructs to condense things down a bit and clean up your original code. Please don't use this exact code as your assignment! It may contain concepts that you haven't learned about yet, or that you don't fully understand, and your professor will quickly be able to see it's not your original work. Play around with the code, make sure you understand what it does, try to think of any edge cases where it might not work as expected (such as having an accession without a sequence, or having a sequence spread over multiple lines). Then, come up with your own algorithm and submit that.
class DnaSeq:
def __init__(self, accession, seq):
self.accession = accession
self.seq = seq
def __len__(self):
if self.seq:
return len(self.seq)
else:
raise ValueError("Sequence missing")
def __repr__(self):
if self.accession and self.seq:
return f"<DnaSeq accession='{self.accession}', seq='{self.seq[:15]}...'>"
else:
raise ValueError("Accession ID or sequence missing")
def read_dna(filename):
DnaSeq_objects = []
with open(filename, 'r') as f:
# get rid of any whitespace on either end of the line
contents = [line.strip() for line in f.readlines()]
while len(contents): # while there are lines left to process
if len(contents[0]) == 0: # there was just whitespace and now it's an empty string
contents.pop(0) # pull the first item off the list
continue # go to the next line in the list
# no point in creating dummy values when we can use the real thing
new_dna_seq = DnaSeq(contents.pop(0).lstrip("> "), contents.pop(0))
DnaSeq_objects.append(new_dna_seq)
return DnaSeq_objects
results = [str(seq_obj) for seq_obj in read_dna("ex1.fa")]
print("\n".join(results))
# "<DnaSeq accession='s0', seq='ATGTTTGTTTTTC...'>",
# "<DnaSeq accession='s1', seq='GTTAATCTTACAA...'>",
# "<DnaSeq accession='s2', seq='ACTCAGGACTTGT...'>",
# "<DnaSeq accession='s3', seq='TCTGGGACCAATG...'>",
# "<DnaSeq accession='s4', seq='AGACCCAGTCCCT...'>",
# "<DnaSeq accession='s5', seq='TTTGTAATGATCC...'>"
In Your Loop You Should change The condition:
for line in seq.readlines():
if line.startswith('>'):
new_dna_seq.accession = line
else:
new_dna_seq.seq = line.strip()
DnaSeq_objects.append(new_dna_seq)
To:
for line in seq.readlines():
if line.startswith('> s'):
new_dna_seq.accession = line.strip().replace('> ', '')
else:
new_dna_seq.seq = line.strip().replace('> ', '')
DnaSeq_objects.append(new_dna_seq)
in Your, if statement checks if the line starts with '> s' and indent the object appending within the else block.
I have also removed'>', from your accession and sequence as That seems unnecessary.
since DnaSeq_objects is a list, just return DnaSeq_objects[:6]. Even if the list contains less than 6 elements, this syntax will not throw an error and will just return all elements

Handle nested fields with conversion types in string with string.Formatter

Update 2
Alright, my answer to this question is not a complete solution to what I originally wanted but it's ok for simpler things like filename templating (what I originally intended to use this for). I have yet to come up with a solution for recursive templating. It might not matter to me though as I have reevaluated what I really need. Though it's possible I'll need bigger guns in the future, but then I'll probably just choose another more advanced templating engine instead of reinventing the tire.
Update
Ok I realize now string.Template probably is the better way to do this. I'll answer my own question when I have a working example.
I want to accomplish formatting strings by grouping keys and arbitrary text together in a nesting manner, like so
# conversions (!):
# u = upper case
# l = lower case
# c = capital case
# t = title case
fmt = RecursiveNamespaceFormatter(globals())
greeting = 'hello'
person = 'foreName surName'
world = 'WORLD'
sample = 'WELL {greeting!u} {super {person!t}, {tHiS iS tHe {world!t}!l}!c}!'
print(fmt.format(sample))
# output: WELL HELLO Super Forename Surname, this is the World!
I've subclassed string.Formatter to populate the nested fields which I retrieve with regex, and it works fine, except for the fields with a conversion type which doesn't get converted.
import re
from string import Formatter
class RecursiveNamespaceFormatter(Formatter):
def __init__(self, namespace={}):
Formatter.__init__(self)
self.namespace = namespace
def vformat(self, format_string, *args, **kwargs):
def func(i):
i = i.group().strip('{}')
return self.get_value(i,(),{})
format_string = re.sub('\{(?:[^}{]*)\}', func, format_string)
try:
return super().vformat(format_string, args, kwargs)
except ValueError:
return self.vformat(format_string)
def get_value(self, key, args, kwds):
if isinstance(key, str):
try:
# Check explicitly passed arguments first
return kwds[key]
except KeyError:
return self.namespace.get(key, key) # return key if not found (e.g. key == "this is the World")
else:
super().get_value(key, args, kwds)
def convert_field(self, value, conversion):
if conversion == "u":
return str(value).upper()
elif conversion == "l":
return str(value).lower()
elif conversion == "c":
return str(value).capitalize()
elif conversion == "t":
return str(value).title()
# Do the default conversion or raise error if no matching conversion found
return super().convert_field(value, conversion)
# output: WELL hello!u super foreName surName!t, tHiS iS tHe WORLD!t!l!c!
What am I missing? Is there a better way to do this?
Recursion is a complicated thing with this, especially with the limitations of python's re module. Before I tackled on with string.Template, I experimented with looping through the string and stacking all relevant indexes, to order each nested field in hierarchy. Maybe a combination of the two could work, I'm not sure.
Here's however a working, non-recursive example:
from string import Template, _sentinel_dict
class MyTemplate(Template):
delimiter = '$'
pattern = '\$(?:(?P<escaped>\$)|\{(?P<braced>[\w]+)(?:\.(?P<braced_func>\w+)\(\))*\}|(?P<named>(?:[\w]+))(?:\.(?P<named_func>\w+)\(\))*|(?P<invalid>))'
def substitute(self, mapping=_sentinel_dict, **kws):
if mapping is _sentinel_dict:
mapping = kws
elif kws:
mapping = _ChainMap(kws, mapping)
def convert(mo):
named = mapping.get(mo.group('named'), mapping.get(mo.group('braced')))
func = mo.group('named_func') or mo.group('braced_func') # i.e. $var.func() or ${var.func()}
if named is not None:
if func is not None:
# if named doesn't contain func, convert it to str and try again.
callable_named = getattr(named, func, getattr(str(named), func, None))
if callable_named:
return str(callable_named())
return str(named)
if mo.group('escaped') is not None:
return self.delimiter
if mo.group('invalid') is not None:
self._invalid(mo)
if named is not None:
raise ValueError('Unrecognized named group in pattern',
self.pattern)
return self.pattern.sub(convert, self.template)
sample1 = 'WELL $greeting.upper() super$person.title(), tHiS iS tHe $world.title().lower().capitalize()!'
S = MyTemplate(sample1)
print(S.substitute(**{'greeting': 'hello', 'person': 'foreName surName', 'world': 'world'}))
# output: WELL HELLO super Forename Surname, tHiS iS tHe World!
sample2 = 'testing${äää.capitalize()}.upper()ing $NOT_DECLARED.upper() $greeting '
sample2 += '$NOT_DECLARED_EITHER ASDF$world.upper().lower()ASDF'
S = MyTemplate(sample2)
print(S.substitute(**{
'some_var': 'some_value',
'äää': 'TEST',
'greeting': 'talofa',
'person': 'foreName surName',
'world': 'världen'
}))
# output: testingTest.upper()ing talofa ASDFvärldenASDF
sample3 = 'a=$a.upper() b=$b.bit_length() c=$c.bit_length() d=$d.upper()'
S = MyTemplate(sample3)
print(S.substitute(**{'a':1, 'b':'two', 'c': 3, 'd': 'four'}))
# output: a=1 b=two c=2 d=FOUR
As you can see, $var and ${var} works as expected, but the fields can also handle type methods. If the method is not found, it converts the value to str and checks again.
The methods can't take any arguments though. It also only catches the last method so chaining doesn't work either, which I believe is because re do not allow multiple groups to use the same name (the regex module does however).
With some tweaking of the regex pattern and some extra logic in convert both these things should be easily fixed.
MyTemplate.substitute works like MyTemplate.safe_substitute by not throwing exceptions on missing keys or fields.

How to iterate more efficiency with multiple IF statements Python

I have 3 if statements and they are really ugly in terms of style and efficiency.
They parse HTML with BS4.
HTML is in example_post variable.
If element exists -> get text
If does not exist -> assign 'None' as a str.
if example_post.find('span', class_='tag1'):
post_reactions = example_post.find('span', class_='tag1').getText()
else:
post_reactions = 'None'
if example_post.find('span', class_='tag2'):
post_comments = example_post.find('span', class_='tag2').getText()
else:
post_comments = 'None'
if example_post.find('span', class_='tag3'):
post_shares = example_post.find('span', class_= 'tag3').getText()
else:
post_shares = 'None'
I started to google how to make it better and found that it is possible to use dictionaries with if statements
so the dict
post_reactions_dict = {'post_reactions': 'tag1', 'post_comments':'tag2','post_shares':'tag3'}
and tried like this
post_titles = []
post_values = []
for key,value in post_reactions_dict.items():
if example_post.find('span', class_=key):
post_values.append(example_post.find('span', class_=key).getText())
post_titles.append(key)
else:
post_titles.append(key)
post_values.append('None')
It is ok, but maybe it is possible to make it even better?
Ideal result:
post_titles = ['post_reactions', 'post_comments', 'post_shares']
post_values (it depends) but for the question ['None', 'None', 'None']
I would suggest making this a bit more generic and avoid using exceptions as the "normal" program flow:
def get_span(element,class_):
tag = element.find('span', class_=class_)
return None if tag is None else tag.getText()
post_reactions = get_span(example_post,'tag1')
post_comments = get_span(example_post,'tag2')
post_share = get_span(example_post,'tag3')
post = {}
attributes = ('reactions', 'tag1'), ('comments', 'tag2'), ('shares', 'tag3')
for attribute, tag in attributes:
try:
post[attribute] = example_post.find('span', class_=tag).getText()
except AttributeError:
post[attribute] = None
Don't use individual variables, use a dict to store the data.
Figure out what the variables (the differences between your repeated code) are; in this case it's just post_* and tag*, put them as pairs together as data.
Don't repeat the example_post.find(...) call; here we're using the fact that .getText() will likely cause an AttributeError if find() returns None/False/whatever it is it returns.
I assume the .find() method returns a class or None? If so my approach without any if:
def get_text(class_):
try:
return example_post.find('span', class_=class_).getText()
except AttributeError:
return 'None'
post_reactions = get_text('tag1')
post_comments = get_text('tag2')
post_share = get_text('tag3')

AttributeError: 'function' object has no attribute 'index'

I've made function which finds the list including specific word by input value.
I'd like to calling 'index' from 'search' function. But I've got Attribute error message like this ;
if distance.haversine([search.index]['geometry']['coordinates'][1],
[search.index]['geometry']['coordinates'][0],t_dicc['tuits']['coordinates']['latitud'],
t_dicc['tuits']['coordinates']['longitud']<=radius):
**AttributeError: 'function' object has no attribute 'index'**
I've tried to modify many times but it didn't work.
import os
import string
import json
from pprint import pprint
import distance
def main():
f = open('monumentos-reducido.json', 'r')
mo_dicc = json.load(f)
g = open('tuits.json', 'r')
t_dicc = json.load(g)
def search():
word = raw_input("monument name : ")
if(word in value for word in ('nombre')):
try:
StopIteration
if word > 1:
index = next(index for (index, d) in enumerate(mo_dicc['features']) if d["properties"]["nombre"] == word)
pprint(mo_dicc['features'][index])
except StopIteration:
exit()
search()
radius = input("radius(meters) : ")
def search2rad(search):
resultlst = []
if distance.haversine([search.index]['geometry']['coordinates'][1],[search.index]['geometry']['coordinates'][0], t_dicc['tuits']['coordinates']['latitud'], t_dicc['tuits']['coordinates']['longitud']<=radius):
index = next(index for (index, d) in enumerate(t_dicc['tuits']) if d['coordenadas'])
resultlst.append(t_dicc['tuits'][index])
print resultlst
search2rad(search)
You are making call to search2rad function as search2rad(search) where search is the function. Within search2rad(), you are doing:
if distance.haversine([search.index]['geometry']['coordinates'][1],[search.index]['geometry']['coordinates'][0], t_dicc['tuits']['coordinates']['latitud'], t_dicc['tuits']['coordinates']['longitud']<=radius):
Here you have mentioned search.index. It is raising error since it is function (with no attribute as index).
I think what you want to do is to pass the value returned by search() call to search2rad(). For that, you may do:
search2rad(search())
But the cleaner way will be to do it like:
index = search()
search2rad(index)

Replacing multiple words in a string from different data sets in Python

Essentially I have a python script that loads in a number of files, each file contains a list and these are used to generate strings. For example: "Just been to see $film% in $location%, I'd highly recommend it!" I need to replace the $film% and $location% placeholders with a random element of the array of their respective imported lists.
I'm very new to Python but have picked up most of it quite easily but obviously in Python strings are immutable and so handling this sort of task is different compared to other languages I've used.
Here is the code as it stands, I've tried adding in a while loop but it would still only replace the first instance of a replaceable word and leave the rest.
#!/usr/bin/python
import random
def replaceWord(string):
#Find Variable Type
if "url" in string:
varType = "url"
elif "film" in string:
varType = "film"
elif "food" in string:
varType = "food"
elif "location" in string:
varType = "location"
elif "tvshow" in string:
varType = "tvshow"
#LoadVariableFile
fileToOpen = "/prototype/default_" + varType + "s.txt"
var_file = open(fileToOpen, "r")
var_array = var_file.read().split('\n')
#Get number of possible variables
numberOfVariables = len(var_array)
#ChooseRandomElement
randomElement = random.randrange(0,numberOfVariables)
#ReplaceWord
oldValue = "$" + varType + "%"
newString = string.replace(oldValue, var_array[randomElement], 1)
return newString
testString = "Just been to see $film% in $location%, I'd highly recommend it!"
Test = replaceWord(testString)
This would give the following output: Just been to see Harry Potter in $location%, I'd highly recommend it!
I have tried using while loops, counting the number of words to replace in the string etc. however it still only changes the first word. It also needs to be able to replace multiple instances of the same "variable" type in the same string, so if there are two occurrences of $film% in a string it should replace both with a random element from the loaded file.
The following program may be somewhat closer to what you are trying to accomplish. Please note that documentation has been included to help explain what is going on. The templates are a little different than yours but provide customization options.
#! /usr/bin/env python3
import random
PATH_TEMPLATE = './prototype/default_{}s.txt'
def main():
"""Demonstrate the StringReplacer class with a test sting."""
replacer = StringReplacer(PATH_TEMPLATE)
text = "Just been to see {film} in {location}, I'd highly recommend it!"
result = replacer.process(text)
print(result)
class StringReplacer:
"""StringReplacer(path_template) -> StringReplacer instance"""
def __init__(self, path_template):
"""Initialize the instance attribute of the class."""
self.path_template = path_template
self.cache = {}
def process(self, text):
"""Automatically discover text keys and replace them at random."""
keys = self.load_keys(text)
result = self.replace_keys(text, keys)
return result
def load_keys(self, text):
"""Discover what replacements can be made in a string."""
keys = {}
while True:
try:
text.format(**keys)
except KeyError as error:
key = error.args[0]
self.load_to_cache(key)
keys[key] = ''
else:
return keys
def load_to_cache(self, key):
"""Warm up the cache as needed in preparation for replacements."""
if key not in self.cache:
with open(self.path_template.format(key)) as file:
unique = set(filter(None, map(str.strip, file)))
self.cache[key] = tuple(unique)
def replace_keys(self, text, keys):
"""Build a dictionary of random replacements and run formatting."""
for key in keys:
keys[key] = random.choice(self.cache[key])
new_string = text.format(**keys)
return new_string
if __name__ == '__main__':
main()
The varType you are assigning will be set in only one of your if-elif-else sequence and then the interpreter will go outside. You would have to run all over it and perform operations. One way would be to set flags which part of sentence you want to change. It would go that way:
url_to_change = False
film_to_change = False
if "url" in string:
url_to_change = True
elif "film" in string:
film_to_change = True
if url_to_change:
change_url()
if film_to_change:
change_film()
If you want to change all occurances you could use a foreach loop. Just do something like this in the part you are swapping a word:
for word in sentence:
if word == 'url':
change_word()
Having said this, I'd reccomend introducing two improvements. Push changing into separate functions. It would be easier to manage your code.
For example function for getting items from file to random from could be
def load_variable_file(file_name)
fileToOpen = "/prototype/default_" + file_name + "s.txt"
var_file = open(fileToOpen, "r")
var_array = var_file.read().split('\n')
var_file.clos()
return var_array
Instead of
if "url" in string:
varType = "url"
you could do:
def change_url(sentence):
var_array = load_variable_file(url)
numberOfVariables = len(var_array)
randomElement = random.randrange(0,numberOfVariables)
oldValue = "$" + varType + "%"
return sentence.replace(oldValue, var_array[randomElement], 1)
if "url" in sentence:
setnence = change_url(sentence)
And so on. You could push some part of what I've put into change_url() into a separate function, since it would be used by all such functions (just like loading data from file). I deliberately do not change everything, I hope you get my point. As you see with functions with clear names you can write less code, split it into logical, reusable parts, no needs to comment the code.
A few points about your code:
You can replace the randrange with random.choice as you just
want to select an item from an array.
You can iterate over your types and do the replacement without
specifying a limit (the third parameter), then assign it to the same object, so you keep all your replacements.
readlines() do what you want for open, read from the file as store the lines as an array
Return the new string after go through all the possible replacements
Something like this:
#!/usr/bin/python
import random
def replaceWord(string):
#Find Variable Type
types = ("url", "film", "food", "location", "tvshow")
for t in types:
if "$" + t + "%" in string:
var_array = []
#LoadVariableFile
fileToOpen = "/prototype/default_" + varType + "s.txt"
with open(fname) as f:
var_array = f.readlines()
tag = "$" + t + "%"
while tag in string:
choice = random.choice(var_array)
string = string.replace(tag, choice, 1)
var_array.remove(choice)
return string
testString = "Just been to see $film% in $location%, I'd highly recommend it!"
new = replaceWord(testString)
print(new)

Categories

Resources