Python convert string in array - python

Hello i have a string that looks like that
el-gu-en-tr-ca-it-eu-ca#valencia-ar-eo-cs-et-th_TH-gl-id-es-bn_IN-ru-he-nl-pt-no-nb-id_ID-lv-lt-pa-te-pl-ta-bg_BG-be-fr-de-bn_BD-uk-pt_BR-ast-hr-jv-zh_TW-sr#latin-da-fa-hi-tr_TR-fi-hu-ja-fo-bs_BA-ro-fa_IR-zh_CN-sr-sq-mn-ko-sv-km-sk-km_KH-en_GB-ms-sc-ug-bal
how can i break items by - and place them in an array like
array[0]->el
array[1]->gu
.....

Use the .split() method on your string:
>>> example = 'el-gu-en-tr-ca-it-eu-ca#valencia-ar-eo-cs-et-th_TH-gl-id-es-bn_IN-ru-he-nl-pt-no-nb-id_ID-lv-lt-pa-te-pl-ta-bg_BG-be-fr-de-bn_BD-uk-pt_BR-ast-hr-jv-zh_TW-sr#latin-da-fa-hi-tr_TR-fi-hu-ja-fo-bs_BA-ro-fa_IR-zh_CN-sr-sq-mn-ko-sv-km-sk-km_KH-en_GB-ms-sc-ug-bal'
>>> example.split('-')
['el', 'gu', 'en', 'tr', 'ca', 'it', 'eu', 'ca#valencia', 'ar', 'eo', 'cs', 'et', 'th_TH', 'gl', 'id', 'es', 'bn_IN', 'ru', 'he', 'nl', 'pt', 'no', 'nb', 'id_ID', 'lv', 'lt', 'pa', 'te', 'pl', 'ta', 'bg_BG', 'be', 'fr', 'de', 'bn_BD', 'uk', 'pt_BR', 'ast', 'hr', 'jv', 'zh_TW', 'sr#latin', 'da', 'fa', 'hi', 'tr_TR', 'fi', 'hu', 'ja', 'fo', 'bs_BA', 'ro', 'fa_IR', 'zh_CN', 'sr', 'sq', 'mn', 'ko', 'sv', 'km', 'sk', 'km_KH', 'en_GB', 'ms', 'sc', 'ug', 'bal']

Call str.split():
s = "el-gu-en-tr-ca-it-eu-ca#valencia-ar-eo-cs-et-th_TH-gl-id-es-bn_IN-ru-he-nl-pt-no-nb-id_ID-lv-lt-pa-te-pl-ta-bg_BG-be-fr-de-bn_BD-uk-pt_BR-ast-hr-jv-zh_TW-sr#latin-da-fa-hi-tr_TR-fi-hu-ja-fo-bs_BA-ro-fa_IR-zh_CN-sr-sq-mn-ko-sv-km-sk-km_KH-en_GB-ms-sc-ug-bal"
locales = s.split("-")

Related

Is there a way to take country two letter alpha codes and display them on a map?

I had a DataFrame with country names and values corresponding to them. I used the following code to convert the countries into codes:
import pycountry
input_countries = happiness_data["Country or region"]
countries = {}
for country in pycountry.countries:
countries[country.name] = country.alpha_2
codes = [countries.get(country, 'Unknown code') for country in input_countries]
print(codes)
Which returns this:
['FI', 'DK', 'NO', 'IS', 'NL', 'CH', 'SE', 'NZ', 'CA', 'AT', 'AU', 'CR', 'IL', 'LU', 'GB', 'IE', 'DE', 'BE', 'US', 'Unknown code', 'AE', 'MT', 'MX', 'FR', 'Unknown code', 'CL', 'GT', 'SA', 'QA', 'ES', 'PA', 'BR', 'UY', 'SG', 'SV', 'IT', 'BH', 'SK', 'Unknown code', 'PL', 'UZ', 'LT', 'CO', 'SI', 'NI', 'Unknown code', 'AR', 'RO', 'CY', 'EC', 'KW', 'TH', 'LV', 'Unknown code', 'EE', 'JM', 'MU', 'JP', 'HN', 'KZ', 'Unknown code', 'HU', 'PY', 'Unknown code', 'PE', 'PT', 'PK', 'Unknown code', 'PH', 'RS', 'Unknown code', 'LY', 'ME', 'TJ', 'HR', 'HK', 'DO', 'BA', 'TR', 'MY', 'BY', 'GR', 'MN', 'MK', 'NG', 'KG', 'TM', 'DZ', 'MA', 'AZ', 'LB', 'ID', 'CN', 'Unknown code', 'BT', 'CM', 'BG', 'GH', 'Unknown code', 'NP', 'JO', 'BJ', 'Unknown code', 'GA', 'Unknown code', 'ZA', 'AL', 'Unknown code', 'KH', 'Unknown code', 'SN', 'SO', 'NA', 'NE', 'BF', 'AM', 'Unknown code', 'GN', 'GE', 'GM', 'KE', 'MR', 'MZ', 'TN', 'BD', 'IQ', 'Unknown code', 'ML', 'SL', 'LK', 'MM', 'TD', 'UA', 'ET', 'Unknown code', 'UG', 'EG', 'ZM', 'TG', 'IN', 'LR', 'KM', 'MG', 'LS', 'BI', 'ZW', 'HT', 'BW', 'Unknown code', 'MW', 'YE', 'RW', 'Unknown code', 'AF', 'CF', 'SS']
I dropped all of the unknown codes, so I only have known codes left. I want to plot these codes on a map so I can get a visualization of where my data is coming from. Is there a way to do this? I tried using pygal to no avail.
Thanks for any help and or advice you can give me. If you want to try this out, feel free to copy that list of countries and make up some random integer values to see if you are able to plot values corresponding to those country labels on a map. Additionally, if I can just use country names (i.e. "Bangladesh") with a value (i.e. (8)) and plot hues on a map according to that, that would work too.
Thanks so much!
You can map your alpha_2 country codes to coordinates using this list of countries, codes and coordinates then plot your data on a map using any sophisticated plotting libraries, like matplotlib and cartopy, matplotlib and geopandas or – if you want the map interactive and/or for the web – plotly and mapbox.
Have a look at Plotly and Built-in Country and State Geometries
https://plotly.com/python/choropleth-maps/

How to find keywords in a text file using python's sklearn

I want to create a way to optimize my resume using a python script. To do this, I am trying to find keywords used in the job listing that I can add to my resume to make it stand out when it is run through ATS. Currently, I am using the following code to find what percent match my resume is for the job. How can I use this comparison and find how to improve my resume with specific keywords from the job listing?
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
resume = open("resume.txt", encoding='latin-1')
reference = open("reference.txt", encoding='latin-1')
compare = [resume.read(),reference.read()]
cMatrix = CountVectorizer().fit_transform(compare)
#prints how well the resume matches as a percentage
matPercent = cosine_similarity(cMatrix)[0][1] * 100
matPercent = round(matPercent, 2) # round to two decimal
print("Resume is a "+ str(matPercent)+ "% match to the job.")
I am using the following to generate keywords, however, this omits important words and is a long list that I think could be optimized better using sklearn. Instead of using FindKeywords(), how can I access information from the CountVectorizer().fit_transform(compare)
def FindKeywords():
file = open("reference.txt", encoding='latin-1')
string = file.read().replace("\n", " ").replace("\t", " ").lower()
kwDict = {}
avoidables = set(['skilled','skills','skill','minimum','tools','work','features','looking','highly','', ' ','','the', 'of', 'to', 'and', 'a', 'in', 'is', 'it', 'you', 'that', 'he', 'was', 'for', 'on', 'are', 'with', 'as', 'I', 'his', 'they', 'be', 'at', 'one', 'have', 'this', 'from', 'or', 'had', 'by', 'not', 'word', 'but', 'what', 'some', 'we', 'can', 'out', 'other', 'were', 'all', 'there', 'when', 'up', 'use', 'your', 'how', 'said', 'an', 'each', 'she', 'which', 'do', 'their', 'time', 'if', 'will', 'way', 'about', 'many', 'then', 'them', 'write', 'would', 'like', 'so', 'these', 'her', 'long', 'make', 'thing', 'see', 'him', 'two', 'has', 'look', 'more', 'day', 'could', 'go', 'come', 'did', 'number', 'sound', 'no', 'most', 'people', 'my', 'over', 'know', 'water', 'than', 'call', 'first', 'who', 'may', 'down', 'side', 'been', 'now', 'find', 'any', 'new', 'work', 'part', 'take', 'get', 'place', 'made', 'live', 'where', 'after', 'back', 'little', 'only', 'round', 'man', 'year', 'came', 'show', 'every', 'good', 'me', 'give', 'our', 'under', 'name', 'very', 'through', 'just', 'form', 'sentence', 'great', 'think', 'say', 'help', 'low', 'line', 'differ', 'turn', 'cause', 'much', 'mean', 'before', 'move', 'right', 'boy', 'old', 'too', 'same', 'tell', 'does', 'set', 'three', 'want', 'air', 'well', 'also', 'play', 'small', 'end', 'put', 'home', 'read', 'hand', 'port', 'large', 'spell', 'add', 'even', 'land', 'here', 'must', 'big', 'high', 'such', 'follow', 'act', 'why', 'ask', 'men', 'change', 'went', 'light', 'kind', 'off', 'need', 'house', 'picture', 'try', 'us', 'again', 'animal', 'point', 'mother', 'world', 'near', 'build', 'self', 'earth', 'father', 'head', 'stand', 'own', 'page', 'should', 'country', 'found', 'answer', 'school', 'grow', 'study', 'still', 'learn', 'plant', 'cover', 'food', 'sun', 'four', 'between', 'state', 'keep', 'eye', 'never', 'last', 'let', 'thought', 'city', 'tree', 'cross', 'farm', 'hard', 'start', 'might', 'story', 'saw', 'far', 'sea', 'draw', 'left', 'late', 'run', "don't", 'while', 'press', 'close', 'night', 'real', 'life', 'few', 'north', 'open', 'seem', 'together', 'next', 'white', 'children', 'begin', 'got', 'walk', 'example', 'ease', 'paper', 'group', 'always', 'music', 'those', 'both', 'mark', 'often', 'letter', 'until', 'mile', 'river', 'car', 'feet', 'care', 'second', 'book', 'carry', 'took', 'science', 'eat', 'room', 'friend', 'began', 'idea', 'fish', 'mountain', 'stop', 'once', 'base', 'hear', 'horse', 'cut', 'sure', 'watch', 'color', 'face', 'wood', 'main', 'enough', 'plain', 'girl', 'usual', 'young', 'ready', 'above', 'ever', 'red', 'list', 'though', 'feel', 'talk', 'bird', 'soon', 'body', 'dog', 'family', 'direct', 'pose', 'leave', 'song', 'measure', 'door', 'product', 'black', 'short', 'numeral', 'class', 'wind', 'question', 'happen', 'complete', 'ship', 'area', 'half', 'rock', 'order', 'fire', 'south', 'problem', 'piece', 'told', 'knew', 'pass', 'since', 'top', 'whole', 'king', 'space', 'heard', 'best', 'hour', 'better', 'true', 'during', 'hundred', 'five', 'remember', 'step', 'early', 'hold', 'west', 'ground', 'interest', 'reach', 'fast', 'verb', 'sing', 'listen', 'six', 'table', 'travel', 'less', 'morning', 'ten', 'simple', 'several', 'vowel', 'toward', 'war', 'lay', 'against', 'pattern', 'slow', 'center', 'love', 'person', 'money', 'serve', 'appear', 'road', 'map', 'rain', 'rule', 'govern', 'pull', 'cold', 'notice', 'voice', 'unit', 'power', 'town', 'fine', 'certain', 'fly', 'fall', 'lead', 'cry', 'dark', 'machine', 'note', 'wait', 'plan', 'figure', 'star', 'box', 'noun', 'field', 'rest', 'correct', 'able', 'pound', 'done', 'beauty', 'drive', 'stood', 'contain', 'front', 'teach', 'week', 'final', 'gave', 'green', 'oh', 'quick', 'develop', 'ocean', 'warm', 'free', 'minute', 'strong', 'special', 'mind', 'behind', 'clear', 'tail', 'produce', 'fact', 'street', 'inch', 'multiply', 'nothing', 'course', 'stay', 'wheel', 'full', 'force', 'blue', 'object', 'decide', 'surface', 'deep', 'moon', 'island', 'foot', 'system', 'busy', 'test', 'record', 'boat', 'common', 'gold', 'possible', 'plane', 'stead', 'dry', 'wonder', 'laugh', 'thousand', 'ago', 'ran', 'check', 'game', 'shape', 'equate', 'hot', 'miss', 'brought', 'heat', 'snow', 'tire', 'bring', 'yes', 'distant', 'fill', 'east', 'paint', 'language', 'among', 'grand', 'ball', 'yet', 'wave', 'drop', 'heart', 'am', 'present', 'heavy', 'dance', 'engine', 'position', 'arm', 'wide', 'sail', 'material', 'size', 'vary', 'settle', 'speak', 'weight', 'general', 'ice', 'matter', 'circle', 'pair', 'include', 'divide', 'syllable', 'felt', 'perhaps', 'pick', 'sudden', 'count', 'square', 'reason', 'length', 'represent', 'art', 'subject', 'region', 'energy', 'hunt', 'probable', 'bed', 'brother', 'egg', 'ride', 'cell', 'believe', 'fraction', 'forest', 'sit', 'race', 'window', 'store', 'summer', 'train', 'sleep', 'prove', 'lone', 'leg', 'exercise', 'wall', 'catch', 'mount', 'wish', 'sky', 'board', 'joy', 'winter', 'sat', 'written', 'wild', 'instrument', 'kept', 'glass', 'grass', 'cow', 'job', 'edge', 'sign', 'visit', 'past', 'soft', 'fun', 'bright', 'gas', 'weather', 'month', 'million', 'bear', 'finish', 'happy', 'hope', 'flower', 'clothe', 'strange', 'gone', 'jump', 'baby', 'eight', 'village', 'meet', 'root', 'buy', 'raise', 'solve', 'metal', 'whether', 'push', 'seven', 'paragraph', 'third', 'shall', 'held', 'hair', 'describe', 'cook', 'floor', 'either', 'result', 'burn', 'hill', 'safe', 'cat', 'century', 'consider', 'type', 'law', 'bit', 'coast', 'copy', 'phrase', 'silent', 'tall', 'sand', 'soil', 'roll', 'temperature', 'finger', 'industry', 'value', 'fight', 'lie', 'beat', 'excite', 'natural', 'view', 'sense', 'ear', 'else', 'quite', 'broke', 'case', 'middle', 'kill', 'son', 'lake', 'moment', 'scale', 'loud', 'spring', 'observe', 'child', 'straight', 'consonant', 'nation', 'dictionary', 'milk', 'speed', 'method', 'organ', 'pay', 'age', 'section', 'dress', 'cloud', 'surprise', 'quiet', 'stone', 'tiny', 'climb', 'cool', 'design', 'poor', 'lot', 'experiment', 'bottom', 'key', 'iron', 'single', 'stick', 'flat', 'twenty', 'skin', 'smile', 'crease', 'hole', 'trade', 'melody', 'trip', 'office', 'receive', 'row', 'mouth', 'exact', 'symbol', 'die', 'least', 'trouble', 'shout', 'except', 'wrote', 'seed', 'tone', 'join', 'suggest', 'clean', 'break', 'lady', 'yard', 'rise', 'bad', 'blow', 'oil', 'blood', 'touch', 'grew', 'cent', 'mix', 'team', 'wire', 'cost', 'lost', 'brown', 'wear', 'garden', 'equal', 'sent', 'choose', 'fell', 'fit', 'flow', 'fair', 'bank', 'collect', 'save', 'control', 'decimal', 'gentle', 'woman', 'captain', 'practice', 'separate', 'difficult', 'doctor', 'please', 'protect', 'noon', 'whose', 'locate', 'ring', 'character', 'insect', 'caught', 'period', 'indicate', 'radio', 'spoke', 'atom', 'human', 'history', 'effect', 'electric', 'expect', 'crop', 'modern', 'element', 'hit', 'student', 'corner', 'party', 'supply', 'bone', 'rail', 'imagine', 'provide', 'agree', 'thus', 'capital', "won't", 'chair', 'danger', 'fruit', 'rich', 'thick', 'soldier', 'process', 'operate', 'guess', 'necessary', 'sharp', 'wing', 'create', 'neighbor', 'wash', 'bat', 'rather', 'crowd', 'corn', 'compare', 'poem', 'string', 'bell', 'depend', 'meat', 'rub', 'tube', 'famous', 'dollar', 'stream', 'fear', 'sight', 'thin', 'triangle', 'planet', 'hurry', 'chief', 'colony', 'clock', 'mine', 'tie', 'enter', 'major', 'fresh', 'search', 'send', 'yellow', 'gun', 'allow', 'print', 'dead', 'spot', 'desert', 'suit', 'current', 'lift', 'rose', 'continue', 'block', 'chart', 'hat', 'sell', 'success', 'company', 'subtract', 'event', 'particular', 'deal', 'swim', 'term', 'opposite', 'wife', 'shoe', 'shoulder', 'spread', 'arrange', 'camp', 'invent', 'cotton', 'born', 'determine', 'quart', 'nine', 'truck', 'noise', 'level', 'chance', 'gather', 'shop', 'stretch', 'throw', 'shine', 'property', 'column', 'molecule', 'select', 'wrong', 'gray', 'repeat', 'require', 'broad', 'prepare', 'salt', 'nose', 'plural', 'anger', 'claim', 'continent', 'oxygen', 'sugar', 'death', 'pretty', 'skill', 'women', 'season', 'solution', 'magnet', 'silver', 'thank', 'branch', 'match', 'suffix', 'especially', 'fig', 'afraid', 'huge', 'sister', 'steel', 'discuss', 'forward', 'similar', 'guide', 'experience', 'score', 'apple', 'bought', 'led', 'pitch', 'coat', 'mass', 'card', 'band', 'rope', 'slip', 'win', 'dream', 'evening', 'condition', 'feed', 'tool', 'total', 'basic', 'smell', 'valley', 'nor', 'double', 'seat', 'arrive', 'master', 'track', 'parent', 'shore', 'division', 'sheet', 'substance', 'favor', 'connect', 'post', 'spend', 'chord', 'fat', 'glad', 'original', 'share', 'station', 'dad', 'bread', 'charge', 'proper', 'bar', 'offer', 'segment', 'slave', 'duck', 'instant', 'market', 'degree', 'populate', 'chick', 'dear', 'enemy', 'reply', 'drink', 'occur', 'support', 'speech', 'nature', 'range', 'steam', 'motion', 'path', 'liquid', 'log', 'meant', 'quotient', 'teeth', 'shell', 'neck'])
for word in string.split(' '):
if word not in kwDict and word not in avoidables:
kwDict[word] = 1
elif word not in avoidables:
kwDict[word] += 1
returns = [key for key in kwDict.keys() if kwDict[key]>0]
return [kw for kw in returns if kw not in avoidables]
You can use the get_feature_names() method from the CountVectorizer as documented here.
So with a concrete example from your code (adjusted a bit), it could look like this:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
resume = "This is an example resume for a job"
reference = "This is an example reference for a job advertisement"
compare = [resume,reference]
cVect = CountVectorizer()
cMatrix = cVect.fit_transform(compare)
#prints how well the resume matches as a percentage
matPercent = cosine_similarity(cMatrix)[0][1] * 100
matPercent = round(matPercent, 2) # round to two decimal
print("Resume is a "+ str(matPercent)+ "% match to the job.")
Returns:
Resume is a 80.18% match to the job.
Then to get the keywords:
cVect.get_feature_names()
The returned keywords:
['advertisement',
'an',
'example',
'for',
'is',
'job',
'reference',
'resume',
'this']
If you would want only keywords from your resume or reference without the other, then you can just fit_transform() another CountVectorizer() just on that data and then get the keywords from that.
The important thing to keep in mind, is that you need to 'save' your trained CountVectorizer, so instead of
CountVectorizer().fit_transform(compare)
You need to use
cVect = CountVectorizer()
cVect.fit_transform(compare)
So that you can later still access your CountVectorizer() instance.

How to remove all the elements that contain special characters and strings?

I'm trying to remove all the elements that contain special characters or strings but some of the elements still there.
description_list = ['$', '2,850', 'door', '.', 'sale', '...', 'trades', '.', 'pay', 'pp', 'fees', 'shipping', 'cost', 'desirable', '\x932', 'liner', 'dial\x94', 'eta', 'movement', 'watch', '\x93safe', 'queen\x94', ',', 'pristine', 'condition', '.', 'i\x92m', 'original', 'owner', 'worn', 'watch', 'gently', 'handful', 'times', '.', 'protective', 'plastics', 'still', 'intact', 'case', 'back', ',', 'parts', 'clasp', 'full', 'original', 'kit', 'you\x92ll', 'see', 'pics', '.', 'includes', 'original', 'boxes', ',', 'manuals', ',', 'warranty', 'card', 'ad', ',', 'spare', 'bracelet', 'links', ',', 'dive', 'strap', '&', 'extension', ',', 'etc', 'payment', 'paypal', ',', 'due', 'quickly', 'upon', 'agreement', 'purchase', 'watch', '.', 'holds', ',', 'delays', ',', 'games', '.', 'pay', 'pp', 'fees', 'shipping', 'us', 'postal', 'service', 'priority', 'mail', 'w/signature', 'confirmation', ',', 'paypal', 'verified', 'address', 'inside', 'usa', '.', 'please', 'don\x92t', 'ask', 'ship', 'outside', 'usa', '.', 'exceptions', 'made', '.', 'please', 'e-mail', '[', 'email', 'protected', ']', '.', 'also', 'text', 'call', '210-705-3383.', 'name', 'james', 'crockett', 'thank', ',', 'james', 'crockett', '$', '2,850', 'door', '.', 'sale', '...', 'trades', '.', 'pay', 'pp', 'fees', 'shipping', 'cost', 'desirable', '\x932', 'liner', 'dial\x94', 'eta', 'movement', 'watch', '\x93safe', 'queen\x94', ',', 'pristine', 'condition', '.', 'i\x92m', 'original', 'owner', 'worn', 'watch', 'gently', 'handful', 'times', '.', 'protective', 'plastics', 'still', 'intact', 'case', 'back', ',', 'parts', 'clasp', 'full', 'original', 'kit', 'you\x92ll', 'see', 'pics', '.', 'includes', 'original', 'boxes', ',', 'manuals', ',', 'warranty', 'card', 'ad', ',', 'spare', 'bracelet', 'links', ',', 'dive', 'strap', '&', 'extension', ',', 'etc', 'payment', 'paypal', ',', 'due', 'quickly', 'upon', 'agreement', 'purchase', 'watch', '.', 'holds', ',', 'delays', ',', 'games', '.', 'pay', 'pp', 'fees', 'shipping', 'us', 'postal', 'service', 'priority', 'mail', 'w/signature', 'confirmation', ',', 'paypal', 'verified', 'address', 'inside', 'usa', '.', 'please', 'don\x92t', 'ask', 'ship', 'outside', 'usa', '.', 'exceptions', 'made', '.', 'please', 'e-mail', '[', 'email', 'protected', ']', '.', 'also', 'text', 'call', '210-705-3383.', 'name', 'james', 'crockett', 'thank', ',', 'james', 'crockett']
price_list = [x for x in description_list if any(c.isdigit() for c in x)]
Output
# price_list
['2,850', '\x932', '210-705-3383.', '2,850', '\x932', '210-705-3383.']
Should be like this (the comma is acceptable because want to extract price number)
['2,850', '2,850']
You can do an all check inside list comprehension that checks if the string contains all digits or comma and then filter only comma values:
price_list = [x for x in description_list if all(c.isdigit() or c == ',' for c in x) and x != ',']
# ['2,850', '2,850']
Regex answer
import re
price_list = [x for x in description_list if re.match('\d+(,*\d+)?$', x)]
You were close, assuming you want to retain data that contains digits or digits with commas. The current list comprehension for price_list is returning strings if they contain at least one digit.
[str(x) for x in description_list if str(x).replace(',', '').isdigit()]

Pandas not dividing length of cells

Been struggling with this problem for a long time. I have a dataframe that looks like this:
dataframe pic
I'm trying to divide the length of each 'counter' by the length of each 'content'. I thought this would be fairly straightforward. So far I've tried:
reviews['diversity'] = reviews['counter'].apply(lambda x: 0 if len(x) == 0 else float(len(x)) / float(len(reviews['content'][x])))
as well as using x['content']. I get the massive error message KeyError: "None of [['aberfeldy', 'recorded', 'their', 'debut', 'young', 'forever', 'using', 'a', 'single', 'microphone', 'good', 'for', 'them', 'in', 'that', 'spirit', 'i', 'cut', 'short', 'my', 'obligatory', 'introduction', 'and', 'bring', 'you', 'straight', 'to', 'the', 'edinburgh', 'group', 'lovelorn', 'unfortunately', 'still', 'heart', 'exposed', 'by', 'oh', 'production', 'love', 'is', 'verb', 'noun', 'as', 'well', 'find', 'it', 'dictionary', 'under', 'l', 'little', 'witticism', 'comes', 'from', 'an', 'arrow', 'written', 'sung', 'riley', 'briggs', 'based', 'on', 'one', 'photo', 'looks', 'like', 'anthony', 'michael', 'hall', 'though', 'his', 'vocals', 'chart', 'fairly', 'standard', 'indie', 'course', 'borrowing', 'neil', 'friend', 'ben', 'gibbard', 'what', 'do', 'plain', 'sensitive', 'guys', 'everywhere', 'listen', 'some', 'of', 'best', 'friends', 'are', 'favorite', 'albums', 'consist', 'campfire', 'singalongs', 'bands', 'with', 'modest', 'acoustic', 'guitar', 'chops', 'cute', 'names', 'accents', 'but', 'those', 'lyrics', 'no', 'band', 'would', 'sing', 'such', 'words', 'deserves', 'easily', 'made', 'comparisons', 'fellow', 'scots', 'belle', '', 'sebastian', 'or', 'even', 'camera', 'obscura', 'let', 'alone', 'earnest', 'aussies', 'lucksmiths', 'compare', 'twee', 'progenitors', 'pastels', 'talulah', 'gosh', 'owe', 'me', 'your', 'cardigan', 'moniker', 'nipped', 'scottish', 'vacation', 'destination', 'practically', 'beg', 'name', 'there', 'need', 'encourage', 'throughout', 'record', 'shows', 'predisposition', 'toward', 'bungling', 'old', 'english', 'teachers', 'motto', 'show', 'not', 'tell', 'this', 'may', 'be', 'result', 'medical', 'condition', 'dyslexia', 'which', 'case', 'we', 'should', 'hold', 'our', 'snark', 'seems', 'guy', 'can', 'open', 'mouth', 'without', 'saying', 'nothing', 'so', 'sad', 'leaving', 'he', 'sings', 'out', 'lonely', 'now', 'she', 'gone', 'adds', 'tie', 'teems', 'vivid', 'storytelling', 'goes', 'rhyme', 'sacred', 'wasted', 'reasons', 'until', 'somewhere', 'editor', 'rhyming', 'loses', 'her', 'job', 'often', 'at', 'when', 'they', 'stumble', 'beyond', 'trite', 'infantilism', 'first', 'vegetarian', 'restaurant', 'lopes', 'along', 'winning', 'tangled', 'up', 'blue', 'strums', 'accented', 'subtle', 'fiddles', 'lovely', 'boy', 'harmonies', 'seemingly', 'aiming', 'album', 'cheerful', 'unpretentious', 'look', 'everyday', 'here', 'finally', 'makes', 'interesting', 'way', 'dance', 'kitchen', 'says', 'willing', 'see', 'where', 'takes', 'him', 'then', 'proclaims', 'sometimes', 'believe', 'human', 'duck', 'cover', 'speaking', 'aliens', 'heliopolis', 'night', 'next', 'track', 'incidentally', 'its', 'second', 'whimsical', 'spaceship', 'song', 'complete', 'nose', 'perfect', 'unique', 'yeah', 'was', 'means', 'warm', 'pop', 'heats', 'headphones', 'veritable', 'help', 'root', 'begins', 'everyone', 'because', 'last', 'thing', 'world', 'needs', 'another', 'batch', 'sullen', 'scenesters', 'yet', 'any', 'relationship', 'just', 'someone', 'doesn', 'mean', 'back', 'beautiful', 'gibbs', 'tells', 'us', 'tender', 'moment', 'probably', 'if', 'hope', 'gets', 'laid']] are in the [index]".
I've tried:
def diverse(x):
if len(x) == 0:
return 0
else:
return float(len(x)) / float(len(reviews['clean'][x]))
reviews['diverse'] = reviews['counter'].apply(diverse)
and get the same thing.
I've tried using applymap with reviews['diversity'] = reviews.applymap(lambda x: 0 if len(x) == 0 else float(len(reviews['counter'][x])) / float(len(reviews['content'][x])))
and get ("object of type 'int' has no len()", 'occurred at index Unnamed: 0').
And yet if I just do float(len(reviews['counter'][4])) / float(len(reviews['clean'][4])), I get 0.634375.
Any help is much appreciated.
edit: I tried:
def test(x, y):
for row, item in x.iteritems():
x = float(len(item))
for row, item in y.iteritems():
if len(item) == 0:
return (0)
else:
y = float(len(item))
return (x/y)`
When I used "print" instead of "return", it gave me all the values. But return only divides the length of the first row, which seems really weird?
Here is toy example I constructed to show how to do what you are asking:
import pandas as pd
from collections import Counter
df = pd.DataFrame([['hello world i am a computer'],
['hello i am a computer too hello computer']],
columns=['content'])
df['counter'] = df.content.str.split().apply(Counter)
df
# returns:
content counter
hello world i am a computer {'am': 1, 'hello': 1, 'computer': 1, 'world': ...
hello i am a computer too hello computer {'am': 1, 'hello': 2, 'computer': 2, 'a': 1, '...
This line answers the question as you phrased it:
df['diversity'] = df.content.str.len() / df.counter.apply(len)
But I think what you really wanted was to break the strings in content into a list of words by splitting on the space character. In that case, you probably want:
df['diversity'] = df.content.str.split().apply(len) / df.counter.apply(len)

Python #properties raising an error

I am trying to write a class to pass the following unittest:
import unittest
from property_address import *
class TestAddresses(unittest.TestCase):
def setUp(self):
self.home = Address( name='Steve Holden', street_address='1972 Flying Circus', city='Arlington', state='VA', zip_code='12345' )
def test_name(self):
self.assertEqual(self.home.name, 'Steve Holden')
self.assertRaises(AttributeError, setattr, self.home, 'name', 'Daniel Greenfeld')
def test_state(self):
self.assertEqual(self.home.state, 'VA')
self.assertRaises(StateError, setattr, self.home, 'state', 'Not a state')
self.home.state = 'CO'
self.assertEqual(self.home.state, 'CO')
The part I am having issues with is the self.assertRaises(StateError, setattr, self.home, 'state', 'Not a state')
I can't seem how to get a StatError to be raised.
The code I am using is:
class Address(object):
states = ['IA', 'KS', 'UT', 'VA', 'NC', 'NE', 'SD', 'AL', 'ID', 'FM', 'DE', 'AK', 'CT', 'PR', 'NM', 'MS', 'PW', 'CO', 'NJ', 'FL', 'MN',
'VI', 'NV', 'AZ', 'WI', 'ND', 'PA', 'OK', 'KY', 'RI', 'NH', 'MO', 'ME', 'VT', 'GA', 'GU', 'AS', 'NY', 'CA', 'HI', 'IL', 'TN',
'MA', 'OH', 'MD', 'MI', 'WY', 'WA', 'OR', 'MH', 'SC', 'IN', 'LA', 'MP', 'DC', 'MT', 'AR', 'WV', 'TX']
def __init__(self,name, street_address, city, state, zip_code):
self._name = name
self._street_address = street_address
self._city = city
self._state = state
self._zip_code = zip_code
#property
def name(self):
return self._name.title()
#property
def state(self):
return self._state
#state.setter
def state(self,value):
if value in self.states:
self._state = value
else:
raise ### This is where I am stuck
do I need to create a new #property for StateError, or should I work it into state def somehow.
You need to raise a StateError exception; that is all:
#state.setter
def state(self,value):
if value not in self.states:
raise StateError(value)
self._state = value
This does require you to have defined the exception class first, of course:
class StateError(Exception):
"""Invalid state value used"""
Demo:
>>> class StateError(Exception): pass
...
>>> class Address(object):
... states = ['IA', 'KS', 'UT', 'VA', 'NC', 'NE', 'SD', 'AL', 'ID', 'FM', 'DE', 'AK', 'CT', 'PR', 'NM', 'MS', 'PW', 'CO', 'NJ', 'FL', 'MN',
... 'VI', 'NV', 'AZ', 'WI', 'ND', 'PA', 'OK', 'KY', 'RI', 'NH', 'MO', 'ME', 'VT', 'GA', 'GU', 'AS', 'NY', 'CA', 'HI', 'IL', 'TN',
... 'MA', 'OH', 'MD', 'MI', 'WY', 'WA', 'OR', 'MH', 'SC', 'IN', 'LA', 'MP', 'DC', 'MT', 'AR', 'WV', 'TX']... #property
... def state(self):
... return self._state
... #state.setter
... def state(self,value):
... if value not in self.states:
... raise StateError(value)
... self._state = value
...
>>> a = Address()
>>> a.state = 'VA'
>>> a.state = 'Nonesuch'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 11, in state
__main__.StateError: Nonesuch

Categories

Resources