KeyError: word fransız not in vocabulary - python
When I tried to run below code, I get keyerror:
KeyError: word fransız not in vocabulary.
What is the issue?
import numpy as np
from gensim.models import Word2Vec
from nltk.tokenize import sent_tokenize,word_tokenize
import string
text="Victor Marie Hugo, Romantik akıma bağlı Fransız şair, romancı ve oyun yazarı. En büyük ve ünlü Fransız yazarlardan biri kabul edilir. Hugo'nun Fransa'daki edebi ünü ilk olarak şiirlerinden sonra da romanlarından ve tiyatro oyunlarından gelir. Pek çok şiirinin içinde özellikle Les Contemplations ve La Légende des siècles büyük saygı görür. Fransa dışında en çok Sefiller ve Notre Dame'ın Kamburu romanlarıyla tanınır.Gençliğinde şiddetli bir kral yanlısı olsa da, görüşü yıllar içinde değişti ve tutkulu bir cumhuriyet destekçisi oldu. Eserleri zamanının politik ve sosyal sorunlarına ve de sanatsal akımlarına değinir. Hugo'nun cenazesi 1885'te Panthéon'da gömüldü. Hugo hakkında en çok eser yazılan ilk 100 kişi listesinde yer almaktadır. Victor Hugo, Joseph Léopold Sigisbert Hugo (1773–1828) ve Sophie Trébuchet (1772–1821) çiftinin üçüncü oğluydu; Abel Joseph Hugo (1798–1855) ve Eugène Hugo (1800–1837) isminde iki ağabeyi vardı. 1802'de Besançon'da doğdu. Napolyon'un bir kahraman olduğunu düşünen serbest fikirli bir cumhuriyetçiydi. Annesi 1812'de Napolyon'a karşı komplo kurduğu için idam edilen General Victor Lahorie ile sevgili olduğu düşünülen Katolik bir Kralcıydı.Hugo'nun çocukluğu ülkede siyasi karmaşıklığın olduğu bir dönemde geçti. Doğumundan iki yıl sonra Napolyon İmparator ilan edilmiş, 18 yaşındayken de Bourbon Monarşisi yeniden tahta geçirilmişti. Hugo'nun ailesinin ters dini ve politik görüşleri Fransa'da egemenlik mücadelesi veren kuvvetleri yansıtıyordu. Hugo'nun babası İspanya'da yenilene kadar orduda yüksek rütbeli bir subaydı.Babası subay olduğu sürece aile sık sık taşındı ve bu yolculuklar sırasında Hugo pek çok şey öğrendi. Çocukluğunda Napoli'ye giderken geniş Alpler'deki geçitleri ve karlı zirveleri, muhteşem Akdeniz mavisini ve şenlikler yapılan Roma'yı gördü. 5 yaşında olmasına rağmen bu 6 aylık geziyi her zaman aklında tuttu. Aile Napoli'de birkaç ay kalıp doğruca Paris'e döndü.Hugo'nun annesi Sophie evliliğinin başında kocasına İtalya (Leopold Napoli'ye yakın bir vilayette valiydi) ve İspanya'ya (üç vilayette görev almıştı) kadar eşlik etti. Askeri hayatın getirdiği yorucu yolculuklar ve kocasının inancının zayıflığı nedeniyle ters düşmelerinden dolayı Sophie 1803'te Leopold'dan bir süreliğine ayrılıp üç çocuğuyla Paris'e yerleşti. Bundan sonra Hugo'nun eğitimi ve yetişmesi üzerine eğildi. Bu yüzden Hugo'nun kariyerinin ilk dönemindeki şiir ve kurgu çalışmaları annesinin inancının ve krala bağlılığının yansımasıydı. Ama başını Fransa'daki 1848 Devrimi'nin çektiği olaylar sırasında Katolik Kralcı yanlısı eğitime başkaldırıp Cumhuriyetçiliği ve Özgür düşünceyi desteklemeye başladı.Gençliğinde aşık oldu ve annesinin isteklerine karşı gelip çocukluk arkadaşı Adèle Foucher (1803–1868) ile gizlice nişanlandı. Annesi ile yakın ilişkisinden dolayı Adèle ile evlenmek için annesinin ölümüne (1821) kadar bekledi ve 1822'de evlendi.Adèle ve Victor Hugo'nun ilk çocuğu Leopold 1823'te doğdu ama doğduktan kısa süre sonra öldü. Sonraki sene kızları 28 Ağustos 1824'te Léopoldine doğdu. Onu 4 Kasım 1826'da doğan Charles, 28 Ekim 1828'de doğan François-Victor, ve 24 Ağustos 1830'da doğan Adèle takip etti.Hugo'nun en büyük ve en sevdiği kızı Léopoldine, Charles Vacquerie ile evliliğinden kısa süre sonra 19 yaşındayken 1843'te öldü. 4 Eylül 1843'te Seine nehrinde boğuldu. Gemi alabaro olduğundan ağır eteği tarafından dibe doğru çekildi ve kocası Charles Vacquerie de onu kurtarmaya çalışırken öldü. O zaman metresi ile Fransa'nın güneyinde seyahat etmekte olan Hugo kızının ölümünü oturduğu cafede okuduğu bir gazeteden öğrendi. Kızının ölümü Hugo'yu oldukça harap etti.III. Napolyon'un 1851 yılının sonundaki askeri darbesi sebebiyle sürgüne çıktı. Fransa'dan ayrıldıktan sonra, Channel Adaları'na gitmeden önce kısa bir süre Brüksel'de yaşadı. 1852'den 1855'e kadar Jersey'de yaşadı. 1855'te 15 yıl yaşayacağı Guernsey'e taşındı. III. Napolyon 1859'da genel af ilan ettiğinde ülkesine dönme fırsatı elde ettiyse de sürgünde kalmayı tercih etti. Kaybedilen Fransa-Prusya Savaşı'nın sonucu olarak III. Napolyon iktidardan çekilmek zorunda kalınca ülkesine döndü. Paris Kuşatması'ndan sonra hayatının geri kalanını Fransa'da geçirmek için geri dönmeden önce tekrar Guernsey'e taşınıp 1872 ve 1873 arası orada kaldı. Hugo ilk romanını (Han d'Islande, 1823) evliliğinden bir yıl sonra yayımladı. Üç yıl sonra da ikinci romanı (Bug-Jargal, 1826) basıldı. 1829 ve 1840 arasında zamanının en iyi şairlerinden biri olarak ününü pekiştiren beş şiir kitabı (Les Orientales, 1829; Les Feuilles d'automne, 1831; Les Chants du crépuscule, 1835; Les Voix intérieures, 1837; ve Les Rayons et les ombres, 1840) yayınladı."
punctuations = ",;:()[]/{}''"
sentence="!.?"
no_punct = ""
for char in text:
if char not in punctuations:
no_punct = no_punct + char
t_sen = ""
for char in no_punct:
if char in sentence:
t_sen = no_punct.split(char)
corpus=[]
for cumle in t_sen:
corpus.append(cumle.split())
model=Word2Vec(corpus,size=30,window=5,min_count=5,sg=1)
model.wv.most_similar('fransız')
I am wondering if your model returns anything for 'Fransız':
model.wv.most_similar('Fransız')
You are not doing any preprocessing on the input vocabulary so I don't think you can expect to find words that differ in casing (e.g. as in your case - lowercase word vs. a capitalized one).
Another reason (thank you for suggestion, #gojomo) - might be the min_count paramter. Here it is 5 which sets the threshold above the count of the words in the text 3 (including both lowercase and capitalized version).
Related
AttributeError: 'ChatBot' object has no attribute 'input'
I'm having trouble finding the error in my code: from chatterbot import ChatBot from chatterbot.trainers import ChatterBotCorpusTrainer from chatterbot.comparisons import JaccardSimilarity from chatterbot.comparisons import LevenshteinDistance from chatterbot.conversation import Statement import nltk nltk.download('stopwords') nltk.download('punkt') nltk.download('averaged_perceptron_tagger') nltk.download('wordnet') #Creo una instancia de la clase ChatBot chatbot = ChatBot( 'Jazz', storage_adapter='chatterbot.storage.SQLStorageAdapter', database='./database.sqlite5', #fichero de la base de datos (si no existe se creará automáticamente) input_adapter='chatterbot.input.TerminalAdapter', #indica que la pregunta se toma del terminal output_adapter='chatterbot.output.TerminalAdapter', #indeica que la respuesta se saca por el terminal trainer='chatterbot.trainers.ListTrainer', #Un Logic_adapter es una clase que devuelve una respuesta ante una pregunta dada. #Se pueden usar tantos logic_adapters como se quiera logic_adapters=[ #'chatterbot.logic.MathematicalEvaluation', #Este es un logic_adapter que responde preguntas sobre matemáticas en inglés #'chatterbot.logic.TimeLogicAdapter', #Este es un logic_adapter que responde preguntas sobre la hora actual en inglés { "import_path": "chatterbot.logic.BestMatch", "statement_comparison_function": "chatterbot.comparisons.levenshtein_distance", "response_selection_method": "chatterbot.response_selection.get_most_frequent_response" } #{ # 'import_path': 'chatterbot.logic.LowConfidenceAdapter', # 'threshold': 0.51, # 'default_response': 'Disculpa, no te he entendido bien. ¿Puedes ser más específico?.' #}, #{ # 'import_path': 'chatterbot.logic.SpecificResponseAdapter', # 'input_text': 'Eso es todo', # 'output_text': 'Perfecto. Hasta la próxima' #}, ], preprocessors=[ 'chatterbot.preprocessors.clean_whitespace' ], #read_only=True, ) trainer = ChatterBotCorpusTrainer(chatbot) trainer.train("chatterbot.corpus.spanish") trainer.train("./PreguntasYRespuestas.yml") #chatbot.train([ # '¿Cómo estás?', # 'Bien.', # 'Me alegro.', # 'Gracias.', # 'De nada.', # '¿Y tú?' #]) levenshtein_distance = LevenshteinDistance(None) disparate=Statement('No te he entendido')#convertimos una frase en un tipo statement entradaDelUsuario="" #variable que contendrá lo que haya escrito el usuario entradaDelUsuarioAnterior="" while entradaDelUsuario!="adios": entradaDelUsuario = chatbot.input.process_input_statement() #leemos la entrada del usuario statement, respuesta = chatbot.generate_response(entradaDelUsuario) if levenshtein_distance.compare(entradaDelUsuario,disparate)>0.51: print('¿Qué debería haber dicho?') entradaDelUsuarioCorreccion = chatbot.input.process_input_statement() chatbot.train([entradaDelUsuarioAnterior.text,entradaDelUsuarioCorreccion.text]) print("He aprendiendo que cuando digas {} debo responder {}".format(entradaDelUsuarioAnterior.text,entradaDelUsuarioCorreccion.text)) entradaDelUsuarioAnterior=entradaDelUsuario print("\n%s\n\n" % respuesta) I have tried to follow the tutorial, I am new to pyton and I would like you to help me find the error since the following appears when compiling: AttributeError: 'ChatBot' object has no attribute 'input'
How to convert complex nested JSON data to a Pandas dataframe?
Here is the part of JSON data which I need to flatten completely in Pandas dataframe in order to use the data to my needs. {"meta":{"status":"COMPLETED","start_date":"2021-02-18T00:00:00.000Z","end_date":"2021-02-18T23:59:59.999Z","count":185},"data":[{"search_id":11214891,"document_id":"1RfsisOXR6qNW81IR6C9BnYXYzU","document_publish_date":"2021-02-18T17:45:21.231Z","document_url":"https://www.challenges.fr/immobilier/paris-remporte-son-bras-de-fer-face-a-airbnb_751784","document_authors":[{"name":"Nicolas Meunier"}],"document_title":"Paris remporte son bras de fer face à Airbnb","document_sentiment":"neutral","document_language_code":"fr","document_key_phrases":["locations touristiques","résidence secondaire","Cour de cassation","pénurie de logements","local commercial","plateforme","importante victoire","Ville de Paris","logement","changement d'usage très contraignant","décision","autorisation municipale","Paris","locations de résidences principales","bailleur","réglementation","Airbnb","place des mécanismes de compensation similaires","dispositif d'autorisation préalable","autres villes françaises","compensation","millions d'euros d'amendes","ville","euros"],"source_name":"Challenges.fr","source_country_code":"fr","source_subregion":"Undef","source_information_type":"news","source_reach":2095836,"source_reach_desktop":661063,"source_reach_mobile":1434773,"source_ave":19386.48,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[]},{"name":"booking","occurrences":[{"endOffset":716,"name":"Booking.com","sentiment":{"discrete":"POS","numeric":0.9235276019904946,"polarities":[{"label":"NEU","score":0.07647239800950542},{"label":"POS","score":0.9235276019904946},{"label":"NEG","score":0}]},"startOffset":705}]}]},{"search_id":11214891,"document_id":"lwfznm8K0yV2TGSY8sFO5E2ZazQ","document_publish_date":"2021-02-18T16:45:21.487Z","document_url":"https://fr.finance.yahoo.com/actualites/paris-remporte-bras-fer-face-174521120.html","document_authors":[],"document_title":"Paris remporte son bras de fer face à Airbnb","document_sentiment":"neutral","document_language_code":"fr","document_key_phrases":["Cour de cassation","locations touristiques","Airbnb","local commercial","réglementation conforme","Ville de Paris","résidences secondaires","Paris","autorisation municipale","bras de fer face","nuitées","changement d'usage très contraignant","locations de résidences principales","pénurie de logements","plateforme","dispositif d'autorisation préalable","bailleur","bras","appartement","logement","durée inférieure","courte durée","suite","Alexandre Bompard","génération de managers Bompard"],"source_name":"Yahoo Finance France","source_country_code":"fr","source_subregion":"Undef","source_information_type":"news","source_reach":306865,"source_reach_desktop":306865,"source_reach_mobile":0,"source_ave":2838.5,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[]},{"name":"booking","occurrences":[{"endOffset":820,"name":"Booking.com","sentiment":{"discrete":"POS","numeric":0.5745349675201228,"polarities":[{"label":"NEU","score":0.4254650324798772},{"label":"POS","score":0.5745349675201228},{"label":"NEG","score":0}]},"startOffset":809}]}]},{"search_id":11214891,"document_id":"JEcrdtSVkSH6CmbwVZM9ifS4vmo","document_publish_date":"2021-02-18T16:23:29.593Z","document_url":"https://www.notretemps.com/high-tech/actualites/locations-touristiques-de-type-airbnb-afp-202102,i238140","document_authors":[{"name":"AFP"}],"document_title":"Locations touristiques de type Airbnb: les règles de la Ville de Paris validées par la justice française","document_sentiment":"neutral","document_language_code":"fr","document_key_phrases":["résidence secondaire","pénurie de logements","locations","local commercial","importante victoire","Ville de Paris","logement","changement d'usage très contraignant","décision","locations de résidences principales","plateforme","bailleur","autorisation municipale","Cour de cassation","millions d'euros d'amendes","Paris","place des mécanismes de compensation similaires","dispositif d'autorisation préalable","compensation","autres villes françaises","Airbnb","ville","euros","affaire","Ian Brossat","M. Brossat"],"source_name":"Notretemps.com","source_country_code":"fr","source_subregion":"Undef","source_information_type":"news","source_reach":2160908,"source_reach_desktop":849854,"source_reach_mobile":1311054,"source_ave":19988.4,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[]},{"name":"booking","occurrences":[{"endOffset":719,"name":"Booking.com","sentiment":{"discrete":"POS","numeric":0.9235276019904946,"polarities":[{"label":"NEU","score":0.07647239800950542},{"label":"POS","score":0.9235276019904946},{"label":"NEG","score":0}]},"startOffset":708}]}]},{"search_id":9086323,"document_id":"R8KxzowCyRQJGiy8h1mNJQiovhc","document_publish_date":"2021-02-18T23:35:19.169Z","document_url":"https://news.mt.co.kr/mtview.php?no=2021021014402985119&VN","document_authors":[],"document_title":"“송도에 가면 세계로 통한다”…스타트업 ‘글로벌 등용문’ 활짝","document_sentiment":"neutral","document_language_code":"ko","document_key_phrases":["글로벌 멤버십","스타트업","미팅을","지난해","글로벌 등용문","해외진출 기회를","해외진출 가능성을","역할이 가장","이들 스타트업의 네트워크","해외진출 전략을","인천스타트업파크 신한스퀘어브릿지","글로벌 유니콘","디지털 글로벌","글로벌 멤버십에","글로벌 시장을","글로벌 프로그램","이중 글로벌","관련 업무나","대표적인 글로벌","위치한 인천스타트업파크","스타트업들은 글로벌","산업혁명 관련","미국의 대표적인","인천스타트업파크는 글로벌","인천 송도에","현지 기관의","현지 시장에","현지 액셀러레이터","구체화하는데 현지","국내외 투자유치부터","국내 대표","대표는","국내외 파트너와","외국생활 경험이"],"source_name":"Money Today","source_country_code":"kr","source_subregion":"Undef","source_information_type":"news","source_reach":8333732,"source_reach_desktop":4335311,"source_reach_mobile":3998421,"source_ave":77087.02,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"u9cTCKixVat-chth8e1m1gD3cfQ","document_publish_date":"2021-02-18T05:15:00.430Z","document_url":"https://www.excite.co.jp/news/article/Prtimes_2021-02-18-3373-218/","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["大手総合旅行ブランド","prtimes","国際比較調査","ステイホーム率","世界16地域","エクスペディア","有給休暇","新型コロナウイルス感染症","2020発表","コロナ禍"],"source_name":"エキサイト","source_country_code":"jp","source_subregion":"Tokyo","source_information_type":"news","source_reach":22563177,"source_reach_desktop":4572134,"source_reach_mobile":17991043,"source_ave":208709.39,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"KO9oHihWN_5IlSDu-7Jd78HV13Y","document_publish_date":"2021-02-18T05:16:00.026Z","document_url":"https://www.jiji.com/jc/article?k=000000218.000003373&g=prt","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","世界16地域","エクスペディア","有給休暇","日本人","コロナ禍","co","新型コロナウイルス","回答","休暇"],"source_name":"時事ドットコム","source_country_code":"jp","source_subregion":"Tokyo","source_information_type":"news","source_reach":13596998,"source_reach_desktop":5133231,"source_reach_mobile":8463767,"source_ave":125772.23,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":2724,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEG","score":0},{"label":"NEU","score":0.49333071647751553}]},"startOffset":2717}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"WRx-0Uu3IuIPKRCh6mySeZAOCqo","document_publish_date":"2021-02-18T05:28:35.955Z","document_url":"https://president.jp/ud/pressrelease/602df8337765611611000000","document_authors":[],"document_title":"[エクスペディア・ジャパン]\nエクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","エクスペディア","有給休暇","日本人","エクスペディア・ジャパン","コロナ禍","co","新型コロナウイルス","回答","休暇"],"source_name":"President Online","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":12523412,"source_reach_desktop":2317254,"source_reach_mobile":10206158,"source_ave":115841.56,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":2722,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEU","score":0.49333071647751553},{"label":"NEG","score":0}]},"startOffset":2715}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"0drNNEKrgZq-6m6H7kP49NbB_RI","document_publish_date":"2021-02-18T06:05:47.612Z","document_url":"https://news.infoseek.co.jp/article/prtimes_000000218_000003373/","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","prtimes","エクスペディア","有給休暇","日本人","png","https","コロナ禍","co","休暇"],"source_name":"Infoseekニュース","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":6261392,"source_reach_desktop":1835452,"source_reach_mobile":4425940,"source_ave":57917.88,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":3874,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEU","score":0.49333071647751553},{"label":"NEG","score":0}]},"startOffset":3867}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"CaAjnxIGRU5UVpSPg4lGTuMqa10","document_publish_date":"2021-02-18T02:00:15.567Z","document_url":"https://news.nicovideo.jp/watch/nw8958960","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","大手総合旅行ブランド","世界16地域","エクスペディア","有給休暇","コロナ禍","co","新型コロナウイルス","回答","休暇"],"source_name":"ニコニコニュース","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":7830856,"source_reach_desktop":1193706,"source_reach_mobile":6637150,"source_ave":72435.42,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":3031,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEU","score":0.49333071647751553},{"label":"NEG","score":0}]},"startOffset":3024}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"kBqtzavznHaMksy0s8dSzYITD-I","document_publish_date":"2021-02-18T05:55:58.996Z","document_url":"https://toyokeizai.net/ud/pressrelease/602dfc4b7765619938420000","document_authors":[],"document_title":"[ エクスペディア・ジャパン ]\n \nエクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","prtimes","世界16地域","エクスペディア","有給休暇","日本人","png","https","コロナ禍","休暇"],"source_name":"東洋経済オンライン","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":25423431,"source_reach_desktop":4800460,"source_reach_mobile":20622971,"source_ave":235166.74,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":3900,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEG","score":0},{"label":"NEU","score":0.49333071647751553}]},"startOffset":3893}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"fIwnWI8wEJcnO6UDOMQWNUoWeXA","document_publish_date":"2021-02-18T05:57:28.940Z","document_url":"https://news.jorudan.co.jp/docs/news/detail.cgi?newsid=PT000218A000003373","document_authors":[{"name":"ジョルダンソクラニュース"}],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下 - ジョルダンソクラニュース","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","prtimes","エクスペディア","有給休暇","日本人","png","https","コロナ禍","co","休暇"],"source_name":"ジョルダンニュース!","source_country_code":"jp","source_subregion":"Tokyo","source_information_type":"news","source_reach":101943,"source_reach_desktop":19745,"source_reach_mobile":82198,"source_ave":942.97,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":3873,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEU","score":0.49333071647751553},{"label":"NEG","score":0}]},"startOffset":3866}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"3XPYggMo53SiEPgKv7cr32ZVPt0","document_publish_date":"2021-02-18T02:00:31.314Z","document_url":"https://www.fnn.jp/articles/-/145813","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","世界16地域","エクスペディア","有給休暇","日本人","コロナ禍","co","新型コロナウイルス","回答","休暇"],"source_name":"FNN.jp","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":19104460,"source_reach_desktop":2316738,"source_reach_mobile":16787722,"source_ave":176716.26,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":2733,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEU","score":0.49333071647751553},{"label":"NEG","score":0}]},"startOffset":2726}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"kwo3oWt-cwP5CCIoWZz05Jn8vX8","document_publish_date":"2021-02-18T06:02:51.630Z","document_url":"https://straightpress.jp/company_news/detail?pr=000000218.000003373","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","prtimes","エクスペディア","有給休暇","日本人","https","コロナ禍","co","新型コロナウイルス","休暇"],"source_name":"Straight Press","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":149829,"source_reach_desktop":18477,"source_reach_mobile":131352,"source_ave":1385.92,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":3108,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEU","score":0.49333071647751553},{"label":"NEG","score":0}]},"startOffset":3101}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"YjUlh1lkUO5hkoQrP84lul8wdMU","document_publish_date":"2021-02-18T05:49:00.620Z","document_url":"https://www.sankei.com/economy/news/210218/prl2102180449-n1.html","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","世界16地域","エクスペディア","有給休暇","日本人","コロナ禍","co","新型コロナウイルス","回答","休暇"],"source_name":"産経ニュース","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":13170608,"source_reach_desktop":3142330,"source_reach_mobile":10028278,"source_ave":121828.12,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":2736,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEG","score":0},{"label":"NEU","score":0.49333071647751553}]},"startOffset":2729}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"SGVwE1byjL_CaH5jVYxR0mpZiG8","document_publish_date":"2021-02-18T08:00:29.479Z","document_url":"https://dime.jp/company_news/detail/?pr=776376","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","prtimes","エクスペディア","有給休暇","日本人","png","https","コロナ禍","co","休暇"],"source_name":"#DIME","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":7164764,"source_reach_desktop":1750326,"source_reach_mobile":5414438,"source_ave":66274.07,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":3862,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEG","score":0},{"label":"NEU","score":0.49333071647751553}]},"startOffset":3855}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"rCTpBa-tLgT3iqdRrKAYCZk_JwM","document_publish_date":"2021-02-18T05:16:59.099Z","document_url":"https://news.biglobe.ne.jp/economy/0218/prt_210218_7925462506.html","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","prtimes","エクスペディア","有給休暇","日本人","png","https","コロナ禍","co","休暇"],"source_name":"Biglobe ニュース","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":7778695,"source_reach_desktop":318393,"source_reach_mobile":7460302,"source_ave":71952.93,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":4201,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEG","score":0},{"label":"NEU","score":0.49333071647751553}]},"startOffset":4194}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"tK9pxNsmQ2V8vQXt4TAA3SxSIyY","document_publish_date":"2021-02-18T02:00:09.254Z","document_url":"https://prtimes.jp/main/html/rd/p/000000218.000003373.html","document_authors":[{"name":"エクスペディア・ジャパン"}],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","エクスペディア","有給休暇","有給休暇取得日数","日本人","コロナ禍","co","新型コロナウイルス","回答","休暇"],"source_name":"PR TIMES","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":16474017,"source_reach_desktop":4172239,"source_reach_mobile":12301778,"source_ave":152384.66,"document_social_echo_twitter":9,"document_social_echo_facebook":18,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":3252,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEU","score":0.49333071647751553},{"label":"NEG","score":0}]},"startOffset":3245}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"tjm4T568m5J5pYUPWqD5wqJCqSI","document_publish_date":"2021-02-18T09:24:31.845Z","document_url":"https://gendai.ismedia.jp/ud/pressrelease/602dfc3b77656191f7460000","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","世界16地域","エクスペディア","有給休暇","日本人","コロナ禍","co","新型コロナウイルス","回答","休暇"],"source_name":"現代ビジネス","source_country_code":"jp","source_subregion":"Tokyo","source_information_type":"news","source_reach":13628949,"source_reach_desktop":4326055,"source_reach_mobile":9302894,"source_ave":126067.78,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":2709,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEU","score":0.49333071647751553},{"label":"NEG","score":0}]},"startOffset":2702}]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"-pGOLA8nUcxfC91Hn_xiCv1WZKg","document_publish_date":"2021-02-18T07:05:35.347Z","document_url":"https://dime.jp/genre/1084979/","document_authors":[],"document_title":"定着するか?コロナ禍で注目される旅の新形態「ステイケーション」","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["ステイケーション","エクスペディア","予約","ランキング","旅行","海外旅行","旅行スタイル","ランクイン","ホテル予約","旅行予約"],"source_name":"#DIME","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":7164764,"source_reach_desktop":1750326,"source_reach_mobile":5414438,"source_ave":66274.07,"document_social_echo_twitter":5,"document_social_echo_facebook":1,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[]},{"name":"booking","occurrences":[]}]},{"search_id":9086309,"document_id":"xeNqsejhJ2mtqT9SdEAk0U01CII","document_publish_date":"2021-02-18T21:29:16.917Z","document_url":"https://news.toremaga.com/release/others/1784791.html","document_authors":[],"document_title":"エクスペディア 世界16地域 有給休暇・国際比較調査 2020発表!コロナ禍で有給休暇の取得が世界的に低下","document_sentiment":"neutral","document_language_code":"ja","document_key_phrases":["expedia","prtimes","世界16地域","エクスペディア","有給休暇","日本人","png","https","コロナ禍","休暇"],"source_name":"とれまがニュース","source_country_code":"jp","source_subregion":"Undef","source_information_type":"news","source_reach":4714,"source_reach_desktop":4179,"source_reach_mobile":535,"source_ave":43.6,"document_social_echo_twitter":0,"document_social_echo_facebook":0,"document_social_echo_reddit":0,"goal":[{"name":"Appeal"},{"name":"Credibility"},{"name":"Fit"},{"name":"Purchase Intent"},{"name":"Trust"}],"entity_level_sentiment":[{"name":"expedia","occurrences":[{"endOffset":3988,"name":"Expedia","sentiment":{"discrete":"POS","numeric":0.5066692835224845,"polarities":[{"label":"POS","score":0.5066692835224845},{"label":"NEG","score":0},{"label":"NEU","score":0.49333071647751553}]},"startOffset":3981}]},{"name":"booking","occurrences":[]}]}
You can use pandas.read_json() (ref 1) or pandas.json_normalize() (ref 2) but you need to first correct your code to be a proper JSON.
Tweepy: bad Authentication Data
I've been trying to do some basic sentiment analysis on some tweets about La Sagrada Familia, and cannot for the life of me figure out why I get this basd authentication data error: Traceback (most recent call last): File "saDemo.py", line 15, in <module> public_tweets = api.search('') File "/Users/declancasey/opt/miniconda3/lib/python3.8/site-packages/tweepy/binder.py", line 252, in _call return method.execute() File "/Users/declancasey/opt/miniconda3/lib/python3.8/site-packages/tweepy/binder.py", line 234, in execute raise TweepError(error_msg, resp, api_code=api_error_code) tweepy.error.TweepError: [{'code': 215, 'message': 'Bad Authentication data.'}] I've seen other people have issues where it relates to the keys they're using, but I gave the original keys I used a few days in case it hadn't yet authenticated, but still get the same error. I've regenerated my keys several times over the past few days, messed with the formatting, tried commenting out different lines but keep getting this error. I'm using python 3.8 and am on Mac Big Sur, any help would be appreciated. My code is below: import tweepy from textblob import TextBlob consumer_key = "XXXX" consumer_secret = "XXXX" access_token = "XXXX" access_token_secret = "XXXX" auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) api = tweepy.API(auth) public_tweets = api.search("La Sagrada Familia") for tweet in public_tweets: print(tweet.text) analysis = TextBlob(tweet.text) print(analysis.sentiment)
I have used your code and could not reproduce the error: Python 3.7.4 I suggest you check your API keys by loggin into your Twitter developer account or try using a different key. import tweepy from textblob import TextBlob consumer_key = key[0] consumer_secret = key[1] access_token = key[2] access_token_secret = key[3] auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) api = tweepy.API(auth) public_tweets = api.search("La Sagrada Familia") for tweet in public_tweets: print(tweet.text) analysis = TextBlob(tweet.text) print(analysis.sentiment) This is the output I got: (I've removed the URLs from output) El Evangelio del dia 27 DE DICIEMBRE – DOMINGO -La Sagrada Familia: Jesús, María y José – Ciclo B SAN JUAN EVAN… Sentiment(polarity=0.0, subjectivity=0.0) Lecturas LSE: La Sagrada Familia (B), 27 de diciembre de 2020 a través de #YouTube Sentiment(polarity=0.0, subjectivity=0.0) LAUDES 2020/12/27 #LaudesFrayNelson para la Fiesta de la Sagrada Familia Sentiment(polarity=0.0, subjectivity=0.0) LECTIO 2020/12/27 LECTURA ESPIRITUAL. #LectioFrayNelson para la Fiesta de la Sagrada Familia Sentiment(polarity=0.0, subjectivity=0.0) RT #DespertaFerro11: Els que estigueu demà prop de la Sagrada Família, passeu a donar suport a #JuntsXCat! Es necessita el teu aval! #Junts… Sentiment(polarity=0.0, subjectivity=0.0) RT #HdadBorriquita: 🔴 ACTUALIDAD | La imagen de San Juan Evangelista se encuentra en el Altar Mayor de la Parroquia de San Agustín con moti… Sentiment(polarity=0.0, subjectivity=0.0) RT #alvpe: Basílica de la Sagrada Família, Barcelona. Sentiment(polarity=0.0, subjectivity=0.0) RT #DespertaFerro11: Els que estigueu demà prop de la Sagrada Família, passeu a donar suport a #JuntsXCat! Es necessita el teu aval! #Junts… Sentiment(polarity=0.0, subjectivity=0.0) RT #DespertaFerro11: Els que estigueu demà prop de la Sagrada Família, passeu a donar suport a #JuntsXCat! Es necessita el teu aval! #Junts… Sentiment(polarity=0.0, subjectivity=0.0) RT #bsainzm: hasta que una mujer alzó la mano. El padre agradeció a la mujer y pidió a la familia de migrantes que pasara al frente. La "fa… Sentiment(polarity=0.0, subjectivity=0.0) RT #DespertaFerro11: Els que estigueu demà prop de la Sagrada Família, passeu a donar suport a #JuntsXCat! Es necessita el teu aval! #Junts… Sentiment(polarity=0.0, subjectivity=0.0) RT #SemiCuenca: El Oratorio y la Capilla del Seminario se “visten” de Navidad. Sagrada Familia, sed guía y estímulo para padres e hijos. Qu… Sentiment(polarity=0.0, subjectivity=0.0) RT #DespertaFerro11: Els que estigueu demà prop de la Sagrada Família, passeu a donar suport a #JuntsXCat! Es necessita el teu aval! #Junts… Sentiment(polarity=0.0, subjectivity=0.0) La sagrada familia de Nazaret, la de Jesús, era a ojos de su tiempo una familia desestructurada: un hijo inesperado… Sentiment(polarity=0.0, subjectivity=0.0) RT #DespertaFerro11: Els que estigueu demà prop de la Sagrada Família, passeu a donar suport a #JuntsXCat! Es necessita el teu aval! #Junts… Sentiment(polarity=0.0, subjectivity=0.0)
Split a column in pandas twice to multiple columns
I have a column "Nome_propriedade" with complete addresses, such as establishment name, streets, neighborhood, city and state It always ends with the name of the city and state. With this pattern: Nome_propriedade "Rod. BR 386, bairro Olarias/Conventos, Lajeado/RS" "Fazenda da Várzea - zona rural, Serro/MG" "Cidade do Rock - Jacarepaguá, Rio de Janeiro/RJ" "Área de extração de carnaúba - Povoado Areal, zona rural, Santa Cruz do Piauí/PI" "Pastelaria - Av. Vicente de Carvalho, 995, Loja Q, Vila da Penha, Rio de Janeiro/RJ" I want to create two new columns, "city" and "state", and fill them with the last values found in column "Nome_propriedade". I also want to stip those away from Nome_propiedade. Nome_propriedade City State Rod. BR 386, bairro Olarias/Conventos Lajeado RS Fazenda da Várzea - zona rural Serro MG Cidade do Rock - Jacarepaguá... Rio de Janeiro RJ Área de extração de carnaúba - Povoado A... Santa Cruz do Piauí PI Pastelaria - Av. Vicente de Carvalho, 99... Rio de Janeiro RJ Please anyone know how I can create these two columns? I can not do a general split because I just want to separate the city and state information. Other information may remain unchanged.
What do you think about: import pandas as pd propiedades = ["Rod. BR 386, bairro Olarias/Conventos, Lajeado/RS", "Fazenda da Várzea - zona rural, Serro/MG", "Cidade do Rock - Jacarepaguá, Rio de Janeiro/RJ", "Área de extração de carnaúba - Povoado Areal, zona rural, Santa Cruz do Piauí/PI", "Pastelaria - Av. Vicente de Carvalho, 995, Loja Q, Vila da Penha, Rio de Janeiro/RJ"] df = pd.DataFrame({"Nome_propriedade":propiedades}) df[["City", "State"]] = df["Nome_propriedade"].apply(lambda x :x.split(",")[-1]).str.split("/", expand=True) UPDATE If you then want to delete these infos from Nome_propriedade you can add this line df["Nome_propriedade"] = df["Nome_propriedade"].apply(lambda x :",".join(x.split(",")[:-1]))
You need to split the string in the column by ,, takw the last element in the list and split it by /. That list is your two columns. pd.DataFrame(list(df['Nome_propriedade'].str.split(',').apply(lambda x: x[-1]).str.split('/')), columns=['city', 'state']) Output: city state 0 Lajeado RS 1 Serro MG 2 Rio de Janeiro RJ 3 Santa Cruz do Piauí PI 4 Rio de Janeiro RJ
Here is an effective solution avoiding the tedious apply and simply sticking with str-operations. df["Nome_propriedade"], x = df["Nome_propriedade"].str.rsplit(', ', 1).str df["City"], df['State'] = x.str.split('/').str Full example: import pandas as pd propiedades = [ "Rod. BR 386, bairro Olarias/Conventos, Lajeado/RS", "Fazenda da Várzea - zona rural, Serro/MG", "Cidade do Rock - Jacarepaguá, Rio de Janeiro/RJ", "Área de extração de carnaúba - Povoado Areal, zona rural, Santa Cruz do Piauí/PI", "Pastelaria - Av. Vicente de Carvalho, 995, Loja Q, Vila da Penha, Rio de Janeiro/RJ" ] df = pd.DataFrame({ "Nome_propriedade":propiedades }) df["Nome_propriedade"], x = df["Nome_propriedade"].str.rsplit(', ', 1).str df["City"], df['State'] = x.str.split('/').str # Stripping Nome_propriedade to len 40 to fit screen print(df.assign(Nome_propriedade=df['Nome_propriedade'].str[:40])) Returns: Nome_propriedade City State 0 Rod. BR 386, bairro Olarias/Conventos Lajeado RS 1 Fazenda da Várzea - zona rural Serro MG 2 Cidade do Rock - Jacarepaguá Rio de Janeiro RJ 3 Área de extração de carnaúba - Povoado A Santa Cruz do Piauí PI 4 Pastelaria - Av. Vicente de Carvalho, 99 Rio de Janeiro RJ If you'd like to keep the items: df["City"], df['State'] = df["Nome_propriedade"]\ .str.rsplit(', ', 1).str[-1]\ .str.split('/').str
The easiest approach I can see is, for a single example: example = 'some, stuff, here, city/state' elements = example.split(',') city, state = elements[-1].split('/') To apply this to the column in your dataframe: df['city_state'] = df.Nome_propriedade.apply(lambda r: r.split(',')[-1].split('/')) df['city'] = [cs[0] for cs in df['city_state']] df['state'] = [cs[1] for cs in df['city_state']] For example: example2 = 'another, thing here city2/state2' df = pd.DataFrame({'address': [example, example2], 'other': [1, 2]}) df['city_state'] = df.address.apply(lambda r: r.split()[-1].split('/')) df['city'] = [cs[0] for cs in df['city_state']] df['state'] = [cs[1] for cs in df['city_state']] df.drop(columns=['city_state'], inplace=True) print(df) # address other city state # 0 some, stuff, here, city/state 1 city state # 1 another, thing here city2/state2 2 city2 state2 Note: some of the other answers provide a more efficient way to unpack the result into your dataframe. I'll leave this here because I think breaking it out into steps is illustrative, but for efficiency sake, I'd go with one of the others.
Unicode elements in list save to file
I have two questions: 1) What I have done wrong in the script below? The result in not encoded propertly and all non standard characters are stored incorrectly. When I print out data list it gives me a proper list of unicode types: [u'Est-ce que tu peux traduire \xc3\xa7a pour moi? \n \n \n Can you translate this for me?'], [u'Chicago est tr\xc3\xa8s diff\xc3\xa9rente de Boston. \n \n \n Chicago is very different from Boston.'], After that I strip all extra spaces and next lines and result in file is like this (looks same when print and save to file): Est-ce que tu peux traduire ça pour moi?;Can you translate this for me? Chicago est très différente de Boston.;Chicago is very different from Boston. 2) What other than Python scripting langage would you recommend? import requests import unicodecsv, os from bs4 import BeautifulSoup import re import html5lib countries = ["fr"] #,"id","bn","my","chin","de","es","fr","hi","ja","ko","pt","ru","th","vi","zh"] for country in countries: f = open("phrase_" + country + ".txt","w") w = unicodecsv.writer(f, encoding='utf-8') toi = 1 print country while toi<2: url = "http://www.englishspeak.com/"+ country +"/english-phrases.cfm?newCategoryShowed=" + str(toi) + "&sortBy=28" r = requests.get(url) soup = BeautifulSoup(r.content, 'html5lib') soup.unicode [s.extract() for s in soup('script')] [s.extract() for s in soup('style')] [s.extract() for s in soup('head')] [s.extract() for s in soup("table" , { "height" : "102" })] [s.extract() for s in soup("td", { "class" : "copyLarge"})] [s.extract() for s in soup("td", { "width" : "21%"})] [s.extract() for s in soup("td", { "colspan" : "3"})] [s.extract() for s in soup("td", { "width" : "25%"})] [s.extract() for s in soup("td", { "class" : "blacktext"})] [s.extract() for s in soup("div", { "align" : "center"})] data = [] rows = soup.find_all('tr', {"class": re.compile("Data.")}) for row in rows: cols = row.find_all('td') cols = [ele.text.strip() for ele in cols] data.append([ele for ele in cols if ele]) wordsList = [] for index, item in enumerate(data): str_tmp = "".join(data[index]).encode('utf-8') str_tmp = re.sub(r' +\n\s+', ';', str_tmp) str_tmp = re.sub(r' +', ' ', str_tmp) wordsList.append(str_tmp.decode('utf-8')) print str_tmp w.writerow(wordsList) toi += 1
You should use r.text not r.content because content are the bytes and text is the decoded text: soup = BeautifulSoup(r.text, 'html5lib')
You can just write utf-8 encoded to file: with open("out.txt","w") as f: for d in data: d = " ".join(d).encode("utf-8") d = re.sub(r'\n\s+', ';', d) d = re.sub(r' +', ' ', d) f.write(d) Output: Fais attention en conduisant. ;Be careful driving.Fais attention. ;Be careful.Est-ce que tu peux traduire ça pour moi? ;Can you translate this for me?Chicago est très différente de Boston. ;Chicago is very different from Boston.Ne t'inquiète pas. ;Don't worry.Tout le monde le sais. ;Everyone knows it.Tout est prêt. ;Everything is ready.Excellent. ;Excellent.De temps en temps. ;From time to time.Bonne idée. ;Good idea.Il l'aime beaucoup. ;He likes it very much.A l'aide! ;Help!Il arrive bientôt. ;He's coming soon.Il a raison. ;He's right.Il est très ennuyeux. ;He's very annoying.Il est très célèbre. ;He's very famous.Comment ça va? ;How are you?Comment va le travail? ;How's work going?Dépêche-toi! ;Hurry!J'ai déjà mangé. ;I ate already.Je ne vous entends pas. ;I can't hear you.Je ne sais pas m'en servir. ;I don't know how to use it.Je ne l'aime pas. ;I don't like him.Je ne l'aime pas. ;I don't like it.Je ne parle pas très bien. ;I don't speak very well.Je ne comprends pas. ;I don't understand.Je n'en veux pas. ;I don't want it.Je ne veux pas ça. ;I don't want that.Je ne veux pas te déranger. ;I don't want to bother you.Je me sens bien. ;I feel good.Je sors du travail à six heures. ;I get off of work at 6.J'ai mal à la tête. ;I have a headache.J'espère que votre femme et vous ferez un bon voyage. ;I hope you and your wife have a nice trip.Je sais. ;I know.Je l'aime. ;I like her.J'ai perdu ma montre. ;I lost my watch.Je t'aime. ;I love you.J'ai besoin de changer de vêtements. ;I need to change clothes.J'ai besoin d'aller chez moi. ;I need to go home.Je veux seulement un en-cas. ;I only want a snack.Je pense que c'est bon. ;I think it tastes good.Je pense que c'est très bon. ;I think it's very good.Je pensais que les vêtements étaient plus chers. ;I thought the clothes were cheaper.J'allais quitter le restaurant quand mes amis sont arrivés. ;I was about to leave the restaurant when my friends arrived.Je voudrais faire une promenade. ;I'd like to go for a walk.Si vous avez besoin de mon aide, faites-le-moi savoir s'il vous plaît. ;If you need my help, please let me know.Je t'appellerai vendredi. ;I'll call you when I leave.Je reviendrai plus tard. ;I'll come back later.Je paierai. ;I'll pay.Je vais le prendre. ;I'll take it.Je t'emmenerai à l'arrêt de bus. ;I'll take you to the bus stop.Je suis un Américain. ;I'm an American.Je nettoie ma chambre. ;I'm cleaning my room.J'ai froid. ;I'm cold.Je viens te chercher. ;I'm coming to pick you up.Je vais partir. ;I'm going to leave.Je vais bien, et toi? ;I'm good, and you?Je suis content. ;I'm happy.J'ai faim. ;I'm hungry.Je suis marié. ;I'm married.Je ne suis pas occupé. ;I'm not busy.Je ne suis pas marié. ;I'm not married.Je ne suis pas encore prêt. ;I'm not ready yet.Je ne suis pas sûr. ;I'm not sure.Je suis désolé, nous sommes complets. ;I'm sorry, we're sold out.J'ai soif. ;I'm thirsty.Je suis très occupé. Je n'ai pas le temps maintenant. ;I'm very busy. I don't have time now.Est-ce que Monsieur Smith est un Américain? ;Is Mr. Smith an American?Est-ce que ça suffit? ;Is that enough?C'est plus long que deux kilomètres. ;It's longer than 2 miles.Je suis ici depuis deux jours. ;I've been here for two days.J'ai entendu dire que le Texas était beau comme endroit. ;I've heard Texas is a beautiful place.Je n'ai jamais vu ça avant. ;I've never seen that before.Juste un peu. ;Just a little.Juste un moment. ;Just a moment.Laisse-moi vérifier. ;Let me check.laisse-moi y réfléchir. ;Let me think about it.Allons voir. ;Let's go have a look.Pratiquons l'anglais. ;Let's practice English.Pourrais-je parler à madame Smith s'il vous plaît? ;May I speak to Mrs. Smith please?Plus que ça. ;More than that.Peu importe. ;Never mind.La prochaine fois. ;Next time.Non, merci. ;No, thank you.Non. ;No.N'importe quoi. ;Nonsense.Pas récemment. ;Not recently.Pas encore. ;Not yet.Rien d'autre. ;Nothing else.Bien sûr. ;Of course.D'accord. ;Okay.S'il vous plaît remplissez ce formulaire. ;Please fill out this form.S'il vous plaît emmenez-moi à cette adresse. ;Please take me to this address.S'il te plaît écris-le. ;Please write it down.Vraiment? ;Really?Juste ici. ;Right here.Juste là. ;Right there.A bientôt. ;See you later.A demain. ;See you tomorrow.A ce soir. ;See you tonight.Elle est jolie. ;She's pretty.Désolé de vous déranger. ;Sorry to bother you.Arrête! ;Stop!Tente ta chance. ;Take a chance.Réglez ça dehors. ;Take it outside.Dis-moi. ;Tell me.Merci Mademoiselle. ;Thank you miss.Merci Monsieur. ;Thank you sir.Merci beaucoup. ;Thank you very much.Merci. ;Thank you.Merci pour tout. ;Thanks for everything.Merci pour ton aide. ;Thanks for your help.Ça a l'air super. ;That looks great.Ça sent mauvais. ;That smells bad.C'est pas mal. ;That's alright.Ça suffit. ;That's enough.C'est bon. ;That's fine.C'est tout. ;That's it.Ce n'est pas juste. ;That's not fair.Ce n'est pas vrai. ;That's not right.C'est vrai. ;That's right.C'est dommage. ;That's too bad.C'est trop. ;That's too many.C'est trop. ;That's too much.Le livre est sous la table. ;The book is under the table.Ils vont revenir tout de suite. ;They'll be right back.Ce sont les mêmes. ;They're the same.Ils sont très occupés. ;They're very busy.Ça ne marche pas. ;This doesn't work.C'est très difficile. ;This is very difficult.C'est très important. ;This is very important.Essaie-le/la. ;Try it.Très bien, merci. ;Very good, thanks.Nous l'aimons beaucoup. ;We like it very much.Voudriez-vous prendre un message s'il vous plaît? ;Would you take a message please?Oui, vraiment. ;Yes, really.Vos affaires sont toutes là. ;Your things are all here.Tu es belle. ;You're beautiful.Tu es très sympa. ;You're very nice.Tu es très intelligent. ;You're very smart. Also you don't actually use the data in your list comps so they seem a little pointless: