I'm working on a project that deciphers the secret sentence.
when I input
apple.appleapple.pear.orange.lemon
I want it to change into
A.B.E.R.T
I used split and replace to do it. However, I can't find a way to change
"apple" into A and
"appleapple" into B
at the same time because when I use replace(), appleapple changes into AA
This is what I've tried.
list1 = n.split()
list2 = f's.split([\\.]) : {list1}'
print(list2.replace("apple", "A"))
print(list2.replace("appleapple", "B"))
print(list2)
You can use a dictionary instead
secret_dict = {'apple':'A','appleapple':'B','pear':'E','orange':'R','lemon':'T'}
n = 'apple.appleapple.pear.orange.lemon'
words_in_n=n.split('.')
resulting_secret_words = [secret_dict.get(word) for word in resulting_secret_words]
secret_sentence = ''.join(resulting_secret_words)
print(secret_sentence)
I think that in this case you should make use of dictionaries instead of replacing all the time. You will be thankful to do so if the 'vocabulary' of your project increases. I would do this:
dictionary = {
'apple': 'A',
'appleapple': 'B',
'pear': 'E',
'orange': 'R',
'lemon': 'T'
}
original = 'apple.appleapple.pear.orange.lemon'
words_list = original.split('.')
result = [dictionary.get(word, 'unknown') for word in words_list]
result = '.'.join(result)
print(result)
The above would print this:
A.B.E.R.T
Note the use of the dictionary's get() method to add a default value if the read word is not found in your vocabulary. For example, with the same dictionary and the string apple.appleapple.pear.orange.lemon.otherthing (I added 'otherthing' at the end) we would have as result the string A.B.E.R.T.unknown.
Related
I am new to regex module and learning a simple case to extract key and values from a simple dictionary.
the dictionary can not contain nested dicts and any lists, but may have simple tuples
MWE
import re
# note: the dictionary are simple and does NOT contains list, nested dicts, just these two example suffices for the regex matching.
d = "{'a':10,'b':True,'c':(5,'a')}" # ['a', 10, 'b', True, 'c', (5,'a') ]
d = "{'c':(5,'a'), 'd': 'TX'}" # ['c', (5,'a'), 'd', 'TX']
regexp = r"(.*):(.*)" # I am not sure how to repeat this pattern separated by ,
out = re.match(regexp,d).groups()
out
You should not use regex for this job. When the input string is valid Python syntax, you can use ast.literal_eval.
Like this:
import ast
# ...
out = ast.literal_eval(d)
Now you have a dictionary object in Python. You can for instance get the key/value pairs in a (dict_items) list:
print(out.items())
Regex
Regex is not the right tool. There will always be cases where some boundary case will be wrongly parsed. But to get the repeated matches, you can better use findall. Here is a simple example regex:
regexp = r"([^{\s][^:]*):([^:}]*)(?:[,}])"
out = re.findall(regexp, d)
This will give a list of pairs.
Regex would be hard (perhaps impossible, but I'm not versed enough to say confidently) to use because of the ',' nested in your tuples. Just for the sake of it, I wrote (regex-less) code to parse your string for separators, ignoring parts inside parentheses:
d = "{'c':(5,'a',1), 'd': 'TX', 1:(1,2,3)}"
d=d.replace("{","").replace("}","")
indices = []
inside = False
for i,l in enumerate(d):
if inside:
if l == ")":
inside = False
continue
continue
if l == "(":
inside = True
continue
if l in {":",","}:
indices.append(i)
indices.append(len(d))
parts = []
start = 0
for i in indices:
parts.append(d[start:i].strip())
start = i+1
parts
# ["'c'", "(5,'a',1)", "'d'", "'TX'", '1', '(1,2,3)']
Good day I just want to understand the logic behind this code
lst = []
word = "ABCD"
lst[:0] = word
print(lst)
OUTPUT: ['A', 'B', 'C', 'D'] why not ['ABCD'] how?
for i in word: # this code I understand it's looping through the string
lst.append(i) # then appending to list
but the first code above I don't get the logic.
lst[:0] = ... is implemented by lst.__setitem__(slice(0, None, 0), ...), where ... is an arbitrary iterable.
The resulting slice is the empty list at the beginning of lst (though in this case, it doesn't really matter since lst is empty), so each element of ... is inserted into lst, starting at the beginning.
You can see this starting with a non-empty list.
>>> lst = [1,2,3]
>>> word = "ABCD"
>>> lst[:0] = word
>>> lst
['A', 'B', 'C', 'D', 1, 2, 3]
To get lst == ['ABCD'], you need to make the right-hand side an iterable containing the string:
lst[:0] = ('ABCD', ) # ['ABCD'] would also work.
Actually it's a well known way to convert string to a list character by character
you can find here -> https://www.geeksforgeeks.org/python-program-convert-string-list/
if you wanna try to get your list element like 'ABCD' then try
lst[:0] = [word,]
by doing that you specify that you need whole word as an element
I need to find the longest string in a list for each letter of the alphabet.
My first straight forward approach looked like this:
alphabet = ["a","b", ..., "z"]
text = ["ane4", "anrhgjt8", "andhjtje9", "ajhe5", "]more_crazy_words"]
result = {key:"" for key in alphabet} # create a dictionary
# go through all words, if that word is longer than the current longest, save it
for word in text:
if word[0].lower() in alphabet and len(result[word[0].lower()]) < len(word):
result[word[0].lower()] = word.lower()
print(result)
which returns:
{'a': 'andhjtje9'}
as it is supposed to do.
In order to practice dictionary comprehension I tried to solve this in just one line:
result2 = {key:"" for key in alphabet}
result2 = {word[0].lower(): word.lower() for word in text if word[0].lower() in alphabet and len(result2[word[0].lower()]) < len(word)}
I just copied the if statement into the comprehension loop...
results2 however is:
{'a': 'ajhe5'}
can someone explain me why this is the case? I feel like I did exactly the same as in the first loop...
Thanks for any help!
List / Dict / Set - comprehension can not refer to themself while building itself - thats why you do not get what you want.
You can use a complicated dictionary comprehension to do this - with help of collections.groupby on a sorted list this could look like this:
from string import ascii_lowercase
from itertools import groupby
text = ["ane4", "anrhgjt8", "andhjtje9", "ajhe5", "]more_crazy_words"]
d = {key:sorted(value, key=len)[-1]
for key,value in groupby((s for s in sorted(text)
if s[0].lower() in frozenset(ascii_lowercase)),
lambda x:x[0].lower())}
print(d) # {'a': 'andhjtje9'}
or
text = ["ane4", "anrhgjt8", "andhjtje9", "ajhe5", "]more_crazy_words"]
d = {key:next(value) for key,value in groupby(
(s for s in sorted(text, key=lambda x: (x[0],-len(x)))
if s[0].lower() in frozenset(ascii_lowercase)),
lambda x:x[0].lower())}
print(d) # {'a': 'andhjtje9'}
or several other ways ... but why would you?
Having it as for loops is much cleaner and easier to understand and would, in this case, follow the zen of python probably better.
Read about the zen of python by running:
import this
I have a list of strings (from a .tt file) that looks like this:
list1 = ['have\tVERB', 'and\tCONJ', ..., 'tree\tNOUN', 'go\tVERB']
I want to turn it into a dictionary that looks like:
dict1 = { 'have':'VERB', 'and':'CONJ', 'tree':'NOUN', 'go':'VERB' }
I was thinking of substitution, but it doesn't work that well. Is there a way to tag the tab string '\t' as a divider?
Try the following:
dict1 = dict(item.split('\t') for item in list1)
Output:
>>>dict1
{'and': 'CONJ', 'go': 'VERB', 'tree': 'NOUN', 'have': 'VERB'}
Since str.split also splits on '\t' by default ('\t' is considered white space), you could get a functional approach by feeding dict with a map that looks quite elegant:
d = dict(map(str.split, list1))
With the dictionary d now being in the wanted form:
print(d)
{'and': 'CONJ', 'go': 'VERB', 'have': 'VERB', 'tree': 'NOUN'}
If you need a split only on '\t' (while ignoring ' ' and '\n') and still want to use the map approach, you can create a partial object with functools.partial that only uses '\t' as the separator:
from functools import partial
# only splits on '\t' ignoring new-lines, white space e.t.c
tabsplit = partial(str.split, sep='\t')
d = dict(map(tabsplit, list1))
this, of course, yields the same result for d using the sample list of strings.
do that with a simple dict comprehension and a str.split (without arguments strip splits on blanks)
list1 = ['have\tVERB', 'and\tCONJ', 'tree\tNOUN', 'go\tVERB']
dict1 = {x.split()[0]:x.split()[1] for x in list1}
result:
{'and': 'CONJ', 'go': 'VERB', 'tree': 'NOUN', 'have': 'VERB'}
EDIT: the x.split()[0]:x.split()[1] does split twice, which is not optimal. Other answers here do it better without dict comprehension.
A short way to solve the problem, since split method splits '\t' by default (as pointed out by Jim Fasarakis-Hilliard), could be:
dictionary = dict(item.split() for item in list1)
print dictionary
I also wrote down a more simple and classic approach.
Not very pythonic but easy to understand for beginners:
list1 = ['have\tVERB', 'and\tCONJ', 'tree\tNOUN', 'go\tVERB']
dictionary1 = {}
for item in list1:
splitted_item = item.split('\t')
word = splitted_item[0]
word_type = splitted_item[1]
dictionary1[word] = word_type
print dictionary1
Here I wrote the same code with very verbose comments:
# Let's start with our word list, we'll call it 'list1'
list1 = ['have\tVERB', 'and\tCONJ', 'tree\tNOUN', 'go\tVERB']
# Here's an empty dictionary, 'dictionary1'
dictionary1 = {}
# Let's start to iterate using variable 'item' through 'list1'
for item in list1:
# Here I split item in two parts, passing the '\t' character
# to the split function and put the resulting list of two elements
# into 'splitted_item' variable.
# If you want to know more about split function check the link available
# at the end of this answer
splitted_item = item.split('\t')
# Just to make code more readable here I now put 1st part
# of the splitted item (part 0 because we start counting
# from number 0) in "word" variable
word = splitted_item[0]
# I use the same apporach to save the 2nd part of the
# splitted item into 'word_type' variable
# Yes, you're right: we use 1 because we start counting from 0
word_type = splitted_item[1]
# Finally I add to 'dictionary1', 'word' key with a value of 'word_type'
dictionary1[word] = word_type
# After the for loop has been completed I print the now
# complete dictionary1 to check if result is correct
print dictionary1
Useful links:
You can quickly copy and paste this code here to check how it works and tweak it if you like: http://www.codeskulptor.com
If you want to learn more about split and string functions in general: https://docs.python.org/2/library/string.html
i'm having a little problem with an exercise i have to do :
Basically the assignment is to open an url, convert it into a given format, and count the number of occurrences of given strings in the text.
import urllib2 as ul
def word_counting(url, code, words):
page = ul.urlopen(url)
text = page.read()
decoded = ext.decode(code)
result = {}
for word in words:
count = decoded.count(word)
counted = str(word) + ":" + " " + str(count)
result.append(counted)
return finale
The result i should get is like " word1: x, word2: y, word3: z " with x,y,z being the number of occurrences. But it seems that i only get ONE number, when i try to run the test program i get as result only like 9 for the first occurrences, 14 for the second list, 5 for the third, missing the other occurrences and the whole counted value.
What am i doing wrong? Thanks in advance
You're not appending to the dictionary correctly.
The correct way is result[key] = value.
So for your loop it would be
for word in words:
count = decoded.count(word)
result[word] = str(count)
An example without decode but using .count()
words = ['apple', 'apple', 'pear', 'banana']
result= {}
for word in words:
count = words.count(word)
result[word] = count
>>> result
>>> {'pear': 1, 'apple': 2, 'banana': 1}
Or you can use Collections.Counter :
>>> from collections import Counter
>>> words = ['apple', 'apple', 'pear', 'banana']
>>> Counter(words)
Counter({'apple': 2, 'pear': 1, 'banana': 1})
Don't forget about list and dictionary comprehensions. They can be quite efficient on larger sets of data (especially if you are analysing a large web-page in your example). At the end of the day, if your data set is small, one could argue that the dict comprehension syntax is cleaner/more pythonic etc.
So in this case I would use something like:
result = {word : decoded.count(word) for word in words}