Sum of Nested Lists for each key in dictionary - python

How do you get the sum of each nested list for each key in the dictionary below?
Let's say the following below is called msgs
I tried the following code:
I ended up getting the result:
It is almost right but for some reason the sum of the first nested list is incorrect, being 0 whereas it should be 19. I have a feeling this has to do with the total = 0 part in the above code I wrote but I am not sure if this is the case and I don't know how to fix the issue.
The way I got the values in the nested list was I summed the number of strings in each index of the nested list. So for instance, this here was for the first key. As you can see, there are 15 entries in the first one and 4 in the second one.
(this dictionary is called 'kakao' in my code)
{'Saturday, July 28, 2018': [['hey', 'ben', 'u her?', 'here?', 'ok so basically', 'farzam and avash dont wanna go to vegas', 'lol', 'im offering a spontaneous trip me and you to SF', 'lol otherwise ill just go back to LA', 'i mean sf is far but', 'i mean if u really wanna hhah', 'we could go and see chris', 'but otherwise its fine', 'alright send me the code too', 'im on my way right now'], ['Wtf is happening lol', '8 haha', 'Key is #8000', 'Hf']]}
The code I used to get the sums as a nested list was:

kakao = {'Saturday, July 28, 2018': [['hey', 'ben', 'u her?', 'here?', 'ok so basically', \
'farzam and avash dont wanna go to vegas', 'lol', 'im offering a spontaneous trip me and you to SF', \
'lol otherwise ill just go back to LA', 'i mean sf is far but', 'i mean if u really wanna hhah', \
'we could go and see chris', 'but otherwise its fine', 'alright send me the code too', 'im on my way right now'], \
['Wtf is happening lol', '8 haha', 'Key is #8000', 'Hf']],
'Friday, August 3, 2018': [['Someone', 'said', 'something'], ['Just', 'test']],}
print({key: [sum(map(lambda letters: len(letters), val))] for key, val in kakao.items()})
#the result --> {'Saturday, July 28, 2018': [19], 'Friday, August 3, 2018': [5]}
I guess you want to count the letters form the sentences at the same day, hope this code can help you.

Related

Remove numbers from list but not those in a string

I have a list of list as follows
list_1 = ['what are you 3 guys doing there on 5th avenue', 'my password is 5x35omega44',
'2 days ago I saw it', 'every day is a blessing',
' 345000 people have eaten here at the beach']
I want to remove 3, but not 5th or 5x35omega44. All the solutions I have searched for and tried end up removing numbers in an alphanumeric string, but I want those to remain as is. I want my list to look as follows:
list_1 = ['what are you guys doing there on 5th avenue', 'my password is 5x35omega44',
'days ago I saw it', 'every day is a blessing',
' people have eaten here at the beach']
I am trying the following:
[' '.join(s for s in words.split() if not any(c.isdigit() for c in s)) for words in list_1]
Use lookarounds to check if digits are not enclosed with letters or digits or underscores:
import re
list_1 = ['what are you 3 guys doing there on 5th avenue', 'my password is 5x35omega44',
'2 days ago I saw it', 'every day is a blessing',
' 345000 people have eaten here at the beach']
for l in list_1:
print(re.sub(r'(?<!\w)\d+(?!\w)', '', l))
Output:
what are you guys doing there on 5th avenue
my password is 5x35omega44
days ago I saw it
every day is a blessing
people have eaten here at the beach
Regex demo
One approach would be to use try and except:
def is_intable(x):
try:
int(x)
return True
except ValueError:
return False
[' '.join([word for word in sentence.split() if not is_intable(word)]) for sentence in list_1]
It sounds like you should be using regex. This will match numbers separated by word boundaries:
\b(\d+)\b
Here is a working example.
Some Python code may look like this:
import re
for item in list_1:
new_item = re.sub(r'\b(\d+)\b', ' ', item)
print(new_item)
I am not sure what the best way to handle spaces would be for your project. You may want to put \s at the end of the expression, making it \b(\d+)\b\s or you may wish to handle this some other way.
You can use isinstance(word, int) function and get a shorter way to do it, you could try something like this:
[' '.join([word for word in expression.split() if not isinstance(word, int)]) for expression in list_1]
>>>['what are you guys doing there on 5th avenue', 'my password is 5x35omega44',
'days ago I saw it', 'every day is a blessing', 'people have eaten here at the beach']
Combining the very helpful regex solutions provided, in a list comprehension format that I wanted, I was able to arrive at the following:
[' '.join([re.sub(r'\b(\d+)\b', '', item) for item in expression.split()]) for expression in list_1]

How can I detect multiple keywords in python string?

I am looking for a way to create several lists and for the keywords in those lists to be extracted and matched with a responce.
User Input: This is a good day I am heading out for a jog.
List 1 : Keywords : good day, great day, awesome day, best day.
List 2 : Keywords : a run, a swim, a game.
But for a huge database of words, can this be linked to just the lists? Or does it need to be especific words?
Also would you recommend Python for a huge database of keywords?
The first thing to do is to break the input string up into tokens. A token is just a piece of the string that you want to match. In your case, it looks like your token size is 2 words (but it doesn't have to be). You might also want to strip all punctuation from the input string as well.
Then for your input, your tokens are
['This is', 'is a', 'a good', 'good day', 'day I', 'I am', 'am heading', 'heading out', 'out for', 'for a', 'a jog']
Then you can iterate over the tokens and check to see if they're contained in each one of the lists. Might look like this:
input = 'This is a good day I am heading out for a jog'
words = input.split(' ')
tokens = [' '.join(words[i:i+2]) for i in range(len(words) - 1)]
for token in tokens:
if token in list1:
print('{} is in list1'.format(token))
if token in list2:
print('{} is in list2'.format(token))
One thing you will likely want to do to optimize this is to use sets for list1 and list2, instead of lists.
set1 = set(list1)
sets offer O(1) lookups, as opposed to O(n) for lists, which is critical if your keyword lists are large.

Generate variant strings based on replacement dictionary for some words only in Python

Given a string and a word replacement dictionary, I'm trying to get python to return all variant strings. E.g. for the string "One went to market", and the replacement dict {'One': ['One','Two','Three'], 'market': ['town','bed']} I want to return: ['One went to town','Two went to town','Three went to town', One went to bed','Two went to bed','Three went to bed']. Currently I have this working only when there are two replacement options.
My partially working approach uses word lists generated from the dictionary, e.g. in the example above, I have ['One,Two,Three','went','to','town,bed']. This this:
def perm(wordlist):
a=[[]]
for i in wordlist:
if ',' in i:
wds=i.split(',')
for alis in a:
alis.append(wds[0])
for j in wds[1:]:
b=[x[:-1] for x in a]
for alis in b:
alis.append(j)
a=a+b
else:
for alis in a:
alis.append(i)
return a
for ['One,Two','went','to','town,bed'] I get the required result, but any time there's more than two options it's haywire.
Supposedly you have these:
string = "One went to market"
dict_repl = {'One':['One','Two','Three'],'market':['town','bed']}
you can get your expected result using this one liner using string comprehension and str.replace:
result = [string.replace('market',v).replace('One',i) for v in dict_repl['market'] for i in dict_repl['One']]
output:
['One went to town', 'Two went to town', 'Three went to town', 'One went to bed', 'Two went to bed', 'Three went to bed']
I believe this is what you're asking for.

Python break list values into sub-components and maintain key

Hello I have a list as follows:
['2925729', 'Patrick did not shake our hands nor ask our names. He greeted us promptly and politely, but it seemed routine.'].
My goal is a result as follows:
['2925729','Patrick did not shake our hands nor ask our names'], ['2925729', 'He greeted us promptly and politely, but it seemed routine.']
Any pointers would be very much appreciated.
>>> t = ['2925729', 'Patrick did not shake our hands nor ask our names. He greeted us promptly and politely, but it seemed routine.']
>>> [ [t[0], a + '.'] for a in t[1].rstrip('.').split('.')]
[['2925729', 'Patrick did not shake our hands nor ask our names.'], ['2925729', ' He greeted us promptly and politely, but it seemed routine.']]
If you have a large dataset and want to conserve memory, you may want to create a generator instead of a list:
g = ( [t[0], a + '.'] for a in t[1].rstrip('.').split('.') )
for key, sentence in g:
# do processing
Generators do not create lists all at once. They create each element as you access it. This is only helpful if you don't need the whole list at once.
ADDENDUM: You asked about making dictionaries if you have multiple keys:
>>> data = ['1', 'I think. I am.'], ['2', 'I came. I saw. I conquered.']
>>> dict([ [t[0], t[1].rstrip('.').split('.')] for t in data ])
{'1': ['I think', ' I am'], '2': ['I came', ' I saw', ' I conquered']}

Python print array with new line

I'm new to python and have a simple array:
op = ['Hello', 'Good Morning', 'Good Evening', 'Good Night', 'Bye']
When i use pprint, i get this output:
['Hello', 'Good Morning', 'Good Evening', 'Good Night', 'Bye']
Is there anyway i can remove the quotes, commas and brackets and print on a seperate line. So that the output is like this:
Hello
Good Morning
Good Evening
Good Night
Bye
You could join the strings with a newline, and print the resulting string:
print "\n".join(op)
For Python 3, we can also use list unpack
https://docs.python.org/3.7/tutorial/controlflow.html#unpacking-argument-lists
print(*op, sep='\n')
It's the same as
print('Hello', 'Good Morning', 'Good Evening', 'Good Night', 'Bye', sep='\n')
Here's a couple points of clarification
First of all what you have there is a list, not an array. The difference is that the list is a far more dynamic and flexible data structure (at list in dynamic languages such as python). For instance you can have multiple objects of different types (e.g have 2 strings, 3 ints, one socket, etc)
The quotes around the words in the list denote that they are objects of type string.
When you do a print op (or print(op) for that matter in python 3+) you are essentially asking python to show you a printable representation of that specific list object and its contents. Hence the quotes, commas, brackets, etc.
In python you have a very easy for each loop, usable to iterate through the contents of iterable objects, such as a list. Just do this:
for greeting in op:
print greeting
Print it line by line
for word in op:
print word
This has the advantage that if op happens to be massively long, then you don't have to create a new temporary string purely for printing purposes.
You can also get some nicer print behavior by using the Pretty Print library pprint
Pretty print will automatically wrap printed contents when you go over a certain threshold, so if you only have a few short items, it'll display inline:
from pprint import pprint
xs = ['Hello', 'Morning', 'Evening']
pprint(xs)
# ['Hello', 'Morning', 'Evening']
But if you have a lot of content, it'll auto-wrap:
from pprint import pprint
xs = ['Hello', 'Good Morning', 'Good Evening', 'Good Night', 'Bye', 'Aloha', 'So Long']
pprint(xs)
# ['Hello',
# 'Good Morning',
# 'Good Evening',
# 'Good Night',
# 'Bye',
# 'Aloha',
# 'So Long']
You can also specify the column width with the width= param.
See Also: How to print a list in Python "nicely"

Categories

Resources