Iterate through the list

Iterate through the list - python

The transaction csv looks like this and I add them to list as shown below.
Bread Milk
Bread Diapers Beer Eggs Beer
[{'Bread': 1, 'Milk': 1, '': 7}, {'Bread': 1, 'Diapers': 1, 'Beer': 6, 'Eggs': 1}, {'Milk': 1, 'Diapers': 1, 'Beer': 6, 'Cola': 1}, {'Bread': 1, 'Milk': 1, 'Diapers': 1, 'Beer': 6}, {'Bread': 1, 'Milk': 1, 'Diapers': 2, 'Cola': 1, 'Chips': 2, 'Beer': 1, '': 1}, {'Bread': 1, 'Milk': 1, '': 7}, {'Bread': 1, 'Cola': 1, 'Beer': 3, 'Milk': 1, 'Chips': 1, 'Diapers': 3, '': 1}, {'Milk': 1, 'Bread': 1, 'Beer': 4, 'Cola': 1, 'Diapers': 1, 'Chips': 1}, {'Bread': 1, 'Milk': 2, 'Diapers': 2, 'Beer': 2, 'Chips': 2}, {'Bread': 2, 'Beer': 3, 'Diapers': 3, 'Milk': 1}]
I would like to consider only the list which contains the count 3 Diapers.
I would expect the transactions to return only as shown below:
{'Bread': 2, 'Beer': 3, 'Diapers': 3, 'Milk': 1}
{'Bread': 1, 'Cola': 1, 'Beer': 3, 'Milk': 1, 'Chips': 1, 'Diapers': 3, '': 1}
{'Bread', 'Beer', 'Diapers', 'Milk'}
{'Bread', 'Cola', 'Beer', 'Milk', 'Chips', 'Diapers', ''}
The code i have is:
def M():
li = []
# Open the csv file
with open('transaction.csv') as fp:
DataCaptured = csv.reader(fp, delimiter=',')
# Iterate through each word in csv and add it's counter to the row
for row in DataCaptured:
li.append(dict(Counter(row)))
if li['Diaper']==3: ---> I am missing this logic not sure how to get it.
# Return the list of counters
return li
print(M())

li=[{'Bread': 1, 'Milk': 1, '': 7}, {'Bread': 1, 'Diapers': 1, 'Beer': 6, 'Eggs': 1}, {'Milk': 1, 'Diapers': 1, 'Beer': 6, 'Cola': 1}, {'Bread': 1, 'Milk': 1, 'Diapers': 1, 'Beer': 6}, {'Bread': 1, 'Milk': 1, 'Diapers': 2, 'Cola': 1, 'Chips': 2, 'Beer': 1, '': 1}, {'Bread': 1, 'Milk': 1, '': 7}, {'Bread': 1, 'Cola': 1, 'Beer': 3, 'Milk': 1, 'Chips': 1, 'Diapers': 3, '': 1}, {'Milk': 1, 'Bread': 1, 'Beer': 4, 'Cola': 1, 'Diapers': 1, 'Chips': 1}, {'Bread': 1, 'Milk': 2, 'Diapers': 2, 'Beer': 2, 'Chips': 2}, {'Bread': 2, 'Beer': 3, 'Diapers': 3, 'Milk': 1}]
for d in li:
if 'Diapers' in d and d['Diapers']==3:
print(d)
OUTPUT:
{'Bread': 1, 'Cola': 1, 'Beer': 3, 'Milk': 1, 'Chips': 1, 'Diapers': 3, '': 1}
{'Bread': 2, 'Beer': 3, 'Diapers': 3, 'Milk': 1}

Related

sorting a nested dictionary by 3 criteria

I have a nested dictionary and I want to sort it by 3 fields, firstly by points secondly by wins(if points are equal) and lastly by alphabet if both points and wins are equal.
My dictionary:
{'Iran': {'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1},
'Morocco': {'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1},
'Portugal': {'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1},
'Spain': {'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1}}
My code:
sort_data=sorted(dic.keys(),key=lambda x:(-dic[x]["points"],dic[x]["wins"]))
for items in sort_data:
x=dic[items]
print(items," ",str(x).replace("'", "").replace("{","").replace("}", ""))
my code dose not work in this case and dose not sort it by alphabet when all the situations are equal.Could you help me?

I suggest using dic.items() rather than dic.keys(), although that's mostly a matter of taste.
def sorted_dict(dic):
def key(x):
k,v = x
return (v['points'], v['wins'], k)
return dict(sorted(dic.items(), key=key))
dic1 = {'Iran': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}, 'Morocco': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}, 'Portugal': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}, 'Spain': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}}
dic2 = sorted_dict(dic1)
print(dic2)
# {'Iran': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1},
# 'Morocco': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1},
# 'Portugal': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1},
# 'Spain': {'draws': 1, 'goal difference': 0, 'loses': 1, 'points': 4, 'wins': 1}}
Note that in your dictionary, all countries have the same number of points and wins, and the countries are already sorted in alphabetical order, so this is a terrible example to test whether the sort worked correctly or not.

You could use dict.items with a lambda instead:
import json # For pretty printing only.
table = {
'Iran': {
'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1
},
'Morocco': {
'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1
},
'Portugal': {
'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1
},
'Spain': {
'draws': 1,
'goal difference': 0,
'loses': 1,
'points': 4,
'wins': 1
}
}
sorted_table = dict(
sorted(table.items(),
key=lambda kvp: (-kvp[1]['points'], -kvp[1]['wins'], kvp[0])))
print(json.dumps(sorted_table, indent=4))
Output:
{
"Iran": {
"draws": 1,
"goal difference": 0,
"loses": 1,
"points": 4,
"wins": 1
},
"Morocco": {
"draws": 1,
"goal difference": 0,
"loses": 1,
"points": 4,
"wins": 1
},
"Portugal": {
"draws": 1,
"goal difference": 0,
"loses": 1,
"points": 4,
"wins": 1
},
"Spain": {
"draws": 1,
"goal difference": 0,
"loses": 1,
"points": 4,
"wins": 1
}
}
Note: No change from input is because input is already sorted...

Pandas, get first and last column index for row value

I have the following dataframe:
columns = pd.date_range(start="2022-05-21", end="2022-06-30")
data = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
]
df = pd.DataFrame(data, columns=columns)
2022-05-21 2022-05-22 2022-05-23 ... 2022-06-28 2022-06-29 2022-06-30
0 0 0 0 ... 5 5 5
1 5 5 5 ... 1 1 1
2 5 5 5 ... 5 5 5
I have to take the first and last column index for every distinct value in the order they are. The correct output for this dataframe will be:
[
[
{'value': 0, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}
],
[
{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 1, 'start': '2022-06-20', 'end': '2022-06-30'}
],
[
{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}
]
]
My best approach for the moment is:
series_set = df.apply(frozenset, axis=1)
container = []
for index in range(len(df.index)):
row = df.iloc[[index]]
values = series_set.iloc[[index]]
inner_container = []
for value in values[index]:
single_value_series = row[row.columns[row.isin([value]).all()]]
dates = single_value_series.columns
result = dict(value=value, start=dates[0].strftime("%Y-%m-%d"), end=dates[-1].strftime("%Y-%m-%d"))
inner_container.append(result)
container.append(inner_container)
The result is:
[
[
{'value': 0, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}
],
[
{'value': 1, 'start': '2022-06-20', 'end': '2022-06-30'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'}
],
[
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-05-21', 'end': '2022-06-30'}
]
]
It has several problems, only the first array is correct :)
When I convert dataframe to frozenset it is sorted and order is changed and also if some value appears more than once it is removed.
I will appreciate any idea and guidance. What I want to avoid is iterating the dataframe.
Thank you!

You can first transpose DataFrame by DataFrame.T and then aggregate minimal and maximal index with convert values to strings by Series.dt.strftime, last convert to dictionaries by DataFrame.to_dict.
For get consecutive groups is compared shifted values with Series.cumsum.
df1 = df.T.reset_index()
L = [df1.groupby(df1[x].ne(df1[x].shift()).cumsum())
.agg(value=(x, 'first'),
start=('index', 'min'),
end=('index', 'max'))
.assign(start=lambda x: x['start'].dt.strftime('%Y-%m-%d'),
end=lambda x: x['end'].dt.strftime('%Y-%m-%d'))
.to_dict(orient='records') for x in df1.columns.drop('index')]
print (L)
[[{'value': 0, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}],
[{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 1, 'start': '2022-06-20', 'end': '2022-06-30'}],
[{'value': 5, 'start': '2022-05-21', 'end': '2022-05-31'},
{'value': 2, 'start': '2022-06-01', 'end': '2022-06-19'},
{'value': 5, 'start': '2022-06-20', 'end': '2022-06-30'}]]

Create a nested dictionary for every distinct words in a list

I have a nested list, and for each list inside I want to create a dictionary that will contain another dictionary with the words related to a certain word as a key and the times they appear as the value. For example:
from
sentences = [["i", "am", "a", "sick", "man"],
["i", "am", "a", "spiteful", "man"],
["i", "am", "an", "unattractive", "man"],
["i", "believe", "my", "liver", "is", "diseased"],
["however", "i", "know", "nothing", "at", "all", "about", "my",
"disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
part of the dictionary returned would be:
{ "man": {"i": 3, "am": 3, "a": 2, "sick": 1, "spiteful": 1, "an": 1, "unattractive": 1}, "liver": {"i": 1, "believe": 1, "my": 1, "is": 1, "diseased": 1}...}
with as many keys as there are distinct words in the passage.
I've tried this:
d = {}
for row in sentences:
for words in rows:
if words not in d:
d[words] = 1
else:
d[words] += 1
But is only the way to count them, how could I use d as a value for another dictionary?

from collections import defaultdict
data = {}
for sentence in sentences:
for word in sentence:
data[word] = defaultdict(lambda: 0)
for sentence in sentences:
length = len(sentence)
for index1, word1 in enumerate(sentence):
for num in range(0, length - 1):
index2 = (index1 + 1 + num) % length
word2 = sentence[index2]
data[word1][word2] += 1
print(data)

sentences = [["i", "am", "a", "sick", "man"],
["i", "am", "a", "spiteful", "man"],
["i", "am", "an", "unattractive", "man"],
["i", "believe", "my", "liver", "is", "diseased"],
["however", "i", "know", "nothing", "at", "all", "about", "my",
"disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
# "as many keys as there are distinct words in the passage"
# Well then we need to start by finding the distinct words.
# sets always help for this.
# first we flatten the list. If you don't know what this is doing,
# search "flatten nested list Python". This is a common pattern:
flat_list = [term for group in sentences for term in group]
# now use set to find distinct words
distinct_words = set(flat_list)
# variable for final dictionary
result = {}
# define this function first. See invocation below
def find_related_counts(word):
# a nice way to do counts us with
# setdefault. If the term has already
# been counted, then it just increments.
# otherwise, it will create the key and
# initialise it to the default
related_counts = {}
for group in sentences:
# is "word" related to the terms in this group?
if word in group:
# yes it is! add the other terms:
for other in group:
# except, presumably, the word itself
if other != word:
related_counts.setdefault(other, 0)
related_counts[other] += 1
return related_counts
# for each word we have a key, and must find the value
for word in distinct_words:
# when dealing with nested anythings, it helps to
# make a function, so you don't have so much
# nesting in one place and separate things out
# nicely instead
value = find_related_counts(word)
result[word] = value
print(result)
print(result["man"])
OUTPUT:
{'spiteful': {'i': 1, 'am': 1, 'a': 1, 'man': 1}, 'and': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'unattractive': {'i': 1, 'am': 1, 'an': 1, 'man': 1}, 'nothing': {'however': 1, 'i': 1, 'know': 2, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'diseased': {'i': 1, 'believe': 1, 'my': 1, 'liver': 1, 'is': 1}, 'sick': {'i': 1, 'am': 1, 'a': 1, 'man': 1}, 'man': {'i': 3, 'am': 3, 'a': 2, 'sick': 1, 'spiteful': 1, 'an': 1, 'unattractive': 1}, 'do': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'believe': {'i': 1, 'my': 1, 'liver': 1, 'is': 1, 'diseased': 1}, 'i': {'am': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1, 'believe': 1, 'my': 2, 'liver': 1, 'is': 1, 'diseased': 1, 'however': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'certain': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'what': 1, 'ails': 1, 'me': 1}, 'an': {'i': 1, 'am': 1, 'unattractive': 1, 'man': 1}, 'my': {'i': 2, 'believe': 1, 'liver': 1, 'is': 1, 'diseased': 1, 'however': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'a': {'i': 2, 'am': 2, 'sick': 1, 'man': 2, 'spiteful': 1}, 'am': {'i': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1}, 'however': {'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'about': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'not': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'for': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'liver': {'i': 1, 'believe': 1, 'my': 1, 'is': 1, 'diseased': 1}, 'know': {'however': 1, 'i': 1, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'at': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'all': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'disease': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'ails': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'me': 1}, 'me': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1}, 'what': {'however': 1, 'i': 1, 'know': 2, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'my': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'ails': 1, 'me': 1}, 'is': {'i': 1, 'believe': 1, 'my': 1, 'liver': 1, 'diseased': 1}}
{'i': 3, 'am': 3, 'a': 2, 'sick': 1, 'spiteful': 1, 'an': 1, 'unattractive': 1}

pandas - pd.replace and TypeError

I have all_data dataframe. I want to replace some categorical values in certain columns with numerical values. I'm trying to use this nested dictionary notation (I've checked that the brackets and curly brackets are in place, I don't think that's the issue):
all_data = all_data.replace({'Street': {'Pave': 1, 'Grvl': 0}},
{'LotShape': {'IR3': 1, 'IR2': 2, 'IR1': 3, 'Reg': 4}},
{'Utilities': {'ELO': 0, 'NoSeWa': 0, 'NoSewr': 0, 'AllPub': 1}},
{'LandSlope': {'Sev': 1, 'Mod': 2, 'Gtl': 3}},
{'ExterQual': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}},
{'ExterCond': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5}},
{'BsmtQual': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4,'Ex': 5}},
{'BsmtCond': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4,'Ex': 5}},
{'BsmtExposure': {'NA': 0, 'No': 1, 'Mn': 2, 'Av': 3, 'Gd': 4}},
{'BsmtFinType1': {'NA': 0, 'Unf': 1, 'LwQ': 2, 'Rec': 3, 'BLQ': 4, 'ALQ': 5, 'GLQ': 6}},
{'BsmtFinType2': {'NA': 0, 'Unf': 1,'LwQ': 2,'Rec': 3, 'BLQ': 4,'ALQ': 5, 'GLQ': 6}},
{'HeatingQC': {'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'CentralAir': {'No': 0,'Yes': 1}},
{'KitchenQual': {'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'Functional': {'Sal': -7,'Sev': -6,'Maj1': -5,'Maj2': -4,'Mod': -3,'Min2': -2,'Min1': -1,
'Typ': 0}},
{'FireplaceQu': {'NA': 0,'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'GarageFinish': {'NA': 0,'Unf': 1,'RFn': 2, 'Fin': 3}},
{'GarageQual': {'NA': 0, 'Po': 1,'Fa': 2, 'TA': 3,'Gd': 4, 'Ex': 5}},
{'GarageCond': {'NA': 0,'Po': 1,'Fa': 2,'TA': 3,'Gd': 4,'Ex': 5}},
{'PavedDrive': {'N': 0,'P': 0, 'Y': 1}},
{'Fence': {'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4}},
{'SaleCondition': {'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
'Partial': 0}}
)
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-f9c9c28b7237> in <module>()
22 {'Fence': {'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4}},
23 {'SaleCondition': {'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
---> 24 'Partial': 0}}
25 )
TypeError: replace() takes from 1 to 8 positional arguments but 23 were given
If I remove the 'SaleCondition' row from the above code, the error is again there but this time referring to 'Fence', and so on, for each line of code from bottom up. I've googled but have no idea what this means. Help MUCH appreciated.

You should do something like :
df.replace({'Fence':{'NA': 0, 'MnWw': 1,'GdWo': 2,'MnPrv': 3,'GdPrv': 4},'SaleCondition':{'Abnorml': 1, 'Alloca': 1, 'AdjLand': 1, 'Family': 1, 'Normal': 0,
'Partial': 0}})
the format should be .replace({'col1':{},'col2':{}}) not .replace({'col1':{}},{'col2':{}})

Using Counter to create a dictionary

I have an output of words which I would like to use to create a dictionary where keys = word; value = word's frequency
Here is the code:
import pandas as pd
import numpy as np
import datetime
import sys
import codecs
import re
import urllib, urllib2
import nltk # Natural Language Processing
from nltk.corpus import stopwords # list of words
import string # list(string.punctuation) - produces a list of punctuations
from collections import Counter # optimized way to do this
#wordToken = ['happy', 'thursday', 'from', 'my', 'big', 'sweater', 'and', 'this', 'ART', '#', 'East', 'Village', ',', 'Manhattan', 'https', ':', '//t.co/5k8PUInmqK', 'RT', '#', 'MayorKev', ':', 'IM', 'SO', 'HYPEE', '#', 'calloutband', '#', 'FreakLikeBex', '#', 'Callout', '#', 'TheBitterEnd', '#', 'Manhattan', '#', 'Music']
# this is the output from wordToken = [token.encode('utf-8') for tweetL in tweetList for token in nltk.tokenize.word_tokenize(tweetL)]
wordTokenLw = ' '.join(map(str, wordToken))
wordTokenLw = wordTokenLw.lower()
tweetD = {}
#c = Counter(wordTokenLw)
c = Counter(word.lower() for word in wordToken) # TRYING the suggested answer
#tweetD = dict(c.most_common())
tweetD = dict(c)
print tweetD
However, my output is completely wrong:
{'\x80': 2, 'j': 4, ' ': 192, '#': 21, "'": 1, '\xa6': 2, ',': 1, '/': 37, '.': 13, '1': 1, '0': 5, '3': 2, '2': 4, '5': 3, '7': 2, '9': 2, '8': 1, ';': 1, ':': 18, '#': 14, 'b': 17, 'a': 83, 'c': 36, '\xe2': 2, 'e': 63, 'd': 16, 'g': 10, 'f': 12, 'i': 37, 'h': 33, 'k': 12, '&': 1, 'm': 38, 'l': 22, 'o': 37, 'n': 49, 'q': 5, 'p': 33, 's': 32, 'r': 44, 'u': 20, 't': 104, 'w': 11, 'v': 14, 'y': 21, 'x': 8, 'z': 5}
I think the issue is with the way my dfile is formatted ( I used a space as a separator for join function). The reason I use JOIN function is to use lower() to get everything in lowercase. However, if there is a better way which will help my end result it would be awesome to hear about it.
This is a new area for me and truly appreciate your help!
The output after trying:
c = Counter(word.lower() for word in wordToken)
{'over': 1, 'hypee': 1, '//t.co/0\xe2\x80\xa6': 1, ',': 1, 'thursday': 1, 'day': 1, 'to': 2, 'dreams': 1, 'main': 1, '#': 14, 'automotive': 1, 'tbt': 1, 'positivital': 1, '2ma': 1, 'amp': 1, 'traveiplaces': 1, '//t.co/vmbal\xe2\x80\xa6': 1, '//t.co/c9ezuknraq': 1, 'motorcycles': 1, 'river': 1, 'view': 1, '//t.co/kpeunlzoyf': 1, 'art': 1, 'reillyhunter': 1, '//t.co/5pcxnzpwhw': 1, 'mayorkev': 1, 'rt': 5, '#': 21, 'pinterest': 1, 'away': 1, 'traveltuesday': 1, 'ice': 1, '//t.co/simhceefqy': 1, 'state': 1, 'fog': 1, ';': 1, '3d': 1, 'be': 1, 'run': 1, '//t.co/xrqaa7cb3e': 1, 'taevision': 1, 'by': 1, 'on': 1, 'livemusic': 1, 'bmwmotorradusa': 1, 'taking': 1, 'calloutband': 1, 'jersey': 1, 'uber': 1, 'bell': 1, 'freaklikebex': 1, 'village': 1, '.': 1, 'from': 2, '//t.co/5k8puinmqk': 1, '//t.co/gappxrvuql': 1, '&': 1, '500px': 1, 'sweater': 1, 'callout': 1, 'next': 1, 'appears': 1, 'music': 1, 'https': 5, ':': 18, 'happy': 1, 'park': 1, 'mercedesbenz': 1, 'amcafee': 1, 'foggy': 1, 'east': 2, '7pm': 1, 'this': 2, 'of': 1, 'taxis': 1, 'my': 1, 'and': 2, 'bridge': 1, 'centralpark': 1, '//t.co/ujdzsywt0u': 1, 'toughrides': 1, '10/22': 1, 'am': 1, 'thebitterend': 1, 'bmwmotorrad': 1, 'im': 1, 'at': 2, 'in': 3, 'cream': 1, 'nj': 1, '//t.co/hnxktmvrsc': 1, 'ny': 2, 'big': 1, 'nyc': 3, 'rides': 1, 'manhattan': 10, 'nice': 1, 'week': 1, 'blue': 1, 'http': 7, 'effect': 1, 'paleteria': 1, "'m": 1, 'a': 1, '//t.co/ucgfcwp9j2': 1, 'i': 2, 'so': 1, 'bmw': 1}

When you join into a single string again, the Counter starts counting letters instead of words (since you're giving it an iterable of letters). Instead, you should make a Counter directly from the wordToken list; you can use a generator expression to call lower on each item as you put it in the counter:
c = Counter(word.lower() for word in wordToken)

It is one of the common mistakes when dealing with strings. Strings are iterable in Python and sometimes when a function takes an iterable and we end up giving strings, we find the function acting on the elements of the string, which are the chars constituting the string.
class collections.Counter([iterable-or-mapping])
In you case, you should simply do the counter on the wordToken like this.
Counter(map(lambda w: w.lower(), wordToken)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterate through the list - python

Related

sorting a nested dictionary by 3 criteria

Pandas, get first and last column index for row value

Create a nested dictionary for every distinct words in a list

pandas - pd.replace and TypeError

Using Counter to create a dictionary

Categories

Resources