creating a list of dictionaries from pandas dataframe - python

This is my df:
df = pd.DataFrame({'sym': ['a', 'b', 'c', 'x', 'y', 'z', 'q', 'w', 'e'],
'sym_t': ['tsla', 'msft', 'f', 'aapl', 'aa', 'gg', 'amd', 'ba', 'c']})
I want to separate this df into groups of three and create a list of dictionaries:
options = [{'value':'a b c', 'label':'tsla msft f'}, {'value':'x y z', 'label':'aapl aa gg'}, {'value':'q w e', 'label':'amd ba c'}]
How can I create that list? My original df has over 1000 rows.

Try groupby to concatenate the rows, then to_dict:
tmp = df.groupby(np.arange(len(df))//3).agg(' '.join)
tmp.columns = ['value', 'label']
tmp.to_dict(orient='records')
Output:
[{'value': 'a b c', 'label': 'tsla msft f'},
{'value': 'x y z', 'label': 'aapl aa gg'},
{'value': 'q w e', 'label': 'amd ba c'}]

Related

Iterate over rows in pandas dataframe. If blanks exist before a specific column, move all column values over

I am attempting to iterate over all rows in a pandas dataframe and move all leftmost columns within each row over until all the non null column values in each row touch. The amount of column movement depends on the number of empty columns between the first null value and the cutoff column.
In this case I am attempting to 'close the gap' between values in the leftmost columns into the column 'd' touching the specific cutoff column 'eee'. The correlating 'abc' rows should help to visualize the problem.
Column 'eee' or columns to the right of 'eee' should not be touched or moved
def moveOver():
df = {
'aaa': ['a', 'a', 'a', 'a', 'a', 'a'],
'bbb': ['', 'b', 'b', 'b', '', 'b'],
'ccc': ['', '', 'c', 'c', '', 'c'],
'ddd': ['', '', '', 'd', '', ''],
'eee': ['b', 'c', 'd', 'e', 'b', 'd'],
'fff': ['c', 'd', 'e', 'f', 'c', 'e'],
'ggg': ['d', 'e', 'f', 'g', 'd', 'f']
}
In row 1 AND 5: 'a' would be moved over 3 column index's to column 'ddd'
In row 2: ['a','b'] would be moved over 2 column index's to columns ['ccc', 'ddd'] respectively
etc.
finalOutput = {
'aaa': ['', '', '', 'a', '', ''],
'bbb': ['', '', 'a', 'b', '', 'a'],
'ccc': ['', 'a', 'b', 'c', '', 'b'],
'ddd': ['a', 'b', 'c', 'd', 'a', 'c'],
'eee': ['b', 'c', 'd', 'e', 'b', 'd'],
'fff': ['c', 'd', 'e', 'f', 'c', 'e'],
'ggg': ['d', 'e', 'f', 'g', 'd', 'f']
}
You can do this:
keep_cols = df.columns[0:df.columns.get_loc('eee')]
df.loc[:,keep_cols] = [np.roll(v, Counter(v)['']) for v in df[keep_cols].values]
print(df):
aaa bbb ccc ddd eee fff ggg
0 a b c d
1 a b c d e
2 a b c d e f
3 a b c d e f g
4 a b c d
5 a b c d e f
Explanation:
You want to consider only those columns which are to the left of 'eee', so you take those columns as stored in keep_cols
Next you'd want each row to be shifted by some amount (we need to know how much), to shift I used numpy's roll. But how much amount? It is given by number of blank values - for that I used Counter from collections.

Merge specific rows in pandas Df

I have df after read_excel where some of values (from one column, with strings) are divided. How can i merge them back?
for example:
the df i have
{'CODE': ['A', None, 'B', None, None, 'C'],
'TEXT': ['A', 'a', 'B', 'b', 'b', 'C'],
'NUMBER': ['1', None, '2', None, None,'3']}
the df i want
{'CODE': ['A','B','C'],
'TEXT': ['Aa','Bbb','C'],
'NUMBER': ['1','2','3']}
I can't find the right solution. I tried to import data in different ways but it also did not help
You can forward fill missing values or Nones for groups with aggregate join and first non None value for NUMBER column:
d = {'CODE': ['A', None, 'B', None, None, 'C'],
'TEXT': ['A', 'a', 'B', 'b', 'b', 'C'],
'NUMBER': ['1', None, '2', None, None,'3']}
df = pd.DataFrame(d)
df1 = df.groupby(df['CODE'].ffill()).agg({'TEXT':''.join, 'NUMBER':'first'}).reset_index()
print (df1)
CODE TEXT NUMBER
0 A Aa 1
1 B Bbb 2
2 C C 3
You can generate dictionary:
cols = df.columns.difference(['CODE'])
d1 = dict.fromkeys(cols, 'first')
d1['TEXT'] = ''.join
df1 = df.groupby(df['CODE'].ffill()).agg(d1).reset_index()

dataframe remove last digit from string if it is number

python dataframe
I want to delete the last character if it is number.
from current dataframe
data = {'d':['AAA2', 'BB 2', 'C', 'DDD ', 'EEEEEEE)', 'FFF ()', np.nan, '123456']}
df = pd.DataFrame(data)
to new dataframe
data = {'d':['AAA2', 'BB 2', 'C', 'DDD ', 'EEEEEEE)', 'FFF ()', np.nan, '123456'],
'expected': ['AAA', 'BB', 'C', 'DDD', 'EEEEEEE)', 'FFF (', np.nan, '12345']}
df = pd.DataFrame(data)
df
ex
Using .str.replace:
df['d'] = df['d'].str.replace(r'(\d)$','',regex=True)

Updating dictionaries of list from another dictionary python

I have two list of dictionaries: I am trying to compare test2 with test1 and update accordingly.
test1 = [{'names': ['a', 'b', 'c'],
'country': 'USA',
'state': 'Texas'},
{'names': ['d', 'e', 'f'],
'country': 'Australia',
'state': 'Melbourne'},
{'names': ['i', 'j', 'k'],
'country': 'canada',
'state': 'Toronto'},
{'names': ['l', 'm', 'n'],
'country': 'Austria',
'state': 'Burgenland'}]
test2 = [{'code': 4286,
'list_of_countries': ['USA',
'Australia',
'Colombia',
'Denmark',
'Greece',
'Iceland']},
{'code':4287,
'list_of_countries': ['Texas',
'Argentina',
'Austria',
'Bangladesh', 'canada']}]
Expected Output:
test2 = [{'names':['a', 'b', 'c', 'd', 'e', 'f'],
'country': ['USA', 'Australia'],
'state': ['Texas', 'Melbourne'],
'code':4286},
{'names':['i', 'j', 'k', 'l', 'm', 'n'],
'country': ['canada', 'Austria'],
'state': ['Toronto','Burgenland'],
'code':4287}]
Tried below snippet: By searching the test1 country in test2 list_of_countries:
for i in test1:
for j in test2:
a = []
if i.get('country') in j.get('list_of_countries'):
a.append({'country':i.get('country'), 'state':i.get('state'})
j.update(a)
You can transform test2 to a dictionary, associating each entry in list_of_countries with their proper key. Then, you can use this result for grouping:
test2 = [{'code': 4286, 'list_of_countries': ['USA', 'Australia', 'Colombia', 'Denmark', 'Greece', 'Iceland']}, {'code': 4287, 'list_of_countries': ['Texas', 'Argentina', 'Austria', 'Bangladesh', 'canada']}]
test1 = [{'names': ['a', 'b', 'c'], 'country': 'USA', 'state': 'Texas'}, {'names': ['d', 'e', 'f'], 'country': 'Australia', 'state': 'Melbourne'}, {'names': ['i', 'j', 'k'], 'country': 'canada', 'state': 'Toronto'}, {'names': ['l', 'm', 'n'], 'country': 'Austria', 'state': 'Burgenland'}]
d = {i:k['code'] for k in test2 for i in k['list_of_countries']}
Now, you can create a series of defaultdicts associated with the country code. By looping over the country/state dicts in test1, you can keep a running update of the states and countries that are associated with each code:
from collections import defaultdict
new_d = dict(zip(d.values(), [defaultdict(list) for _ in d]))
for i in test1:
for a, b in i.items():
new_d[d[i['country']]][a].append(b)
r = [{'code':a, **b, 'names':[j for k in b['names'] for j in k]} for a, b in new_d.items()]
The final list comprehension transforms new_d to your desired format, a list of dictionaries.
Output:
[{'code': 4286, 'names': ['a', 'b', 'c', 'd', 'e', 'f'], 'country': ['USA', 'Australia'], 'state': ['Texas', 'Melbourne']}, {'code': 4287, 'names': ['i', 'j', 'k', 'l', 'm', 'n'], 'country': ['canada', 'Austria'], 'state': ['Toronto', 'Burgenland']}]

How to create a heatmap with condition?

I have the following data:
keys = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C']
values = ['111', '222', '333', '444', '555', '666', '777', '888', '222', '888', '222', '333', '999', '444', '555', '666', '777', '888']
I want to create a heatmap as follows:
mydata = pd.DataFrame({x: values, y: keys})
df_new = mydata.set_index(x)[y].astype(str).str.get_dummies().T
fig, ax = plt.subplots(figsize = (20,5))
ax = sns.heatmap(df_new, cbar=False, linewidths=.5)
plt.show()
The only issue is that the values appear as duplicated columns in a heatmap. For example, 222 appears 3 times in the heatmap. How can I push the same value to be in a single column?

Categories

Resources