I am trying to convert first header row of excel table into dict with a value of 1. Fairly new to Python and not able to excute this code. My table in spreadhseet looks like:
Matrix
Column A
Column B
Row A
10
20
Row B
30
40
I would like my output as following dict:
{'Column A': 1,'Column B': 1}
I tried test_row = pd.read_excel("Test.xlsx", index_col=0).to_dict('index')
The column names will increase in future. So, it will be nice to have a solution that can extract n number of columns header into dict with a value of 1. Many thanks!
Given your example Dataframe as
df = pd.DataFrame({'Matrix': {0: 'Row A', 1: 'Row B'}, 'Column A': {0: 10, 1: 30}, 'Column B': {0: 20, 1: 40}})
You can use:
cols_dict = {col: 1 for col in df.columns} # {'Matrix': 1, 'Column A': 1, 'Column B': 1}
rows_dict = {row: 1 for row in df.Matrix} # {'Row A': 1, 'Row B': 1}
Related
I have a dataset with two address information that I need to compare to evaluate if they contains the same number or set of numbers.
This is my dataset
data = [['Road 76', 'Road 12, 55'], ['Road 11, 7-9', 'Road 11, 5'], ['Road 25', 'Street 5']]
df_original = pd.DataFrame( data, columns = ['Address 1', 'Address 2'])
This is the outcome
test_data = [['Road 76', 'Road 12, 55', 0], ['7-9, Road 11', 'Road 11, 5', 1], ['Road 5', 'Street 25', 0]]
df_outcome = pd.DataFrame(test_data, columns = ['Address 1', 'Address 2', 'Number Match?'])
df_outcome
This is my attempt, but it only considers the first number appearing in the column
df_original['Address 1'] = df_original['Address 1'].str.extract('(\d+)')
df_original['Address 2'] = df_original['Address 2'].str.extract('(\d+)')
df_original['Number match'] = np.where(df_original['Address 1']==df_original['Address 2'], 1, 0)
Suggestions?
First get all integers by Series.str.findall, convert values to sets and for intersection use &, last convert to boolean for mapping True->1, False->0:
df_original['Address 1'] = df_original['Address 1'].str.findall('(\d+)').apply(set)
df_original['Address 2'] = df_original['Address 2'].str.findall('(\d+)').apply(set)
df_original['Number match'] = (df_original['Address 1'] & df_original['Address 2']).astype(int)
print (df_original)
Address 1 Address 2 Number match
0 {76} {55, 12} 0
1 {9, 7, 11} {5, 11} 1
2 {25} {5} 0
Current dataframe:
Expected dataframe:
I've tried different things like pivot_table, set_index, unstack, etc. but it's not working as expected. My goal is to have a new dataframe with each row based on a unique combination of 'Col 1', 'Col 2' and 'Col 3'. Also, I need to somehow transpose my data in 'Col 4' and 'Col 5' into different columns with the values in 'Col 4'serving as my new columns. I'm having trouble reshaping the dataframe as I not only have duplicate index values but I also need to transpose on duplicate values in 'Col 4'. Any ideas on how could we accomplish this?
[Apologies for the links of the images for current dataframe and expected result. I could only upload a link in the post, as I'm new to this platform]
Try this:
df = pd.DataFrame({'Col 1': [1, 2, 3], 'Col 2': ['abc'] * 3, 'Col 3': ['xyz']*3, 'Col 4': ['a', 'b', 'a'], 'Col 5': pd.date_range("2021-08-17", periods=3, freq="D")})
df = df.sort_values('Col 4')
df[['Col 4']].groupby(['Col 4']).cumcount()+1
df[['tmp']] = df[['Col 4']].groupby(['Col 4']).cumcount()+1
df[['tmp']] = df[['tmp']].astype('str')
df[['Col 4']] = df['Col 4'] + ' ( ' + df['tmp'] +' )'
pd.pivot(df, values='Col 5', index=['Col 1', 'Col 2', 'Col 3'], columns=['Col 4'])
I have dataframe which has many rows.
How can I make this upper dataframe as below which has one rows.
import pandas as pd
# source dataframe
df_source = pd.DataFrame({
'ID': ['A01', 'A01'],
'Code': ['101', '102'],
'amount for code': [10000, 20000],
'count for code': [4, 3]
})
# target dataframe
df_target = pd.DataFrame({
'ID': ['A01'],
'Code101': [1],
'Code102': [1],
'Code103': [0],
'amount for code101': [10000],
'count for code101': [4],
'amount for code102': [20000],
'count for code102': [3],
'amount for code103': [None],
'count for code103': [None],
'count for code': [None],
'sum of amount': [30000],
'sum of count': [7]
})
I tried to use method 'get.dummies' but It can be used only for there was that code or not.
How can I handle dataframe to make my dataset?
You can iterate through the rows of your existing dataframe and populate (using .at or .loc) your new dataframe (df2). df2 will have the index ID, which is now unique.
import pandas as pd
df = pd.DataFrame({
'ID': ['A01', 'A01'],
'Code': ['101', '102'],
'amount for code': [10000, 20000],
'count for code': [4, 3]
})
df2 = pd.DataFrame()
for idx, row in df.iterrows():
for col in df.columns:
if col !='ID' and col !='Code':
df2.at[row['ID'],col+row['Code']]=row[col]
You can use pivot_table:
df_result = df.pivot_table(index='ID', columns='Code', values=['amount for code', 'amount for code'])
This will return a data frame with multi-level column index, for example ('101', 'amount for code')
Then you can add other calculated columns like sum of amount and so on.
I have 2 DataFrames containing examples, I would like to see if a example of DataFrame 1 is present in DataFrame 2.
Normally I would aggregate the rows per example and simply merge the DataFrames. Unfortunately the merging has to be done with a "matching table" which has a many-to-many relationship between the keys (id_low vs. id_high).
Simplified example
Matching Table:
Input DataFrames
They are therefore matchable like this:
Expected Output:
Simplified example (for Python)
import pandas as pd
# Dataframe 1 - containing 1 Example
d1 = pd.DataFrame.from_dict({'Example': {0: 'Example 1', 1: 'Example 1', 2: 'Example 1'},
'id_low': {0: 1, 1: 2, 2: 3}})
# DataFrame 2 - containing 1 Example
d2 = pd.DataFrame.from_dict({'Example': {0: 'Example 2', 1: 'Example 2', 2: 'Example 2'},
'id_low': {0: 1, 1: 4, 2: 6}})
# DataFrame 3 - matching table
dm = pd.DataFrame.from_dict({'id_low': {0: 1, 1: 2, 2: 2, 3: 3, 4: 3, 5: 4, 6: 5, 7: 6, 8: 6},
'id_high': {0: 'A',
1: 'B',
2: 'C',
3: 'D',
4: 'E',
5: 'B',
6: 'B',
7: 'E',
8: 'F'}})
d1 and d2 are matchable as you can see above.
Expected Output (or similar):
df_output = pd.DataFrame.from_dict({'Example': {0: 'Example 1'}, 'Example_2': {0: 'Example 2'}})
Failed attemps
Aggregation of with matching table translated values then merging. Considerer using Regex with the OR-Operator.
IIUC:
d2.merge(dm)
.merge(d1.merge(dm), on='id_high')\
.groupby(['Example_x','Example_y'])['id_high'].agg(list)\
.reset_index()
Output:
Example_x Example_y id_high
0 Example 2 Example 1 [A, B, E]
I have data like --
sample 1, domain 1, value 1
sample 1, domain 2, value 1
sample 2, domain 1, value 1
sample 2, domain 3, value 1
-- stored in a dictionary --
dict_1 = {('sample 1','domain 1'): value 1, ('sample 1', 'domain 2'): value 1}
-- etc.
Now, I have a different kind of value, named value 2 --
sample 1, domain 1, value 2
sample 1, domain 2, value 2
sample 2, domain 1, value 2
sample 2, domain 3, value 2
-- which I again put in a dictionary,
dict_2 = {('sample 1','domain 1'): value 2, ('sample 1', 'domain 2'): value 2}
How can I merge these two dictionaries in python? The keys, for instance ('sample 1', 'domain 1') are the same for both dictionaries.
I expect it to look like --
final_dict = {('sample 1', 'domain 1'): (value 1, value 2), ('sample 1', 'domain 2'): (value 1, value 2)}
-- etc.
The closest you're likely to get to this would be a dict of lists (or sets). For simplicity, you usually go with collections.defaultdict(list) so you're not constantly checking if the key already exists. You need to map to some collection type as a value because dicts have unique keys, so you need some way to group the multiple values you want to store for each key.
from collections import defaultdict
final_dict = defaultdict(list)
for d in (dict_1, dict_2):
for k, v in d.items():
final_dict[k].append(v)
Or equivalently with itertools.chain, you just change the loop to:
from itertools import chain
for k, v in chain(dict_1.items(), dict_2.items()):
final_dict[k].append(v)
Side-note: If you really need it to be a proper dict at the end, and/or insist on the values being tuples rather than lists, a final pass can convert to such at the end:
final_dict = {k: tuple(v) for k, v in final_dict.items()}
You can use set intersection of keys to do this:
dict_1 = {('sample 1','domain 1'): 'value 1', ('sample 1', 'domain 2'): 'value 1'}
dict_2 = {('sample 1','domain 1'): 'value 2', ('sample 1', 'domain 2'): 'value 2'}
result = {k: (dict_1.get(k), dict_2.get(k)) for k in dict_1.keys() & dict_2.keys()}
print(result)
# {('sample 1', 'domain 1'): ('value 1', 'value 2'), ('sample 1', 'domain 2'): ('value 1', 'value 2')}
The above uses dict.get() to avoid possibilities of a KeyError being raised(very unlikely), since it will just return None by default.
As #ShadowRanger suggests in the comments, If a key is for some reason not found, you could replace from the opposite dictionary:
{k: (dict_1.get(k, dict_2.get(k)), dict_2.get(k, dict_1.get(k))) for k in dict_1.keys() | dict_2.keys()}
Does something handcrafted like this work for you?
dict3 = {}
for i in dict1:
dict3[i] = (dict1[i], dict2[i])
from collections import defaultdict
from itertools import chain
dict_1 = {('sample 1','domain 1'): 1, ('sample 1', 'domain 2'): 2}
dict_2 = {('sample 1','domain 1'): 3, ('sample 1', 'domain 2'): 4}
new_dict_to_process = defaultdict(list)
dict_list=[dict_1.items(),dict_2.items()]
for k,v in chain(*dict_list):
new_dict_to_process[k].append(v)
Output will be
{('sample 1', 'domain 1'): [1, 3],
('sample 1', 'domain 2'): [2, 4]})