How to assign dynamic variables calling from a function in python - python

I have a function which does a bunch of stuff and returns pandas dataframes. The dataframe is extracted from a dynamic list and hence I'm using the below method to return these dataframes.
As soon as I call the function (code in 2nd block), my jupyter notebook just runs the cell infinitely like some infinity loop. Any idea how I can do this more efficiently.
funct(x):
some code which creates multiple dataframes
i = 0
for k in range(len(dynamic_list)):
i += 1
return globals()["df" + str(i)]
The next thing I do is call the function and try to assign it dynamically,
i = 0
for k in range(len(dynamic_list)):
i += 1
globals()["new_df" + str(i)] = funct(x)
I have tried returning selective dataframes from first function and it works just fine, like,
funct(x):
some code returning df1, df2, df3....., df_n
return df1, df2
new_df1, new_df2 = funct(x)

for each dataframe object your code is creating you can simply add it to a dictionary and set the key from your dynamic list.
Here is a simple example:
import pandas as pd
test_data = {"key1":[1, 2, 3], "key2":[1, 2, 3], "key3":[1, 2, 3]}
df = pd.DataFrame.from_dict(test_data)
dataframe example:
key1 key2 key3
0 1 1 1
1 2 2 2
2 3 3 3
I have used a fixed list of values to focus on but this can be dynamic based on however you are creating them.
values_of_interest_list = [1, 3]
Now we can do whatever we want to do with the dataframe, in this instance, I want to filter only data where we have a value from our list.
data_dict = {}
for value_of_interest in values_of_interest_list:
x_df = df[df["key1"] == value_of_interest]
data_dict[value_of_interest] = x_df
To see what we have, we can print out the created dictionary that contains the key we have assigned and the associated dataframe object.
for key, value in data_dict.items():
print(type(key))
print(type(value))
Which returns
<class 'int'>
<class 'pandas.core.frame.DataFrame'>
<class 'int'>
<class 'pandas.core.frame.DataFrame'>
Full sample code is below:
import pandas as pd
test_data = {"key1":[1, 2, 3], "key2":[1, 2, 3], "key3":[1, 2, 3]}
df = pd.DataFrame.from_dict(test_data)
values_of_interest_list = [1, 3]
# Dictionary for data
data_dict = {}
# Loop though the values of interest
for value_of_interest in values_of_interest_list:
x_df = df[df["key1"] == value_of_interest]
data_dict[value_of_interest] = x_df
for key, value in data_dict.items():
print(type(key))
print(type(value))

Related

count how often a key appears in a dataset

i have a pandas dataframe
where you can find 3 columns. the third is the second one with some str slicing.
To every warranty_claim_number, there is a key_part_number (first column).
this dataframe has a lot of rows.
I have a second list, which contains 70 random select warranty_claim_numbers.
I was hoping to find the corresponding key_part_number from those 70 claims in my dataset.
Then i would like to create a dictionary with the key_part_number as key and the corresponding value as warranty_claim_number.
At last, count how often each key_part_number appears in this dataset and update the key.
This should like like this:
dicti = {4:'000120648353',10:'000119582589',....}
first of all you need to change the datatype of warranty_claim_numbers to string or you wont get the leading 0's
You can subset your df form that list of claim numbers:
df = df[df["warranty_claim_number"].isin(claimnumberlist)]
This gives you a dataframe with only the rows with those claim numbers.
countofkeyparts = df["key_part_number"].value_counts()
this gives you a pandas series with the values and you can cast i to a dict with to_dict()
countofkeyparts = countofkeyparts.to_dict()
The keys in a dict have to be unique so if you want the count as a key you can have the value be a list of key_part_numbers
values = {}
for key, value in countofkeyparts.items():
values[value]= values.get(value,[])
values[value].append(key)
According to your example, you can't use the number of occurrences as the key of the dictionary because the key in the dictionary is unique and you can't exclude multiple data columns with the same frequency of occurrence, so it is recommended to set the result in this format: dicti = {4:['000120648353', '09824091'],10:['000119582589'] ,....}
I'll use randomly generated data as an example
from collections import Counter
import random
lst = [random.randint(1, 10) for i in range(20)]
counter = Counter(lst)
print(counter) # First element, then number of occurrences
nums = set(counter.values()) # All occurrences
res = {item: [val for val in counter if counter[val] == item] for item in nums}
print(res)
# Counter({5: 6, 8: 4, 3: 2, 4: 2, 9: 2, 2: 2, 6: 1, 10: 1})
# {1: [6, 10], 2: [3, 4, 9, 2], 4: [8], 6: [5]}
This does what you want:
# Select rows where warranty_claim_numbers item is in lst:
df_wanted = df.loc[df["warranty_claim_numbers"].isin(lst), "warranty_claim_numbers"]
# Count the values in that row:
count_values = df_wanted.value_counts()
# Transform to Dictionary:
print(count_values.to_dict())

Get dict keys using pandas apply

i want to get values from the dict that looks like
pair_devices_count =
{('tWAAAA.jg', 'ttNggB.jg'): 1,
('tWAAAM.jg', 'ttWVsM.jg'): 2,
('tWAAAN.CV', 'ttNggB.AS'): 1,
('tWAAAN.CV', 'ttNggB.CV'): 2,
('tWAAAN.CV', 'ttNggB.QG'): 1}
(Pairs of domain)
But when i use
train_data[['domain', 'target_domain']].apply(lambda x: pair_devices_count.get((x), 0))
it raises an error, because pandas series are not hashable
How can i get dict values to generate column
train['pair_devices_count']?
you cannot apply on multiple columns. You can try this :
train_data.apply(lambda x: pair_devices_count[(x.domain, x.target_domain)], axis=1)
pandas series are not hashable
Convert pd.Series to tuple before using .get consider following simple example
import pandas as pd
d = {('A','A'):1,('A','B'):2,('A','C'):3}
df = pd.DataFrame({'X':['A','A','A'],'Y':['C','B','A'],'Z':['X','Y','Z']})
df['d'] = df[['X','Y']].apply(lambda x:d.get(tuple(x)),axis=1)
print(df)
output
X Y Z d
0 A C X 3
1 A B Y 2
2 A A Z 1

How to merge values of two arrays into one? [duplicate]

This question already has answers here:
How to merge lists into a list of tuples?
(10 answers)
Closed 1 year ago.
How to merge two arrays as value-pairs in python?
Example as follows:
A = [0,2,2,3]
B = [1,1,4,4]
Output:
[[0,1],[2,1],[2,4],[3,4]]
You can simply use "zip"
l1 = [0,2,2,3]
l2 = [1,1,4,4]
print(list(map(list ,zip(l1,l2))))
In addition to Greg's answer, if you need key value pairs cast your zip result to dict
l1 = [0,2,2,3]
l2 = [1,1,4,4]
print(dict(zip(l1,l2)))
Output
{0: 1, 2: 4, 3: 4}
Before creating any loop, try to use built-ins.
Also there is a similar question for your need
Zip with list output instead of tuple
You can simultaniously itterate through using zip(), and append a result list with each of the pairs like so:
A = [0,2,2,3]
B = [1,1,4,4]
result = []
for item1, item2 in zip(A,B):
result.append([item1, item2])
Output = [[0,1],[2,1],[2,4],[3,4]]
print(result) # Prints: [[0,1],[2,1],[2,4],[3,4]]
print(Output == result) # Prints: True
This would give you a list of lists like you were looking for in your question as an output.
Things to keep in mind
If the two starting lists are different sizes then zip() throws away values after one of the lists runs out, so with :
A = [0,2,2,3,4,5]
B = [1,1,4,4]
result = []
for item1, item2 in zip(A,B):
result.append([item1, item2])
Output = [[0,1],[2,1],[2,4],[3,4]]
print(result) # Prints: [[0,1],[2,1],[2,4],[3,4]]
print(Output == result) # Prints: True
You notice that the 4 and 5 in list A is thrown out and ignored.
Key-Value Pair
Also this is not a key-value pair, for that you will want to look into dictionaries in python. That would be something like:
output = {0:1, 2:4, 3:4}
This would allow you to do a lookup for a value, based on it's key like so:
output[3] # Would be 4
output[0] # Would be 1
Which doesn't work for this example because there are two 2's used as keys, so one would be overridden.
Since you have mentioned key-value, you probably mean a dictionary.
A = [0, 2, 2, 3]
B = [1, 1, 4, 4]
dct = {} # Empty dictionary
for key, value in zip(A, B):
dct[key] = value
print(dct)
The output will be:
{0: 1, 2: 4, 3: 4}
Note that, by definition, you can't have two identical keys. So in your case, {2: 1} will be overriden by {2: 4}.

Pandas integrate over columns per each row

In a simplified dataframe:
import pandas as pd
df1 = pd.DataFrame({'350': [7.898167, 6.912074, 6.049002, 5.000357, 4.072320],
'351': [8.094912, 7.090584, 6.221289, 5.154516, 4.211746],
'352': [8.291657, 7.269095, 6.393576, 5.308674, 4.351173],
'353': [8.421007, 7.374317, 6.496641, 5.403691, 4.439815],
'354': [8.535562, 7.463452, 6.584512, 5.485725, 4.517310],
'355': [8.650118, 7.552586, 6.672383, 4.517310, 4.594806]},
index=[1, 2, 3, 4, 5])
int_range = df1.columns.astype(float)
a = 0.005
b = 0.837
I would like to solve an equation which is attached as an image below:
I is equal to the values in the data frame. x is the int_range values so in this case from 350 to 355 with a dx=1.
a and b are optional constants
I need to get a dataframe as an output per each row
For now I do something like this, but I'm not sure it's correct:
dict_INT = {}
for index, row in df1.iterrows():
func = df1.loc[index]*df1.loc[index].index.astype('float')
x = df1.loc[index].index.astype('float')
dict_INT[index] = integrate.trapz(func, x)
df_out = pd.DataFrame(dict_INT, index=['INT']).T
df_fin = df_out/(a*b)
This is the final sum I get per row:
1 3.505796e+06
2 3.068796e+06
3 2.700446e+06
4 2.199336e+06
5 1.840992e+06
I solved this by first converting the dataframe to dict and then performing your equation by each item in row, then writing these value to dict using collections defaultdict. I will break it down:
import pandas as pd
from collections import defaultdict
df1 = pd.DataFrame({'350': [7.898167, 6.912074, 6.049002, 5.000357, 4.072320],
'351': [8.094912, 7.090584, 6.221289, 5.154516, 4.211746],
'352': [8.291657, 7.269095, 6.393576, 5.308674, 4.351173],
'353': [8.421007, 7.374317, 6.496641, 5.403691, 4.439815],
'354': [8.535562, 7.463452, 6.584512, 5.485725, 4.517310],
'355': [8.650118, 7.552586, 6.672383, 4.517310, 4.594806]},
index=[1, 2, 3, 4, 5]
)
int_range = df1.columns.astype(float)
a = 0.005
b = 0.837
dx = 1
df_dict = df1.to_dict() # convert df to dict for easier operations
integrated_dict = {} # initialize empty dict
d = defaultdict(list) # initialize empty dict of lists for tuples later
integrated_list = []
for k,v in df_dict.items(): # unpack df dict of dicts
for x,y in v.items(): # unpack dicts by column and index (x is index, y is column)
integrated_list.append((k, (((float(k)*float(y)*float(dx))/(a*b))))) #store a list of tuples.
for x,y in integrated_list: # create dict with column header as key and new integrated calc as value (currently a tuple)
d[x].append(y)
d = {k:tuple(v) for k, v in d.items()} # unpack to multiple values
integrated_df = pd.DataFrame.from_dict(d) # to df
integrated_df['Sum'] = integrated_df.iloc[:, :].sum(axis=1)
output (updated to include sum):
350 351 352 353 354 \
0 660539.653524 678928.103226 697410.576822 710302.382557 722004.527599
1 578070.704898 594694.141935 611402.972521 622015.269056 631317.086738
2 505890.250896 521785.529032 537763.142652 547984.294624 556969.473835
3 418189.952210 432314.245161 446512.126165 455795.202628 464025.483871
4 340576.344086 353243.212903 365976.797133 374493.356033 382109.376344
355 Sum
0 733761.502987 4.202947e+06
1 640661.416965 3.678162e+06
2 565996.646356 3.236389e+06
3 383188.781362 2.600026e+06
4 389762.516129 2.206162e+06

Comparing two Dictionaries and Print the Common

I have two tab separated files with multiple columns. I used 2 dictionaries, to store specific column of interest.
import csv
dic1={}
dic2={}
with open("Table1.tsv") as samplefile:
reader = csv.reader(samplefile, delimiter="\t")
columns = zip(*reader)
for column in columns:
A, B, C, D = columns
with open("Table2.tsv") as samplefile1:
reader = csv.reader(samplefile1, delimiter="\t")
columns = zip(*reader)
for column1 in columns:
A1, B1, C1 = columns
dic1['PMID'] = A # the first dictionary storing the data of column "A"
dic2['PMID'] = A1 # the second dictionary storing the data of column "A1"
# statement to compare the data in dic1[PMID] with dic2['PMID'] and print the common
Problem: What is the proper logic /or conditional statement to use to compare the two dictionaries and print the common data in both.
You can use set intersection as:
>>> d1={'a':2,'b':3,'c':4,'d':5}
>>> d2={'a':2,'f':3,'c':4,'b':5,'q':17}
>>> dict(set(d1.items()) & set(d2.items()))
{'a': 2, 'c': 4}
For your specific problem, this is the code:
>>> dic1={}
>>> dic2={}
>>> dic1['PMID']=[1,2,34,2,3,4,5,6,7,3,5,16]
>>> dic2['PMID']=[2,34,1,3,4,15,6,17,31,34,16]
>>> common=list(set(dic1['PMID']) & set(dic2['PMID']))
>>> common
[1, 2, 3, 4, 6, 34, 16]

Categories

Resources