Comparing two Dictionaries and Print the Common - python

I have two tab separated files with multiple columns. I used 2 dictionaries, to store specific column of interest.
import csv
dic1={}
dic2={}
with open("Table1.tsv") as samplefile:
reader = csv.reader(samplefile, delimiter="\t")
columns = zip(*reader)
for column in columns:
A, B, C, D = columns
with open("Table2.tsv") as samplefile1:
reader = csv.reader(samplefile1, delimiter="\t")
columns = zip(*reader)
for column1 in columns:
A1, B1, C1 = columns
dic1['PMID'] = A # the first dictionary storing the data of column "A"
dic2['PMID'] = A1 # the second dictionary storing the data of column "A1"
# statement to compare the data in dic1[PMID] with dic2['PMID'] and print the common
Problem: What is the proper logic /or conditional statement to use to compare the two dictionaries and print the common data in both.

You can use set intersection as:
>>> d1={'a':2,'b':3,'c':4,'d':5}
>>> d2={'a':2,'f':3,'c':4,'b':5,'q':17}
>>> dict(set(d1.items()) & set(d2.items()))
{'a': 2, 'c': 4}
For your specific problem, this is the code:
>>> dic1={}
>>> dic2={}
>>> dic1['PMID']=[1,2,34,2,3,4,5,6,7,3,5,16]
>>> dic2['PMID']=[2,34,1,3,4,15,6,17,31,34,16]
>>> common=list(set(dic1['PMID']) & set(dic2['PMID']))
>>> common
[1, 2, 3, 4, 6, 34, 16]

Related

How to assign dynamic variables calling from a function in python

I have a function which does a bunch of stuff and returns pandas dataframes. The dataframe is extracted from a dynamic list and hence I'm using the below method to return these dataframes.
As soon as I call the function (code in 2nd block), my jupyter notebook just runs the cell infinitely like some infinity loop. Any idea how I can do this more efficiently.
funct(x):
some code which creates multiple dataframes
i = 0
for k in range(len(dynamic_list)):
i += 1
return globals()["df" + str(i)]
The next thing I do is call the function and try to assign it dynamically,
i = 0
for k in range(len(dynamic_list)):
i += 1
globals()["new_df" + str(i)] = funct(x)
I have tried returning selective dataframes from first function and it works just fine, like,
funct(x):
some code returning df1, df2, df3....., df_n
return df1, df2
new_df1, new_df2 = funct(x)
for each dataframe object your code is creating you can simply add it to a dictionary and set the key from your dynamic list.
Here is a simple example:
import pandas as pd
test_data = {"key1":[1, 2, 3], "key2":[1, 2, 3], "key3":[1, 2, 3]}
df = pd.DataFrame.from_dict(test_data)
dataframe example:
key1 key2 key3
0 1 1 1
1 2 2 2
2 3 3 3
I have used a fixed list of values to focus on but this can be dynamic based on however you are creating them.
values_of_interest_list = [1, 3]
Now we can do whatever we want to do with the dataframe, in this instance, I want to filter only data where we have a value from our list.
data_dict = {}
for value_of_interest in values_of_interest_list:
x_df = df[df["key1"] == value_of_interest]
data_dict[value_of_interest] = x_df
To see what we have, we can print out the created dictionary that contains the key we have assigned and the associated dataframe object.
for key, value in data_dict.items():
print(type(key))
print(type(value))
Which returns
<class 'int'>
<class 'pandas.core.frame.DataFrame'>
<class 'int'>
<class 'pandas.core.frame.DataFrame'>
Full sample code is below:
import pandas as pd
test_data = {"key1":[1, 2, 3], "key2":[1, 2, 3], "key3":[1, 2, 3]}
df = pd.DataFrame.from_dict(test_data)
values_of_interest_list = [1, 3]
# Dictionary for data
data_dict = {}
# Loop though the values of interest
for value_of_interest in values_of_interest_list:
x_df = df[df["key1"] == value_of_interest]
data_dict[value_of_interest] = x_df
for key, value in data_dict.items():
print(type(key))
print(type(value))

Create a key:value pair in the first loop and append more values in subsequent loops

How can I create a key:value pair in a first loop and then just append values in subsequent loops?
For example:
a = [1,2,3]
b = [8,9,10]
c = [4,6,5]
myList= [a,b,c]
positions= ['first_position', 'second_position', 'third_position']
I would like to create a dictionary which records the position values for each letter so:
mydict = {'first_position':[1,8,4], 'second_position':[2,9,6], 'third_position':[3,10,5]}
Imagine that instead of 3 letters with 3 values each, I had millions. How could I loop through each letter and:
In the first loop create the key:value pair 'first_position':[1]
In subsequent loops append values to the corresponding key: 'first_position':[1,8,4]
Thanks!
Try this code:
mydict = {}
for i in range(len(positions)):
mydict[positions[i]] = [each[i] for each in myList]
Output:
{'first_position': [1, 8, 4],
'second_position': [2, 9, 6],
'third_position': [3, 10, 5]}
dictionary.get('key') will return None if the key doesn't exist. So, you can check if the value is None and then append it if it isn't.
dict = {}
for list in myList:
for position, val in enumerate(list):
this_position = positions[position]
if dict.get(this_position) is not None:
dict[this_position].append(val)
else:
dict[this_position] = [val]
The zip function will iterate the i'th values of positions, a, b and c in order. So,
a = [1,2,3]
b = [8,9,10]
c = [4,6,5]
positions= ['first_position', 'second_position', 'third_position']
sources = [positions, a, b, c]
mydict = {vals[0]:vals[1:] for vals in zip(*sources)}
print(mydict)
This created tuples which is usually fine if the lists are read only. Otherwise do
mydict = {vals[0]:list(vals[1:]) for vals in zip(*sources)}

How to combine multiple columns from a pandas df into a list

How can you combine multiple columns from a dataframe into a list?
Input:
df = pd.DataFrame(np.random.randn(10000, 7), columns=list('ABCDEFG'))
If I wanted to create a list from column A I would perform:
df1 = df['A'].tolist()
But if I wanted to combine numerous columns into this list it wouldn't be efficient write df['A','B','C'...'Z'].tolist()
I have tried to do the following but it just adds the columns headers to a list.
df1 = list(df.columns)[0:8]
Intended input:
A B C D E F G
0 0.787576 0.646178 -0.561192 -0.910522 0.647124 -1.388992 0.728360
1 0.265409 -1.919283 -0.419196 -1.443241 -2.833812 -1.066249 0.553379
2 0.343384 0.659273 -0.759768 0.355124 -1.974534 0.399317 -0.200278
Intended Output:
[0.787576, 0.646178, -0.561192, -0.910522, 0.647124, -1.388992, 0.728360,
0.265409, -1.919283, -0.419196, -1.443241, -2.833812, -1.066249, 0.553379,
0.343384, 0.659273, -0.759768, 0.355124, -1.974534, 0.399317, -0.200278]
Is this what you are looking for
lst = df.values.tolist()
flat_list = [item for x in lst for item in x]
print(flat_list)
You can using to_dict
df = pd.DataFrame(np.random.randn(10, 10), columns=list('ABCDEFGHIJ'))
df.to_dict('l')
Out[1036]:
{'A': [-0.5611441440595607,
-0.3785906500723589,
-0.19480328695097676,
-0.7472526275034221,
-2.4232786057647457,
0.10506614562827334,
0.4968179288412277,
1.635737019365132,
-1.4286421753281746,
0.4973223222844811],
'B': [-1.0550082961139444,
-0.1420067090193365,
0.30130476834580633,
1.1271866812852227,
0.38587456174846285,
-0.531163142682951,
-1.1335754634118729,
0.5975963084356348,
-0.7361022807495443,
1.4329395663140427],
...}
Or adding values.tolist()
df[list('ABC')].values.tolist()
Out[1041]:
[[0.09552771302434987, 0.18551596484768904, -0.5902249875268607],
[-1.5285190712746388, 1.2922627021799646, -0.8347422966138306],
[-0.4092028716404067, -0.5669107267579823, 0.3627970727410332],
[-1.3546346273319263, -0.9352316948439341, 1.3568726575880614],
[-1.3509518030469496, 0.10487182694997808, -0.6902134363370515]]
Edit : np.concatenate(df[list('ABC')].T.values.tolist())

How to create a nested dictionary from a csv file with N rows in Python

I was looking for a way to read a csv file with an unknown number of columns into a nested dictionary. i.e. for input of the form
file.csv:
1, 2, 3, 4
1, 6, 7, 8
9, 10, 11, 12
I want a dictionary of the form:
{1:{2:{3:4}, 6:{7:8}}, 9:{10:{11:12}}}
This is in order to allow O(1) search of a value in the csv file.
Creating the dictionary can take a relatively long time, as in my application I only create it once, but search it millions of times.
I also wanted an option to name the relevant columns, so that I can ignore unnecessary once
Here's a simple, albeit brittle approach:
>>> d = {}
>>> with io.StringIO(s) as f: # fake a file
... reader = csv.reader(f)
... for row in reader:
... nested = d
... for val in map(int, row[:-2]):
... nested = nested.setdefault(val, {})
... k, v = map(int, row[-2:]) # this will fail if you don't have enough columns
... nested[k] = v
...
>>> d
{1: {2: {3: 4}, 6: {7: 8}}, 9: {10: {11: 12}}}
However, this assumes the number of columns is at least 2.
Here is what I came up with. Feel free to comment and suggest improvements.
import csv
import itertools
def list_to_dict(lst):
# Takes a list, and recursively turns it into a nested dictionary, where
# the first element is a key, whose value is the dictionary created from the
# rest of the list. the last element in the list will be the value of the
# innermost dictionary
# INPUTS:
# lst - a list (e.g. of strings or floats)
# OUTPUT:
# A nested dictionary
# EXAMPLE RUN:
# >>> lst = [1, 2, 3, 4]
# >>> list_to_dict(lst)
# {1:{2:{3:4}}}
if len(lst) == 1:
return lst[0]
else:
data_dict = {lst[-2]: lst[-1]}
lst.pop()
lst[-1] = data_dict
return list_to_dict(lst)
def dict_combine(d1, d2):
# Combines two nested dictionaries into one.
# INPUTS:
# d1, d2: Two nested dictionaries. The function might change d1 and d2,
# therefore if the input dictionaries are not to be mutated,
# you should pass copies of d1 and d2.
# Note that the function works more efficiently if d1 is the
# bigger dictionary.
# OUTPUT:
# The combined dictionary
# EXAMPLE RUN:
# >>> d1 = {1: {2: {3: 4, 5: 6}}}
# >>> d2 = {1: {2: {7: 8}, 9: {10, 11}}}
# >>> dict_combine(d1, d2)
# {1: {2: {3: 4, 5: 6, 7: 8}, 9: {10, 11}}}
for key in d2:
if key in d1:
d1[key] = dict_combine(d1[key], d2[key])
else:
d1[key] = d2[key]
return d1
def csv_to_dict(csv_file_path, params=None, n_row_max=None):
# NAME: csv_to_dict
#
# DESCRIPTION: Reads a csv file and turns relevant columns into a nested
# dictionary.
#
# INPUTS:
# csv_file_path: The full path to the data file
# params: A list of relevant column names. The resulting dictionary
# will be nested in the same order as parameters in 'params'.
# Default is None (read all columns)
# n_row_max: The maximum number of rows to read. Default is None
# (read all rows)
#
# OUTPUT:
# A nested dictionary containing all the relevant csv data
csv_dictionary = {}
with open(csv_file_path, 'r') as csv_file:
csv_data = csv.reader(csv_file, delimiter=',')
names = next(csv_data) # Read title line
if not params:
# A list of column indices to read from csv
relevant_param_indices = list(range(0, len(names) - 1))
else:
# A list of column indices to read from csv
relevant_param_indices = []
for name in params:
if name not in names:
# Parameter name is not found in title line
raise ValueError('Could not find {} in csv file'.format(name))
else:
# Get indices of the relevant columns
relevant_param_indices.append(names.index(name))
for row in itertools.islice(csv_data, 1, n_row_max):
# Get a list containing relevant columns only
relevant_cols = [row[i] for i in relevant_param_indices]
# Turn the string to numbers. Not necessary
float_row = [float(element) for element in relevant_cols]
# Build nested dictionary
csv_dictionary = dict_combine(csv_dictionary, list_to_dict(float_row))
return csv_dictionary

Load data from dictionary to csv

I have a dictionary in this format:
data_dict = {'a' : [1,2,3], 'b' : [[4,5],[6,7],[8,9]]}
What I would like to do is parse data from dictionary to csv file in a column 'format'. So key would be a title, and values goes afterwards, so the output should look like:
a b
1 [4,5]
2 [6,7]
3 [8,9]
I have tried to use csv.DictWriter or csv.writer but nothing have worked out for me.
You can use zip to aggregates elements from multiple iterables:
>>> rows = zip([1,2,3], [[4,5],[6,7],[8,9]])
>>> for row in rows:
... print(row)
...
(1, [4, 5])
(2, [6, 7])
(3, [8, 9])
import csv
import sys
data_dict = {'a' : [1,2,3], 'b' : [[4,5],[6,7],[8,9]]}
keys = sorted(data_dict) # to get ordered keys
values = [data_dict[key] for key in keys]
writer = csv.writer(sys.stdout, delimiter='\t') # Replace `sys.stdout` as you need
writer.writerow(keys)
for row in zip(*values):
writer.writerow(row)

Categories

Resources