How to access lists in a dictionary and compare them? - python

data.txt TXT file, NOT py
Mich->
Anne
Luke
Carl
Marl->
Fill
Luke
Anne->
Luke
Fill
python file:
with open('data.txt') as f:
dati = f.read()
dati = dati.strip()
dati = dati.splitlines()
diz = {}
for items in dati:
if items[-2:] == '->':
key = items.replace('->', '')
if key in diz:
continue
else:
diz[key]=[]
else:
diz[key].append(items)
print(diz)
OUTPUT: d = {'Mich': ['Anne', 'Luke', 'Carl'], 'Marl': ['Fill', 'Luke'], 'Anne': ['Luke', 'Fill']}
I would like understand how can I access to lists and compare names here if elements of d are in an other file (data.txt).
for example, if I have to know which keys have the same names, what I have to do?
Thanks everbody.
I tried set, for do intersection, but I can't do this with this lists
for output I thought about (mich, marl and anne know luke)
I searched everywhere on internet how analise lists in dictionary, maybe its impossible?

One way to iterate the list would be like this:
d = {'Mich': ['Anne', 'Luke', 'Carl'],
'Marl': ['Fill', 'Luke'],
'Anne': ['Luke', 'Fill']}
names = []
for k, v in d.items():
print(k,"has: ")
for items in v:
print(items)
# here you can check if this data is in other file
names.append(items)
print(names)
This will result in:
Mich has:
Anne
Luke
Carl
Marl has:
Fill
Luke
Anne has:
Luke
Fill
['Anne', 'Luke', 'Carl', 'Fill', 'Luke', 'Luke', 'Fill']
You Should give more information about how the data is structured in the other file too, this all I can do with the information given

Related

Creating a dictionary with multiple values per key from specific columns in a dataframe

email_ad Name Manager_Name Manager_Band_level
example_email#gmail.com. Tom Banks Boss1 30
sample_email#gmail.com. Bill Bob Boss2 40
How do I create a dictionary with email_ad as the key and each of the other columns in my data frame as values. I tried this
mydict={df['email_ad']:[df['Name'],df['Manager Name'], df['Manager_Band_level']]}
This did not work. I have been stuck forever on this. Any help would be amazing!
Use zip with dict as:
mydict = dict(zip(df['email_ad'], df.iloc[:, 1:].to_numpy().tolist()))
mydict
{'example_email#gmail.com.': ['Tom Banks', 'Boss1', 30],
'sample_email#gmail.com.': ['Bill Bob', 'Boss2', 40]}

Get key and values from nested dictionary from a list fortmat into a Dataframe

I have very deeply nested list of dictionaries. I am trying to capture 'keys' from specific nested dictionary and convert it to a data frame. How do I do this? I have basic dictionary knowledge to generate keys, I have tried appending [] and {} and it didn't quite work. Any guidance appreciated!
import pandas as pd
from pprint import pprint
d = {'Main':{
'SecondLevel':
[{'Identifier':'abc',
'StudentInfo':{'Name':'Mike','Grade':'1',
'TeachersAssigned':[{'Name':'Paul'},
{'Name':'Smith'}
]}},
{
'StudentInfo':{'Name':'Mandy','Grade':'1',
'TeachersAssigned':[{'Name':'Baker'},
{'Name':'Smith'}
]}}]}}
pprint(d)
list_dict = []
for doc in d['Main']['SecondLevel']:
identifier = '' if doc.get('Identifier') is None else doc['Identifier']
studentname = doc['StudentInfo']['Name']
list_dict.append(identifier)
list_dict.append(studentname)
for teach in doc['StudentInfo']['TeachersAssigned']:
teachers_name = teach['Name']
list_dict.append(teachers_name)
pprint(list_dict)
>>> ['abc', 'Mike', 'Paul', 'Smith', '', 'Mandy', 'Baker', 'Smith']
pd.DataFrame(list_dict)
>>> single column with list of the values from above
I am trying to get it to like this:
Identifier StudentInfo TeachersAssigned
abc Mike Paul
abc Mike Smith
Mandy Baker
Mandy Smith
Am I doing the for loop wrong for list comprehension?
Given your dictionary this is how I manage. But as I explained before, you cannot have columns of different length in a DataFrame, therefore you can use np.nan
import pandas as pd
import numpy as np
import pandas as pd
d = {'Main':{
'SecondLevel':
[{'Identifier':'abc',
'StudentInfo':{'Name':'Mike','Grade':'1',
'TeachersAssigned':[{'Name':'Paul'},
{'Name':'Smith'}
]}},
{
'StudentInfo':{'Name':'Mandy','Grade':'1',
'TeachersAssigned':[{'Name':'Baker'},
{'Name':'Smith'}
]}}]}}
data = {'Identifier':[],'Name':[],'TeachersAssigned':[]}
for i in range(len(d['Main']['SecondLevel'])):
for j in range(len(d['Main']['SecondLevel'][i]['StudentInfo']['TeachersAssigned'])):
try:
data['Identifier'].append(d['Main']['SecondLevel'][i]['Identifier'])
except KeyError:
data['Identifier'].append(np.nan)
data['Name'].append(d['Main']['SecondLevel'][i]['StudentInfo']['Name'])
data['TeachersAssigned'].append(d['Main']['SecondLevel'][i]['StudentInfo']['TeachersAssigned'][j]['Name'])
df = pd.DataFrame(data)
print(df)
Output:
Identifier Name TeachersAssigned
0 abc Mike Paul
1 abc Mike Smith
2 NaN Mandy Baker
3 NaN Mandy Smith

Can I assign multiple unique IDs to a duplicate group using dictionaries?

I'm attempting to merge two datasets using a unique ID for clients that are present in only one dataset. I've assigned the unique IDs to each full name as a dictionary, but each person is unique, even if they have the same name. I need to assign each unique ID iteratively to each instance of that person's name.
example of the dictionary:
{'Corey Davis': {'names_id':[1472]}, 'Jose Hernandez': {'names_id': [3464,15202,82567,98472]}, ...}
I've already attempted using the .map() function as well as
referrals['names_id'] = referrals['full_name'].copy()
for key, val in m.items():
referrals.loc[referrals.names_id == key, 'names_id'] = val
but of course, it only assigns the last value encountered, 98472.
I am hoping for something along the lines of:
full_name names_id \
Corey Davis 1472
Jose Hernandez 3464
Jose Hernandez 15202
Jose Hernandez 82657
Jose Hernandez 98472
but I get
full_name names_id \
Corey Davis 1472
Jose Hernandez 98472
Jose Hernandez 98472
Jose Hernandez 98472
Jose Hernandez 98472
Personally, what I would do is:
inputs = [{'full_name':'test', 'names_id':[1]}, {'full_name':'test2', 'names_id':[2,3,4]}]
# Create list of dictionaries for each 'entry'
entries = []
for input in inputs:
for name_id in input['names_id']:
entries.append({'full_name': input['full_name'], 'names_id': name_id})
# Now you have a list of dicts - each being one line of your table
# entries is now
# [{'full_name': 'test', 'names_id': 1},
# {'full_name': 'test2', 'names_id': 2},
# {'full_name': 'test2', 'names_id': 3},
# {'full_name': 'test2', 'names_id': 4}]
# I like pandas and use it for its dataframes, you can create a dataframe from list of dicts
import pandas as pd
final_dataframe = pd.DataFrame(entries)

How do I convert a csv file with one column to a dictionary in Python?

I need some help with a assignment for python.
The task is to convert a .csv file to a dictionary, and do some changes. The problem is that the .csv file only got 1 column, but 3 rows.
The .csv file looks like this in excel
A B
1.male Bob West
2.female Hannah South
3.male Bruce North
So everything is in column A.
My code looks so far like this:
import csv
reader = csv.reader(open("filename.csv"))
d={}
for row in reader:
d[row[0]]=row[0:]
print(d)
And the output
{'\ufeffmale Bob West': ['\ufeffmale Bob West'], 'female Hannah South':
['female Hannah South'], 'male Bruce North': ['male Bruce North']}
but I want
{1 : Bob West, 2 : Hannah South, 3 : Bruce North}
The male/female should be changed with ID, (1,2,3). And i donĀ“t know how to figure out the 1 column thing.
Thanks in advance.
You can use dict comprehension and enumerate the csv object,
import csv
reader = csv.reader(open("filename.csv"))
x = {num+1:name[0].split(" ",1)[-1].rstrip() for (num, name) in enumerate(reader)}
print(x)
# output,
{1: 'Bob West', 2: 'Hannah South', 3: 'Bruce North'}
Or you can do it without using csv module simply by reading the file,
with open("filename.csv", 'r') as t:
next(t) # skip first line
x = {num+1:name.split(" ",1)[-1].strip() for (num, name) in enumerate(t)}
print(x)
# output,
{1: 'Bob West', 2: 'Hannah South', 3: 'Bruce North'}
as per Simit, but using regular expressions and realising that your 1. and A and B are you just trying to explain Excel cell and column identifiers
import re, csv
reader = csv.reader(open("data.csv"))
out = {}
for i, line in enumerate(reader, 1):
m = re.match(r'^(male|female) (.*)$', line)
if not m:
print(f"error processing {repr(line)}")
continue
out[i] = m[2]
print(out)
I like to use Pandas for stuff like this. You can use Pandas to import it and then export it to a dict.
import pandas as pd
df = pd.read_csv('test.csv',header=-1)
# Creates new columns in the dataframe based on the rules of the question
df['Name']=df[0].str.split(' ',1).str.get(1)
df['ID'] = df[0].str.split('.',1).str.get(0)
The dataframe should have three columns:
0 - This is the raw data.
Name - The name as defined in the problem.
ID - The number that comes before the period.
I didn't include gender, but it really won't fit into the dict. I'm also assuming your data does not have a header.
The next part converts your pandas dataframe to a dict in the output that you want.
output_dict = dict()
for i in range(len(df[['ID','Name']])):
output_dict[df.iloc[i]['ID']] = df.iloc[i]['Name']
This should work for the given input:
data.csv:
1.male Bob West,
2.female Hannah South,
3.male Bruce North,
Code:
import csv
reader = csv.reader(open("data.csv"))
d = {}
for row in reader:
splitted = row[0].split('.')
# print splitted[0]
# print ' '.join(splitted[1].split(' ')[1:])
d[splitted[0]] = ' '.join(splitted[1].split(' ')[1:])
print(d)
Output
{'1': 'Bob West', '3': 'Bruce North', '2': 'Hannah South'}
import cv with open('Employee_address.txt', mode='r') as CSV_file:
csv_reader= csv.DirectReader(csv_file)
life_count=0
for row in csv_reader:
if line_count==0:
print(f'columns names are {",".join()}')
line += 1
print(f'\t{row["name"]} works in the {row["department"]} department, and lives in{row["living address"]}.line_count +=1 print(f'Processed {line_count} lines.')

Aggregate sets according to keys with defaultdict python

I have a bunch of lines in text with names and teams in this format:
Team (year)|Surname1, Name1
e.g.
Yankees (1993)|Abbot, Jim
Yankees (1994)|Abbot, Jim
Yankees (1993)|Assenmacher, Paul
Yankees (2000)|Buddies, Mike
Yankees (2000)|Canseco, Jose
and so on for several years and several teams.
I would like to aggregate names of players according to team (year) combination deleting any duplicated names (it may happen that in the original database there is some redundant information). In the example, my output should be:
Yankees (1993)|Abbot, Jim|Assenmacher, Paul
Yankees (1994)|Abbot, Jim
Yankees (2000)|Buddies, Mike|Canseco, Jose
I've written this code so far:
file_in = open('filein.txt')
file_out = open('fileout.txt', 'w+')
from collections import defaultdict
teams = defaultdict(set)
for line in file_in:
items = [entry.strip() for entry in line.split('|') if entry]
team = items[0]
name = items[1]
teams[team].add(name)
I end up with a big dictionary made up by keys (the name of the team and the year) and sets of values. But I don't know exactly how to go on to aggregate things up.
I would also be able to compare my final sets of values (e.g. how many players have Yankee's team of 1993 and 1994 in common?). How can I do this?
Any help is appreciated
You can use a tuple as a key here, for eg. ('Yankees', '1994'):
from collections import defaultdict
dic = defaultdict(list)
with open('abc') as f:
for line in f:
key,val = line.split('|')
keys = tuple(x.strip('()') for x in key.split())
vals = [x.strip() for x in val.split(', ')]
dic[keys].append(vals)
print dic
for k,v in dic.iteritems():
print "{}({})|{}".format(k[0],k[1],"|".join([", ".join(x) for x in v]))
Output:
defaultdict(<type 'list'>,
{('Yankees', '1994'): [['Abbot', 'Jim']],
('Yankees', '2000'): [['Buddies', 'Mike'], ['Canseco', 'Jose']],
('Yankees', '1993'): [['Abbot', 'Jim'], ['Assenmacher', 'Paul']]})
Yankees(1994)|Abbot, Jim
Yankees(2000)|Buddies, Mike|Canseco, Jose
Yankees(1993)|Abbot, Jim|Assenmacher, Paul

Categories

Resources