Extracting columns from a string or list in python

Extracting columns from a string or list in python - python

I'm trying to extract columns from a string of values in python. The string of values looks like follows -
CN=Unix ADISID,OU=SA,OU=DGO,DC=dom,DC=ab,DC=com,1001
CN=1002--DS,OU=Process,DC=dom,DC=ab,DC=com,1002
CN=1003--Cyb,OU=SA,OU=DGO,DC=dom,DC=ab,DC=com,1003
CN=Doe--Joe,OU=Adm,DC=dom,DC=ab,DC=com,d1004
CN=cruise--bob,OU=SA,OU=DGO,DC=dom,DC=ab,DC=com,d1005
Now I would like to extract columns from this string with column headers like CN, OU1, OU2,DC1, DC2, DC3,ID. The number of OU and DC values are different in every line so if they are not present in a line, I would like to keep that column as blank. Also, I'm using the following piece of code to generate the above string.
result = l.search_s(base, ldap.SCOPE_SUBTREE, criteria, attributes)
results=""
for i in [entry for dn, entry in result if isinstance(entry, dict)]:
results += str(i.get('distinguishedName')[0] +","+ i.get('sAMAccountName')[0] + "\n").replace("\, ","--")
print results
Will it be easier if I create results as a list to begin with?

To get the "fields left blank" behavior, you're going to have to count the max number of each field. I believe that CN is unique, so that should always be 1.
result = l.search_s(base, ldap.SCOPE_SUBTREE, criteria, attributes)
users = []
for i in [entry for dn, entry in result if isinstance(entry, dict)]:
dn = i.get('distinguishedName')[0].replace('\, ', '--').split(',')
info = collections.defaultdict(list)
info['id'] = i.get('sAMAccountName')[0]
for part in dn:
key,value = part.split('=',1)
info[key].append(value)
users.append(info)
max_cn = max(map(lambda u: len(u['CN']), users))
assert max_cn == 1
max_ou = max(map(lambda u: len(u['OU']), users))
max_dn = max(map(lambda u: len(u['DN']), users))
numflds = max_cn + max_ou + max_dn
fields = []
for u in users:
f = [u['CN']]
ou = u['OU'] + [''] * max_ou
f.extend(ou[:max_ou])
dn = u['DN'] + [''] * max_dn
f.extend(dn[:max_dn])
f.append(u['id'])

For each line:
pairs = [kv.split('=') for kv in line.split(',')]
for pair in pairs:
if len(pair) == 1:
pair.insert(0, 'ID')
Now you have something like this:
[['CN', 'Unix ADISID'],
['OU', 'SA'],
['OU', 'DGO'],
['DC', 'dom'],
['DC', 'ab'],
['DC', 'com'],
['ID', '1001']]
Then:
from collections import defaultdict
mapping = defaultdict(list)
for k,v in pairs:
mapping[k].append(v)
Which gives you:
{'CN': ['Unix ADISID'],
'DC': ['dom', 'ab', 'com'],
'ID': ['1001'],
'OU': ['SA', 'DGO']}

Related

Reformating a dictionary of list based on items in another list

I have a dictionary of lists like
source = {"name":["hans","james","mat"],"country":["spain"],"language":["english","french"]}
and another list like
data_not_avail = ["hans","spain","mat"]
How is it possible to reformat source dictionary into the following format
{
"exist":{"name":["james"], "language":["english","french"]},
"not_exist":{"name":["hans","mat"], "country":["spain"]}
}
I was trying to solve by finding the key of item which are present in list but it was not a success
data_result = {}
keys_list = []
for v in data_not_avail:
keys = [key for key, value in source.items() if v in value]
data_result.update({keys[0]:[v]})
keys_list.extend(keys)

This is a approach, you can use a list comprehension (or python built in filter) to filter every element within source lists with the content of data_not_avail.
data = {"exist": {}, "not_exist": {}}
for key, value in source.items():
data["exist"][key] = [v for v in value if v not in data_not_avail]
data["not_exist"][key] = [v for v in value if v in data_not_avail]
# if you dont need empty list in the result
if not data["exist"][key]:
del data["exist"][key]
if not data["not_exist"][key]:
del data["not_exist"][key]

Naive way of solving it is this, check it out.
values = list(source.values())
exist_values = []
not_values = []
for l in values:
temp_exist = []
temp_not = []
for item in l:
if item not in data_not_avail:
temp_exist.append(item)
else:
temp_not.append(item)
exist_values.append(temp_exist)
not_values.append(temp_not)
exist = {}
not_exist = {}
keys = ['name', 'language', 'country']
for i,key in enumerate(keys):
if len(exist_values[i]) != 0:
exist[key] = exist_values[i]
if len(not_values[i]) != 0:
not_exist[key] = not_values[i]
print(exist, not_exist)
#{'name': ['james'], 'country': ['english', 'french']}
#{'name': ['hans', 'mat'], 'language': ['spain']}

How to make a dictionary from a txt file?

Assuming a following text file (dict.txt) has
1 2 3
aaa bbb ccc
the dictionary should be {1: aaa, 2: bbb, 3: ccc} like this
I did:
d = {}
with open("dict.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val
print (d)
but it didn't work. I think it is because of the structure of txt file.

The data which you want to be keys are in first line, and all the data which you want to be as values are in second line.
So, do something like this:
with open(r"dict.txt") as f: data = f.readlines() # Read 'list' of all lines
keys = list(map(int, data[0].split())) # Data from first line
values = data[1].split() # Data from second line
d = dict(zip(keys, values)) # Zip them and make dictionary
print(d) # {1: 'aaa', 2: 'bbb', 3: 'ccc'}

Updated answer based on OP edit:
#Initialize dict
d = {}
#Read in file by newline splits & ignore blank lines
fobj = open("dict.txt","r")
lines = fobj.read().split("\n")
lines = [l for l in line if not l.strip() == ""]
fobj.close()
#Get first line (keys)
key_list = lines[0].split()
#Convert keys to integers
key_list = list(map(int,key_list))
#Get second line (values)
val_list = lines[1].split()
#Store in dict going through zipped lists
for k,v in zip(key_list,val_list):
d[k] = v

First create separate list for keys and values, with condition
like :
if (idx % 2) == 0:
keys = line.split()
values = lines[idx + 1].split()
then combine both the lists
d = {}
# Get all lines in list
with open("dict.txt") as f:
lines = f.readlines()
for idx, line in enumerate(lines):
if (idx % 2) == 0:
# Get the key list
keys = line.split()
# Get the value list
values = lines[idx + 1].split()
# Combine both the lists in dictionary
d.update({ keys[i] : values[i] for i in range(len(keys))})
print (d)

Problem of incorrect output for dictionary returned from file

File contains student ID and ID of the solved problem.
Example:
1,2
1,4
1,3
2,1
2,2
2,3
2,4
The task is to write a function which will take a filename as an argument and return a dictionary with a student ID and amount of solved tasks.
Example output:
{1:3, 2:4}
My code which doesn't support the correct output. Please, help me find a mistake and a solution.
import collections
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary = {key: collections.Counter(str(value))}
return dictionary

Since you only care about the sum, not the individual exercises, you can use a Counter on the first column:
def solved_tasks(filename):
with open(filename) as in_stream:
counts = collections.Counter(
line.partition(',')[0] # first column ...
for line in in_stream if line # ... of every non-empty row
)
return {int(key): value for key, value in counts.items()}

Assuming that you want to save the repeated instances of student id, you can use a defaultdict and save the problems solved by each student as a list in your dictionary:
import collections
dictionary = collections.defaultdict(list)
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary[key].append(value)
return dictionary
Output:
defaultdict(<type 'list'>, {'1': ['2', '4', '3'], '2': ['1', '2', '3', '4']})
If you want the sum:
def solved_tasks(filename):
with open(filename) as f:
for line in f.readlines():
key,value = line.strip().split(',')
dictionary[key] += 1
return dictionary
Output:
defaultdict(<type 'int'>, {'1': 3, '2': 4})

you can count how often a key appears
marks = """1,2
"1,4
"1,3
"2,1
"2,2
"2,3
"2,4
"2,4"""
dict = {}
for line in marks.split("\n"):
key,value = line.strip().split(",")
dict[key] = dict.get(key,[]) + [value]
for key in dict:
dict[key] = len(set(dict[key])) # eliminate duplicates
the dict.get(key,[]) method returns an empty list if the key doesn't exist in the dict as a default parameter.
#Edit: You said there may contain duplicates. This method would eliminate all duplicates.
#Edit: Added multilines with """

def solved_tasks(filename):
res = {}
values=""
with open(filename, "r") as f:
for line in f.readlines():
values += line.strip()[0] #take only the first value and concatinate with the values string
value = values[0] #take the first value
res[int(value)] = values.count(value) #put it in the dict
for i in values: #loop the values
if i != value: # if the value is not the first value, then the value is the new found value
value = i
res[int(value)] = values.count(value) #add the new value to the dict
return res

Splitting a string into multiple variables that are subject to change

I have a string like this
b'***************** Winner Prediction *****************\nDate: 2019-08-27 07:00:00\nRace Key: 190827082808\nTrack Name: Mornington\nPosition Number: 8\nName: CONSIDERING\nFinal Odds: 17.3\nPool Final: 37824.7\n'
And in Python, I want to split this string into variables such as:
Date =
Race_Key =
Track_Name =
Name =
Final_Odds =
Pool_Final =
However, the string will always be in the same format, but the values will always be different, for example, the names may have two words in them so it needs to work with all cases.
I have tried:
s = re.split(r'[.?!:]+', pred0)
def search(word, sentences):
return [i for i in sentences if re.search(r'\b%s\b' % word, i)]
But no luck there.

you can split the string and parse it into a dict like this:
s = s.decode() #decode the byte string
n = s.split('\n')[1:-1] #split the string, drop the Winner Prediction and resulting last empty list entry
keys = [key.split(': ')[0].replace(': ','') for key in n] #get keys
vals = [val.split(': ')[1] for val in n] #get values for keys
results = dict(zip(keys,vals)) #store in dict
result :
Date 2019-08-27 07:00:00
Race Key 190827082808
Track Name Mornington
Position Number 8
Name CONSIDERING
Final Odds 17.3
Pool Final 37824.7

You can use the following:
return [line.split(":", 1)[-1].strip() for line in s.splitlines()[1:]]
This will return (for your example input):
['2019-08-27 07:00:00', '190827082808', 'Mornington', '8', 'CONSIDERING', '17.3', '37824.7']

Maybe you can try this:
p = b'***************** Winner Prediction *****************\nDate: 2019-08-27 07:00:00\nRace Key: 190827082808\nTrack Name: Mornington\nPosition Number: 8\nName: CONSIDERING\nFinal Odds: 17.3\nPool Final: 37824.7\n'
out = p.split(b"\n")[:-1][1:]
d = {}
for i in out:
temp = i.split(b":")
key = temp[0].decode()
value = temp[1].strip().decode()
d[key] = value
output would be:
{'Date': '2019-08-27 07',
'Race Key': '190827082808',
'Track Name': 'Mornington',
'Position Number': '8',
'Name': 'CONSIDERING',
'Final Odds': '17.3',
'Pool Final': '37824.7'}

Adding multiple dictionaries to a key in python dictionary

I am trying to add multiple dictionaries to a key.
e.g.
value = { column1 : {entry1 : val1}
{entry2 : val2}
column2 : {entry3 : val3}
{entry4 : val4}
}
What exactly I am trying to do with this code is:
There is a file.txt which has columns and valid entries for that header. I am trying to make a dictionary with columns as key and for each column another dictionary for each valid entry.
So I am parsing the text file line by line to find the pattern for column and entries and storing it in a variable, check if the column(which is a key) already exists in the dictionary, if exists then add another dictionary to the column, if not create a new entry. I Hope this makes sense.
Sample contents of file.txt
blah blah Column1 blah blah
entry1 val1
entry2 val2
blah blah Column2 blah blah
entry3 val3
entry4 val4
My code:
from __future__ import unicode_literals
import os, re, string, gzip, fnmatch, io
from array import *
header = re.compile(...) #some regex
valid_entries = re.compile(---) #some regex
matches=[]
entries=[]
value = {'MONTH OF INTERVIEW' : {'01': 'MIN VALUE'}}
counter = 0
name = ''
f =open(r'C:/file.txt')
def exists(data, name):
for key in data.keys():
if key == name :
print "existing key : " + name
return True
else :
return False
for line in f:
col = ''
ent = ''
line = re.sub(ur'\u2013', '-', line)
line = re.sub(ur'\u2026', '_', line)
m = header.match(line)
v = valid_entries.match(line)
if m:
name= ''
matches.append(m.groups())
_,_, name,_,_= m.groups()
#print "name : " + name
if v:
entries.append(v.groups())
ent,col= v.groups()
#print v.groups()
#print "col :" + col
#print "ent :" + ent
if (name is not None) and (ent is not None) and (col is not None):
print value
if exists(value, name):
print 'inside existing loop'
value[name].update({ent:col})
else:
value.update({name:{ent:col}})
print value
problem with this code is , it is replacing the values of the sub dictionary and also it is not adding all the values to the dictionary.
I am new to python, so this could be a naive approach to handle this kind of situation. If you think there is a better way of getting what I want, I would really appreciate if you tell me.

Dictionaries have only one value per key. The trick is to make that value a container too, like a list:
value = {
'column1': [{entry1 : val1}, {entry2 : val2}]
'column2': [{entry3 : val3}, {entry4 : val4}]
}
Use dict.setdefault() to insert a list value when there is no value yet:
if name is not None and ent is not None and col is not None:
value.setdefault(name, []).append({ent: col})
You could just make the values one dictionary with multiple (ent, col) key-value pairs here:
if name is not None and ent is not None and col is not None:
value.setdefault(name, {})[ent] = col
Your exists() function was overcomplicating a task dictionaries excel at; testing for a key is done using in instead:
if name in value:
would have sufficed.

I would keep the keys as a list of dictionaries, so you can extend or append
>>> d = {}
>>> d[1] = [{'a': 1}]
>>> d[1].append({'b':2})
>>> d
{1: [{'a': 1}, {'b': 2}]}

You can use defaultdict and regex for this (demo here):
with open('/path/to/file.txt', 'rU') as f: # read the contents from the file
lines = f.readlines()
import re
from collections import defaultdict
d = defaultdict(list) # dict with default value: []
lastKey = None
for line in lines:
m = re.search('Column\d',line) # search the current line for a key
if m: lastKey = m.group()
else:
m = re.search('(?<=entry\d ).*',line) # search the current line for a value
if m: d[lastKey].append(m.group()) # append the value
Output:
[('Column1', ['val1', 'val2']), ('Column2', ['val3', 'val4'])]
Note: Of course, the above code assumes your file.txt was formatted as in your example. For your real file.txt data you might have to adjust the regex.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting columns from a string or list in python - python

Related

Reformating a dictionary of list based on items in another list

How to make a dictionary from a txt file?

Problem of incorrect output for dictionary returned from file

Splitting a string into multiple variables that are subject to change

Adding multiple dictionaries to a key in python dictionary

Categories

Resources