string to OrderedDict conversion in python - python

i have created a python Ordered Dictionary by importing collections and stored it in a file named 'filename.txt'. the file content looks like
OrderedDict([(7, 0), (6, 1), (5, 2), (4, 3)])
i need to make use of this OrderedDict from another program. i do it as
myfile = open('filename.txt','r')
mydict = myfile.read()
i need to get 'mydict' as of Type
<class 'collections.OrderedDict'>
but here, it comes out to be of type 'str'.
is there any way in python to convert a string type to OrderedDict type? using python 2.7

You could store and load it with pickle
import cPickle as pickle
# store:
with open("filename.pickle", "w") as fp:
pickle.dump(ordered_dict, fp)
# read:
with open("filename.pickle") as fp:
ordered_dict = pickle.load(fp)
type(ordered_dict) # <class 'collections.OrderedDict'>

The best solution here is to store your data in a different way. Encode it into JSON, for example.
You could also use the pickle module as explained in other answers, but this has potential security issues (as explained with eval() below) - so only use this solution if you know that the data is always going to be trusted.
If you can't change the format of the data, then there are other solutions.
The really bad solution is to use eval() to do this. This is a really really bad idea as it's insecure, as any code put in the file will be run, along with other reasons
The better solution is to manually parse the file. The upside is that there is a way you can cheat at this and do it a little more easily. Python has ast.literal_eval() which allows you to parse literals easily. While this isn't a literal as it uses OrderedDict, we can extract the list literal and parse that.
E.g: (untested)
import re
import ast
import collections
with open(filename.txt) as file:
line = next(file)
values = re.search(r"OrderedDict\((.*)\)", line).group(1)
mydict = collections.OrderedDict(ast.literal_eval(values))

This is not a good solution but it works. :)
#######################################
# String_To_OrderedDict
# Convert String to OrderedDict
# Example String
# txt = "OrderedDict([('width', '600'), ('height', '100'), ('left', '1250'), ('top', '980'), ('starttime', '4000'), ('stoptime', '8000'), ('startani', 'random'), ('zindex', '995'), ('type', 'text'), ('title', '#WXR##TU##Izmir##brief_txt#'), ('backgroundcolor', 'N'), ('borderstyle', 'solid'), ('bordercolor', 'N'), ('fontsize', '35'), ('fontfamily', 'Ubuntu Mono'), ('textalign', 'right'), ('color', '#c99a16')])"
#######################################
def string_to_ordereddict(txt):
from collections import OrderedDict
import re
tempDict = OrderedDict()
od_start = "OrderedDict([";
od_end = '])';
first_index = txt.find(od_start)
last_index = txt.rfind(od_end)
new_txt = txt[first_index+len(od_start):last_index]
pattern = r"(\(\'\S+\'\,\ \'\S+\'\))"
all_variables = re.findall(pattern, new_txt)
for str_variable in all_variables:
data = str_variable.split("', '")
key = data[0].replace("('", "")
value = data[1].replace("')", "")
#print "key : %s" % (key)
#print "value : %s" % (value)
tempDict[key] = value
#print tempDict
#print tempDict['title']
return tempDict

Here's how I did it on Python 2.7
from collections import OrderedDict
from ast import literal_eval
# Read in string from text file
myfile = open('filename.txt','r')
file_str = myfile.read()
# Remove ordered dict syntax from string by indexing
file_str=file_str[13:]
file_str=file_str[:-2]
# convert string to list
file_list=literal_eval(file_str)
header=OrderedDict()
for entry in file_list:
# Extract key and value from each tuple
key, value=entry
# Create entry in OrderedDict
header[key]=value
Again, you should probably write your text file differently.

Related

Python - Encoding items in a list

I have a script that works fine in Python2 but I can't get it to work in Python3. I want to base64 encode each item in a list and then write it to a json file. I know I can't use map the same way in Python3 but when I make it a list I get a different error.
import base64
import json
list_of_numbers = ['123456', '234567', '345678']
file = open("orig.json", "r")
json_object = json.load(file)
list = ["[{\"number\":\"" + str(s) + "\"}]" for s in list_of_numbers]
base64_bytes = map(base64.b64encode, list)
json_object["conditions"][1]["value"] = base64_bytes
rule = open("new.json", "w")
json.dump(json_object, rule, indent=2, sort_keys=True)
rule.close()
I'm not sure if your error is related to this, but here's what I think might be the problem. When you map a function, the returned value becomes a map object. To get the results as a list again, you need to cast it back to a list after you map your function. In other words:
base64_bytes = list(map(base64.b64encode, list))
P.S. It's better to avoid list as your variable name since it's the name of the built-in function list.

how to print after the keyword from python?

i have following string in python
b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
I want to print the all alphabet next to keyword "name" such that my output should be
waqas
Note the waqas can be changed to any number so i want print any name next to keyword name using string operation or regex?
First you need to decode the string since it is binary b. Then use literal eval to make the dictionary, then you can access by key
>>> s = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
>>> import ast
>>> ast.literal_eval(s.decode())['name']
'waqas'
It is likely you should be reading your data into your program in a different manner than you are doing now.
If I assume your data is inside a JSON file, try something like the following, using the built-in json module:
import json
with open(filename) as fp:
data = json.load(fp)
print(data['name'])
if you want a more algorithmic way to extract the value of name:
s = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a",\
"persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],\
"name":"waqas"}'
s = s.decode("utf-8")
key = '"name":"'
start = s.find(key) + len(key)
stop = s.find('"', start + 1)
extracted_string = s[start : stop]
print(extracted_string)
output
waqas
You can convert the string into a dictionary with json.loads()
import json
mystring = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
mydict = json.loads(mystring)
print(mydict["name"])
# output 'waqas'
First you need to convert the string into a proper JSON Format by removing b from the string using substring in python suppose you have a variable x :
import json
x = x[1:];
dict = json.loads(x) //convert JSON string into dictionary
print(dict["name"])

Converting a list of dicts to wiki-format

I have a list of dicts that looks like this (might look like this, I really have no idea upfront what data they contain):
data = [
{'k1': 'v1-a', 'k2': 'v2-a', 'k3': 'v3-a'},
{'k1': 'v1-b', 'k3': 'v3-b'},
{'k1': 'v1-c', 'k2': 'v2-c', 'k3': 'v3-c'},
{'k1': 'v1-d', 'k2': 'v2-d', 'k3': 'v3-d'}
]
The goal is to make it into a string that looks like this:
||k1||k2||k3||
|v1-a|v2-a|v3-a|
|v1-b||v3-b|
|v1-c|v2-c|v3-c|
|v1-d|v2-d|v3-d|
This is for the confluence wiki format.
The problem in itself is not that complicated, but the solution I come up with is so ugly that I almost don't want to use it.
What I got currently is this:
from pandas import DataFrame
// data = ...
df = DataFrame.from_dict(data).fillna('')
body = '||{header}||\n{data}'.format(
header='||'.join(df.columns.values.tolist()),
data='\n'.join(['|{}|'.format('|'.join(i)) for i in df.values.tolist()])
)
Which isn't just ugly, it depends on pandas, which is huge (I don't want to depend on this library just for this)!
The solution above would work without pandas if there was a good way of getting the list of headers, and list of list of values from the dict. But python 2 does't guaranty dict order, so I can't count on .values() giving me the correct info.
Is there anything in itertools or collections I've been missing out of?
This works for me in Python 3 and 2.7. Try it: https://repl.it/repls/VividMediumturquoiseAlbino
all_keys = sorted({key for dic in data for key in dic.keys()})
header = "||" + "||".join(all_keys) + "||"
lines = [header]
for row in data:
elems_on_row = [row.get(key, "") for key in all_keys]
current_row = "|" + "|".join(elems_on_row) + "|"
lines.append(current_row)
wikistr = "\n".join(lines)
print(wikistr)
One approach would be to use csv.DictWriter to handle the formatting, with StringIO to collect the input and defaultdict to do a bit of creative cheating. Whether or not this is prettier is up for debate.
from StringIO import StringIO
from collections import defaultdict
from csv import DictWriter
output = StringIO()
keys = list(set(key for datum in data for key in datum.keys()))
header = '|'.join('|{}|'.format(key) for key in keys)
output.write(header + '\n')
fields = [''] + keys + [''] # provides empty fields for starting and ending |
writer = DictWriter(output, fields, delimiter = '|')
for row in data:
writer.writerow(defaultdict(str, **row)) # fills in the empty fields
output.seek(0)
result = output.read()
How it works
Create the list of headers by making a set containing all keys that are in any one of your dictionaries.
Make a DictWriter that uses '|' for its delimiter, to get the pipes between entries.
Add empty-string headers at the beginning and end, so that the beginning and ending pipes will get written.
Use a defaultdict to supply the empty beginning and ending values, since they're not in the dictionaries.
The answer in pure Python is to walk the list and therefore every dictionary twice.
In the first run you can collect all distinct keys and in the second run you can build your wiki formatted string output.
Let's start by collecting the keys where we can use a set as storage:
keys = set()
for dict_ in data:
keys.update(set(dict_.keys())
keys = sorted(keys)
Now that we have the set of unique keys, we can run through the list again for the output:
wiki_output = ''
wiki_output = '||' + '||'.join(keys) + '||'
for dict_ in data:
for key in keys:
wiki_output += '|' + dict_.get(key, '')
wiki_output += '|\n'
There we go...

Python how to read orderedDict from a txt file

Basically I want to read a string from a text file and store it as orderedDict.
My file contains the following content.
content.txt:
variable_one=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']]))])
variable_two=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']]))])
how will I retrieve values in python as:
xxx
xxx_a -> xxx_b
xxx_c -> xxx_d
import re
from ast import literal_eval
from collections import OrderedDict
# This string is slightly different from your sample which had an extra bracket
line = "variable_one=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']])])"
match = re.match(r'(\w+)\s*=\s*OrderedDict\((.+)\)\s*$', line)
variable, data = match.groups()
# This allows safe evaluation: data can only be a basic data structure
data = literal_eval(data)
data = [(key, OrderedDict(val)) for key, val in data]
data = OrderedDict(data)
Verification that it works:
print variable
import json
print json.dumps(data, indent=4)
Output:
variable_one
{
"xxx": {
"xxx_a": "xxx_b",
"xx_c": "xx_d"
},
"yyy": {
"yyy_a": "yyy_b",
"yy_c": "yy_d"
}
}
Having said all that, your request is very odd. If you can control the source of the data, use a real serialisation format that supports order (so not JSON). Don't output Python code.

Stumped - In Python obtaining unique keys with multiple associated values from a list of ditionaries

I'm parsing a csv file to perform some basic data processing. The file that I am working with is a log of user activity to a website formatted as follows:
User ID, Url, Number of Page Loads, Number of Interactions
User ID and Url are strings, Number of Page Loads and Number of Interactions are integers.
I am attempting to determine which url has the best interaction-to-page ratio.
The part I am struggling with is getting unique values and aggregating the results from the columns.
I've written the following code:
import csv
from collections import defaultdict
fields = ["USER","URL","LOADS","ACT"]
file = csv.DictReader(open('file.csv', 'rU'), delimiter=",",fieldnames=fields)
file.next()
dict = defaultdict(int)
for i in dict:
dict[i['URL']] += int(i['LOADS'])
This works fine. It returns a list of unique urls with the number of total loads by url in a dictionary - { 'URL A' : 1000 , 'URL B' : 500}
The issue is when i try to add multiple values to the url key, I'm stumped.
I've tried amending the for loop to do:
for i in dict:
dict[i['URL']] += int(i['LOADS']), int(i['ACT'])
and I receive TypeError: unsupported operand type(s) for +=: 'int' and 'tuple'. Why is the second value considered a tuple?
I tried adding just int(i[ACT]), and it worked fine. It's just when I try both values at the same time.
I'm on python 2.6.7; Any ideas on how to do this and why it's considered a tuple?
You are better off using a list as your defaultdict container:
import csv
from collections import defaultdict
d = defaultdict(list)
fields = ["USER","URL","LOADS","ACT"]
with open('file.csv', 'rU') as the_file:
rows = csv.DictReader(the_file, delimiter=",",fieldnames=fields)
rows.next()
for row in rows:
data = (int(row['LOADS']),int(row['ACT']))
d[row['URL']].append(data)
Now you have
d['someurl'] = [(5,17),(7,14)]
Now you can do whatever sums you would like, for example, all the loads for a URL:
load_sums = {k:sum(i[0] for i in d[k]) for k in d}
Because int(i['LOADS']), int(i['ACT']) is a tuple:
>>> 1, 2
(1, 2)
If you want to add both variables at the same time, just add them together:
+= int(i['LOADS']) + int(i['ACT'])
Also, you're shadowing the builtin dict and list types. Use different variable names. You won't be able to use the list builtin once your shadow it:
>>> d = {1: 2, 3: 4}
>>> list(d)
[1, 3]
>>> list = 5
>>> list(d)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable
It's just when I try both values at the same time.
How do you want to "add" them? As their sum?
for i in list:
dict[i['URL']] += int(i['LOADS']) + int(i['ACT'])
Also, don't use list and dict as variable names.
import csv
fields = ["USER","URL","LOADS","ACT"]
d = {}
with open('file.csv', 'rU') as f:
csvr = csv.DictReader(f, delimiter=",",fieldnames=fields)
csvr.next()
for rec in csvr:
d[rec['URL']] = d.get(rec['URL'], 0) + int(rec['LOADS']) + int(rec['ACT'])
You could use an object-oriented approach and define a class to hold the information. It's wordier than most of the other answers, but worth considering.
import csv
from collections import defaultdict
class Info(object):
def __init__(self, loads=0, acts=0):
self.loads = loads
self.acts = acts
def __add__(self, args): # add a tuple of values
self.loads += args[0]
self.acts += args[1]
return self
def __repr__(self):
return '{}(loads={}, acts={})'.format(self.__class__.__classname__,
self.loads, self.acts)
summary = defaultdict(Info)
fields = ["USER", "URL", "LOADS", "ACTS"]
with open('urldata.csv', 'rU') as csv_file:
reader = csv.DictReader(csv_file, delimiter=",", fieldnames=fields)
reader.next() # skip header
for rec in reader:
summary[rec['URL']] += (int(rec['LOADS']), int(rec['ACTS']))
for url,info in summary.items():
print '{{{!r}: ({}, {})}}'.format(url, info.loads, info.acts)

Categories

Resources