How to remove commas, brackets in python using regular expression? - python

These are the contents of my text file (eg:abc.doc):
{'data': [{'name': 'abc'},{'name': 'xyz'}]}
After opening the file in python; how do i remove all the brackets, quotes and commas.
The final output should be:
data:
name:abc
name:xyz

Use ast.literal_eval() to turn it into a python structure, then print the values:
with open(r'd:\output1.doc', 'r') as inputfile:
inputstring = inputfile.read()
data = ast.literal_eval(inputstring)
for key, sublist in data.items():
print '{}:'.format(key)
for subdict in sublist:
for key, value in subdict.items():
print('{}:{}'.format(key, value))
For your example that results in:
>>> inputstring = "{'data': [{'name': 'abc'},{'name': 'xyz'}]}"
>>> import ast
>>> data = ast.literal_eval(inputstring)
>>> for key, sublist in data.items():
... print '{}:'.format(key)
... for subdict in sublist:
... for key, value in subdict.items():
... print '{}:{}'.format(key, value)
...
data:
name:abc
name:xyz
However: If you got this from the Facebook API, then you transcribed the format incorrectly. The Facebook API gives you JSON data, which uses double quotes (") instead:
{"data": [{"name": "abc"},{"name": "xyz"}]}
in which case you should use the json library that comes with Python:
import json
data = json.loads(inputstring)
# process the same way as above.
If you have a filename, you can ask the library to read straight from the file using:
data = json.load(filename) # note, no `s` after `load`.

Looks to me like you have json, which can be easily parsed using pyjson:
import json
obj=json.loads(u'''{'data': [{'name': 'abc'},{'name': 'xyz'}]}''')
Bob's your uncle now, innit?

Related

How to retrieve key value pair within listed dictionary in python?

I have a list of dictionaries as a key value pairs, where I want to access the data of each dict by key:
sample data:
['"imageUrl":"/images/4.jpg"', '"number":"04047122"', '"name":"test"',...
real data
>>> data
['"imageUrl":"/images/products/klein/04047122_k.jpg"', '"art":"04047122"', '"productId":"170336"'; } } }) ']
This unfortunatelly does not work:
re.findall(r'(?:number\(\{)(.*)', data)[0].split(',')
How can I retrieve the values by name e.g. data['number'] ?
For a more robust solution, since each string in the input list is a valid line of CSV, delimited by a colon, you can use csv.reader to parse the list and then pass the resulting sequence of key-value pairs to the dict constructor to build a dict:
import csv
lst = ['"imageUrl":"/images/4.jpg"', '"number":"04047122"', '"name":"test"']
data = dict(csv.reader(lst, delimiter=':'))
You can then access data['number'] as desired.
Try to convert your data to a real dictionary:
data = ['"imageUrl":"/images/4.jpg"', '"number":"04047122"', '"name":"test"']
data_dict = dict([x.replace('"','').split(":") for x in data])
and then you will be able to access your keys:
print(data_dict["number"]) # output: 04047122
You can convert your string list to an actual dictionary easily:
>>> ls = ['"imageUrl":"/images/4.jpg"', '"number":"04047122"', '"name":"test"']
>>> data = dict(elem.replace('"', '').split(':') for elem in ls)
>>> data
{'imageUrl': '/images/4.jpg', 'number': '04047122', 'name': 'test'}
>>> data['number']
'04047122'

How can I open a txt as a dictionary?

I have the following txt... (I've saved as a dictionary)
"{'03/01/20': ['luiana','macarena']}\n"
"{'03/01/21': ['juana','roberta','mariana']}\n"
"{'03/01/24': ['pedro','jose','mario','luis']}\n"
"{'03/01/24': ['pedro','jose','mario','luis']}\n"
"{'03/01/22': ['emanuel']}\n"
the problem is that I want to open it as a dictionary, but I don't know how I can do it. I've tried with:
f = open ('usuarios.txt','r')
lines=f.readlines()
whip=eval(str(lines))
but it's not working... my idea is for example just take the dictionaries that have as a value the next day 03/01/24
if you want to to have only one dict with all the saved dictionaries you can use:
import ast
my_dict = {}
with open('your_file.txt', 'r') as fp:
for line in fp.readlines():
new_dict = ast.literal_eval(line)
for key, value in new_dict.items():
if key in my_dict:
my_dict[key].extend(value)
else:
my_dict[key] = value
print(my_dict)
output:
{'03/01/20': ['luiana', 'macarena'], '03/01/21': ['juana', 'roberta', 'mariana'], '03/01/24': ['pedro', 'jose', 'mario', 'luis', 'pedro', 'jose', 'mario', 'luis'], '03/01/22': ['emanuel']}
if yo could change the format you are saving the strings from
"{'03/01/20': ['luiana','macarena']}\n"
to
'{"03/01/20": ["luiana","macarena"]}\n'
Then you could just do the following:
import json
line = '{"03/01/20": ["luiana","macarena"]}\n'
d = json.loads('{"03/01/20": ["luiana","macarena"]}\n')
The result would be a dictionary d with dates as keys:
>>> {'03/01/20': ['luiana', 'macarena']}
Them, it would be just a mater of looping over the lines of your file and adding them to your dictionary.
An alternative approach would be to use pickle to save your dictionary instead of the .txt, them use it to load from the disk.

How to remove Key pairs in a line and print only the values after : in python using regex

I am pretty new to python, The text i have is in text file like :
{u'metro_name': u'Phoenix-Mesa-Glendale, AZ', u'ips': 38060}
{u'metro_name': u'Los Angeles-Long Beach-Glendale, CA (MSAD)', u'ips': 31100}
How to print only the Values in Python and concat them with $ in a single line i.e the output :
'Phoenix-Mesa-Glendale, AZ$$38060','Los Angeles-Long Beach-Glendale, CA (MSAD)$$31100'
I'm pretty sure that regex can't fully parse string literal syntax, so it's more work than it's worth here. Consider using ast.literal_eval to turn each line into a dictionary. Then you can do whatever string manipulation you like on their values.
import ast
from collections import OrderedDict
dicts = []
with open("data.txt") as file:
for line in file:
d = ast.literal_eval(line)
d = OrderedDict((k, d[k]) for k in ('metro_name', 'ips'))
dicts.append(d)
output_lines = []
for dict in dicts:
values = [str(value) for value in dict.values()]
line = "$$".join(values)
output_lines.append(repr(line))
print ",".join(output_lines)
Result:
'Phoenix-Mesa-Glendale, AZ$$38060','Los Angeles-Long Beach-Glendale, CA (MSAD)$$31100'

Python how to read orderedDict from a txt file

Basically I want to read a string from a text file and store it as orderedDict.
My file contains the following content.
content.txt:
variable_one=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']]))])
variable_two=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']]))])
how will I retrieve values in python as:
xxx
xxx_a -> xxx_b
xxx_c -> xxx_d
import re
from ast import literal_eval
from collections import OrderedDict
# This string is slightly different from your sample which had an extra bracket
line = "variable_one=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']])])"
match = re.match(r'(\w+)\s*=\s*OrderedDict\((.+)\)\s*$', line)
variable, data = match.groups()
# This allows safe evaluation: data can only be a basic data structure
data = literal_eval(data)
data = [(key, OrderedDict(val)) for key, val in data]
data = OrderedDict(data)
Verification that it works:
print variable
import json
print json.dumps(data, indent=4)
Output:
variable_one
{
"xxx": {
"xxx_a": "xxx_b",
"xx_c": "xx_d"
},
"yyy": {
"yyy_a": "yyy_b",
"yy_c": "yy_d"
}
}
Having said all that, your request is very odd. If you can control the source of the data, use a real serialisation format that supports order (so not JSON). Don't output Python code.

How to fetch single item out of long string?

I have a very string as output of function as follows:
tmp = <"last seen":1568,"reviews [{"id":15869,"author":"abnbvg","changes":........>
How will I fetch the "id":15869 out of it?
The string content looks like JSON, so either use the json module or use a regular expression to extract the specific string you need.
The data looks like a JSON string. Use:
try:
import json
except ImportError:
import simplejson as json
tmp = '"last seen":1568,"reviews":[{"id":15869,"author":"abnbvg"}]'
data = json.loads('{{{}}}'.format(tmp))
>>> print data
{u'reviews': [{u'id': 15869, u'author': u'abnbvg'}], u'last seen': 1568}
>>> print data['reviews'][0]['id']
15869
Note that I wrapped the string in { and } to make a dictionary. You might not have to do that if the actual JSON string is already encapsulated with braces.
If id is the only thing you need from the string and it will always be something like {"id":15869,"author":"abnbvg"..., then you can go with sinple string split instead of json conversion.
tmp = '"last seen":1568,"reviews" : [{"id":15869,"author":"abnbvg","changes":........'
tmp1 = tmp.split('"id":', 1)[1]
id = tmp1.split(",", 1)[0]
Please note that tmp1 line may raise IndexError in case there is no "id" key found in the string. You can use -1 instead of 1 to side step. But in this way, you can report that "id" is not found.
try:
tmp1 = tmp.split('"id":', 1)[1]
id = tmp1.split(",", 1)[0]
except IndexError:
print "id key is not present in the json"
id = None
If you do really need more variables from the json string, please go with mhawke's solution of converting the json to dictionary and getting the value. You can use ast.literal_eval
from ast import literal_eval
tmp = '"last seen":1568,"reviews" : [{"id":15869,"author":"abnbvg","changes":........'
tmp_dict = literal_eval("""{%s}"""%(tmp))
print tmp_dict["reviews"][0]["id"]
In the second case, if you need to collect all the "id" keys in the list, this will help:
id_list =[]
for id_dict in tmp_dict["reviews"]:
id_list.append(id_dict["id"])
print id_list

Categories

Resources