Strip Quote from key Using DictReader - python

I am currently reading data out from a csv files, and i wanted to turn it into a dictionary, Key Value Pair.
I was able to do that using csv.DictReader. But is there anyway to strip the quotes from the keys?
I have it print out like this
{'COUNTRY': 'Germany', 'price': '49', 'currency': 'EUR', 'ID': '1', 'CITY': 'Munich'}
{'COUNTRY': 'United Kingdom', 'price': '40', 'currency': 'GBP', 'ID': '2', 'CITY': 'London'}
{'COUNTRY': 'United Kingdom', 'price': '40', 'currency': 'GBP', 'ID': '3', 'CITY': 'Liverpool'}
is there anyway to make it look like this
{COUNTRY: 'Germany', price: '49', currency: 'EUR', ID: '1', CITY: 'Munich'}
{COUNTRY: 'United Kingdom', price: '40', currency: 'GBP', ID: '2', CITY: 'London'}
{COUNTRY: 'United Kingdom', price: '40', currency: 'GBP', ID: '3', CITY: 'Liverpool'}
import csv
input_file = csv.DictReader(open("201611022225.csv"))
for row in input_file:
print row

Python uses quotes to indicate that it is a string object when printing. In your case, the dictionary uses string as keys, so when you print, it shows the quotes. But it doesn't actually save the quotes as part of the data, it's just to indicate the data type.
For example, if you write this to a text file and open it later, it will not show you quotes.

Related

Whats wrong? Pandas

SyntaxError: invalid syntax. when executing, it does not work, create groups by continent, writes that = invalid, what should be put?
def country_kl(country):
if country = ['United States', 'Mexico', 'Canada', 'Bahamas', 'Chile', 'Brazil', 'Colombia','British Virgin Islands'
,'Peru','Uruguay','Turks and Caicos Islands','Cambodia','Bermuda','Argentina']:
return '1'
elif country = ['France', 'Spain', 'Germany', 'Switzerland', 'Belgium', 'United Kingdom', 'Austria', 'Italy', 'Swaziland'
,'Russia' , 'Sweden','Czechia','Monaco','Denmark','Poland','Norway','Netherlands','Portugal','Turkey','Finland',
'Ukraine','Andorra','Hungary','Greece','Romania','Slovakia','Liechtenstein','Guernsey','Ireland']:
return '2'
elif country = ['India','China', 'Singapore', 'Hong Kong', 'Australia', 'Japan']:
return '3'
elif country = ['United Arab Emirates',
'Thailand','Malaysia','New Zealand','South Korea','Philippines','Taiwan','Israel','Vietnam','Cayman Islands',
'Kazakhstan' ,'Georgia','Bahrain','Nepal','Qatar','Oman','Lebanon']:
return '3'
else :
return '4'
One more error in your code is that you used a single "=", what
actually means substitution.
To compare two values use "==" (double "=").
But of course, to check whether a value of a variable is contained
in a list you have to use in operator, just as Ilya suggested in his comment.
Another, more readable and elegant solution is:
Create a dictionary, where the key is country name and the
value is your expected result for this country. Something like:
countries = {'United States': '1', 'Mexico': '1', 'France': '2', 'Spain': '2',
'India': '3', 'China': '3', 'Singapore': '3'}
(include other countries too).
Look up this dictionary, with default value of '4', which you
used in your code:
result = countries.get(country, default='4')
And by the way: Your question and code have nothing to do with Pandas.
You use ordinary, pythonic list and (as I suppose) a string variable.
But since you marked your question also with Pandas tag,
I came up also with a pandasonic solution:
Create a Series from the above dictionary:
ctr = pd.Series(countries.values(), index=countries.keys())
Lookup this Series, also with a default value:
result = ctr.get(country, default='4')

How to print specific key values from a list of dictionaries in Python? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 11 months ago.
Improve this question
I am trying to print specific key value from a list of dictionaries in Python, list is as following:
employees = [
{'name': 'Tanya', 'age': 20, 'birthday': '1990-03-10',
'job': 'Back-end Engineer', 'address': {'city': 'New York', 'country': 'USA'}},
{'name': 'Tim', 'age': 35, 'birthday': '1985-02-21', 'job': 'Developer', 'address': {'city': 'Sydney', 'country': 'Australia'}}]
How can print out specific specific key values and loop through each dictionary? Output should be Name, job and city.
Possible solution is following:
employees = [
{'name': 'Tanya', 'age': 20, 'birthday': '1990-03-10',
'job': 'Back-end Engineer', 'address': {'city': 'New York', 'country': 'USA'}},
{'name': 'Tim', 'age': 35, 'birthday': '1985-02-21', 'job': 'Developer', 'address': {'city': 'Sydney', 'country': 'Australia'}}]
for employee in employees:
print(f"{employee['name']}, {employee['job']}, {employee['address']['city']}")
Prints
Tanya, Back-end Engineer, New York
Tim, Developer, Sydney
for i in employees:
print("name:{0}, job:{1}, city:{2}".format(i['name'],i['job'],i['address']['city']))

Using regex in python for a dynamic string

I have a pandas columns with strings which dont have the same pattern, something like this:
{'iso_2': 'FR', 'iso_3': 'FRA', 'name': 'France'}
{'iso': 'FR', 'iso_2': 'USA', 'name': 'United States of America'}
{'iso_3': 'FR', 'iso_4': 'FRA', 'name': 'France'}
How do I only keep the name of the country for every row? I would only like to keep "France", "United States of America", "France".
I tried building the regex pattern: something like this
r"^\W+[a-z]+_[0-9]\W+"
But this turns out to be very specific, and if there is a slight change in the string the pattern wont work. How do we resolve this?
As you have dictionaries in the column, you can get the values of the name keys:
import pandas as pd
df = pd.DataFrame({'col':[{'iso_2': 'FR', 'iso_3': 'FRA', 'name': 'France'},
{'iso': 'FR', 'iso_2': 'USA', 'name': 'United States of America'},
{'iso_3': 'FR', 'iso_4': 'FRA', 'name': 'France'}]})
df['col'] = df['col'].apply(lambda x: x["name"])
Output of df['col']:
0 France
1 United States of America
2 France
Name: col, dtype: object
If the column contains stringified dictionaries, you can use ast.literal_eval before accessing the name key value:
import pandas as pd
import ast
df = pd.DataFrame({'col':["{'iso_2': 'FR', 'iso_3': 'FRA', 'name': 'France'}",
"{'iso': 'FR', 'iso_2': 'USA', 'name': 'United States of America'}",
"{'iso_3': 'FR', 'iso_4': 'FRA', 'name': 'France'}"]})
df['col'] = df['col'].apply(lambda x: ast.literal_eval(x)["name"])
And in case your column is totally messed up, yes, you can resort to regex:
df['col'] = df['col'].str.extract(r"""['"]name['"]\s*:\s*['"]([^"']+)""")
# or to support escaped " and ':
df['col'] = df['col'].str.extract(r"""['"]name['"]\s*:\s*['"]([^"'\\]+(?:\\.[^'"\\]*)*)""")>>> df['col']
0
0 France
1 United States of America
2 France
See the regex demo.

Convert Nested JSON into Dataframe

I have a nested JSON like below. I want to convert it into a pandas dataframe. As part of that, I also need to parse the weight value only. I don't need the unit.
I also want the number values converted from string to numeric.
Any help would be appreciated. I'm relatively new to python. Thank you.
JSON Example:
{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'},
'gender': 'male'}
Sample output below:
id name weight gender
123 joe 100 male
use " from pandas.io.json import json_normalize ".
id name weight.number weight.unit gender
123 joe 100 lbs male
if you want to discard the weight unit, just flatten the json:
temp = {'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}
temp['weight'] = temp['weight']['number']
then turn it into a dataframe:
pd.DataFrame(temp)
Something like this should do the trick:
json_data = [{'id': '123', 'name': 'joe', 'weight': {'number': '100', 'unit': 'lbs'}, 'gender': 'male'}]
# convert the data to a DataFrame
df = pd.DataFrame.from_records(json_data)
# conver id to an int
df['id'] = df['id'].apply(int)
# get the 'number' field of weight and convert it to an int
df['weight'] = df['weight'].apply(lambda x: int(x['number']))
df

Efficient looping through dictionary with keys as tuple

I have a very large dictionary with 200 million keys. The keys are tuple with integer as individual elements of the tuple. I want to search for the key where the "query integer" lies within the two integers of the tuple in dictionary keys.
Currently, I am looping through all dictionary keys and comparing the integer with each element of tuple if it lies within that range. It works but the time to look up each query is around 1-2 minutes and I need to perform around 1 Million such queries. The example of the dictionary and the code which I have written are following:
Sample dictionary:
[{ (3547237440, 3547237503) : {'state': 'seoul teukbyeolsi', 'country': 'korea (south)', 'country_code': 'kr', 'city': 'seoul'} },
{ (403044176, 403044235) : {'state': 'california', 'country': 'united states', 'country_code': 'us', 'city': 'pleasanton'} },
{ (3423161600, 3423161615) : {'state': 'kansas', 'country': 'united states', 'country_code': 'us', 'city': 'lenexa'} },
{ (3640467200, 3640467455) : {'state': 'california', 'country': 'united states', 'country_code': 'us', 'city': 'san jose'} },
{ (853650485, 853650485) : {'state': 'colorado', 'country': 'united states', 'country_code': 'us', 'city': 'arvada'} },
{ (2054872064, 2054872319) : {'state': 'tainan', 'country': 'taiwan', 'country_code': 'tw', 'city': 'tainan'} },
{ (1760399104, 1760399193) : {'state': 'texas', 'country': 'united states', 'country_code': 'us', 'city': 'dallas'} },
{ (2904302140, 2904302143) : {'state': 'iowa', 'country': 'united states', 'country_code': 'us', 'city': 'hampton'} },
{ (816078080, 816078335) : {'state': 'district of columbia', 'country': 'united states', 'country_code': 'us', 'city': 'washington'} },
{ (2061589204, 2061589207) : {'state': 'zhejiang', 'country': 'china', 'country_code': 'cn', 'city': 'hangzhou'} }]
The code I have written:
ipint=int(ipaddress.IPv4Address(ip))
for k in ip_dict.keys():
if ipint >= k[0] and ipint <= k[1]:
print(ip_dict[k]['country'], ip_dict[k]['country_code'], ip_dict[k]['state'])
where ip is just ipaddress like '192.168.0.1'.
If anyone could provide a hint regarding more efficient way to perform this task, it would be much appreciated.
Thanks
I suggest you to use another structure with a good query complexity like a tree.
Maybe you can try this library I just found https://pypi.org/project/rangetree/
As they say, it is optimized for lookups but not for insertions so if you need to insert once and lopk many it should be OK.
Another solution is to not used a dict but a list, to order it and to build an index over it. Do a dichotomy on this index when there is a query (can be less optimal if ranges are not regular so I prefer the first solution)
Create a index for each of the 2 integers: a sorted list like this:
[(left_int, [list_of_row_ids_that have_this_left_int]),
(another_greater_left_int, [...])]
You can then search for all rows that have a left int greater than the searched one in log(n).
A binary search will do here.
Do the same for the right int.
Keep the rest of the data in a list of tuples.
More in detail:
data = [( (3547237440, 3547237503), {'state': 'seoul'} ), ...]
left_idx = [(3547237440, [0,43]), (9547237440, [3])]
# 0, 43, 3 are indices in the data list
# search
min_left_idx = binary_search(left_idx, 3444444)
# now all rows referred to by left_idx[min_left_idx] ... left_idx[-1] will satisfy your criteria
min_right_idx = ...
# between these 2 all referred rows satisfy the range check
# intersect the sets

Categories

Resources