how can i extract just the value?
I have this code :
data = []
with open('city.txt') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
data.append(row[3])
That appends to the list the following (list is massive):
[....... " 'id': 'AX~only~Mariehamn'", " 'id': 'AX~only~Saltvik'", " 'id': 'AX~only~Sund'"]
How can i just append to the list the value of key 'id' ?
i just want to append this to the list: AX~only~Saltvik, and so on ?
city.txt is file containing the following(90k line file) :
{'name': 'Herat', 'displayName': 'Herat', 'meta': {'type': 'CITY'}, 'id': 'AF~HER~Herat', 'countryId': 'AF', 'countryName': 'AF', 'regionId': 'AF~HER', 'regionName': 'HER', 'latitude': 34.3482, 'longitude': 62.1997, 'links': {'self': {'path': '/api/netim/v1/cities/AF~HER~Herat'}}}
{'name': 'Kabul', 'displayName': 'Kabul', 'meta': {'type': 'CITY'}, 'id': 'AF~KAB~Kabul', 'countryId': 'AF', 'countryName': 'AF', 'regionId': 'AF~KAB', 'regionName': 'KAB', 'latitude': 34.5167, 'longitude': 69.1833, 'links': {'self': {'path': '/api/netim/v1/cities/AF~KAB~Kabul'}}}
so on ....
when i print(row) in the for loop statement i get the following(this is just las line of the output):
["{'name': 'Advancetown'", " 'displayName': 'Advancetown'", " 'meta': {'type': 'CITY'}", " 'id': 'AU~QLD~Advancetown'", " 'countryId': 'AU'", " 'countryName': 'AU'", " 'regionId': 'AU~QLD'", " 'regionName': 'QLD'^C: 152.7713", " 'links': {'self': {'path': '/api/netim/v1/cities/AU~QLD~Basin%20Pocket'}}}"]
This answer is assuming that your output is exact, and that each value appended to your list is along the lines of a string, " 'id': 'AX~only~Mariehamn'".
This means that in the base CSV file, the id and value are stored together as a string. You can get the second value through various string functions.
for row in readCSV:
data.append(row[3].split(": ")[1].strip("'"))
The above code splits the string into a list with two parts, one before the colon and one afterwards: [" 'id'", "'AX~only~Mariehamn'". Then, it takes the second value and strips the 's, resulting in a clean string.
It looks like row[3] is a string representing a key, value pair.
I would split it further and select only the value portion:
data = []
with open('city.txt') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
data.append(row[3].split(':')[1][2:-1]
[2:-1] is to remove the ' and a space.
Related
I have this read function where it reads a csv file using csv.DictReader. The file.csv is separated by commas and it fully reads. However, this part of my file has a column that contains multiple commas. My question is, how can I make sure that comma is counted as part of a column? I cannot alter my csv file to meet the criteria.
Text File:
ID,Name,University,Street,ZipCode,Country
12,Jon Snow,U of Winterfell,Winterfell #45,60434,Westeros
13,Steve Rogers,NYU,108, Chelsea St.,23333,United States
20,Peter Parker,Yale,34, Tribeca,32444,United States
34,Tyrion Lannister,U of Casterly Rock,Kings Landing #89, 43543,Westeros
The desired output is this:
{'ID': '12', 'Name': 'Jon Snow', 'University': 'U of Winterfell', 'Street': 'Winterfell #45', 'ZipCode': '60434', 'Country': 'Westeros'}
{'ID': '13', 'Name': 'Steve Rogers', 'University': 'NYU', 'Street': '108, Chelsea St.', 'ZipCode': '23333', 'Country': 'United States'}
{'ID': '20', 'Name': 'Peter Parker', 'University': 'Yale', 'Street': '34, Tribeca', 'ZipCode': '32444', 'Country': 'United States'}
{'ID': '34', 'Name': 'Tyrion Lannister', 'University': 'U of Casterly Rock', 'Street': 'Kings Landing #89', 'ZipCode': '43543', 'Country': 'Westeros'}
As you can tell the 'Street' has at least two commas due to the numbers:
13,Steve Rogers,NYU,108, Chelsea St.,23333,United States
20,Peter Parker,Yale,34, Tribeca,32444,United States
Note: Most of the columns being read splits by a str,str BUT under the 'Street' column it is followed by a str, str (there is an extra space after the comma). I hope this makes sense.
The options I tried looking out is using re.split, but I don't know how to implement it on my read file. I was thinking re.split(r'(?!\s),(?!\s)',x[:-1])? How can I make sure the format from my file will count as part of any column? I can't use pandas.
My current output looks like this right now:
{'ID': '12', 'Name': 'Jon Snow', 'University': 'U of Winterfell', 'Street': 'Winterfell #45', 'ZipCode': '60434', 'Country': 'Westeros'}
{'ID': '13', 'Name': 'Steve Rogers', 'University': 'NYU', 'Street': '108', 'ZipCode': 'Chelsea St.', 'Country': '23333', None: ['United States']}
{'ID': '20', 'Name': 'Peter Parker', 'University': 'Yale', 'Street': '34', 'ZipCode': 'Tribeca', 'Country': '32444', None: ['United States']}
{'ID': '34', 'Name': 'Tyrion Lannister', 'University': 'U of Casterly Rock', 'Street': 'Kings Landing #89', 'ZipCode': '43543', 'Country': 'Westeros'}
This is my read function:
import csv
list = []
with open('file.csv', mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=",", skipinitialspace=True)
for col in csv_reader:
list.append(dict(col))
print(dict(col))
You can't use csv if the file isn't valid CSV format.
You need to call re.split() on ordinary lines, not on dictionaries.
list = []
with open('file.csv', mode='r') as csv_file:
keys = csv_file.readline().strip().split(',') # Read header line
for line in csv_file:
line = line.strip()
row = re.split(r'(?!\s),(?!\s)',line)
list.append(dict(zip(keys, row)))
The actual solution for the problem is modifying the script that generates the csv file.
If you have a chance to modify that output you can do 2 things
Use a delimiter other than a comma such as | symbol or ; whatever you believe it doesn't exist in the string.
Or enclose all columns with " so you'll be able to split them by , which are actual separators.
If you don't have a chance to modify the output.
And if you are sure about that multiple commas are only in the street column; then you should use csv.reader instead of DictReader this way you can get the columns by Indexes that you are already sure. for instance row[0] will be ID row[1] will be Name and row[-1] will be Country row[-2] will be ZipCode so row[2:-2] would give you what you need i guess. Indexes can be arranged but the idea is clear I guess.
Hope that helps.
Edit:
import csv
list = []
with open('file.csv', mode='r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",", skipinitialspace=True)
# pass the header row
next(csv_reader)
for row in csv_reader:
list.append({"ID": row[0],
"Name": row[1],
"University": row[2],
"Street": ' '.join(row[3:-2]),
"Zipcode": row[-2],
"Country": row[-1]})
print(list)
--
Here is the output (with pprint)
[{'Country': 'Westeros',
'ID': '12',
'Name': 'Jon Snow',
'Street': 'Winterfell #45',
'University': 'U of Winterfell',
'Zipcode': '60434'},
{'Country': 'United States',
'ID': '13',
'Name': 'Steve Rogers',
'Street': '108 Chelsea St.',
'University': 'NYU',
'Zipcode': '23333'},
{'Country': 'United States',
'ID': '20',
'Name': 'Peter Parker',
'Street': '34 Tribeca',
'University': 'Yale',
'Zipcode': '32444'},
{'Country': 'Westeros',
'ID': '34',
'Name': 'Tyrion Lannister',
'Street': 'Kings Landing #89',
'University': 'U of Casterly Rock',
'Zipcode': '43543'}]
-- second edit
edited the index on the street.
Regards.
schoolname|category|gender|medium_of_inst|address|area|pincode|landmark
----------+----------+----------+----------+----------+----------+------
qqq|qqq|qq|aa|asd|wer|asd|wert
www|fgh|qq|aa|sg|wer|asd|wert
eee|fxg|qq|aa|axcvsd|wer|asd|wert
How can I remove the second line and split "|" and convert it to json
Try the below code. First split the string by \n and remove the second line and then split by |. Hope this will help.
import json
strings = '''schoolname|category|gender|medium_of_inst|address|area|pincode|landmark
----------+----------+----------+----------+----------+----------+------
qqq|qqq|qq|aa|asd|wer|asd|wert
www|fgh|qq|aa|sg|wer|asd|wert
eee|fxg|qq|aa|axcvsd|wer|asd|wert'''
json_file_name = 'test.json'
strings = strings.split('\n') #Split the string by newline \n
del strings[0] #Remove the heading columns
del strings[0] #Remove the string starts with ----------+
data = []
try:
for string in strings:
row = string.split('|') #Split and write to json
row_data = {};
row_data['schoolname'] = row[0]
row_data['category'] = row[1]
row_data['gender'] = row[2]
row_data['medium_of_inst'] = row[3]
row_data['address'] = row[4]
row_data['area'] = row[5]
row_data['pincode'] = row[6]
row_data['landmark'] = row[7]
data.append(row_data)
with open(json_file_name, 'w') as outfile:
json.dump(data, outfile)
#Use the below to read the file
with open(json_file_name) as file_object:
# store file data in object
data = json.load(file_object)
print(data)
except Exception as e:
print("Type error: " + str(e))
Output
[
{
'schoolname': 'qqq',
'category': 'qqq',
'gender': 'qq',
'medium_of_inst': 'aa',
'address': 'asd',
'area': 'wer',
'pincode': 'asd',
'landmark': 'wert'
},
{
'schoolname': 'www',
'category': 'fgh',
'gender': 'qq',
'medium_of_inst': 'aa',
'address': 'sg',
'area': 'wer',
'pincode': 'asd',
'landmark': 'wert'
},
{
'schoolname': 'eee',
'category': 'fxg',
'gender': 'qq',
'medium_of_inst': 'aa',
'address': 'axcvsd',
'area': 'wer',
'pincode': 'asd',
'landmark': 'wert'
}
]
Here below is the snippet which would help you :
import pandas as pd
df = pd.read_csv('test_old.csv',skiprows=2,
names=['schoolname', 'category', 'gender','medium_of_inst',
'address','area','pincode','landmark'],
sep='|', engine='python')
json_output = df.to_json(orient='records')[1:-1].replace('},{', '} {')
print(json_output)
Output :
[
{
"schoolname":"qqq",
"category":"qqq",
"gender":"qq",
"medium_of_inst":"aa",
"address":"asd",
"area":"wer",
"pincode":"asd",
"landmark":"wert"
},
{
"schoolname":"www",
"category":"fgh",
"gender":"qq",
"medium_of_inst":"aa",
"address":"sg",
"area":"wer",
"pincode":"asd",
"landmark":"wert"
},
{
"schoolname":"eee",
"category":"fxg",
"gender":"qq",
"medium_of_inst":"aa",
"address":"axcvsd",
"area":"wer",
"pincode":"asd",
"landmark":"wert"
}
]
dict1 = {"retrievalid": "status", "retrieval":"status",}
import csv
dict2= {'DOB': '05/11/1961', 'DoctorLast': 'peter', 'PatientFirstname': 'ramasamy', 'DoctorFirst': 'kesava', 'PhNo': '734 535 343', 'PatientLastname': 'kum', 'Location': 'blr', 'retID': 'j233 1331', 'SiteId': '1234143', 'FaxNo': 'NULL', 'COMPLETED_DATE': '2015-02-13 09:06:27.000', 'retID_WH': 'f234343', 'STATUS': 'DELIVERED'
with open('karthik.csv', 'w') as output:
writer = csv.writer(output)
for key, value in dict1.items():
writer.writerow(dict1["retrievalid"])
Expected column
Status
Delivered
I want to combine longitude and latitude to
{latlon: '40.33333,-79.34343'}
the entire JSON is in variable data = jsonData
I want to remove original key-value pair
{
'locale': 'en_US',
'timezone': '-7',
'id': '13',
'agerangemin': '21',
'verified': 'true',
'coverimageurl': 'scontent.xx.fbcdn/t31.0-0/p480x480/13063482_1183967848280764_1411489384515766669_o.jpg',
'tagline': 'Veggien',
'lastupdated': '1462341401',
'fbupdated_time': '2016-03-30T00:38:48+0000',
'lname': 'Kulkarni',
'fname': 'Nikhil',
'email': 'nikhilhk.usa#gmail.com',
'latitude': '40.333333',
'longitude': '-79.34343',
'displayname': 'Nikhil Kulkarni',
'fbprofileid': '1121344884543061',
'profileimageurl': 'scontent.xx.fbcdn/hprofile-xft1/v/t1.0-1/p100x100/10423743_952350738109144_964810479230145631_n.jpg?oh=71f7e953dbbf8e2f1d9f22418f7888b2&oe=579F4A36',
'link': 'facebook/app_scoped_user_id/1121344884543061/',
'diet': 'Vegetarian',
'dietsinceyear': '1966',
'gender': 'M',
'vegstory': '',
'shortdescription': 'Just like that',
'categoryids': '',
'reasonforveg': 'Religious'
}
data['latlong'] = data['latitude'] + ',' + data['longitude']
del data['latitude']
del data['longitude']
Can be done in one line.
>>> dic = {'latitude': '40.333333', 'longitude': '-79.34343'}
>>>
>>> dic['latlon'] = "{0},{1}".format(dic.pop('latitude'),dic.pop('longitude'))
>>> dic
{'latlon': '40.333333,-79.34343'}
To understand how dic.pop() work, see this.
>>> json_data['latlon'] = ','.join(json_data[k] for k in ('latitude', 'longitude'))
>>> json_data['latlon']
'40.333333,-79.34343'
Note that this will preserve the original key-value pair.
UPDATE:
If you want to remove the original key-value pair use pop method:
>>> json_data['latlon'] = ','.join(json_data.pop(k) for k in ('latitude', 'longitude'))
>>> json_data['latlon']
'40.333333,-79.34343'
When I print these strings
print query, netinfo
I get output below, which is fine. How do i take these strings and put them into a CSV file into a single row?
8.8.8.8 [{'updated': '2012-02-24T00:00:00', 'handle': 'NET-8-0-0-0-1', 'description': 'Level 3 Communications, Inc.', 'tech_emails': 'ipaddressing#level3.com', 'abuse_emails': 'abuse#level3.com', 'postal_code': '80021', 'address': '1025 Eldorado Blvd.', 'cidr': '8.0.0.0/8', 'city': 'Broomfield', 'name': 'LVLT-ORG-8-8', 'created': '1992-12-01T00:00:00', 'country': 'US', 'state': 'CO', 'range': '8.0.0.0 - 8.255.255.255', 'misc_emails': None}, {'updated': '2014-03-14T00:00:00', 'handle': 'NET-8-8-8-0-1', 'description': 'Google Inc.', 'tech_emails': 'arin-contact#google.com', 'abuse_emails': 'arin-contact#google.com', 'postal_code': '94043', 'address': '1600 Amphitheatre Parkway', 'cidr': '8.8.8.0/24', 'city': 'Mountain View', 'name': 'LVLT-GOGL-8-8-8', 'created': '2014-03-14T00:00:00', 'country': 'US', 'state': 'CA', 'range': None, 'misc_emails': None}]
I have tried hobbling this together but it's all jacked up. I could use some help on how to use the csv module.
writer=csv.writer(open('dict.csv', 'ab'))
for key in query:
writer.writerow(query)
You can put your variables in a tuple and write to csv file :
import csv
from operator import itemgetter
with open('ex.csv', 'wb') as csvfile:
spamreader = csv.writer(csvfile, delimiter=' ')
spamreader.writerow((query, netinfo))
Note: if you are in python 3 use following code :
import csv
from operator import itemgetter
with open('ex.csv', 'w',newline='') as csvfile:
spamreader = csv.writer(csvfile, delimiter=' ')
spamreader.writerow((query, netinfo))