How to read txt file data and convert into nested dictionary? - python

I have this txt file but I'm having trouble in converting it into a nested dictionary in python. The txt file only has the values of the pokemon but are missing the keys such as 'quantity' or 'fee'. Below is the content in the txt file. (I have the ability to change the txt file if needed)
charmander,3,100,fire
squirtle,2,50,water
bulbasaur,5,25,grass
gyrados,1,1000,water flying
This is my desired dictionary:
pokemon = {
'charmander':{'quantity':3,'fee':100,'powers':['fire']},
'squirtle':{'quantity':2,'fee':50,'powers':['water']},
'bulbasaur':{'quantity':5,'fee':25,'powers':['grass']},
'gyrados':{'quantity':1,'fee':1000,'powers':['water','flying']}
}

Convert text file to lines, then process each line using "," delimiters. For powers, split the string again using " " delimiter. Then just package each extracted piece of information into your dict structure as below.
with open('pokemonInfo.txt') as f:
data = f.readlines()
dict = {}
for r in data:
fields = r.split(",")
pName = fields[0]
qty = fields[1]
fee = fields[2]
powers = fields[3]
dict[pName] = {"quantity": qty, "fee": fee, "powers": [p.strip() for p in powers.split(" ")]}
for record in dict.items():
print(record)

Related

Program should read from a file and returns a dictionary but returning a type error

The dataset looks like this-
Action|10|Golden Tree (2012)
Drama|3|Titanic (1967)
So it is Genre|SerialNo|Movie
Required output is-
{ "Toy Story (1995)" : "Adventure", "Golden Tree (2012)" : "Action" }
Currently, the only output generated is "Action", I tried to write some code to fix it, but returns a type error. How do I fix this?
from collections import defaultdict
def read_genre_data(file):
movie_genre_dict = {}
ratings = defaultdict(list)
for line in open(file):
genre, num, movie = line.split('|')
#movie[genre].append(movie)
return genre
readGenre = read_genre_data("genreMovieSample.txt")
print(readGenre)
You need to add to the dictionary, and then return the dictionary. You're just returning the value of genre from the last line of the file.
def read_genre_data(file):
movie_genre_dict = {}
with open(file) as f:
for line in f:
genre, num, movie = line.split('|')
movie_genre_dict[movie] = genre
return movie_genre_dict

How to split a text file into a nested array?

Working on a project creating a python flask website that stores user logins into a text file. I have a text file where each line is one user and each user has 5 parameters stored on the line. All user parameters are separated by a ; character.
Parameters are:
username
password
first name
last name
background color
title
avatar
Sample of the text file:
joebob;pass1;joe;bob;yellow;My title!!;https://upload.wikimedia.org/wikipedia/commons/c/cd/Stick_Figure.jpg
richlong;pass2;rich;long;blue;My title2!!;https://www.iconspng.com/images/stick-figure-walking/stick-figure-walking.jpg
How do I go about storing the parameters into a python array, and how do I access them later when I need to reference log-ins.
Here is what I wrote so far:
accounts = { }
def readAccounts():
file = open("assignment11-account-info.txt", "r")
for accounts in file: #line
tmp = accounts.split(';')
for data in tmp: #data in line
accounts[data[0]] = {
'user': data[0],
'pass': data[1],
'first': data[2],
'last': data[3],
'color': data[4],
'title': data[5],
'avatar': data[6].rstrip()
}
file.close()
You can use the python builtin csv to parse
import csv
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
result = []
for row in reader:
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
res = dict(zip(fields, row))
result.append(res)
Or equivalent but harder to read for a beginner the pythonic list comprehension:
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
result = [ dict(zip(fields, row)) for row in reader ]
Here's what I might do:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
accounts[user] = {
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
}
By using with, the file handle file is closed for you automatically. This is the most "Python"-ic way of doing things.
So long as user is unique, you won't overwrite any entries you put in as you read through the file assignment11-account-info.txt.
If you need to deal with a case where user is repeated in the file assignment11-account-info.txt, then you need to use an array or list ([...]) as opposed to a dictionary ({...}). This is because reusing the value of user will overwrite any previous user entry you add to accounts. Overwriting existing entries is almost always a bad thing when using dictionaries!
If that is the case, I might do the following:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
if user not in accounts:
accounts[user] = []
accounts[user].append({
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
})
In this way, you preserve any cases where user is duplicated.

How to merge three files with common id in pandas?

I have three files which are users.dat, ratings.dat and movies.dat.
users.dat
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
ratings.dat
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
1::1197::3::978302268
1::1287::5::978302039
1::2804::5::978300719
movied.dat
1193::One Flew Over the Cuckoo's Nest (1975)::Drama
661::James and the Giant Peach (1996)::Animation|Children's|Musical
914::My Fair Lady (1964)::Musical|Romance
3408::Erin Brockovich (2000)::Drama
2355::Bug's Life, A (1998)::Animation|Children's|Comedy
1197::Princess Bride, The (1987)::Action|Adventure|Comedy|Romance
1287::Ben-Hur (1959)::Action|Adventure|Drama
2804::Christmas Story, A (1983)::Comedy|Drama
My expected output
1::1193::5::978300760::F::1::10::48067::One Flew Over the Cuckoo's Nest::Drama::1975
1::661::3::978302109::F::1::10::48067::James and the Giant Peach::Animation|Children's|Musical::1996
1::914::3::978301968::F::1::10::48067::My Fair Lady ::Musical|Romance::1964
1::3408::4::978300275::F::1::10::48067::Erin Brockovich ::Drama::2000
1::2355::5::978824291::F::1::10::48067::Bug's Life, A ::Animation|Children's|Comedy::1998
I am trying to merge these files without using pandas. I created three dictionary. User id is a common key. Then, I tried to merge these three files using users keys. But, i did not merge exaclty what i want. Any advice and suggestions will be greatly appreciated
My code
import json
file = open("users.dat","r",encoding = 'utf-8')
users={}
for line in file:
x = line.split('::')
user_id=x[0]
gender=x[1]
age=x[2]
occupation=x[3]
i_zip=x[4]
users[user_id]=gender,age,occupation,i_zip.strip()
file = open("movies.dat","r",encoding='latin-1')
movies={}
for line in file:
x = line.split('::')
movie_id=x[0]
title=x[1]
genre=x[2]
movies[movie_id]=title,genre.strip()
file = open("ratings.dat","r")
ratings={}
for line in file:
x = line.split('::')
a=x[0]
b=x[1]
c=x[2]
d=x[3]
ratings[a]=b,c,d.strip()
newdict = {}
newdict.update(users)
newdict.update(movies)
newdict.update(ratings)
for i in users.keys():
addition = users[i] + movies[i]+ratings[i]
newdict[i] = addition
with open('data.txt', 'w') as outfile:
json.dump(newdict, outfile)
My output like this
{"1": ["F", "1", "10", "48067", "Toy Story (1995)", "Animation|Children's|Comedy", "1246", "4", "978302091"], "2": ["M", "56", "16", "70072", "Jumanji (1995)", "Adventure|Children's|Fantasy", "1247", "5", "978298652"],
First mistake in your code (apart from messed up indents) is that you make a dictionary out of ratings with user ID as a key:
ratings[a]=b,c,d.strip()
For your dataset, dictionary ratings will end up with value { '1': ('2804', '5', '978300719') }. So all but one rating would have been lost since you have only one user.
What you want to do instead is to treat your ratings data as a list, not a dictionary. And the result you are trying to achieve is also an extended version of the ratings, because you will end up with as many rows, as you have scores.
Secondly, you don't need json module, since your desired output is not in JSON format.
Here's a code that does the job:
#!/usr/bin/env python3
# Part 1: collect data from the files
users = {}
file = open("users.dat","r",encoding = 'utf-8')
for line in file:
user_id, gender, age, occupation, i_zip = line.rstrip().split('::')
users[user_id] = (gender, age, occupation, i_zip)
movies={}
file = open("movies.dat","r",encoding='latin-1')
for line in file:
movie_id, title, genre = line.rstrip().split('::')
# Parse year from title
title = title.rstrip()
year = 'N/A'
if title[-1]==')' and '(' in title:
short_title, in_parenthesis = title.rsplit('(', 1)
in_parenthesis = in_parenthesis.rstrip(')').rstrip()
if in_parenthesis.isdigit() and len(in_parenthesis)==4:
# Text in parenthesis has four digits - it must be year
title = short_title.rstrip()
year = in_parenthesis
movies[movie_id] = (title, genre, year)
ratings=[]
file = open("ratings.dat","r")
for line in file:
user_id, movie_id, score, dt = line.rstrip().split('::')
ratings.append((user_id, movie_id, score, dt))
# Part 2: save the output
file = open('output.dat','w',encoding='utf-8')
for user_id, movie_id, score, dt in ratings:
# Get user data from dictionary
gender, age, occupation, i_zip = users[user_id]
# Get movie data from dictionary
title, genre, year = movies[movie_id]
# Merge data into a single string
row = '::'.join([user_id, movie_id, score, dt,
gender, age, occupation, i_zip,
title, genre, year])
# Write to the file
file.write(row + '\n')
file.close()
Part 1 is based on your code, with the main differences that I save the ratings to a list (not dictionary) and that I added parsing of years.
Part 2 is where the output is being saved.
Contents of output.dat file after running the script:
1::1193::5::978300760::F::1::10::48067::One Flew Over the Cuckoo's Nest::Drama::1975
1::661::3::978302109::F::1::10::48067::James and the Giant Peach::Animation|Children's|Musical::1996
1::914::3::978301968::F::1::10::48067::My Fair Lady::Musical|Romance::1964
1::3408::4::978300275::F::1::10::48067::Erin Brockovich::Drama::2000
1::2355::5::978824291::F::1::10::48067::Bug's Life, A::Animation|Children's|Comedy::1998
1::1197::3::978302268::F::1::10::48067::Princess Bride, The::Action|Adventure|Comedy|Romance::1987
1::1287::5::978302039::F::1::10::48067::Ben-Hur::Action|Adventure|Drama::1959
1::2804::5::978300719::F::1::10::48067::Christmas Story, A::Comedy|Drama::1983

save two list in one json file

I'm getting data with two lists and I want to save both of them in one single json file can someone help me.
I'm using selenium
def get_name(self):
name = []
name = self.find_elements_by_class_name ('item-desc')
price = []
price = self.find_elements_by_class_name ('item-goodPrice')
for names in name :
names = (names.text)
#print names
for prices in price :
prices = (prices.text)
#print price
I would create a dictionary and then JSON dumps
An example could be:
import json
def get_name(self):
names = [ name.text for name in self.find_elements_by_class_name('item-desc') ]
prices = [ price.text for price in self.find_elements_by_class_name('item-goodPrice')]
with open('output-file-name.json', 'w') as f:
f.write(json.dumps({'names': names, 'prices': prices}))
EDIT: In the first version of the answer I was only creating the JSON, if you want to create a file as well, you should include what suggested by #Andersson comment

In Python, trying to convert geocoded tsv file into geojson format

trying to convert a geocoded TSV file into JSON format but i'm having trouble with it. Here's the code:
import geojson
import csv
def create_map(datafile):
geo_map = {"type":"FeatureCollection"}
item_list = []
datablock = list(csv.reader(datafile))
for i, line in enumerate(datablock):
data = {}
data['type'] = 'Feature'
data['id'] = i
data['properties']={'title': line['Movie Title'],
'description': line['Amenities'],
'date': line['Date']}
data['name'] = {line['Location']}
data['geometry'] = {'type':'Point',
'coordinates':(line['Lat'], line['Lng'])}
item_list.append(data)
for point in item_list:
geo_map.setdefault('features', []).append(point)
with open("thedamngeojson.geojson", 'w') as f:
f.write(geojson.dumps(geo_map))
create_map('MovieParksGeocode2.tsv')
I'm getting a TypeError:list indices must be integers, not str on the data['properties'] line but I don't understand, isn't that how I set values to the geoJSON fields?
The file I'm reading from has values under these keys: Location Movie Title Date Amenities Lat Lng
The file is viewable here: https://github.com/yongcho822/Movies-in-the-park/blob/master/MovieParksGeocodeTest.tsv
Thanks guys, much appreciated as always.
You have a couple things going on here that need to get fixed.
1.Your TSV contains newlines with double quotes. I don't think this is intended, and will cause some problems.
Location Movie Title Date Amenities Formatted_Address Lat Lng
"
Edgebrook Park, Chicago " A League of Their Own 7-Jun "
Family friendly activities and games. Also: crying is allowed." Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA 41.9998876 -87.7627672
"
2.You don't need the geojson module to dump out JSON - which is all GeoJSON is. Just import json instead.
3.You are trying to read a TSV, but you don't include the delimiter=\t option that is needed for that.
4.You are trying to read keys off the rows, but you aren't using DictReader which does that for you.Hence the TypeError about indices you mention above.
Check out my revised code block below..you still need to fix your TSV to be a valid TSV.
import csv
import json
def create_map(datafile):
geo_map = {"type":"FeatureCollection"}
item_list = []
with open(datafile,'r') as tsvfile:
reader = csv.DictReader(tsvfile,delimiter='\t')
for i, line in enumerate(reader):
print line
data = {}
data['type'] = 'Feature'
data['id'] = i
data['properties']={'title': line['Movie Title'],
'description': line['Amenities'],
'date': line['Date']}
data['name'] = {line['Location']}
data['geometry'] = {'type':'Point',
'coordinates':(line['Lat'], line['Lng'])}
item_list.append(data)
for point in item_list:
geo_map.setdefault('features', []).append(point)
with open("thedamngeojson.geojson", 'w') as f:
f.write(json.dumps(geo_map))
create_map('MovieParksGeocode2.tsv')

Categories

Resources