Turning a CSV file into a dictionary - python

Table
I am trying to make a dictionary using the values in the table above. I am trying to use 'Genre' as the Key and then a list of tuples for the name, publisher, and platform. variable explorer
D1[genre] = { genre: [(name, publisher, platform),..],..}
My Code:
import csv
fp = open('video_game_sales_tiny.csv','r')
fp.readline()
data_reader = csv.reader(fp)
D1 = {}
for line in data_reader:
name = line[0].lower()
platform = line[1]
year = line[2]
genre = line[3].lower()
publisher = line[4].lower()
D1[genre] = [name, publisher, platform, year]
There are multiple Genres with the same name, and when the loop gets to a genre that matches the Key, it copies over dictionary instead of adding a tuple to the dictionary.
I am trying to make the dictionary look like:
D1 = { Puzzle: [ (Pac-man, Atari, 2600, 1982),(BurgerTime, Mattel Interactive, 2600, 1981), (Q*bert, Parker Bros, 2600, 1982), Shooter: [ (),(), ()], Action: [ (),(), ()] }

You need to make the values of your dictionary an array of tuples. Then you can append new tuples instead of overwriting them. Here is an example using your code:
for line in data_reader:
name = line[0].lower()
platform = line[1]
year = line[2]
genre = line[3].lower()
publisher = line[4].lower()
if genre in D1:
D1[genre].append((name, publisher, platform, year))
else:
D1[genre] = [(name, publisher, platform, year)]

for line in data_reader:
name = line[0].lower()
platform = line[1]
year = line[2]
genre = line[3].lower()
publisher = line[4].lower()
if genre in D1:
D1[genre].append((name, publisher, platform, year))
else:
D1[genre] = [(name, publisher, platform, year)]

Related

Getting KeyError when trying to assign data from a dictionary into class and object in python

The file, data used
Austin = null|Stone Cold Austin|996003892|987045321|Ireland
keller = null|Mathew Keller|02/05/2002|0199999999|0203140819|019607892|9801 2828 5596 0889
The Nested Dictionary
data = {'Austin': {'Full Name': 'Stone Cold Steve Austin', 'Contact Details': '996003892', 'Emergency Contact Number': '987045321', Country: 'Ireland'}}
The class and Object that I want to use to assign the dict data
class member2:
def __init__(self, realname, phone, emergencyContact, country):
self.realname = realname
self.phone = phone
self.emergencyContact = emergencyContact
self.country = country
Assigning text file data into a nested dictionary
with open("something.txt", 'r') as f:
for line in f:
key, values = line.strip().split(" = ") # note the space around =, to avoid trailing space in key
values = values.split('|')
data2 = {key: dict(zip(keys, values[1:]))}
#To assign data to the class (NOT WORKING)
member2.realname = data2[values[2]]
print(member2)
if key == username:
data2 = {key: dict(zip(keys, values[1:]))}
Output
member2.realname = data2[values[2]]
KeyError: 'Stone Cold Steve Austin'
You are referring non existing key 'Stone Cold Steve Austin'
Maybe you wish to access something like data2[key][keys[0]]:
keys = ["Full Name", "Contact Details", "Emergency Contact Number", "Country"]
with open("we.txt", 'r') as f:
for line in f:
key, values = line.strip().split(" = ") # note the space around =, to avoid trailing space in key
values = values.split('|')
data2 = {key: dict(zip(keys, values[1:]))}
print(data2[key][keys[0]])
Output:
Stone Cold Austin
Mathew Keller

How to split a text file into a nested array?

Working on a project creating a python flask website that stores user logins into a text file. I have a text file where each line is one user and each user has 5 parameters stored on the line. All user parameters are separated by a ; character.
Parameters are:
username
password
first name
last name
background color
title
avatar
Sample of the text file:
joebob;pass1;joe;bob;yellow;My title!!;https://upload.wikimedia.org/wikipedia/commons/c/cd/Stick_Figure.jpg
richlong;pass2;rich;long;blue;My title2!!;https://www.iconspng.com/images/stick-figure-walking/stick-figure-walking.jpg
How do I go about storing the parameters into a python array, and how do I access them later when I need to reference log-ins.
Here is what I wrote so far:
accounts = { }
def readAccounts():
file = open("assignment11-account-info.txt", "r")
for accounts in file: #line
tmp = accounts.split(';')
for data in tmp: #data in line
accounts[data[0]] = {
'user': data[0],
'pass': data[1],
'first': data[2],
'last': data[3],
'color': data[4],
'title': data[5],
'avatar': data[6].rstrip()
}
file.close()
You can use the python builtin csv to parse
import csv
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
result = []
for row in reader:
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
res = dict(zip(fields, row))
result.append(res)
Or equivalent but harder to read for a beginner the pythonic list comprehension:
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
result = [ dict(zip(fields, row)) for row in reader ]
Here's what I might do:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
accounts[user] = {
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
}
By using with, the file handle file is closed for you automatically. This is the most "Python"-ic way of doing things.
So long as user is unique, you won't overwrite any entries you put in as you read through the file assignment11-account-info.txt.
If you need to deal with a case where user is repeated in the file assignment11-account-info.txt, then you need to use an array or list ([...]) as opposed to a dictionary ({...}). This is because reusing the value of user will overwrite any previous user entry you add to accounts. Overwriting existing entries is almost always a bad thing when using dictionaries!
If that is the case, I might do the following:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
if user not in accounts:
accounts[user] = []
accounts[user].append({
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
})
In this way, you preserve any cases where user is duplicated.

How to merge three files with common id in pandas?

I have three files which are users.dat, ratings.dat and movies.dat.
users.dat
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
1::F::1::10::48067
ratings.dat
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
1::1197::3::978302268
1::1287::5::978302039
1::2804::5::978300719
movied.dat
1193::One Flew Over the Cuckoo's Nest (1975)::Drama
661::James and the Giant Peach (1996)::Animation|Children's|Musical
914::My Fair Lady (1964)::Musical|Romance
3408::Erin Brockovich (2000)::Drama
2355::Bug's Life, A (1998)::Animation|Children's|Comedy
1197::Princess Bride, The (1987)::Action|Adventure|Comedy|Romance
1287::Ben-Hur (1959)::Action|Adventure|Drama
2804::Christmas Story, A (1983)::Comedy|Drama
My expected output
1::1193::5::978300760::F::1::10::48067::One Flew Over the Cuckoo's Nest::Drama::1975
1::661::3::978302109::F::1::10::48067::James and the Giant Peach::Animation|Children's|Musical::1996
1::914::3::978301968::F::1::10::48067::My Fair Lady ::Musical|Romance::1964
1::3408::4::978300275::F::1::10::48067::Erin Brockovich ::Drama::2000
1::2355::5::978824291::F::1::10::48067::Bug's Life, A ::Animation|Children's|Comedy::1998
I am trying to merge these files without using pandas. I created three dictionary. User id is a common key. Then, I tried to merge these three files using users keys. But, i did not merge exaclty what i want. Any advice and suggestions will be greatly appreciated
My code
import json
file = open("users.dat","r",encoding = 'utf-8')
users={}
for line in file:
x = line.split('::')
user_id=x[0]
gender=x[1]
age=x[2]
occupation=x[3]
i_zip=x[4]
users[user_id]=gender,age,occupation,i_zip.strip()
file = open("movies.dat","r",encoding='latin-1')
movies={}
for line in file:
x = line.split('::')
movie_id=x[0]
title=x[1]
genre=x[2]
movies[movie_id]=title,genre.strip()
file = open("ratings.dat","r")
ratings={}
for line in file:
x = line.split('::')
a=x[0]
b=x[1]
c=x[2]
d=x[3]
ratings[a]=b,c,d.strip()
newdict = {}
newdict.update(users)
newdict.update(movies)
newdict.update(ratings)
for i in users.keys():
addition = users[i] + movies[i]+ratings[i]
newdict[i] = addition
with open('data.txt', 'w') as outfile:
json.dump(newdict, outfile)
My output like this
{"1": ["F", "1", "10", "48067", "Toy Story (1995)", "Animation|Children's|Comedy", "1246", "4", "978302091"], "2": ["M", "56", "16", "70072", "Jumanji (1995)", "Adventure|Children's|Fantasy", "1247", "5", "978298652"],
First mistake in your code (apart from messed up indents) is that you make a dictionary out of ratings with user ID as a key:
ratings[a]=b,c,d.strip()
For your dataset, dictionary ratings will end up with value { '1': ('2804', '5', '978300719') }. So all but one rating would have been lost since you have only one user.
What you want to do instead is to treat your ratings data as a list, not a dictionary. And the result you are trying to achieve is also an extended version of the ratings, because you will end up with as many rows, as you have scores.
Secondly, you don't need json module, since your desired output is not in JSON format.
Here's a code that does the job:
#!/usr/bin/env python3
# Part 1: collect data from the files
users = {}
file = open("users.dat","r",encoding = 'utf-8')
for line in file:
user_id, gender, age, occupation, i_zip = line.rstrip().split('::')
users[user_id] = (gender, age, occupation, i_zip)
movies={}
file = open("movies.dat","r",encoding='latin-1')
for line in file:
movie_id, title, genre = line.rstrip().split('::')
# Parse year from title
title = title.rstrip()
year = 'N/A'
if title[-1]==')' and '(' in title:
short_title, in_parenthesis = title.rsplit('(', 1)
in_parenthesis = in_parenthesis.rstrip(')').rstrip()
if in_parenthesis.isdigit() and len(in_parenthesis)==4:
# Text in parenthesis has four digits - it must be year
title = short_title.rstrip()
year = in_parenthesis
movies[movie_id] = (title, genre, year)
ratings=[]
file = open("ratings.dat","r")
for line in file:
user_id, movie_id, score, dt = line.rstrip().split('::')
ratings.append((user_id, movie_id, score, dt))
# Part 2: save the output
file = open('output.dat','w',encoding='utf-8')
for user_id, movie_id, score, dt in ratings:
# Get user data from dictionary
gender, age, occupation, i_zip = users[user_id]
# Get movie data from dictionary
title, genre, year = movies[movie_id]
# Merge data into a single string
row = '::'.join([user_id, movie_id, score, dt,
gender, age, occupation, i_zip,
title, genre, year])
# Write to the file
file.write(row + '\n')
file.close()
Part 1 is based on your code, with the main differences that I save the ratings to a list (not dictionary) and that I added parsing of years.
Part 2 is where the output is being saved.
Contents of output.dat file after running the script:
1::1193::5::978300760::F::1::10::48067::One Flew Over the Cuckoo's Nest::Drama::1975
1::661::3::978302109::F::1::10::48067::James and the Giant Peach::Animation|Children's|Musical::1996
1::914::3::978301968::F::1::10::48067::My Fair Lady::Musical|Romance::1964
1::3408::4::978300275::F::1::10::48067::Erin Brockovich::Drama::2000
1::2355::5::978824291::F::1::10::48067::Bug's Life, A::Animation|Children's|Comedy::1998
1::1197::3::978302268::F::1::10::48067::Princess Bride, The::Action|Adventure|Comedy|Romance::1987
1::1287::5::978302039::F::1::10::48067::Ben-Hur::Action|Adventure|Drama::1959
1::2804::5::978300719::F::1::10::48067::Christmas Story, A::Comedy|Drama::1983

Converting a text file into csv file using python

I have a requirement where in I need to convert my text files into csv and am using python for doing it. My text file looks like this ,
Employee Name : XXXXX
Employee Number : 12345
Age : 45
Hobbies: Tennis
Employee Name: xxx
Employee Number :123456
Hobbies : Football
I want my CSV file to have the column names as Employee Name, Employee Number , Age and Hobbies and when a particular value is not present it should have a value of NA in that particular place. Any simple solutions to do this? Thanks in advance
You can do something like this:
records = """Employee Name : XXXXX
Employee Number : 12345
Age : 45
Hobbies: Tennis
Employee Name: xxx
Employee Number :123456
Hobbies : Football"""
for record in records.split('Employee Name'):
fields = record.split('\n')
name = 'NA'
number = 'NA'
age = 'NA'
hobbies = 'NA'
for field in fields:
field_name, field_value = field.split(':')
if field_name == "": # This is employee name, since we split on it
name = field_value
if field_name == "Employee Number":
number = field_value
if field_name == "Age":
age = field_value
if field_name == "Hobbies":
hobbies = field_value
Of course, this method assumes that there is (at least) Employee Name field in every record.
Maybe this helps you get started? It's just the static output of the first employee data. You would now need to wrap this into some sort of iteration over the file. There is very very likely a more elegant solution, but this is how you would do it without a single import statement ;)
with open('test.txt', 'r') as f:
content = f.readlines()
output_line = "".join([line.split(':')[1].replace('\n',';').strip() for line in content[0:4]])
print(output_line)
I followed very simple steps for this and may not be optimal but solves the problem. Important case here I can see is there can be multiple keys ("Employee Name" etc) in single file.
Steps
Read txt file to list of lines.
convert list to dict(logic can be more improved or complex lambdas can be added here)
Simply use pandas to convert dict to csv
Below is the code,
import pandas
etxt_file = r"test.txt"
txt = open(txt_file, "r")
txt_string = txt.read()
txt_lines = txt_string.split("\n")
txt_dict = {}
for txt_line in txt_lines:
k,v = txt_line.split(":")
k = k.strip()
v = v.strip()
if txt_dict.has_key(k):
list = txt_dict.get(k)
else:
list = []
list.append(v)
txt_dict[k]=list
print pandas.DataFrame.from_dict(txt_dict, orient="index")
Output:
0 1
Employee Number 12345 123456
Age 45 None
Employee Name XXXXX xxx
Hobbies Tennis Football
I hope this helps.

list indices must be integers, not str 6

I am very new to python and am really struggling to find a solution to this issue.
I just don't understand why I need to include only integers in my list when I though they are supposed to support multiple data types.
I've got a very simple field entry system for an account registration and I just can't add the items into a list.
Any help would be greatly appreciated. I've have included my code and the message I receive.
useraccounts = {}
group = []
forename = input('Forename: ')
surname = input('Surname: ')
DOB = input('DOB: ')
stu_class = input('Class: ')
group['forename'] = forename
group['surname'] = surname
group['dob'] = DOB
group['class'] = stu_class
group.append(user accounts)
This is the error message:
Traceback (most recent call last):
File "/Users/admin/Documents/Homework/Computing/testing/testing.py", line 11, in <module>
group['forename'] = forename
TypeError: list indices must be integers, not str
It looks like you want group to be a dict, and useraccounts to be a list. You have them backwards, as well as the append:
useraccounts = [] # <-- list
group = {} # <-- dict
forename = input('Forename: ')
surname = input('Surname: ')
DOB = input('DOB: ')
stu_class = input('Class: ')
group['forename'] = forename
group['surname'] = surname
group['dob'] = DOB
group['class'] = stu_class
useraccounts.append(group) # <-- reversed. this will append group to useraccounts
As written, you were trying to append useraccuonts, an empty list, to group, a dict which has no append method
What you want is a dictionary:
group = {}
group['forename'] = forename
group['surname'] = surname
group['dob'] = DOB
group['class'] = stu_class
In your original code useraccounts stays an empty dict that you just append to the list. If you wanted to add group to useraccounts:
useraccounts['key'] = group
group is a list, it cannot take string indices. It looks like you wanted to use a dictionary instead:
useraccounts = []
group = {}
group['forename'] = forename
group['surname'] = surname
group['dob'] = DOB
group['class'] = stu_class
useraccounts.append(group)
Note that you probably wanted useraccounts to be the list here; your code tried to call .append() on the group object..
or inline the keys and values directly into the dictionary definition:
useraccounts.append({
'forename': forename,
'surname': surname,
'dob']: DOB,
'class': stu_class})

Categories

Resources