Python Splitting A String to make several keys for a dictionary - python

so I am trying to write a function that will read a text file, extract the information it needs from a line of text, and then assign that information to a key in a python dictionary. However here is a problem i have.
def read_star_names(filename):
"""
Given the name of a file containing a star catalog in CSV format, produces a dictionary
where the keys are the names of the stars and the values are Henry Draper numbers as integers.
If a star has more than one name, each name will appear as a key
in the dictionary. If a star does not have a name it will not be
represented in this dictionary.
example return: {456: 'BETA', 123: 'ALPHA', 789: 'GAMMA;LITTLE STAR'}
"""
result_name = {}
starfile = open(filename, 'r')
for dataline in starfile:
items = dataline.strip().split(',')
draper = int(items[3])
name = str(items[6])
result_name[name] = draper
starfile.close()
return result_name
This is attempting to read this:
0.35,0.45,0,123,2.01,100,ALPHA
-0.15,0.25,0,456,3.2,101,BETA
0.25,-0.1,0,789,4.3,102,GAMMA;LITTLE STAR
The problem I am having is that what it returns is this:
{'ALPHA': 123, 'GAMMA;LITTLE STAR': 789, 'BETA': 456}
I want the GAMMA and the LITTLE STAR, to be seperate keys, but still refer to the same number, 789.
How should I proceed?
I tried splitting the line of text at the semicolon but then that added indexes and I had a hard time managing them.
Thanks.

You already have isolated the part that contains all the names, all you need to do is separate the names and make separate keys for each of them, as so
for i in name.split(";"):
result_name[i] = draper

Related

Python: How to convert CSV file to lists of dictionaries without importing reader or external libraries

I need to convert a CSV file into a list of dictionaries without importing CSV or other external libraries for a project I am doing for class.
Attempt
I am able to get the keys using header line but when I try to extract the values it goes row by row instead of column by column and starts in the wrong place. However when I append it to the list it goes back to starting at the right place. However I am unsure of how to connect the keys to the correct column in the list.
CSV file
This is the CSV file I am using, I am only using the descriptions portion up to the first comma.
I tried using a for 6 loop in order to cycle through each key but it seems to go row by row and I don't know how to change it.
If anybody could steer me in the right direction it would be very appreciated.
CSV sample - sample is not saving correctly but it has the three headers on top and then the three matching information below and so on.
(Code,Name,State)\n
(ACAD,Acadia National Park,ME)\n
(ARCH,Arches National Park,UT)\n
(BADL, Badlands National Park,SD)\n
read your question. I am posting code from what I understood from your question. You should learn to post the code in question. It is a mandatory skill. Always open a file using the "with" block. I made a demo CSV file with two rows of records. The following code fetched all the rows as a list of dictionaries.
def readParksFile(fileName="national_parks.csv"):
with open(fileName) as infile:
column_names = infile.readline()
keys = column_names.split(",")
number_of_columns = len(keys)
list_of_dictionaries = []
data = infile.readlines()
list_of_rows = []
for row in data:
list_of_rows.append(row.split(","))
infile.close()
for item in list_of_rows:
row_as_a_dictionary = {}
for i in range(number_of_columns):
row_as_a_dictionary[keys[i]] = item[i]
list_of_dictionaries.append(row_as_a_dictionary)
for i in range(len(list_of_dictionaries)):
print(list_of_dictionaries[i])
Output:
{'Code': 'cell1', 'Name': 'cell2', 'State': 'cell3', 'Acres': 'cell4', 'Latitude': 'cell5', 'Longitude': 'cell6', 'Date': 'cell7', 'Description\n': 'cell8\n'}
{'Code': 'cell11', 'Name': 'cell12', 'State': 'cell13', 'Acres': 'cell14', 'Latitude': 'cell15', 'Longitude': 'cell16', 'Date': 'cell17', 'Description\n': 'cell18'}
I would create a class with a constructor that has the keys from the first row of the CSV as properties. Then create an empty list to store your dictionaries. Then open the file (that is a built-in library so I assume you can use it) and read it line by line. Store the line as a string and use the split method with a comma as the delimiter and store that list in a variable. Call the constructor of your class for each line to construct your dictionary using the indexes of the list from the split method. Before reading the next line, append the dictionary to your list. This is probably not the easiest way to do it but it doesn't use any external libraries (although as others have mentioned, there is a built-in CSV module).
Code:
#Class with constructor
class Park:
def __init__(self, code, name, state):
self.code = code
self.name = name
self.state = state
#Empty array for storing the dictionaries
parks = []
#Open file
parks_csv = open("parks.csv")
#Skip first line
lines = parks_csv.readlines()[1:]
#Read the rest of the lines
for line in lines:
parkProperties = line.split(",")
newPark = Park(parkProperties[0], parkProperties[1], parkProperties[2])
parks.append(newPark)
#Print park dictionaries
#It would be easier to parse this using the JSON library
#But since you said you can't use any libraries
for park in parks:
print(f'{{code: {park.code}, name: {park.name}, state: {park.state}}}')
#Don't forget to close the file
parks_csv.close()
Output:
{code: ACAD, name: Acadia National Park, state: ME}
{code: ARCH, name: Arches National Park, state: UT}
{code: BADL, name: Badlands National Park, state: SD}

How to write function that measures frequency of each line (objects) - Python

Write a function create_dictionary(filename) that reads the named file and returns a dictionary mapping from object names to occurrence counts (the number of times the particular object was guessed). For example, given a file mydata.txt containing the following:
abacus
calculator
modern computer
abacus
modern computer
large white thing
modern computer
So, when I enter this:
dictionary = create_dictionary('mydata.txt')
for key in dictionary:
print(key + ': ' + str(dictionary[key]))
The function must return the following dictionary format:
{'abacus': 2, 'calculator': 1, 'modern computer': 3, 'large white thing': 1}
Among other things, I know how to count the frequency of words. But how does one count the frequency of each line as above?
Here are some constraints:
You may assume the given file exists, but it may be empty (i.e.
containing no lines).
Keys must be inserted into the dictionary in the order in which they
appear in the input file.
In some of the tests we display the keys in insertion order; in others were sort the keys alphabetically.
Leading and trailing whitespace should be stripped from object names
Empty object names (e.g. blank lines or lines with only whitespace)
should be ignored.
One easier way to achieve is use the following
Let the file name a.txt
from collections import Counter
s = open('a.txt','r').read().strip()
print(Counter(s.split('\n')))
The output will be as follows:
Counter({'abacus': 2,
'calculator': 1,
'large white thing': 1,
'modern computer': 3})
Further to what #bigbounty has suggested, here what I could come up with.
from collections import Counter
def create_dictionary(filename):
"""Blah"""
keys = Counter()
s = open(filename,'r').read().strip()
keys = (Counter(s.split('\n')))
return keys
So, if I type:
dictionary = create_dictionary('mydata.txt')
for key in dictionary:
print(key + ': ' + str(dictionary[key]))
I get:
abacus: 2
calculator: 1
modern computer: 3
large white thing: 1
But I need some help with "How to print nothing if the text file is empty?"
For example: consider an empty text file ('nothing.txt'). The expected output is blank. But I dont know how to omit the default value ' : 1' for keys. Any advise?

How do I change values in a dictionary in one key?

For example, I have a key = name. Within name, there's birthday, age, and phone number. I would only like to change say birthday but keep the rest. And I'm trying to use an existing file that have the names already.
So if you want to use a tuple as a key the syntax would be
d[(name, phoneNumber)] = birthday
as far as using values in a pre-existing file you will need the open method.
file = open("fileName.csv",r) # assuming that you have a file of comma separated values
text = file.read()
.split("\n")
peps = [p.split(",") for p in text]
dictionary = {}
for p in peps:
d[(p[column of name] , p[col# for phone number)]]=p[#Bday]
Roughly. There are several types of collections in python, {}, [], (), hash and set. I would recommend that you read up on each here

retrieving name from number ID

I have a code that takes data from online where items are referred to by a number ID, compared data about those items, and builds a list of item ID numbers based on some criteria. What I'm struggling with is taking this list of numbers and turning it into a list of names. I have a text file with the numbers and corresponding names but am having trouble using it because it contains multi-word names and retains the \n at the end of each line when i try to parse the file in any way with python. the text file looks like this:
number name\n
14 apple\n
27 anjou pear\n
36 asian pear\n
7645 langsat\n
I have tried split(), as well as replacing the white space between with several difference things to no avail. I asked a question earlier which yielded a lot of progress but still didn't quite work. The two methods that were suggested were:
d = dict()
f=open('file.txt', 'r')
for line in f:
number, name = line.split(None,1)
d[number] = name
this almost worked but still left me with the \n so if I call d['14'] i get 'apple\n'. The other method was:
import re
f=open('file.txt', 'r')
fr=f.read()
r=re.findall("(\w+)\s+(.+)", fr)
this seemed to have gotten rid of the \n at the end of every name but leaves me with the problem of having a tuple with each number-name combo being a single entry so if i were to say r[1] i would get ('14', 'apple'). I really don't want to delete each new line command by hand on all ~8400 entries...
Any recommendations on how to get the corresponding name given a number from a file like this?
In your first method change the line ttn[number] = name to ttn[number] = name[:-1]. This simply strips off the last character, and should remove your \n.
names = {}
with open("id_file.txt") as inf:
header = next(inf, '') # skip header row
for line in inf:
id, name = line.split(None, 1)
names[int(id)] = name.strip()
names[27] # => 'anjou pear'
Use this to modify your first approach:
raw_dict = dict()
cleaned_dict = dict()
Assuming you've imported file to dictionary:
raw_dict = {14:"apple\n",27:"anjou pear\n",36 :"asian pear\n" ,7645:"langsat\n"}
for keys in raw_dict:
cleaned_dict[keys] = raw_dict[keys][:len(raw_dict[keys])-1]
So now, cleaned_dict is equal to:
{27: 'anjou pear', 36: 'asian pear', 7645: 'langsat', 14: 'apple'}
*Edited to add first sentence.

How to read from a file into a dict with string key and tuple value?

For an assignment, I'm creating a program that retrieves from a file information regarding Olympic countries and their medal count.
One of my functions goes through a list in this format:
Country,Games,Gold,Silver,Bronze
AFG,13,0,0,2
ALG,15,5,2,8
ARG,40,18,24,28
ARM,10,1,2,9
ANZ,2,3,4,5
The function needs to go through this list, and store into a dictionary with the country name as a key, and the remaining four entries as a tuple.
Here is what I am working with so far:
def medals(string):
'''takes a file, and gathers up the country codes and their medal counts
storing them into a dictionary'''
#creates an empty dictionary
medalDict = {}
#creates an empty tuple
medalCount = ()
#These following two lines remove the column headings
with open(string) as fin:
next(fin)
for eachline in fin:
code, medal_count = eachline.strip().split(',',1)
medalDict[code] = medal_count
return medalDict
Now, the intent is for the entries to look something like this
{'AFG': (13, 0, 0, 2)}
Instead, I'm getting
{'AFG': '13,0,0,2'}
It looks like it is being stored as a string, and not a tuple. Is it something to do with the
medalDict[code] = medal_count
line of code? I'm not too sure how to convert that into separate integer values for a tuple neatly.
You are storing the whole string '13,0,0,2' as value, so
medalDict[code] = medal_count
should be replaced by:
medalDict[code] = tuple(medal_count.split(','))
Your original thought is correct, with this line being the sole exception. What is changed is now it splits the '13,0,0,2' into a list ['13', '0', '0', '2'] and converts it into a tuple.
You can also do this to convert strings inside into integers:
medalDict[code] = tuple([int(ele) for ele in medal_count.split(',')])
But make sure your medal_count contains only integers.
This line:
code, medal_count = eachline.strip().split(',',1)
... is splitting the whitespace-stripped eachline, 1 time, on ',', then storing the resulting two strings into code and medal_count ... so yes, medal_count contains a string.
You could handle this one of two ways:
Add a line along the lines of:
split_counts = tuple(medal_count.split(','))
... and then use split_counts from there on in the code, or
(in Python 3) Change the line above to
code, *medal_count = eachline.strip().split(',')
... which makes use of Extended iterable unpacking (and will give you a list, so if a tuple is necessary it'll need to be converted).
Your Problem seems to be this:
split(',',1)
# should be
split(',')
because split(..., 1) only makes 1 split and split(...) splits as much as possible.
So you should be able to do this:
for eachline in fin:
code, *medal_count = eachline.strip().split(',')
medalDict[code] = medal_count

Categories

Resources