a loop to read lines that start with specific letters - python

I'm using a for loop to read a file, but I only want to read specific lines, say line that start with "af"and "apn". Is there any built-in feature to achieve this?
How to split this line after reading it ?
How to store the elements from the split into a dictionary?
Lets say the first element of the line after the split is employee ID i store it in the dictionary then the second element is his full name i want to store it in the dictionary too.
So when i use this line "employee_dict{employee_ID}" will i get his full name ?
Thank you.

You can do so very easily
f = open('file.txt', 'r')
employee_dict = {}
for line in f:
if line.startswith("af") or line.startswith("apn"):
emprecords = line.split() #assuming the default separator is a space character
#assuming that all your records follow a common format, you can then create an employee dict
employee = {}
#the first element after the split is employee id
employee_id = int(emprecords[0])
#enter name-value pairs within the employee object - for e.g. let's say the second element after the split is the emp name, the third the age
employee['name'] = emprecords[1]
employee['age'] = emprecords[2]
#store this in the global employee_dict
employee_dict[employee_id] = employee
To retrieve the name of employee id 1 after having done the above use something like:
print employee_dict[1]['name']
Hope this gives you an idea on how to go about

if your file looks like
af, 1, John
ggg, 2, Dave
you could create dict like
d = {z[1].strip() : z[2].strip() for z in [y for y in [x.split(',') for x in open(r"C:\Temp\test1.txt")] if y[0] in ('af', 'apn')]}
More readable version
d = {}
for l in open(r"C:\Temp\test1.txt"):
x = l.split(',')
if x[0] not in ('af', 'apn'): continue
d[x[1].strip()] = x[2].strip()
both solutions give you d = {'1': 'John'} on this example. To get name from the dict, you can do name = d['1']

prefixes = ("af", "apn")
with open('file.txt', 'r') as f:
employee_dict = dict((line.split()[:2]) for line in f if any(line.startswith(p) for p in prefixes))

dictOfNames={}
file = open("filename","r")
for line in file:
if line.startswith('af') or if line.startswith('apn'):
line=line.split(',') #split using delimiter of ','
dictOfNames[line[1]] = line[2] # take 2nd element of line as id and 3rd as name
The program above will read the file and store the second element as id and third as name if it starts with 'af' or 'apn'. Assuming comma is the delimiter.
now you can go with dictOfNames[id] to get the name.

Related

Creating a dictionary using data from a .csv file

I have a .csv file with 20 lines, and each line is formatted as follows:
Lucy, 23, F, diabetes
Darwin, 60, M, hypertension
Dave, 35, M, epilepsy
Tiffany, 12, F, asthma
... and so on.
I am looking to convert this .csv file into a dictionary, presented as follows:
dict = {
'Lucy':{
age: 23
gender: 'F'
condition: 'diabetes'
},
'Darwin':{
age: 60
gender: 'M'
condition: 'hypertension'
},
#(and so on for all 20 lines)
}
Each line is in the form: name, age, gender, condition. Here's what I have tried so far.
dict ={}
f = open("medical.csv', mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
line.split(",")
... and this is where I hit a halt. I cannot figure out how to assign the titles to each value in the line such that the dictionary will be displayed as above, with the tags 'age', 'gender' and 'condition'. And when I run the code, there is a SyntaxError: invalid syntax message on "medical.csv".
The age has to be an integer. If it is not an integer, I want the program to skip that line when creating the dictionary.
Any help would be much appreciated!
I recommend not naming your dictionary keys with the names because the names can be repeated.
At the beginning create the main dict, then iterate over lines in CSV. In each line extract name person properties (You used split method - it fits perfectly here!, but instead of doing split(",") use split(", ")). Create dictionary for each person and assign keys and values to it this way:
person = {}
person['age'] = 23
An so on...
Then assign this person's dictionary as a value to the main dictionary and set the key to person's name. Hope it helps a bit!
I suggest to use the csv module for this purpose. Note the handy skipinitialspace argument.
import csv
from pprint import pprint
def row_to_dict(ts):
return {k: t for k, t in zip(("age", "gender", "condition"), ts)}
if __name__ == "__main__":
result = {}
with open("medical.csv") as f:
reader = csv.reader(f, skipinitialspace=True)
for row in reader:
name, data = row[0], row[1:]
result[name] = row_to_dict(data)
pprint(result)
First of all, please keep in mind that there might be more "pythonic" answers to your problem.
Well, you are on the right path:
dict ={}
f = open("medical.csv", mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
l = line.split(",")
Let's give a name to the result to line.split(",") (l).
Now l is in this format:
l[0] contains the name
l[1] contains the age
l[2] contains the sex
l[3] contains the condition.
Now, the first element of l is the name, so let's add it to the dictionary:
dict[l[0].strip()] = {}
Note:
I'm using l[0].strip() because there might be unwanted whitespace at the beginning or end of it
I'm initializing a new dictionary inside the dictionary (the data structure you want is a dictionary of dictionaies)
Now, let's add in turn the other fields:
dict[l[0].strip()]['gender'] = l[2].strip()
dict[l[0].strip()]['condition'] = l[3].strip()
This works, unless the age is not an integer, so we need to use a try except block for that beforehand:
try:
age = int(l[1].strip())
except ValueError:
continue # You want to skip the current iteration, right?
Now we can put everything together, polishing the code a bit:
dict ={}
f = open("medical.csv", mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
l = line.split(",")
age = -1
try:
age = int(l[1].strip())
except ValueError:
continue
key = l[0].strip()
dict[key]['age'] = age
dict[key]['sex'] = l[2].strip()
dict[key]['condition'] = l[3].strip()
Of course this supposes all the names are different (I've just read firanek's answer: I agree with her/him in the fact that you should not use names as the key, with this approach, you lose all the data about all the people with the same name except for the last one)
Oh, I was almost forgetting about it: you can use the dict constructor and replace the lines dict[keys][<string>] = <thing>:
dict[key] = {'age' = age, 'sex' = l[2].strip(), 'condition' = l[3].strip().
You may want to check out the Pandas library, and manipulate the data with DataFrames as it has lots of built-in functionality.
import pandas as pd
data=pd.read_csv("data.csv", header=None ,names=["Name", "Age", "Gender", "Condition"], index_col=False, na_values=",NaN, null", verbose=True)
data=pd.DataFrame(data)
newdata=data.dropna(subset=['Age'])
print("new data: \n", newdata)
Also a similar question: Pandas: drop columns with all NaN's

reading a file and parse them into section

okay so I have a file that contains ID number follows by name just like this:
10 alex de souza
11 robin van persie
9 serhat akin
I need to read this file and break each record up into 2 fields the id, and the name. I need to store the entries in a dictionary where ID is the key and the name is the satellite data. Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. dict.keys and list.sort might be helpful (I guess). Finally the input filename needs to be the first command-line argument.
Thanks for your help!
I have this so far however can't go any further.
fin = open("ids","r") #Read the file
for line in fin: #Split lines
string = str.split()
if len(string) > 1: #Seperate names and grades
id = map(int, string[0]
name = string[1:]
print(id, name) #Print results
We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list).
Now we open the file (no error handling, you should add that) and read in the lines individually. Now we have 'number firstname secondname'-strings for each line in the list "lines".
Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')).
Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value.
To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id.
import sys
try:
infile = sys.argv[1]
except IndexError:
infile = input('Enter file name: ')
with open(infile, 'r') as file:
lines = file.readlines()
out = {}
for fullstr in lines:
tmp = fullstr.split()
out[tmp[0]] = ' '.join(tmp[1:])
for id in sorted(out, key=int):
print id, out[str(id)]
This works for python 2.7 with ASCII-strings. I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. You may also want to add a lot of error handling in case the input file is somehow formatted differently.
Just a suggestion, this code is probably simpler than the other code posted:
import sys
with open(sys.argv[1], "r") as handle:
lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])
for idx in sorted(data, key=int):
print idx, data[idx]

How to fetch these below values in python

I have a file in the below format
.aaa b/b
.ddd e/e
.fff h/h
.lop m/n
I'm trying to read this file. My desired output is if I find ".aaa" I should get b/b, if I find ".ddd" I should get e/e and so on.
I know how to fetch 1st column and 2nd column but I don't know how to compare them and fetch the value. This is what I've written.
file = open('some_file.txt')
for line in file:
fields = line.strip().split()
print (fields[0]) #This will give 1st column
print (fields[1]) # This will give 2nd column
This is not the right way of doing things. What approach follow?
Any time you want to do lookups, a dictionary is going to be your friend.
You could write a function to load the data into a dictionary:
def load_data(filename):
result = dict()
with open(filename, 'r') as f:
for line in f:
k,v = line.strip().split() # will fail if not exactly 2 fields
result[k] = v
return result
And then use it to perform your lookups like this:
data = load_data('foo.txt')
print data['.aaa']
It sounds like what you may want is to build a dictionary mapping column 1 to column 2. You could try:
file = open('some_file.txt')
field_dict = {}
for line in file:
fields = line.strip().split()
field_dict[fields[0]] = fields[1]
Then in your other code, when you see '.ddd' you can simply get the reference from the dictionary (e.g. field_dict['.ddd'] should return 'e/e')
Just do splitting on each line according to the spaces and check whether the first item matches the word you gave. If so then do printing the second item from the list.
word = input("Enter the word to search : ")
with open(file) as f:
for line in f:
m = line.strip().split()
if m[0] == word:
print m[1]

How to read and store values from a text file into a dictionary. [python]

I'm trying to figure out how to open up a file and then store it's contents into a dictionary using the Part no. as the key and the other information as the value. So I want it to look something like this:
{Part no.: "Description,Price", 453: "Sperving_Bearing,9900", 1342: "Panametric_Fan,23400",9480: "Converter_Exchange,93859"}
I was able to store the text from the file into a list, but I'm not sure how to assign more than one value to a key. I'm trying to do this without importing any modules. I've been using the basic str methods, list methods and dict methods.
For a txt file like so
453 Sperving_Bearing 9900
1342 Panametric_Fan 23400
9480 Converter_Exchange 93859
You can just do
>>> newDict = {}
>>> with open('testFile.txt', 'r') as f:
for line in f:
splitLine = line.split()
newDict[int(splitLine[0])] = ",".join(splitLine[1:])
>>> newDict
{9480: 'Converter_Exchange,93859', 453: 'Sperving_Bearing,9900', 1342: 'Panametric_Fan,23400'}
You can get rid of the ----... line by just checking if line.startswith('-----').
EDIT - If you are sure that the first two lines contain the same stuff, then you can just do
>>> testDict = {"Part no.": "Description,Price"}
>>> with open('testFile.txt', 'r') as f:
_ = next(f)
_ = next(f)
for line in f:
splitLine = line.split()
testDict[int(splitLine[0])] = ",".join(splitLine[1:])
>>> testDict
{9480: 'Converter_Exchange,93859', 'Part no.': 'Description,Price', 453: 'Sperving_Bearing,9900', 1342: 'Panametric_Fan,23400'}
This adds the first line to the testDict in the code and skips the first two lines and then continues on as normal.
You can read a file into a list of lines like this:
lines = thetextfile.readlines()
You can split a single line by spaces using:
items = somestring.split()
Here's a principial example how to store a list into a dictionary:
>>>mylist = [1, 2, 3]
>>>mydict = {}
>>>mydict['hello'] = mylist
>>>mydict['world'] = [4,5,6]
>>>print(mydict)
Containers like a tuple, list and dictionary can be nested into each other as their items.
To itereate a list you have to use a for statement like this:
for item in somelist:
# do something with the item like printing it
print item
Here's my stab at it, tested on Python 2.x/3.x:
import re
def str2dict(filename="temp.txt"):
results = {}
with open(filename, "r") as cache:
# read file into a list of lines
lines = cache.readlines()
# loop through lines
for line in lines:
# skip lines starting with "--".
if not line.startswith("--"):
# replace random amount of spaces (\s) with tab (\t),
# strip the trailing return (\n), split into list using
# "\t" as the split pattern
line = re.sub("\s\s+", "\t", line).strip().split("\t")
# use first item in list for the key, join remaining list items
# with ", " for the value.
results[line[0]] = ", ".join(line[1:])
return results
print (str2dict("temp.txt"))
You should store the values as a list or a tuple. Something like this:
textname = input("ENter a file")
thetextfile = open(textname,'r')
print("The file has been successfully opened!")
thetextfile = thetextfile.read()
file_s = thetextfile.split()
holder = []
wordlist = {}
for c in file_s:
wordlist[c.split()[0]] = c.split()[1:]
Your file should look like this:
Part no.;Description,Price
453;Sperving_Bearin,9900
1342;Panametric_Fan,23400
9480;Converter_Exchange,93859
Than you just need to add a bit of code:
d = collections.OrderedDict()
reader = csv.reader(open('your_file.txt','r'),delimiter=';')
d = {row[0]:row[1].strip() for row in reader}
for x,y in d.items():
print x
print y

Making a dictionary from file, first word is key in each line then other four numbers are to be a tuple value

This dictionary is supposed to take the three letter country code of a country, i.e, GRE for great britain, and then take the four consecutive numbers after it as a tuple. it should be something like this:
{GRE:(204,203,112,116)} and continue doing that for every single country in the list. The txt file goes down like so:
Country,Games,Gold,Silver,Bronze
AFG,13,0,0,2
ALG,15,5,2,8
ARG,40,18,24,28
ARM,10,1,2,9
ANZ,2,3,4,5 etc.;
This isn't actually code i just wanted to show it is formatted.
I need my program to skip the first line because it's a header. Here's what my code looks like thus far:
def medals(goldMedals):
infile = open(goldMedals, 'r')
medalDict = {}
for line in infile:
if infile[line] != 0:
key = line[0:3]
value = line[3:].split(',')
medalDict[key] = value
print(medalDict)
infile.close()
return medalDict
medals('GoldMedals.txt')
Your for loop should be like:
next(infile) # Skip the first line
for line in infile:
words = line.split(',')
medalDict[words[0]] = tuple(map(int, words[1:]))
A variation on a theme, I'd convert all the remaining cols to ints, and I'd use a namedtuple:
from collections import namedtuple
with open('file.txt') as fin:
# The first line names the columns
lines = iter(fin)
columns = lines.next().strip().split(',')
row = namedtuple('Row', columns[1:])
results = {}
for line in lines:
columns = line.strip().split(',')
results[columns[0]] = row(*(int(c) for c in columns[1:]))
# Results is now a dict to named tuples
This has the nice feature of 1) skipping the first line and 2) providing both offset and named access to the rows:
# These both work to return the 'Games' column
results['ALG'].Games
results['ALG'][0]
with open('path/to/file') as infile:
answer = {}
for line in infile:
k,v = line.strip().split(',',1)
answer[k] = tuple(int(i) for i in v.split(','))
I think inspectorG4dget's answer is the most readable... but for those playing code golf:
with open('medals.txt', 'r') as infile:
headers = infile.readline()
dict([(i[0], tuple(i[1:])) for i in [list(line.strip().split(',')) for line in infile]])

Categories

Resources