Related
need some help in narrow down my issue.
here is the issue, i have one csv file which i have converted into dictonary.
csv format
switch1,10.222.197.8/29,255.255.255.248,ab14-host_mini,100,a05-02,aad
ab14-app2-1-aad.pps.ddc.net,10.222.197.10,255.255.255.248,ab14-host_mini,100,a05-02,aad
ab14-app2-3-aad.pps.ddc.net,10.222.197.11,255.255.255.248,ab14-host_mini,100,a05-02,aad
ab14-cbatch2-1-aad.pps.ddc.net,10.222.197.12,255.255.255.248,ab14-host_mini,100,a05-02,aad
switch2,10.222.197.24/29,255.255.255.248,ab14-host_mini,100,b05-02,aad
vl100--asw1-b05-02-aad.net.ddc.net,10.222.197.25,255.255.255.248,ab14-host_mini,100,b05-02,aad
ab14-app2-4-aad.pps.ddc.net,10.222.197.26,255.255.255.248,ab14-host_mini,100,b05-02,aad
ab14-app2-5-aad.pps.ddc.net,10.222.197.27,255.255.255.248,ab14-host_mini,100,b05-02,aad
switch3,10.222.197.0/29,255.255.255.248,ab14-host_mini,100,b09-02,aad
vl100--asw1-b09-02-aad.net.ddc.net,10.222.197.1,255.255.255.248,ab14-host_mini,100,b09-02,aad
ab14-app1-2-aad.pps.ddc.net,10.222.197.2,255.255.255.248,ab14-host_mini,100,b09-02,aad
ab14-app1-5-aad.pps.ddc.net,10.222.197.3,255.255.255.248,ab14-host_mini,100,b09-02,aad
ab14-cbatch1-1-aad.pps.ddc.net,10.222.197.4,255.255.255.248,ab14-host_mini,100,b09-02,aad
switch4,10.222.197.32/29,255.255.255.248,ab14-host_mini,100,b14-02,aad
vl100--asw1-b14-02-aad.net.ddc.net,10.222.197.33,255.255.255.248,ab14-host_mini,100,b14-02,aad
ab14-app2-2-aad.pps.ddc.net,10.222.197.34,255.255.255.248,ab14-host_mini,100,b14-02,aad
ab14-cbatch2-2-aad.pps.ddc.net,10.222.197.35,255.255.255.248,ab14-host_mini,100,b14-02,aad
switch5,10.222.197.40/29,255.255.255.248,ab14-host_mini,100,c12-02,aad
vl100--asw1-c12-02-aad.net.ddc.net,10.222.197.41,255.255.255.248,ab14-host_mini,100,c12-02,aad
ab14-app1-1-aad.pps.ddc.net,10.222.197.42,255.255.255.248,ab14-host_mini,100,c12-02,aad
ab14-dapp1-1-aad.pps.ddc.net,10.222.197.43,255.255.255.248,ab14-host_mini,100,c12-02,aad
vl112--asw1-a01-01-aad.net.ddc.net,10.222.250.241,255.255.255.248,aad-fdc,112,a01-01,aad
cs97-fdc2-20-aad.pps.ddc.net,10.222.250.242,255.255.255.248,aad-fdc,112,a01-01,aad
cs97-fdc2-22-aad.pps.ddc.net,10.222.250.243,255.255.255.248,aad-fdc,112,a01-01,aad
switch6,10.222.162.32/27,255.255.255.224,aad-fdc,101,a02-01,aad
vl101--asw1-a02-01-aad.net.ddc.net,10.222.162.33,255.255.255.224,aad-fdc,101,a02-01,aad
cs77-fdc2-9-aad.pps.ddc.net,10.222.162.62,255.255.255.224,aad-fdc,101,a02-01,aad
cs92-fdc2-2-aad.pps.ddc.net,10.222.162.34,255.255.255.224,aad-fdc,101,a02-01,aad
cs95-fdc2-2-aad.pps.ddc.net,10.222.162.35,255.255.255.224,aad-fdc,101,a02-01,aad
I have converted this into disctonary using below logic.
with open("pam.csv") as f:
reader = csv.reader(f)
mydict = {rows[0]: rows[1] for rows in reader}
print(mydict)
now i want to fetch the one of the switch name by giving the ip address , i have written the logic to get that but i am getting server name.
lets say example : i am trying to find the switch name when i give input in the below logic 10.222.197.4 but i am getting the server name called ab14-cbatch1-1-aad.pps.ddc.net.
how to fetch switch name for that server IP , so in my case i should get switch3
src = "10.222.197.4"
for key, value in mydict.items():
if src in value:
# key = mydict[src]
print("found")
print(key)
output: found
ab14-cbatch1-1-aad.pps.ddc.net
Just add one more condition to match the switch name pattern:
src = "10.222.197.4"
for key, value in mydict.items():
if src in value and "switch" in key:
# key = mydict[src]
print("found")
print(key)
Same basic idea as #Zircoz, but another approach would be to create a data structure optimized for your search cases in the first place:
import csv
from pprint import pprint
import re
switch_dict = {}
ip_pattern = r"(\d{1,3}\.?){4}"
with open("pam.csv") as file:
reader = csv.reader(file)
for row in reader:
if "switch" in row[0]:
ip_address = re.match(ip_pattern, row[1])[0]
switch_dict[ip_address] = row[0]
pprint(switch_dict)
Output:
{'10.222.162.32': 'switch6',
'10.222.197.0': 'switch3',
'10.222.197.24': 'switch2',
'10.222.197.32': 'switch4',
'10.222.197.40': 'switch5',
'10.222.197.8': 'switch1'}
Here is the code and I am trying to see what I have done wrong. I am new to python functions and linking external files so it would be nice if you could explain your code.
def get_data(filename):
records = []
with open(filename) as readfile:
lines = readfile.readlines()
for line in lines:
# variable line contains:
str_rec = line.split(",")
pname = str_rec[0]
price = int(str_rec[1])
quantity = int(str_rec[2])
records.append([pname, price, quantity])
#caution: indentation
return records
hell= get_data(data.txt)
print(hell)
data.txt is a link to another file that I am trying to pass as an argument.
open(filename) takes the filename as a string, so you should pass the name as a string, not the actual file.
hell= get_data("data.txt")
I'm new to Bonobo library and built a simple flow :
read a simple CSV called input.csv with header : Header1, Header2, Header3, Header4
append a new column which is the concatenation of the others
write the result to a CSV file called output.csv
I'm using the built-in CsvReader and CsvWriter from bonobo to make it simple.
First I was stuck with the CsvReader not sending the headers with cells, and a suggested workaround was adding
#use_raw_input
annotation for the transformation coming right after the CsvReader. But when passing content to the next activity, the bag is once again losing its header and seen as a simple tuple. It does work only if and only if I explicitely name the fields
def process_rows(Header1, Header2, Header3, Header4)
My code is as per below (put a breakpoint in process_rows to see that you get a tuple without the header) :
import bonobo
from bonobo.config import use_raw_input
# region constants
INPUT_PATH = 'input.csv'
OUTPUT_PATH = 'output.csv'
EXPECTED_HEADER = ('Header1', 'Header2', 'Header3', 'Header4')
# endregion constants
#This is stupid because all rows are checked instead of only the first
#use_raw_input #mandatory to get the header
def validate_header(input):
if input._fields != EXPECTED_HEADER:
raise("This file has an unexpected header, won't be processed")
yield input
def process_rows(*input):
concat = ""
for elem in input:
concat += elem
result = input.__add__((concat,))
yield result
# region bonobo + main
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(bonobo.CsvReader(INPUT_PATH, delimiter=','),
validate_header,
process_rows,
bonobo.CsvWriter(OUTPUT_PATH))
return graph
def get_services(**options):
return {}
if __name__ == '__main__':
parser = bonobo.get_argument_parser()
with bonobo.parse_args(parser) as options:
bonobo.run(
get_graph(**options),
services=get_services(**options)
)
# endregion bonobo + main
Thanks for your time and help !
I did some investigations and found this "FUTURE" document that I think is what you are after:
http://docs.bonobo-project.org/en/master/guide/future/transformations.html
But it is not implemented.
I found this similar question Why does Bonobo's CsvReader() method yield tuples and not dicts?
I wish to have to have the first field (Username) from File1 and the second field(Password) output into a third file which is created during the function but I am unable to do it. :(
The format of the files will always be the same which are:
File 1:
Username:DOB:Firstname:Lastname:::
File2:
Lastname:Password
My current code:
def merge(f1,f2,f3):
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
line = line[:-3]
username = line.split(':')
outputFile.write(username[0])
with open(f2) as passwordFile:
for line in passwordFile:
password = line.split(':')
outputFile.write(password[1])
merge('file1.txt', 'file2.txt', 'output.txt')
I want the Username from File1 and the Password from File2 to write to File3 with the layout:
Username:Password
Username:Password
Username:Password
Any help would be appreciated. :)
If the files are identically sorted (i.e. the users appear in the same order in both files), use the tip in this answer to iterate over both files at the same time rather than one after the other in your example.
from itertools import izip
with open(f3, "a") as outputFile:
for line_from_f1, line_from_f2 in izip(open(f1), open(f2)):
username = line_from_f1.split(':')[0]
password = line_from_f1.split(':')[1]
outputfile.write("%s:%s" % (username, password))
If the files are not identically sorted, first create a dictionary with keys lastname and values username from file1. Then create a second dictionary with keys lastname and values password from file2. Then iterate over the keys of either dict and print both values.
This is the minimum change that you would need to do to your code to make it work:
def merge(f1,f2,f3):
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
username = line.split(':')[0]
lastname = line.split(':')[3]
outputFile.write(username)
with open(f2) as passwordFile:
for line in passwordFile:
lN, password = line.split(':')
if lN == lastname: outputFile.write(password[1])
merge('file1.txt', 'file2.txt', 'output.txt')
However, this method isn't very good because it reads a file multiple times. I would go ahead and make a dictionary for the second file, with the lastname as a key. Dictionaries are very helpful in these situations. The dictionary can be made apriori as follows:
def makeDict(f2):
dOut = {}
with open(f2) as f:
for l in f:
dOut[ l.split(':')[0] ] = l.split(':')[1]
return dOut
def merge(f1,f2,f3):
pwd = makeDict(f2)
print pwd
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
if line.strip() == '': continue
username = line.split(':')[0]
lastname = line.split(':')[3]
if lastname in pwd:
outputFile.write(username + ':' + pwd[lastname] + '\n')
merge('f1.txt', 'f2.txt', 'f3.txt' )
I just ran the following program using the files:
f1.txt
Username0:DOB:Firstname:Lastname0:::
Username1:DOB:Firstname:Lastname1:::
Username2:DOB:Firstname:Lastname2:::
Username3:DOB:Firstname:Lastname3:::
f2.txt
Lastname0:Password0
Lastname1:Password1
Lastname2:Password2
Lastname3:Password3
and got the output:
Username0:Password0
Username1:Password1
Username2:Password2
Username3:Password3
I did add the last line merge(...) and another like which would be used to skip blank lines in the input text, but otherwise, everything should be fine. There wont be any output if the merge(... function isn't called.
Abstract the data extraction from the file i/o, then you can re-use merge() with different extraction functions.
import itertools as it
from operator import itemgetter
from contextlib import contextmanager
def extract(foo):
"""Extract username and password, compose and return output string
foo is a tuple or list
returns str
>>> len(foo) == 2
True
"""
username = itemgetter(0)
password = itemgetter(1)
formatstring = '{}:{}\n'
item1, item2 = foo
item1 = item1.strip().split(':')
item2 = item2.strip().split(':')
return formatstring.format(username(item1), password(item2))
#contextmanager
def files_iterator(files):
"""Yields an iterator that produces lines synchronously from each file
Intended to be used with contextlib.contextmanager decorator.
yields an itertools.izip object
files is a list or tuple of file paths - str
"""
files = map(open, files)
try:
yield it.izip(*files)
finally:
for file in files:
file.close()
def merge(in_files,out_file, extract):
"""Create a new file with data extracted from multiple files.
Data is extracted from the same/equivalent line of each file:
i.e. File1Line1, File2Line1, File3Line1
File1Line2, File2Line2, File3Line2
in_files --> list or tuple of str, file paths
out_file --> str, filepath
extract --> function that returns list or tuple of extracted data
returns none
"""
with files_iterator(in_files) as files, open(out_file, 'w') as out:
out.writelines(map(extract, files))
## out.writelines(extract(lines) for lines in files)
merge(['file1.txt', 'file2.txt'], 'file3.txt', extract)
Files_Iterator is a With Statement Context Manager that allows multiple synchronous file iteration and ensures the files will be closed. Here is a good start for reading - Understanding Python's "with" statement
I would recommend building two dictionaries to represent the data in each file, then write File3 based on that structure:
d1 = {}
with open("File1.txt", 'r') as f:
for line in f:
d1[line.split(':')[3]] = line.split(':')[0]
d2 = {}
with open("File2.txt", 'r') as f:
for line in f:
d2[line.split(':')[0]] = line.split(':')[1]
This will give you two dictionaries that look like this:
d1 = {Lastname: Username}
d2 = {Lastname: Password}
To then write this to File 3, simply run through the keys of either dicitonary:
with open("File3.txt", 'w') as f:
for key in d1:
f.write("{}:{}\n".format(d1[key], d2[key]))
Some things to Note:
If the files don't have all the same values, you'll need to throw in some handling for that (let me know if this is the case and I can toss a few ideas your way
This approach does not preserve any order the files were in
The code assumes that all lines are of the same format. A more complicated file will need some code to handle "odd" lines
Its fine to avoid this if you have identically sorted rows in each file. But, if it gets any more complicated than that, then you should be using pandas for this. With pandas, you can essentially do a join, so, no matter how the rows are ordered in each file, this will work. Its also very concise.
import pandas as pd
df1 = pd.read_csv(f1, sep=':', header=None).ix[:,[0,3]]
df1.columns = ['username', 'lastname']
df2 = pd.read_csv(f2, sep=':', header=None)
df2.columns = ['lastname', 'password']
df3 = pd.merge(df1, df2).ix[:,['username','password']]
df3.to_csv(f3, header=False, index=False, sep=':')
Note that you will also have the option to do outer joins. This is useful, if for some reason, there are usernames without passwords or vice versa in your files.
This is pretty close. Be sure no blank line at end of input files, or add code to skip blank lines when you read.
#!/usr/bin/env python
"""
File 1:
Username:DOB:Firstname:Lastname:::
File2:
Lastname:Password
File3:
Username:Password
"""
def merge(f1,f2,f3):
username_lastname = {}
with open(f3, "a") as outputFile:
with open(f1) as usernameFile:
for line in usernameFile:
user = line.strip().split(':')
print user
username_lastname[user[3]] = user[0] # dict with Lastname as key, Username as value
print username_lastname
with open(f2) as passwordFile:
for line in passwordFile:
lastname_password = line.strip().split(':')
print lastname_password
password = lastname_password[1]
username = username_lastname[lastname_password[0]]
print username, password
out_line = "%s:%s\n" % (username, password)
outputFile.write(out_line)
outputFile.close()
merge('f1.txt', 'f2.txt', 'output.txt')
f1:
Username1:DOB:Firstname:Lastname1:::
Username2:DOB:Firstname:Lastname2:::
Username3:DOB:Firstname:Lastname3:::
f2:
Lastname1:Password1
Lastname2:Password2
Lastname3:Password3
f3:
Username1:Password1
Username2:Password2
Username3:Password3
I am making a flashcard program in which I take a text file that contains several columns, such as english word, french equivalent, gender, type of word, etc. My idea was to create a loop that read each line of the text file, separating by tabs, and makes an instance of a user-defined Word object for each line.
In the following block code I import the text file, process it into a list, then attempt to create an instance of a previously defined object: Word. I would like the object to have the second item on the list for it's name so that it is easily searchable, but it's not letting me do this, please can somebody help me with the code:
file = (open('dictionary.txt', 'r')).readline()
import re
line_list = re.split(r'\t', file.rstrip('\n'))
line_list[1] = Word(line_list[0], line_list[1], line_list[2], line_list[3])
Create a dict of instances and use the second item of the lists as key. It's a bad idea to create dynamic variables.
import re
instance_dict = {}
with open('dictionary.txt') as f:
for line in f:
line_list = re.split(r'\t', line.rstrip('\n'))
instance_dict[line_list[1]] = Word(*line_list[:4])
Why the with statement?
It is good practice to use the with keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way.
You can also use the csv module:
import csv
instances = {}
with open('dictionary.txt', 'rb') as f:
reader = csv.reader(f, delimiter='\t')
instances = {line[1]: Word(*line) for line in reader}
Here's a cleaner solution using a namedtuple. You'll end up with a dict called "words" which you use to lookup each by name.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pprint
from collections import namedtuple
Word = namedtuple('Word', ['name', 'french', 'gender', 'type_'])
words = {}
with open('dictionary.txt', 'rU') as fin:
for word in (Word(*r.rstrip('\n').split('\t')) for r in fin):
words[word.name] = word
pprint.pprint(words)
Firstly, it's better to use with, as statements to get input from files, as the closing procedures are automatically taken care of. Secondly, to read ALL of the lines from a file, you must use readlines() rather than readline(). Try something like this :
with open('dictionary.txt','r') as file :
line_list = file.readlines()
splitLineList = []
for lines in line_list :
splitLineList.append(re.split(r'\t',lines.strip('\n'))
You may have an appropriate solution depending on few clarification on your requirements
"My idea was to create a loop that read each line of the text file,
separating by tabs, and"
If the text file is already pre-validated or reliable to ignore error-handling (e.g. not evenly separated by single tabs).
with open('dictionary.txt', 'r') as f:
[line.strip().split("\t")
for line in f.read().split("\n")
if line.strip()]
will get you the (comprehensive) list required to create Word object instances, without using re
"then attempt to create an instance of a previously defined object:
Word."
with open('dictionary.txt', 'r') as f:
[Word(line.strip().split("\t"))
for line in f.read().split("\n")
if line.strip()]
"I would like the object to have the second item on the list for it's
name so that it is easily searchable,"
Can you rewrite this with an example?
but it's not letting me do this,
line_list[1] = Word(line_list[0], line_list[1], line_list[2], line_list[3])
Sorry I am loosing you here, why are using line_list[1] to refer newly created Word instances where line_list[1] itself is an argument ?
With your clarification, I would have something like this
Reworked Code:
from pprint import pprint
My assumption on your Class definition:
class Word():
def __init__(self, **kwargs):
self.set_attrs(**kwargs)
def __call__(self):
return self.get_attr("swedish_word")
def set_attrs(self, **kwargs):
for k, v in kwargs.iteritems():
setattr(self, k, v)
def get_attr(self, attr):
return getattr(self, attr)
def get_attrs(self):
return ({attr.upper():getattr(self, attr) for attr in self.__dict__.keys()})
def print_attrs(self):
pprint(self.get_attrs())
if __name__ == '__main__':
# sample entries in dictionary.txt
# swedish_word english_word article word_type
# hund dog ett noun
# katt cat ett noun
# sova sleep ett verb
with open('dictionary.txt', 'r') as f:
header = f.readline().strip().split("\t")
instances = [Word(**dict(zip(header, line.strip().split("\t"))))
for line in f.read().split("\n")
if line.strip()]
# for line in f.read().split("\n"):
# data = dict(zip(header, line.strip().split("\t")))
# w = Word(**data)
You can get instance properties for a given swedish_word like this
def print_swedish_word_properties(swedish_word):
for instance in instances:
if instance() == swedish_word:
print "Properties for Swedish Word:", swedish_word
instance.print_attrs()
print_swedish_word_properties("hund")
to have output like this
Properties for Swedish Word: hund
{'ARTICLE': 'ett',
'ENGLISH_WORD': 'dog',
'SWEDISH_WORD': 'hund',
'WORD_TYPE': 'noun'}
or you can use any other class methods to search instances on various attributes