I just want to check if there is any better way of doing this rather than using what i came up with.
The thing is that i need to parse a .py file, more precisely i have to look for a specific list named id_list that contains several int numbers. Numbers can be written in several formats.
For example:
id_list = [123456, 789123, 456789]
id_list = [ 123456,
789123,
456789 ]
id_list = [ 123456
,789123
,456789 ]
What i came up with works just fine, but for the sake of perfectionism i want to know if there is "smoother" way of doing so.
with open(filepath, 'rb') as input_file:
parsed_string = ''
start_flag = False
start_parsing = False
for line in input_file:
if 'id_list' in line:
id_detected = True
if id_detected:
for char in line:
if char == '[':
start_parsing = True
if start_parsing and char != '\n':
parsed_string += char
if char == ']':
id_detected = False
start_parsing = False
break
After that has been done im just filtering parsed_string:
new_string = "".join(filter(lambda char: char.isdigit() or char == ',', parsed_string))
Which gets me string containing numbers and commas: 123456,789123,456789
So to wrap this up, is there anything that i could improve?
You can use a regular expression to solve:
import re
with open(filepath, 'rb') as input_file:
text = input_file.read()
match = re.search(r'id_list\s*=\s*\[(.*?)\]', text, flags=re.DOTALL)
if match is None:
print "Not found"
else:
id_list_str = match.group(1)
id_list = map(int, id_list_str.split(','))
print id_list
just use import and from
If you don't want to import the whole python files just import the elements you need
example
from filename.py import id_list
Related
I am working with files right now and i want to get text from a bracket this is what i mean by getting text from a brackets...
{
this is text for a
this is text for a
this is text for a
this is text for a
}
[
this is text for b
this is text for b
this is text for b
this is text for b
]
The content in a is this is text for a and the content for b is is text for b
my code seems to not be printing the contents in a properly it show a&b my file.
My code:
with open('file.txt','r') as read_obj:
for line in read_obj.readlines():
var = line[line.find("{")+1:line.rfind("}")]
print(var)
iterate over the file
for each line check the first character
if the first character is either '[' or '{' start accumulating lines
if the first character is either ']' or '}' stop accumulating lines
a_s = []
b_s = []
capture = False
group = None
with open(path) as f:
for line in f:
if capture: group.append(line)
if line[0] in '{[':
capture = True
group = a_s if line[0] == '{' else b_s
elif line[0] in '}]':
capture = False
group = None
print(a_s)
print(b_s)
Relies on the file to be structured exactly as shown in the example.
This is what regular expressions are made for. Python has a built-in module named re to perform regular expression queries.
In your case, simply:
import re
fname = "foo.txt"
# Read data into a single string
with open(fname, "r") as f:
data = f.read()
# Remove newline characters from the string
data = re.sub(r"\n", "", data)
# Define pattern for type A
pattern_a = r"\{(.*?)\}"
# Print text of type A
print(re.findall(pattern_a, data))
# Define pattern for type B
pattern_b = r"\[(.*?)\]"
# Print text of type B
print(re.findall(pattern_b, data))
Output:
['this is text for athis is text for athis is text for athis is text for a']
['this is text for bthis is text for bthis is text for bthis is text for b']
Read the file and split the content to a list.
Define a brackets list and exclude them through a loop and write the rest to a file.
file_obj = open("content.txt", 'r')
content_list = file_obj.read().splitlines()
brackets = ['[', ']', '{', '}']
for i in content_list:
if i not in brackets:
writer = open("new_content.txt", 'a')
writer.write(i+ '\n')
writer.close()
f1=open('D:\\Tests 1\\t1.txt','r')
for line in f1.readlines():
flag=0
if line.find('{\n') or line.find('[\n'):
flag=1
elif line.find('}\n') or line.find(']\n'):
flag=0
if flag==1:
print(line.split('\n')[0])
Trying to get my program to split lines into 3 rows from a file and then apply a "if row1 == x:" to add to an existing class. Now thats not my problem, ive gotten it to work, except for when row1 is ''. So i tried changing the input file so it was ' ', then '*', and 'k' (and so on), nothing worked.
Thing is that most lines in the input file reads: 1234565,'streetadress1','streetadress2' but for some lines there are no streetadress1 only ''. but the program has no problem identifying the number or 'streetadress2'.
class adress(object):
def __init__(self,street,ykord,xkord):
self.street = street
self.ykord = ykord
self.xkord = xkord
self.connected = []
self.anlid = []
self.distances = []
self.parent = []
self.child =[]
def set_connections(self):
input_file = open("kopplingar2.txt")
temp = input_file.read().splitlines()
for l in temp:
row = l.split(',')
identity = row[0]
streetA = row[1]
streetB = row[2]
if streetA == self.street:
diction = {'street':streetB, 'identity':identity}
self.child.append(diction)
elif streetA == '':
self.anlid.append(identity)
print 'poop!'
elif streetB == self.street and streetA != '':
diction = {'street':streetA, 'identity':identity}
self.parent.append(diction)
print streetA
The 'print poop' is just to see if it ever occur, but it doesnt. It should be about 400 lines of poop as a result since about 75% of the lines in the inputfile contain ''.
I have no idea why its working for the other rows but not for row1 (except that it sometimes is '' instead of a full string).
'' is an empty string in Python. If you need to compare a value with a string consisting of two apostrophe characters, you need to write streetA = "''".
as #yole said you need to compare with "''", if for example one the line in the file is 123,'','streetB' then l would be "123,'','streetB'" the what you get is
>>> l="123,'','streetB'"
>>> l.split(',')
['123', "''", "'streetB'"]
>>>
I'm using Python 2.7.6 and learning to use the csv module. My script somehow added unexpected double quotes and several spaces after my last element, when I parsed input file to the CSV output file on my last element in each line. I could not remove those double quotes using regular substitution.
The only way that I could remove the extra double quote is to use:
tmp[1] = tmp[1][:-3]
I don't understand how the extra double quote is added when I parsed my input. Please let me know why or how those double quote were added to the partno when the input file did not have them.
My code:
import re
import csv
fname = "inputfile"
try:
fhand = open(fname)
except:
print 'File cannot be opened:', fname
exit()
domain = []
server = []
model = []
serial = []
dn = []
memsize = []
vid = []
partno = []
csv_out = open('/tmp/out.csv','wb')
writer = csv.writer(csv_out)
for line in fhand:
words = line.split("; ")
tmp_row_list = []
for number in [0,1,2,3,4,5,6,7]:
tmp=words[number].split("=")
if "}" in tmp[1]:
tmp[1] = tmp[1][:-3]
#tmp[1] = re.sub('}','', tmp[1])
if number == 0: domain.append(tmp[1])
if number == 1: server.append(tmp[1])
if number == 2: model.append(tmp[1])
if number == 3: serial.append(tmp[1])
if number == 4: dn.append(tmp[1])
if number == 5: memsize.append(tmp[1])
if number == 6: vid.append(tmp[1])
if number == 7: partno.append(tmp[1])
rows = zip(domain,server,model,serial,dn,memsize,vid,partno)
writer.writerows(rows)
csv_out.close()
Input file:
ffile:#{Ucs=uname; ServerId=4/6; Model=UCSB-B200-M3; Serial=FCH; AssignedToDn=; TotalMemory=98304; Vid=V06; PartNumber=73-14689-04}
ffile:#{Ucs=uname; ServerId=4/7; Model=UCSB-B200-M3; Serial=FCH; AssignedToDn=; TotalMemory=98304; Vid=V06; PartNumber=73-14689-04}
My bad output that has the strange double quotes before I had to remove them (if unrem the line with the re.sub, bad output with double quotes and extra spaces will show up in the last field/element):
uname,4/6,UCSB-B200-M3,FCH,,98304,V06,"73-14689-04
"
uname,4/7,UCSB-B200-M3,FCH,,98304,V06,"73-14689-04
"
Looks like you can simplify that lot.
Given input.txt as:
ffile:#{Ucs=uname; ServerId=4/6; Model=UCSB-B200-M3; Serial=FCH; AssignedToDn=; TotalMemory=98304; Vid=V06; PartNumber=73-14689-04}
ffile:#{Ucs=uname; ServerId=4/7; Model=UCSB-B200-M3; Serial=FCH; AssignedToDn=; TotalMemory=98304; Vid=V06; PartNumber=73-14689-04}
Then using the following:
import re, csv
get_col_vals = re.compile('(?:\w+)=(.*?)[;}]').findall
with open('input.txt') as fin, open('output.csv', 'wb') as fout:
csvout = csv.writer(fout, quoting=csv.QUOTE_NONE)
csvout.writerows(get_col_vals(row) for row in fin)
The resulting output.csv is:
uname,4/6,UCSB-B200-M3,FCH,,98304,V06,73-14689-04
uname,4/7,UCSB-B200-M3,FCH,,98304,V06,73-14689-04
I have a function where conversion is a list and filename is the name of the input file. I want to write 5 characters in a line then add a new line and add 5 characters to that line and add a new line until there is nothing to write from the list conversion to the file filename. How can I do that?
From what I think I understand about your question, you may need code that looks like this:
import os
def write_data_file(your_collection, data_file_path):
"""Writes data file from memory to previously specified path."""
the_string = ''
for row in your_collection:
for i, c in enumerate(row):
if i % 5 < 4:
the_string += c
else:
the_string += os.linesep + c
with open(data_file_path, 'w') as df:
df.write(the_string)
my_collection = [
"This is more than five characters",
"This is definately more than five characters",
"NOT",
"NADA"
]
write_data_file(my_collection, '/your/file/path.txt')
Other than that, you may need to clarify what you are asking. This code snippet loops through a collection (like a list) then loops over the assumed string contained at that row in the collection. It adds the new line whenever the 5 character limit has been reached.
def foo(conversion, filename):
with open(filename, "a") as f:
line = ""
for s in conversion:
for c in s:
if len(line) < 5:
line += c
else:
f.write(line + "\n")
line = c
f.write(line)
Convert the list of strings into one large string, then loop through that string five characters at a time and insert "\n" after each iteration of the loop. Then write that to a file. Here's a basic example of what I mean, might need some tweaking but it'll give you the idea:
# concatenate all the strings for simplicity
big_string = ""
for single_string in conversion:
big_string += single_string
# loop through using 5 characters at a time
string_with_newlines = ""
done = False
while not done:
next_five_chars = big_string[:5]
big_string = big_string[5:]
if next_five_chars:
string_with_newlines += next_five_chars + "\n"
else:
done = True
# write it all to file
with open(filename, "w") as your_file:
your_file.write(string_with_newlines)
l = ["one", "two", "three", "four", "five"]
def write(conversion, filename):
string = "".join(conversion)
f = open(filename, "a")
for i in range(5, len(string) + 5, 5):
f.write(string[i-5:i])
f.write("\n")
if __name__ == '__main__':
write(l, "test.txt")
This create a file called "test.txt" with the content:
onetw
othre
efour
onetw
othre
efour
five
I made a function for grouping your list:
import itertools
def group(iterable, n):
gen = (char for substr in iterable for char in substr)
while True:
part = ''.join(itertools.islice(gen, 0, n))
if not part:
break
yield part
Presentation:
>>> l = ['DBTsiQoECGPPo', 'd', 'aDAuehlM', 'FbUnSuMLuEbHe', 'jRvARVZMn', 'SbGCi'
, 'jhI', 'Rpbd', 'uspffRvPiAmbQEoZDFAG', 'RIbHAcbREdqpMDX', 'bqVMrN', 'FtU', 'nu
fWcfjfmAaUtYtwNUBc', 'oZvk', 'EaytqdRkICuxqbPaPulCZlD', 'dVrZdidLeakPT', 'qttRfH
eJJMOlJRMKBM', 'SAiBrdPblHtRGpjpZKuFLGza', 'RxrLgclVavoCmPkhR', 'YuulTYaNTLghUkK
riOicMuUD']
>>> list(group(l, 5))
['DBTsi', 'QoECG', 'PPoda', 'DAueh', 'lMFbU', 'nSuML', 'uEbHe', 'jRvAR', 'VZMnS'
, 'bGCij', 'hIRpb', 'duspf', 'fRvPi', 'AmbQE', 'oZDFA', 'GRIbH', 'AcbRE', 'dqpMD
', 'XbqVM', 'rNFtU', 'nufWc', 'fjfmA', 'aUtYt', 'wNUBc', 'oZvkE', 'aytqd', 'RkIC
u', 'xqbPa', 'PulCZ', 'lDdVr', 'ZdidL', 'eakPT', 'qttRf', 'HeJJM', 'OlJRM', 'KBM
SA', 'iBrdP', 'blHtR', 'GpjpZ', 'KuFLG', 'zaRxr', 'LgclV', 'avoCm', 'PkhRY', 'uu
lTY', 'aNTLg', 'hUkKr', 'iOicM', 'uUD']
>>> '\n'.join(group(l, 5))
'DBTsi\nQoECG\nPPoda\nDAueh\nlMFbU\nnSuML\nuEbHe\njRvAR\nVZMnS\nbGCij\nhIRpb\ndu
spf\nfRvPi\nAmbQE\noZDFA\nGRIbH\nAcbRE\ndqpMD\nXbqVM\nrNFtU\nnufWc\nfjfmA\naUtYt
\nwNUBc\noZvkE\naytqd\nRkICu\nxqbPa\nPulCZ\nlDdVr\nZdidL\neakPT\nqttRf\nHeJJM\nO
lJRM\nKBMSA\niBrdP\nblHtR\nGpjpZ\nKuFLG\nzaRxr\nLgclV\navoCm\nPkhRY\nuulTY\naNTL
g\nhUkKr\niOicM\nuUD'
Write the result of '\n'.join(group(l, 5)) to a file.
I have a file that looks like this
!--------------------------------------------------------------------------DISK
[DISK]
DIRECTION = 'OK'
TYPE = 'normal'
!------------------------------------------------------------------------CAPACITY
[CAPACITY]
code = 0
ID = 110
I want to read sections [DISK] and [CAPACITY].. there will be more sections like these. I want to read the parameters defined under those sections.
I wrote a following code:
file_open = open(myFile,"r")
all_lines = file_open.readlines()
count = len(all_lines)
file_open.close()
my_data = {}
section = None
data = ""
for line in all_lines:
line = line.strip() #remove whitespace
line = line.replace(" ", "")
if len(line) != 0: # remove white spaces between data
if line[0] == "[":
section = line.strip()[1:]
data = ""
if line[0] !="[":
data += line + ","
my_data[section] = [bit for bit in data.split(",") if bit != ""]
print my_data
key = my_data.keys()
print key
Unfortunately I am unable to get those sections and the data under that. Any ideas on this would be helpful.
As others already pointed out, you should be able to use the ConfigParser module.
Nonetheless, if you want to implement the reading/parsing yourself, you should split it up into two parts.
Part 1 would be the parsing at file level: splitting the file up into blocks (in your example you have two blocks: DISK and CAPACITY).
Part 2 would be parsing the blocks itself to get the values.
You know you can ignore the lines starting with !, so let's skip those:
with open('myfile.txt', 'r') as f:
content = [l for l in f.readlines() if not l.startswith('!')]
Next, read the lines into blocks:
def partition_by(l, f):
t = []
for e in l:
if f(e):
if t: yield t
t = []
t.append(e)
yield t
blocks = partition_by(content, lambda l: l.startswith('['))
and finally read in the values for each block:
def parse_block(block):
gen = iter(block)
block_name = next(gen).strip()[1:-1]
splitted = [e.split('=') for e in gen]
values = {t[0].strip(): t[1].strip() for t in splitted if len(t) == 2}
return block_name, values
result = [parse_block(b) for b in blocks]
That's it. Let's have a look at the result:
for section, values in result:
print section, ':'
for k, v in values.items():
print '\t', k, '=', v
output:
DISK :
DIRECTION = 'OK'
TYPE = 'normal'
CAPACITY :
code = 0
ID = 110
Are you able to make a small change to the text file? If you can make it look like this (only changed the comment character):
#--------------------------------------------------------------------------DISK
[DISK]
DIRECTION = 'OK'
TYPE = 'normal'
#------------------------------------------------------------------------CAPACITY
[CAPACITY]
code = 0
ID = 110
Then parsing it is trivial:
from ConfigParser import SafeConfigParser
parser = SafeConfigParser()
parser.read('filename')
And getting data looks like this:
(Pdb) parser
<ConfigParser.SafeConfigParser instance at 0x100468dd0>
(Pdb) parser.get('DISK', 'DIRECTION')
"'OK'"
Edit based on comments:
If you're using <= 2.7, then you're a little SOL.. The only way really would be to subclass ConfigParser and implement a custom _read method. Really, you'd just have to copy/paste everything in Lib/ConfigParser.py and edit the values in line 477 (2.7.3):
if line.strip() == '' or line[0] in '#;': # add new comment characters in the string
However, if you're running 3'ish (not sure what version it was introduced in offhand, I'm running 3.4(dev)), you may be in luck: ConfigParser added the comment_prefixes __init__ param to allow you to customize your prefix:
parser = ConfigParser(comment_prefixes=('#', ';', '!'))
If the file is not big, you can load it and use Regexes to find parts that are of interest to you.