How do I save specific text in an array from .txt file - python

I have a .txt file, where I want to save only following characters "N", "1.1" ,"XY", "N", "2.3" ,"xz" in an array.
The .txt file looks like this:
[ TITLE
N 1.1 XY
N 2.3 XZ
]
Here is my code:
src = open("In.txt", "r")
def findOp (row):
trig = False
temp = ["", "", ""]
i = 1
n = 0
for char in row:
i += 1
if (char != '\t') & (char != ' ') & (char != '\n'):
trig = True
temp[n] += char
else:
if trig:
n += 1
trig = False
return temp
for line in src.readlines():
print(findOp(line))
The Output from my code is:
['[', 'TITLE', '']
['', '', '']
['N', '1.1', 'XY']
['N', '2.3', 'XZ']
['', '', '']
[']', '', '']
The problem is the program also saves whitespace characters in an array which i dont want.

I would recommend the trim()-function with witch one you can remove whitespace from a string
Whitespace on both sides:
s = s.strip()
Whitespace on the right side:
s = s.rstrip()
Whitespace on the left side:
s = s.lstrip()

You could check the return array before exiting:
def findOp(row):
trig = False
temp = ["", "", ""]
i = 1
n = 0
for char in row:
i += 1
if (char != '\t') & (char != ' ') & (char != '\n'):
trig = True
temp[n] += char
else:
if trig:
n += 1
trig = False
# Will return `temp` if all elements eval to True otherwise
# it will return None
return temp if all(temp) else None
The value None can then be used as a check condition in subsequent constructs:
for line in src.readlines():
out = findOp(line)
if out:
print(out)
>> ['N', '1.1', 'XY']
>> ['N', '2.3', 'XZ']

Try numpy.genfromtxt:
import numpy as np
text_arr = np.genfromtxt('In.txt', skip_header = 1, skip_footer = 1, dtype = str)
print(text_arr)
Output:
[['N' '1.1' 'XY']
['N' '2.3' 'XZ']]
Or if you want list, add text_arr.tolist()

Try this :
with open('In.txt', 'r') as f:
lines = [i.strip() for i in f.readlines() if i.strip()][1:-1]
output = [[word for word in line.split() if word] for line in lines]
Output :
[['N', '1.1', 'XY'], ['N', '2.3', 'XZ']]

Related

How to extract a specific line out of a text file

I have the code from the attached picture in a .can-file, which is in this case a text file. The task is to open the file and extract the content of the void function. In this case it would be "$LIN::Kl_15 = 1;"
This is what I already got:
Masterfunktionsliste = open("C:/.../Masterfunktionsliste_Beispiel.can", "r")
Funktionen = []
Funktionen = Masterfunktionsliste.read()
Funktionen = Funktionen.split('\n')
print(Funktionen)
I receive the following list:
['', '', 'void KL15 ein', '{', '\t$LIN::Kl_15 = 1;', '}', '', 'void Motor ein', '{', '\t$LIN::Motor = 1;', '}', '', '']
And now i want to extract the $LIN::Kl_15 = 1; and the $LIN::Motor = 1; line into variables.
Use the { and } lines to decide what lines to extract:
scope_depth = 0
line_stack = list(reversed(Funktionen))
body_lines = []
while len(line_stack) > 0:
next = line_stack.pop()
if next == '{':
scope_depth = scope_depth + 1
elif next == '}':
scope_depth = scope_depth - 1
else:
# test that we're inside at lest one level of {...} nesting
if scope_depth > 0:
body_lines.append(next)
body_lines should now have values ['$LIN::Kl_15 = 1;', '$LIN::Motor = 1;']
You can loop through the list, search for your variables and save it as dict:
can_file_content = ['', '', 'void KL15 ein', '{', '\t$LIN::Kl_15 = 1;', '}', '', 'void Motor ein', '{', '\t$LIN::Motor = 1;', '}', '', '']
extracted = {}
for line in can_file_content:
if "$LIN" in line: # extract the relevant line
parsed_line = line.replace(";", "").replace("\t", "") # remove ";" and "\t"
variable, value = parsed_line.split("=") # split on "="
extracted[variable.strip()] = value.strip() # remove whitespaces
output is {'$LIN::Kl_15': '1', '$LIN::Motor': '1'}
now you can access your new variables with extracted['$LIN::Motor'] which is 1

Is there any method to count different items in the text file for every matched string and store in dataframe?

The text file looks like
data/File_10265.data:
Apple:2kg
Apple:3kg
Banana:1kg
Banana:4kg
Some string1
data/File_10276.data:
Apple:6kg
Apple:5kg
Apple:3kg
Banana:2kg
Banana:4kg
Banana:2kg
Banana:4kg
Extra line
data/File_10278.data:
Apple:3kg
Banana:2kg
Banana:4kg
Banana:2kg
Banana:7kg
Some words
The code is as follows:
import re
import pandas as pd
f = open("Samplefruit.txt", "r")
lines = f.readlines()
Apple_count=0
Banana_count=0
File_count=0
Filename_list=[]
Apple_list=[]
Banana_list=[]
for line in lines:
match1=re.findall('data/(?P<File>[^\/]+(?=\..*data))',line)
if match1:
Filename_list.append(match1[0])
print('Match found:',match1)
if line.startswith("Apple"):
Apple_count+=1
elif line.startswith("Banana"):
Banana_count+=1
Apple_list.append(Apple_count)
Banana_list.append(Banana_count)
df=pd.DataFrame({'Filename': Filename_list,'Apple':
Apple_list,'Banana':
Banana_list})
The desired output:
Filename: |Apple |Banana
File_10265|2 |2
File_10276|3 |4
File_10278|1 |4
Maybe there is a more efficient way to do this but here's one solution:
with open('filetest.txt') as f:
lines = f.readlines()
unique_lines = list(dict.fromkeys(lines))
for line in unique_lines:
print(line + str(lines.count(line)))
f1 = open('file.txt', 'a')
f1.write(line + str(lines.count(line)))
f1.close()
You simply open the file, read all lines into a list, then get rid of any duplicates. Then you loop through the list (now with the duplicates removed), and use the .count (docs) function to get the number of occurrences of each unique item in the list.
Try this,
pattern = re.compile(r"data/File_[\d]+.data:")
lines = text.split("\n")
files = itertools.groupby(lines, lambda line:pattern.search(line) == None)
for k, content in files:
if k == True:
content = list(content)
all_words = list(set(content))
counts = {word:content.count(word) for word in all_words if word != ""}
print(counts)
Output -
{'Banana:': 2, 'Apple:': 2}
{'Banana:': 4, 'Apple:': 3}
{'Banana:': 4, 'Apple:': 1}
NOTE: New changes have been made to the code as per the changes in the question.
Try this:
import re
text = {}
def unit_cal(val1, val2): #function to add quantities with units and return the final answer with units
q1 = re.findall("[0-9]+", val1)
unit = re.findall("[a-zA-Z]+", val1)
if (val2 != False):
q2 = re.findall("[0-9]+", val2)
ans = int(q1[0]) + int(q2[0])
else:
ans = int(q1[0])
return str(ans) + unit[0] #remove + unit[0] to return only the value
with open("item.txt", "r") as f1:
for line in f1:
if ("data" in line):
temp_key = line
k = {}
text[temp_key] = k
elif (line.strip() != ""):
temp_word = line.strip().split(":")
if temp_word[0] in text[temp_key]:
text[temp_key][temp_word[0]] = unit_cal(temp_word[1], text[temp_key][temp_word[0]])
else:
text[temp_key][temp_word[0]] = unit_cal(temp_word[1], False)
final_text = ""
for main_key in text:
final_text += main_key + "\n"
for sub_key in text[main_key]:
final_text += sub_key + " : " + str(text[main_key][sub_key]) + "\n\n"
print(final_text) #final output is displayed in the idle
with open("new_items.txt", "w") as f2:
f2.write(final_text) #the output is also written to a new file
Output:
data/File_10265.data:
Apple : 5kg
Banana : 5kg
data/File_10276.data:
Apple : 14kg
Banana : 12kg
data/File_10278.data:
Apple : 3kg
Banana : 15kg
Here, I have posted an answer. Thanks, #Mani,#CarySwoveland, #Zero, and #M B for your support. The code is as follows:
import pandas as pd
text = {}
with open(r"Samplefruit.txt", "r") as file:
for line in file:
if "data" in line:
Filename=line.split('/')[-1].split('.')[0]
Apple_count=0
Banana_count=0
print('----------------')
print(Filename)
elif ("Apple" in line or "Banana" in line):
if line.startswith("Apple"):
Apple_count+=1
elif line.startswith("Banana"):
Banana_count+=1
print('Apple:',Apple_count)
print('Banana:',Banana_count)
text[Filename] = {'Apple':Apple_count,'Banana':Banana_count}
File_list.append(Filename)
df = pd.DataFrame(
{"Filename": text.keys(), "Apple": [x['Apple'] for x in text.values()],"Banana": [x['Banana'] for x in text.values()]}
)
print(df)

Search same string in line of file

In a file I have several lines with this structure:
> Present one time: "Instance: ...Edition: ..."
> Present two times: "Instance: ...Edition: ...Instance: ...Edition: ..."
> Present n times: "Instance: ...Edition: ... [n] Instance: ...Edition: ..."
This structure can appear once per line or several times on the same line. The idea is to read the file, line by line, isolate the values represented by ... and write them in an excel file. I can do it but I'm only able to isolate the values if the structure above is present one time in the line. If the structure is present more than once on the line, I can only save the values ​​of the first structure.
This is my code:
#READ FILE
for i in fin:
if "Instance:" in i:
instance = ((i.split('Instance:'))[1].split('Edition')[0])
worksheet.write(row, col, instance)
if "Edition:" in i:
edition = ((i.split('Edition:'))[1].split('\n')[0])
worksheet.write(row, col, edition)
row += 1
Any idea how I could solve this problem?
Note that this only works if your input ends in an empty line (called newline).
If it does not you can add it like so: s += '\n'
s = '''Instance: A Edition: Limited
Instance: B Edition: Common Instance: C Edition: 2020 Instance: D Edition: Bla
'''
result = []
start_in = start_ed = None
for i in range(len(s)):
# Reaching the end of a data item
if s[i:i+9] == 'Instance:' or s[i] == '\n':
if start_in and start_ed:
result.append(
(s[start_in:start_ed-8].strip(), s[start_ed:i].strip())
)
start_in = start_ed = None
if s[i:i+9] == 'Instance:':
start_in = i+9
if s[i:i+8] == 'Edition:':
start_ed = i+8
print(result)
[('A', 'Limited'), ('B', 'Common'), ('C', '2020'), ('D', 'Bla')]
Edit: With Version field as requested
s = '''Instance: A Edition: Limited Version: 1
Instance: B Edition: Common Version: 2 Instance: C Edition: 2020 Version: 3 Instance: D Edition: Bla Version: 4
'''
result = []
start_in = start_ed = start_vs = None
for i in range(len(s)):
# Reaching the end of a data item
if s[i:i+9] == 'Instance:' or s[i] == '\n':
if start_in and start_ed and start_vs:
result.append((
s[start_in:start_ed-8].strip(),
s[start_ed:start_vs-8].strip(),
s[start_vs:i].strip()
))
start_in = start_ed = start_vs = None
if s[i:i+9] == 'Instance:':
start_in = i+9
if s[i:i+8] == 'Edition:':
start_ed = i+8
if s[i:i+8] == 'Version:':
start_vs = i+8
print(result)
Alternative solution using a regular expression. This is shorter but maybe harder to read and maintain:
import re
r = re.findall(r'Instance:([\w|\s]+?)Edition:([\w|\s]+?)(?=Instance|\n)', s)
[(' A ', ' Limited'), (' B ', ' Common '), (' C ', ' 2020 '), (' D ', ' Bla')]
If you don't want spaces around your matches you can either apply a strip to all elements like I did in my other solution, or you can modify the regex to read Instance: ([\w...

Output only specific items in a txt file according to strings from another list Python

I have a list of Strings:
myStrings = [Account,Type, myID]
I also have a txt file with numbers associated with these Strings, e.g:
[90: 'Account', 5: 'Type', 6: 'MyID', 8: 'TransactionNum', 9: 'Time']
How can I print only the numbers and strings in the txt file that are in myStrings. For example, since 'Time' is not in myStrings, I do not want to print it. I also would like to make this txt file a list
This should help u:
myStrings = ['Account','Type', 'myID']
f = open("D:\\Test.txt","r")
txt = f.read()
f.close()
txt = txt.replace('\n',' ')
txt = txt.split(',')
txtlst = []
for x in txt:
txtlst.append(x.split(':'))
numslst = [int(txtlst[i][0]) for i in range(len(txtlst))]
strlst = []
for i in txtlst:
for j in i:
try:
int(j)
except ValueError:
strlst.append(j.replace("'",""))
for x in range(len(strlst)):
strlst[x] = strlst[x].replace(' ','')
for x in range(len(strlst)):
if strlst[x] in myStrings:
print(numslst[x])
print(strlst[x])
Output:
90
Account
5
Type
After you have said that the file does not have [ & ] , could make it work like so :
import json
myStrings = ['Account','Type', 'myID']
with open('text-file.txt') as filename:
file_text = filename.read()
file_text_list = file_text.split(',')
file_text_dict = {}
for item in file_text_list:
k, v = item.split()
v = v.replace("'", "")
k = k.replace(":", "")
if v in myStrings:
file_text_dict[k] = v
print(file_text_dict) # output => {'90': 'Account', '5': 'Type'}
print(list(file_text_dict.values())) # output => ['Account', 'Type']

A way to check next letter in Python after the current letter is detected?

May 1 00:00:00 date=2018-04-30 time=23:59:59 dev=A devid=1234 msg="test 1"
May 1 00:00:00 date=2018-04-31 time=00:00:01 dev=A devid=1234 msg="test 2"
Above is a sample of a log file that I am trying to convert into csv by checking letter by letter for = and save as a column value in a row.
I managed to capture columnValue if the value after the = is not a string.
Below is a part of the code that extracts the value. There is a part of the line where after =, there is a string with spaces in between. This broke the extract to start a new find. Is it possible to check the next letter for "\"" and then start saving letter by letter until the next "\"" so that I can save the Column Value as a string?
I'm using python 2.7
def outputCSV(log_file_path, outputCSVName, colValueSet):
data = []
f = open(log_file_path, "r")
values = set() # create empty set for all column values
content = f.readlines()
content = [x.strip() for x in content] #List of lines to iterate through
colValueSet.add("postingDate")
for line in content:
new_dict = dict.fromkeys(colValueSet, "")
new_dict["postingDate"]= line[0:16]
findingColHeader = True # we have to find the columns first
findingColValue = False # After column found, starting finding values
col_value = "" # Empty at first
value = "" # Empty value at first
start = False
for letter in line:
if findingColHeader:
if letter == " ":
# space means start taking in new value
# data is in this structure with space prior to column names -> " column=value"
start = True
col_value = ""
elif letter == "=":
findingColValue = True
start = False
findingColHeader = False
elif start:
col_value += letter
elif findingColValue:
if letter == " ":
new_dict[col_value] = value
value = ""
col_value = ""
findingColHeader = True
start = True
findingColValue = False
else:
value += letter
data += [new_dict]
with open(outputCSVName, 'wb') as csvfile:
fieldnames = list(colValueSet)
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in data:
writer.writerow(row)
print("Writing Complete")
# findColumnValues(a) would calculate all column value from the file path
outputCSV("ttest.log", "MyProcessedLog.csv", findColumnValues("test.log"))
you may try something like this:
>>> a = 'May 1 00:00:00 date=2018-04-30 time=23:59:59 dev=A devid=1234 msg="test 1" '
>>> a.split('=')
['May 1 00:00:00 date', '2018-04-30 time', '23:59:59 dev', 'A devid', '1234 msg', '"test 1" ']
>>> parts = a.split('=')
>>> b = []
>>> for i,j in zip(parts, parts[1:]) :
... b.append( (i[i.rfind(' ')+1:], j[:j.rfind(' ')]) )
...
>>> b
[('date', '2018-04-30'), ('time', '23:59:59'), ('dev', 'A'), ('devid', '1234'), ('msg', '"test 1"')]
>>>
I could make a cute one-liner, but I think this way it's easier to understand for you, when you see all intermediate results and can grasp the main idea -- split the line at = signs, use the last word as a keyword and the rest as the value.
You could use re module for Python (it's good to know it for any advanced text processing):
data = '''May 1 00:00:00 date=2018-04-30 time=23:59:59 dev=A devid=1234 msg="test 1"
May 1 00:00:00 date=2018-04-31 time=00:00:01 dev=A devid=1234 msg="test 2"'''
import re
for line in data.split('\n'):
print(re.findall(r'([^\s]+)=([^\s"]+|"[^"]+")', line))
Outputs:
[('date', '2018-04-30'), ('time', '23:59:59'), ('dev', 'A'), ('devid', '1234'), ('msg', '"test 1"')]
[('date', '2018-04-31'), ('time', '00:00:01'), ('dev', 'A'), ('devid', '1234'), ('msg', '"test 2"')]
The explanation of this regular pattern can be found here.

Categories

Resources