I am learning python and also English, I am doing this code to read a TXT and find in it a sequence of numbers, then rename the file with the sequence found. But besides looking for this sequence of numbers, I needed to find some set of words, for example, if I find the words Apple, Watermelon and Pineapple, and not find Pumpkin, classifies TXT as "fruits", and when renaming the file renames with the sequence of digits plus an "f" of fruit for example:
name_files2 = os.listdir(path_txt)
for TXT in name_files2:
with open(path_txt + '\\' + TXT, "r") as content:
search = re.search(r'(\d{5})\-(\d{2})\-(\d{4})\.(\d)\.(\d{2})\.(\d{4})|'
r'(\d{5})\s*\-\s*(\d{2})\s*\.\s*(\d{4})\.(\d)\.(\d{2})\.(\d{4})|'
r'(\d{7})\-(\d{2})\-(\d{4})\.(\d)\.(\d{4})', content.read())
if search is not None:
name2 = search.group(0)
name2 = re.sub(r"\D", "", name2)
fp = os.path.join("18_digitos", name2 + "_%d.txt")
postfix = 0
while os.path.exists(fp % postfix):
postfix += 1
os.rename(
os.path.join(path_txt, TXT),
fp % postfix
)
I can find the words in this way in the text, but I can not do both at the same time
if text_complete.find("apple") >= 0 and text_complete.find("watermelon") >= 0 and \
text_complete.find("pineapple") >= 0 and text_complete.find("pumpkin") < 0:
print("Find Fruit")
I need basically to make two codes work together, I need them to find the 18-digit sequence, identify the keywords and classify as fruits for example, and rename the file with the sequence found + key word ranking + increment. Example: 12345678901234567_f_0 , 12345678901234567_f_1.
Currently it only concatenates the sequence and increment, example: 12345678901234567_0 , 12345678901234567_1. The increment I made to differentiate the files when they have the same sequence of numbers
EDIT: what I am not getting is to join the sequence and classification fruit that were extracted from the same text. The same number, may have the classification fruits or vegetables, for example. So I need to find out which sequence came out each fruit or vegetable classification to rename the file
If I understand correctly, you want to check the contents of the file twice: once to extract a sequence of numbers and once to check if it contains "fruit" words.
In order to look at the text more than once you should store the contents of the file in its own variable.
You can change the code in the with block to :
with open(path_txt + '\\' + TXT, "r") as content:
text_complete = content.read()
And the later you can check for your number sequence
search = re.search(r'...', text_complete.read()) # ... is your long regular expression
And you can also run your if statement to check for "fruit" words:
if text_complete.find("apple") >= 0 and ... : # ... is the rest of your condition
found_fruit = True
By storing the contents of the file as a string as the text_complete variable, you can refer to it multiple times, checking for something different each time.
Related
I have a text file (s1.txt) containing the following information. There are some lines that contain only one number and others that contain two numbers separated by a hyphen.
1
3-5
10
11-13
111
113-150
1111
1123-1356
My objective is to write a program that reads the file line by line, subtracts each number from one, replaces hyphens with colons, and prints the output on one single line. The following is my expected outcome.
{0 2:4 9 10:12 110 112:149 1110 1122:1355}
Using the following code, I am receiving an output that is quite different from what I expected. Please, let me know how I can correct it.
s1_file = input("Enter the name of the S1 file: ")
s1_text = open(s1_file, "r")
# Read contents of the S1 file to string
s1_data = s1_text.read()
for atoms in s1_data.split('\n'):
if atoms.isnumeric():
qm_atom = int(atoms) - 1
#print(qm_atom)
else:
qm_atom = atoms.split('-')
print(qm_atom)
If your goal is to output directly to the screen as a single line you should add end=' ' to the print function.
Or you can store the values in a variable and print everything at the end.
Regardless of that, you were missing at the end to subtract 1 from the values and then join them with the join function. The join function is used on a string where it creates a new string with the values of an array (all values must be strings) separated by the string on which the join method is called.
For example ', '.join(['car', 'bike', 'truck']) would get 'car, bike, truck'.
s1_file = input("Enter the name of the S1 file: ")
s1_text = open(s1_file, "r")
# Read contents of the S1 file to string
s1_data = s1_text.read()
output = []
for atoms in s1_data.split('\n'):
if atoms.isnumeric():
qm_atom = int(atoms) - 1
output.append(str(qm_atom))
else:
qm_atom = atoms.split('-')
# loop the array to subtract 1 from each number
qm_atom_substrated = [str(int(q) - 1) for q in qm_atom]
# join function to combine int with :
output.append(':'.join(qm_atom_substrated))
print(output)
An alternative way of doing it could be:
s1_file = input("Enter the name of the S1 file: ")
with open (s1_file) as f:
output_string = ""
for line in f:
elements = line.strip().split('-')
elements = [int(element) - 1 for element in elements]
elements = [str(element) for element in elements]
elements = ":".join(elements)
output_string += elements + " "
print(output_string)
why are you needlessly complicating a simple task by checking if a element is numerical then handle it else handle it differently.
Also your code gave you a bad output because your else clause is incorrect , it just split elements into sub lists and there is no joining of this sub list with ':'
anyways here is my complete code
f=open(s1_file,'r')
t=f.readlines()#reading all lines
for i in range(0,len(t)):
t[i]=t[i][0:-1]#removing /n
t[i]=t[i].replace('-',':') #replacing - with :
try:t[i]=int(t[i])-1 #convert str into int & process
except:
t[i]=f"{int(t[i].split(':')[0])-1}:{int(t[i].split(':')[1])-1}" #if str case then handle
print(t)
I have a long lists of animal identifiers in a text file. Our convention is to use two of alphabetical characters, followed by a litter identifier a dash and then the animal id within that litter. The number before the dash identifies whether they are control or manipulated animals.
So it looks like this: (with explanations in parentheses not in the text file) The only things in the text file are the identifier and possibly a data after that identifier on the same line.
XL20-4 is a control animal (0 - even),
XL21-4 is a manipulated animal (1 - odd),
Running all the way to the 300s
XL304-5 (4 - even - control),
XL303-4 (3 - odd - manipulated).
First how to create an ordered list in separate textfiles of the animals in each condition from the original text file, so it can then be read by our matlab code.
It needs to retain the order of animal generation within those new textfiles
i.e.
XL302-4,
XL304-5,
XL304-6,
XL306-1,
Each with a '/n' ending.
Thanks in advance.
based on what you had said this would be the way to do it, but there should be some finer tweaking because the file contents originally are unknown (name and how they are placed in the text file)
import re
def write_to_file(file_name, data_to_write):
with open(file_name, 'w') as file:
for item in data_to_write:
file.write(f"{item}\n")
# read contents from file
with open('original.txt', 'r') as file:
contents = file.readlines()
# assuming that each of the 'XL20-4,' are on a new line
control_group = []
manipulated_group = []
for item in contents:
# get only the first number between the letters and dash
test_generation = int(item[re.search(r"\d", item).start():item.find('-')])
if test_generation % 2: # if even evaluates to 0 ~ being false
manipulated_group.append(item)
else:
control_group.append(item)
# write to files with the data
write_to_file('control.txt', control_group)
write_to_file('manipulated.txt', manipulated_group)
I have a folder of 181 text file, each containing numbers but I only need to multiply those on lines that have "size" by a constant variable such as 0.5, but I did achieve this with this: Search and replace math operations with the result in Notepad++
But what I am trying to do is without expression or quotation marks so that the rest of the community I am in can simply do the same without editing every file to meet the format needed to multiply each number.
For Example:
farmers = {
culture = armenian
religion = coptic
size = 11850
}
being multiplied by 0.5 to:
farmers = {
culture = armenian
religion = coptic
size = 5925
}
I tried making a python script but it did not work although I don't know much python:
import operator
with open('*.txt', 'r') as file:
data = file.readlines()
factor = 0.5
count = 0
for index, line in enumerate(data):
try:
first_word = line.split()[0]
except IndexError:
pass
if first_word == 'size':
split_line = line.split(' ')
# print(' '.join(split_line))
# print(split_line)
new_line = split_line
new_line[-1] = ("{0:.6f}".format(float(split_line[-1]) * factor))
new_line = ' '.join(new_line) + '\n'
# print(new_line)
data[index] = new_line
count += 1
elif first_word == 'text_scale':
split_line = line.split(' ')
# print(split_line)
# print(' '.join(split_line))
new_line = split_line
new_line[-1] = "{0:.2f}".format(float(split_line[-1]) * factor)
new_line = ' '.join(new_line) + '\n'
# print(new_line)
data[index] = new_line
count += 1
with open('*.txt', 'w') as file:
file.writelines(data)
print("Lines changed:", count)
So are there any solutions to this, I rather not make people in my community format every single file to work with my solution. Anything could work just that I haven't found a simple solution that is quick and easy for anyone to understand for those who use notepad++ or Sublime Text 3.
If you use EmEditor, you can use the Replace in Files feature of EmEditor. In EmEditor, select Replace in Files on the Search menu (or press Ctrl + Shift + H), and enter:
Find: (?<=\s)size(.*?)(\d+)
Replace with: \J "size\1" + \2 / 2
File Types: *.txt (or file extension you are looking for)
In Folder: (folder path you are searching in)
Set the Keep Modified Files Open and Regular Expressions options (and Match Case option if size always appears in lower case),
Click the Replace All button.
Alternatively, if you would like to use a macro, this is a macro for you (you need to edit the folder path):
editor.ReplaceInFiles("(?<=\\s)size(.*?)(\\d+)","\\J \x22size\\1\x22 + \\2 / 2","E:\\Test\\*.txt",eeFindReplaceCase | eeFindReplaceRegExp | eeReplaceKeepOpen,0,"","",0,0);
To run this, save this code as, for instance, Replace.jsee, and then select this file from Select... in the Macros menu. Finally, select Run Replace.jsee in the Macros menu.
Explanations:
\J in the replacement expression specifies JavaScript. In this example, \2 is the backreference from (\d+), thus \2 / 2 represents a matched number divided by two.
References: EmEditor How to: Replace Expression Syntax
I am trying to parse certain paragraphs out of multiple text file and store them in list. All the text file have some similar format to this:
MODEL NUMBER: A123
MODEL INFORMATION: some info about the model
DESCRIPTION: This will be a description of the Model. It
could be multiple lines but an empty line at the end of each.
CONCLUSION: Sold a lot really profitable.
Now i can pull out the information where its one line, but am having trouble when i encounter something which is multiple line (like 'Description'). The description length is not known but i know at the end it would have an empty line (which would mean using '\n'). This is what i have so far:
import os
dir = 'Test'
DESCRIPTION = []
for files in os.listdir(dir):
if files.endswith('.txt'):
with open(dir + '/' + files) as File:
reading = File.readlines()
for num, line in enumerate(reading):
if 'DESCRIPTION:' in line:
Start_line = num
if len(line.strip()) == 0:
I don't know if its the best approach, but what i was trying to do with if len(line.strip()) == 0: is to create a list of blank lines and then find the first greater value than Start_Line. I saw this Bisect.
In the end i would like my data to be if i say print Description
['DESCRIPTION: Description from file 1',
'DESCRIPTION: Description from file 2',
'DESCRIPTION: Description from file 3,]
Thanks.
Regular expression. Think about it this way: you have a pattern that will allow you to cut any file into pieces you will find palatable: "newline followed by capital letter"
re.split is your friend
Take a string
"THE
BEST things
in life are
free
IS
YET
TO
COME"
As a string:
p = "THE\nBEST things\nin life are\nfree\nIS\nYET\nTO\nCOME"
c = re.split('\n(?=[A-Z])', p)
Which produces list c
['THE', 'BEST things\nin life are\nfree', 'IS', 'YET', 'TO', 'COME']
I think you can take it from there, as this would separate your files into each a list of strings with each string beings its own section, then from there you can find the "DESCRIPTION" element and store it, you see that you separate each section, including its subcontents by that re split. Important to note that the way I've set up the regex it recognies the PATTERN "newline and then Capital Letter" but CUTS after the newline, which is why it is outside the brackets.
I have a file at /location/data.txt . In this file I have entry like :
aaa:xxx:abc.com:1857:xxx1:rel5t2:y
ifa:yyy:xyz.com:1858:yyy1:rel5t2:y
I want to access 'aaa' from my code either I mention aaa while giving the input in caps or small after running my python code it should return me aaa is the right item
But here I want to include one exception that if I give the input with -mc suffix (aaa-mc) either in small latters or in caps it should ignore the -mc.
Below is the my code and output as well which I am getting now.
def pITEMName():
global ITEMList,fITEMList
pITEMList = []
fITEMList = []
ITEMList = str(raw_input('Enter pipe separated list of ITEMS : ')).upper().strip()
items = ITEMList.split("|")
count = len(items)
print 'Total Distint ITEM Count : ', count
pipelst = [i.split('-mc')[0] for i in ITEMList.split('|')]
filepath = '/location/data.txt'
f = open(filepath, 'r')
for lns in f:
split_pipe = lns.split(':', 1)
if split_pipe[0] in pipelst:
index = pipelst.index(split_pipe[0])
pITEMList=split_pipe[0]+"|"
fITEMList.append(pITEMList)
del pipelst[index]
for lns in pipelst:
print bcolors.red + lns,' is wrong ITEM Name' + bcolors.ENDC
f.close()
When I execute above code it prompts me like :
Enter pipe separated list of ITEMS :
And if I provide the list like :
Enter pipe separated list of ITEMS : aaa-mc|ifa
it gives me the result as :
Total Distint item Count : 2
AAA-MC is wrong item Name
items Belonging to other :
Other center :
item Count From Other center = 0
items Belonging to Current Centers :
Active items in US1 :
^IFA$
Active items in US2 :
^AAA$
Ignored item Count From Current center = 0
You Have Entered itemList belonging to this center as: ^IFA$|^AAA$
Active item Count : 2
Do You Want To Continue [YES|Y|NO|N] :
As you must be see in above result aaa is coming as valid count (active item count : 2) because its available in /location/data.txt file. but also its coming as AAA-MC is wrong item name (2nd line from above result). I want '-mc or -MC' to ignore with any item present or non present in /location/data.txt file.
Please let me know what's wrong with my above code to achieving this.
The issue you're having is that your code expects the "-mc" suffix to appear in lowercase, but you're calling the upper() method on the input string, resulting in text that is all upper case. You need to change one of those so that they match (it doesn't really matter which one).
Either replace the upper() call with lower(), or replace the string "-mc" with "-MC", and your code should work better (I'm not certain I understand all of it, so there may be other issues).
The way you are constructing ITEMList is by reading in a string, capitalizing it (with upper()), and stripping all whitespace. Therefore, something like 'aaa-mc' is being converted to 'AAA-MC'. You're later splitting this uppercase string on the token '-mc', which is impossible for it to contain, so.
I'd reccommed either replacing upper() with lower() when you are reading your string in, or doing a hard replace on the types of '-mc', so instead of
i.split('-mc')[0]
try using
i.replace('-mc','').replace('-MC','')
in your list comprension.