using regex to find multiple occurences in a file in python

using regex to find multiple occurences in a file in python - python

I m trying the following:
myfile.txt has the following content ,I want to extract data between each 'abc start' and 'abc end' using regular expression in python. thanks for the help
abc start
1
2
3
4
abc end
5
6
7
abc start
8
9
10
abc end
expecting a output as 1 2 3 4 8 9 10

import re
with open('myfile.txt') as f:
txt = f.read()
strings = re.findall('abc start\n(.+?)\nabc end', txt, re.DOTALL)
# to transform to your output..
result = []
for s in strings:
result += s.split('\n')
print(result)
#['1', '2', '3', '4', '8', '9', '10']

using regex
import re
string = ''
with open('file.txt','r') as f:
for i in f.readlines():
string +=i.strip()+' '
f.close()
exp = re.compile(r'abc start(.+?)abc end')
result = [[int(j) for j in list(i.strip().split())] for i in exp.findall(string)]
print(result)
# [[1, 2, 3, 4], [8, 9, 10]]

Related

Getting a nested list with "" in the outer list

After opening and reading an input file, I'm trying to split the input on different characters. This works well, although I seem to be getting a nested list which I don't want. My list does not look like [[list]], but like ["[list]"]. What did I do wrong here?
The input looks like this:
name1___1 2 3 4 5
5=20=22=10=2=0=0=1=0=1something,something
name2___1 2 3 4
2=30=15=8=4=3=2=0=0=0;
The output looks like this:
["['name1", '', '', "1 2 3 4 5', 'name2", '', '', "1 2 3 4']"]
Here is my code:
file = open("file.txt")
input_of_this_file = file.read()
a = input_of_this_file.split("\n")
b = a[0::2] # so i get only the even lines
c = str(b) # to make it a string so the .strip() works
d = c.strip() # because there were whitespaces
e = d("_")
print e
If i then do:
x = e[0]
I get:
['name1
This removes the outer list, but also removes the last ].
I would like it to look like: name1, name2
So that i only get the names.

Use itertools.islice and a list comprehension.
>>> from itertools import islice
>>> with open("tmp.txt") as f:
... [line.rstrip("\n").split("_") for line in islice(f, None, None, 2)]
...
[['name1', '', '', '1 2 3 4 5'], ['name2', '', '', '1 2 3 4']]

Keeping your code syntax without imports:
c=[]
input_of_file = '''name1___1 2 3 4 5
5=20=22=10=2=0=0=1=0=1something,something
name2___1 2 3 4
2=30=15=8=4=3=2=0=0=0;'''
a = input_of_file.split("\n")
b = a[::2]
for item in b:
new_item = item.split('__')
c.append(new_item)
Results
c = [['name1', '_1 2 3 4 5'], ['name2', '_1 2 3 4']]
c[0][0] = 'name1'

read txt file input and add values to two arrays

A B C D
2 4 5 6
4 5 3 7
3 6 7 8
I want to get A, B, C column values to array(3 x 3) and D column to another array(3 x 1).

simple brute-force method:
a33 = [[],[],[]]
a31 = []
with open('dat.txt') as f:
for ln in f:
a,b,c,d = ln.split()
a33[0] += a
a33[1] += b
a33[2] += c
a31 += d
print a33
print a31
[['2', '4', '3'], ['4', '5', '6'], ['5', '3', '7']]
['6', '7', '8']

import numpy as np
# Read the data from a file
with open('data.txt') as file:
lines = file.readlines()
# Chop of the columns
raw_data = lines[1:]
# Now fetch all the data
data_abc = []
data_d = []
for line in raw_data:
values = line.split()
data_abc.append(values[:3])
data_d.append(values[3])
# Convert to matrix
data_abc = np.asmatrix(data_abc)
data_d = np.asmatrix(data_d)
# Display the result
print('Data A B C:', data_abc)
print('Data D:', data_d)

How to edit the .txt or .dat file information in Python?

I am a very beginner in Python and have the next 'problem'. I would be glad, if you could help me)
I have a *.dat file (let's name it file-1, first row is just a headline which I use only here to mark the columns) which looks like:
1 2 3 4 5 6
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
I need it to be like (file-1 (converted)):
6 5 1 -1000
6 5 1 -1000
6 5 2 -1000
6 5 3 -1000
6 5 3 -1000
6 5 3 -1000
6 5 3 -1000
So, file-1 has 9 rows (7 with information and 2 empty) and 6 columns and I have to do the next:
Delete the last 3 columns in the file-1.
Add 1 new column that will take place between the columns 2 and 3.
The value of this new column should be increased by 1 unit (like '+= 1') after passing the empty line.
Delete all the empty lines. The result is represented as the 'file-1 (converted)'.
I've tried to do this but stucked. For now I am on the level of:
import sys
import csv
with open("file-1.dat", "r", newline="") as f:
sys.stdout = open('%s2 (converted).txt' % f.name, 'a')
incsv = csv.reader(f, delimiter="\t")
for row in incsv:
if len(row) == 6:
i = 0
row = row[0:3]
row.insert(2, i)
print(row)
and it looks like:
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
I don't know for now how to change 0 to 1 and 2 and so on, so it could be like:
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 1, '-1000']
['6', '5', 2, '-1000']
['6', '5', 2, '-1000']
['6', '5', 2, '-1000']
['6', '5', 2, '-1000']
And the result should be like the 'file-1 (converted)' file.
P.S. All the examples are simplified, real file has a lot of rows and I don't know where the empty lines appear.
P.P.S. Sorry for such a long post, hope, it makes sense. Ask, suggest - I would be really glad to see other opinions) Thank you.

seems like you're almost there, you're just inserting i=0 all the time instead of the count of empty rows, try something like:
with open("file-1.dat", "r", newline="") as f:
sys.stdout = open('%s2 (converted).txt' % f.name, 'a')
incsv = csv.reader(f, delimiter="\t")
empties = 0 # init empty row counter
for row in incsv:
if len(row) == 6:
row = row[0:3]
row.insert(2, empties) # insert number of empty rows
print(row)
else:
empties += 1 # if row is empty, increase counter

This is bit different without using csv module. Hope this helps. :)
import sys
count = 0
with open("file-1.dat", "r") as f:
sys.stdout = open('%s2 (converted).txt' % f.name, 'a')
for line in f:
converted_line = line.split()[:-3] #split each line and remove last 3 column
if not converted_line: # if list/line is empty
count += 1 #increase count but DO NOT PRINT/ WRITE TO FILE
else:
converted_line.insert(2,str(count)) # insert between 2nd and 3rd column
print ('\t'.join(converted_line)) # join them and print them with tab delimiter

You need to increment i on every empty line
import sys
import csv
with open("file-1.dat", "r") as f:
sys.stdout = open('%s2 (converted).txt' % f.name, 'a')
incsv = csv.reader(f, delimiter="\t")
incsv.next() # ignore first line
i = 0
for row in incsv:
if len(row) == 6:
row = row[0:3]
row.insert(2, i)
print(row)
elif len(row) == 0:
i += 1
Also, I couldn't execute your code on my machine (with Python 2.7.6). I changed the code according to run with Python 2.x.
Edit: I see it runs with Python 3.x

How to add inputed numbers in string.(Python)

How do i add all imputed numbers in a string?
Ex:
input:
5 5 3 5
output
18
and it must supports ('-')
Ex.
input
-5 5 3 5
output
8
I write something like this:
x = raw_input()
print sum(map(int,str(x)))
and it adds normally if x>0
But what to do with ('-') ?
I understand that i need to use split() but my knowledge is not enough (

You're close, you just need to split the string on spaces. Splitting will produce the list of strings ['-5', '5', '3', '5']. Then you can do the rest of the map and sum as you intended.
>>> s = '-5 5 3 5'
>>> sum(map(int, s.split()))
8

its simple
>>> input = raw_input('Enter your input: ')
Enter your input: 5 5 10 -10
>>> list_numbers = [int(item) for item in input.split(' ')]
>>> print list_numbers
[5, 5, 10, -10]
And after what you want :)

You can use the following line:
sum(map(int, raw_input().split()))

Python, How to use line from file as key in dictionary, use next line for value?

I have a file like this:
Ben
0 1 5 2 0 1 0 1
Tim
3 2 1 5 4 0 0 1
I would like to make a dictionary that looks like this:
{Ben: 0 1 5 2 0 1 0 1, Tim : 3 2 1 5 4 0 0 1}
so I was thinking something like:
for line in file:
dict[line] = line + 1
but you can't iterate through a file like that, so how would I go about
doing this?

This might be what you want:
dict_data = {}
with open('data.txt') as f:
for key in f:
dict_data[key.strip()] = next(f).split()
print dict_data
Output:
{'Tim': ['3', '2', '1', '5', '4', '0', '0', '1'], 'Ben': ['0', '1', '5', '2', '0', '1', '0', '1']}
Discussion
The for loop assumes each line is a key, we will read the next line in the body of the loop
key.strip() will turn 'Tim\n' to 'Tim'
f.next() reads and returns the next line -- the line after the key line
f.next().split() therefore splitting that line into a list
dict_data[key.strip()] = ... will do something like: dict_data['Tim'] = [ ... ]
Update
Thank to Blckknght for the pointer. I changed f.next() to next(f)
Update 2
If you want to turn the list into a list of integers instead of string, then instead of:
dict_data[key.strip()] = next(f).split()
Do this:
dict_data[key.strip()] = [int(i) for i in next(f).split()]

state = 0
d = {}
for line in file:
if state == 0:
key = line.strip()
state = 1
elif state == 1:
d[key] = line.split()
state = 0

I think the easiest approach is to first load the full file with file.readlines(), which loads the whole file and returns a list of the lines. Then you can create your dictionary with a comprehension:
lines = my_file.readlines()
my_dict = dict(lines[i:i+2] for i in range(0, len(lines), 2))
For your example file, this will give my_dict the contents:
{"Ben\n": "0 1 5 2 0 1 0 1\n", "Tim\n": "3 2 1 5 4 0 0 1\n"}
An alternative approach would be to use a while loop that reads two lines at a time:
my_dict = {}
while True:
name = file.readline().strip()
if not name: # detect the end of the file, where readline returns ""
break
numbers = [int(n) for n in file.readline().split()]
my_dict[name] = numbers
This approach allows you easily do some processing of the lines than the comprehension in the earlier version, such as stripping newlines and splitting the line of numbers into a list of actual int objects.
The result for the example file would be:
{"Ben": [0, 1, 5, 2, 0, 1, 0, 1], "Tim": [3, 2, 1, 5, 4, 0, 0, 1]}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

using regex to find multiple occurences in a file in python - python

I m trying the following: myfile.txt has the following content ,I want to extract data between each 'abc start' and 'abc end' using regular expression in python. thanks for the help abc start 1 2 3 4 abc end 5 6 7 abc start 8 9 10 abc end expecting a output as 1 2 3 4 8 9 10

import re with open('myfile.txt') as f: txt = f.read() strings = re.findall('abc start\n(.+?)\nabc end', txt, re.DOTALL) # to transform to your output.. result = [] for s in strings: result += s.split('\n') print(result) #['1', '2', '3', '4', '8', '9', '10']

using regex import re string = '' with open('file.txt','r') as f: for i in f.readlines(): string +=i.strip()+' ' f.close() exp = re.compile(r'abc start(.+?)abc end') result = [[int(j) for j in list(i.strip().split())] for i in exp.findall(string)] print(result) # [[1, 2, 3, 4], [8, 9, 10]]

Related

Getting a nested list with "" in the outer list

read txt file input and add values to two arrays

How to edit the .txt or .dat file information in Python?

How to add inputed numbers in string.(Python)

Python, How to use line from file as key in dictionary, use next line for value?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

using regex to find multiple occurences in a file in python - python

I m trying the following: myfile.txt has the following content ,I want to extract data between each 'abc start' and 'abc end' using regular expression in python. thanks for the help abc start 1 2 3 4 abc end 5 6 7 abc start 8 9 10 abc end expecting a output as 1 2 3 4 8 9 10

import re with open('myfile.txt') as f: txt = f.read() strings = re.findall('abc start\n(.+?)\nabc end', txt, re.DOTALL) # to transform to your output.. result = [] for s in strings: result += s.split('\n') print(result) #['1', '2', '3', '4', '8', '9', '10']

using regex import re string = '' with open('file.txt','r') as f: for i in f.readlines(): string +=i.strip()+' ' f.close() exp = re.compile(r'abc start(.+?)abc end') result = [[int(j) for j in list(i.strip().split())] for i in exp.findall(string)] print(result) # [[1, 2, 3, 4], [8, 9, 10]]

Related

Getting a nested list with "" in the outer list

read txt file input and add values to two arrays

How to edit the *.txt or *.dat file information in Python?

How to add inputed numbers in string.(Python)

Python, How to use line from file as key in dictionary, use next line for value?

Categories

Resources

How to edit the .txt or .dat file information in Python?