extract only integer from txt - python
I have a txt file which contains lost of information,I do not want its head and tail, I need only numbers in the middle. which is a 1x11200 matrix.
[txtpda]
LT=5.6
DATE=21.06.2018
TIME=14:11
CNT=11200
RES=0.00854518
N=5
VB=350
VT=0.5
LS=0
MEASTIME=201806211412
PICKUP=BFW-2
LC=0.8
[PROFILE]
255
256
258
264
269
273
267
258
251
255
259
262
260
256
255
260
264
266
265
263
261
263
267
275
280
280
280
280
283
284
283
277
279
280
283
285
283
282
280
280
286
288
298
299
299
299
304
303
300
297
295
296
299
301
303
301
299
296
298
299
302
303
304
307
308
312
313
314
312
311
311
310
312
310
309
305
303
299
297
294
288
280
270
266
250
242
222
213
199
180
173
...
-1062
-1063
[VALUES]
Ra;2;3;2;0.769;0;0;-1;0;-1;0
Rz;2;2;2;5.137;0;0;-1;0;-1;0
Pt;0;0;0;26.25;0;0;-1;0;-1;0
Wt;0;0;0;24.3;0;0;-1;0;-1;0
now I using the following method to extract numbers:
def OpenFile():
name=askopenfilename(parent=root)
f=open(name,'r')
originalyvec1=[]
yvec1=[]
if f==0:
print("fail to open the file")
else:
print("file successfully opened")
data=f.readlines()
for i in range(0,14):
del data[0]//delete its head(string)
del data[11204]//delete its tail(string)
del data[11203]//delete its tail(string)
del data[11202]//delete its tail(string)
del data[11201]//delete its tail(string)
del data[11200]//delete its tail(string)
for line in data:
for nbr in line.split(): //delete \n
yvec1.append(int(nbr))
if f.close()==0:
print("fail to close file")
else:
print("file closed")
I want to use numpy to manage it in a easy way. Is that possible?
like np.array or something like that.
You can use a alternative form of iter(), where you pass iter() a function and it will keep calling that function until it sees the value (2nd arg). You can use this to skip until you see [PROFILE]\n and then use that same form of iter() to read until [VALUES]\n. The function is just the one called by next(iterable), which is iterable.__next__, e.g.:
with open(name) as f:
for _ in iter(f.__next__, '[PROFILE]\n'): # Skip until PROFILE
pass
yvec1 = [int(d) for d in iter(f.__next__, '[VALUES]\n')]
yvec1 will now contain all values between [PROFILE] and [VALUES].
An alternative and potentially quicker way to consume the first iter() is to use collections.deque() instead of the for loop but this is likely over-kill for this problem, e.g.:
deque(iter(f.__next__, '[PROFILE]\n'), maxlen=0)
Note: using with will automatically close(f) at the end of the block.
You can simply replace everything from the line data=f.readlines() and below with:
data = [int(line) for line in map(str.strip, f.readlines()) if line.isdigit() or line.startswith('-') and line[1:].isdigit()]
And data will be the list of integers you're looking for.
Just to give you the idea this may help
The s3[0] will be all the numbers between PROFILE ans VALUES
#s=your data
s='sjlkf slflsafj[PROFILEl9723,2974982,2987492,886[VALUES]skjlfsajlsjal'
s2=s.split('[PROFILE]')
s3=s2[1].split('[VALUES]')
Related
Renumbering a Sequence of Numbers With Gaps using Python
I am trying to figure out how to renumber a certain file format and struggling to get it right. First, a little background may help: There is a certain file format used in computational chemistry to describe the structure of a molecule with the extension .xyz. The first column is the number used to identify a specific atom (carbon, hydrogen, etc.), and the subsequent columns show what other atom numbers it is connected to. Below is a small sample of this file, but the usual file is significantly larger. 259 252 260 254 261 255 262 256 264 248 265 268 265 264 266 269 270 266 265 267 282 267 266 268 264 269 265 270 265 271 276 277 271 270 272 273 272 271 274 278 273 271 275 279 274 272 275 280 275 273 274 281 276 270 277 270 278 272 279 273 280 274 282 266 283 286 283 282 284 287 288 284 283 285 289 285 284 286 282 287 283 288 283 289 284 290 293 290 289 291 294 295 291 290 292 304 As you can see, the numbers 263 and 281 are missing. Of course, there could be many more missing numbers so I need my script to be able to account for this. Below is the code I have thus far, and the lists missing_nums and missing_nums2 are given as well, however, I would normally obtain them from an earlier part of the script. The last element of the list missing_nums2 is where I want numbering to finish, so in this case: 289. missing_nums = ['263', '281'] missing_nums2 = ['281', '289'] with open("atom_nums.xyz", "r") as f2: lines = f2.read() for i in range(0, len(missing_nums) - 1): if i == 0: with open("atom_nums_out.xyz", "w") as f2: replacement = int(missing_nums[i]) for number in range(int(missing_nums[i]) + 1, int(missing_nums2[i])): lines = lines.replace(str(number), str(replacement)) replacement += 1 f2.write(lines) else: with open("atom_nums_out.xyz", "r") as f2: lines = f2.read() with open("atom_nums_out.xyz", "w") as f2: replacement = int(missing_nums[i]) - (i + 1) print(replacement) for number in range(int(missing_nums[i]), int(missing_nums2[i])): lines = lines.replace(str(number), str(replacement)) replacement += 1 f2.write(lines) The problem lies in the fact that as the file gets larger, there seems to be repeats of numbers for reasons I cannot figure out. I hope somebody can help me here. EDIT: The desired output of the code using the above sample would be 259 252 260 254 261 255 262 256 263 248 264 267 264 263 265 268 269 265 264 266 280 266 265 267 263 268 264 269 264 270 275 276 270 269 271 272 271 270 273 277 272 270 274 278 273 271 274 279 274 272 273 279 275 269 276 269 277 271 278 272 279 273 280 265 281 284 281 280 282 285 286 282 281 283 287 283 282 284 280 285 281 286 281 287 282 288 291 288 287 289 292 293 289 288 290 302 Which is, indeed, what I get as the output for this small sample, but as the missing numbers increase it seems to not work and I get duplicate numbers. I can provide the whole file if anyone wants. Thanks!
Assuming my interpretation of the lists missing_nums and missing_nums2 is correct, this is how I would perform the operation. from os import rename def fixFile(fn, mn1, mn2): with open(fn, "r") as fin: with open('tmp.txt', "w") as fout: for line in fin: for i in range(len(mn1)): minN = int(mn1[1]) maxN = int(mn2[i]) for nxtn in range(minN, maxN): line.replace(str(nxtn), str(nxtn +1)) fout.write(line) rename(temp, fn) missing_nums = ['263', '281'] missing_nums2 = ['281', '289'] fn = "atom_nums_out.xyz" fixFile(fn, missing_nums, missing_nums2) Note, I am only reading the file in once a line at a time, and writing the result out a line at a time. I am then renaming the temp file to the original filename after all data is processed. This means, significantly longer files, will not chew up memory.
How to convert image Run length Encoded Pixels mask to binary mask and reshape in python?
I have file having EncodedPixels mask of different size 1: I want to convert these EncodedPixels in binary and resize all into 1024 and then again convert in to EncodedPixels. Explanation: In file there is image-Mask in Encoded Pixels form, and images have different dimensions (5000x5000, 260x260 etc) So I resize all images in to 1024x1024, Now I want to resize each image-mask according to image 1024x1024. I my mind there is only one possible solution (might be more available) to resize mask is first we need to convert run length encoding pixel in to binary and then we are able to resize mask easily. File Link: link here This code will use to resize binary mask. from PIL import Image import numpy as np pil_image = Image.fromarray(binary_mask) pil_image = pil_image.resize((new_width, new_height), Image.NEAREST) resized_binary_mask = np.asarray(pil_image) Encoded Pixels Example ['6068157 7 6073371 20 6078584 34 6083797 48 6089010 62 6094223 72 6099436 76 6104649 80 6109862 85 6115075 89 6120288 93 6125501 98 6130714 102 6135927 106 6141140 111 6146354 114 6151567 118 6156780 123 6161993 127 6167206 131 6172419 136 6177632 140 6182845 144 6188058 149 6193271 153 6198484 157 6203697 162 6208910 166 6214124 169 6219337 174 6224550 178 6229763 182 6234976 187 6240189 191 6245402 195 6250615 200 6255828 204 6261041 208 6266254 213 6271467 218 6276680 224 6281893 229 6287107 233 6292320 238 6297533 244 6302746 249 6307959 254 6313172 259 6318385 265 6323598 270 6328811 275 6334024 280 6339237 286 6344450 291 6349663 296 6354877 300 6360090 306 6365303 311 6370516 316 6375729 322 6380942 327 6386155 332 6391368 337 6396581 343 6401794 348 6407007 353 6412220 358 6417433 364 6422647 368 6427860 373 6433073 378 6438286 384 6443499 389 6448712 394 6453925 399 6459138 405 6464351 410 6469564 415 6474777 420 6479990 426 17204187 78 17208797 227 17209412 56 17214025 203 17214637 34 17219253 179 17219862 11 17224481 155 17229709 131 17234937 107 17240165 83 17245393 60 17250621 36 17255849 12']
Python print string alignment
I am printing some values in a loop in Python. My current output is as follows: 0 Data Count: 249 7348 249 4469 2768 261 20 126 1 Data Count: 288 11 288 48 2284 598 137 408 2 Data Count: 808 999 808 2896 32739 138 202 678 3 Data Count: 140 26 140 2688 8054 884 433 987 What I'd like is for all values in each column to align, despite differing character/number counts in some, to make it easier to read. The pseudo code behind this is as follows: for i in range(0,3): print i, " Data Count: ", Count_A, " ", Count_B, " ", Count_C, " ", Count_D, " ", Count_E, " ", Count_F, " ", Count_G, " ", Count_H Thanks in advance everyone!
You could use format string justification: from random import randint for i in range(5): data = [randint(0, 1000) for j in range(5)] print("{:5} {:5} {:5} {:5}".format(*data)) output: 92 460 72 630 837 214 118 677 906 328 102 320 895 998 177 922 651 742 215 938 According to the format specification from Python docs
With the % string formatting operator, the minimum width of output is specified in a placeholder as a number before the data type (the full format of a placeholder is %[key][flags][width][.precision][length type]conversion type). If the result is shorter, it will be left-padded to the specified length: from random import randint for i in range(5): data = [randint(0, 1000) for j in range(5)] print("%5d %5d %5d %5d %5d" % tuple(data)) gives: 946 937 544 636 871 232 860 704 877 716 868 849 851 488 739 419 381 695 909 518 570 756 467 351 537 (code adapted from #andreihondrari's answer)
How to display a sequence of numbers in column-major order?
Program description: Find all the prime numbers between 1 and 4,027 and print them in a table which "reads down", using as few rows as possible, and using as few sheets of paper as possible. (This is because I have to print them out on paper to turn it in.) All numbers should be right-justified in their column. The height of the columns should all be the same, except for perhaps the last column, which might have a few blank entries towards its bottom row. The plan for my first function is to find all prime numbers between the range above and put them in a list. Then I want my second function to display the list in a table that reads up to down. 2 23 59 3 29 61 5 31 67 7 37 71 11 41 73 13 43 79 17 47 83 19 53 89 ect... This all I've been able to come up with myself: def findPrimes(n): """ Adds calculated prime numbers to a list. """ prime_list = list() for number in range(1, n + 1): prime = True for i in range(2, number): if(number % i == 0): prime = False if prime: prime_list.append(number) return prime_list def displayPrimes(): pass print(findPrimes(4027)) I'm not sure how to make a row/column display in Python. I remember using Java in my previous class and we had to use a for loop inside a for loop I believe. Do I have to do something similar to that?
Although I frequently don't answer questions where the original poster hasn't even made an attempt to solve the problem themselves, I decided to make an exception of yours—mostly because I found it an interesting (and surprisingly challenging) problem that required solving a number of somewhat tricky sub-problems. I also optimized your find_primes() function slightly by taking advantage of some reatively well-know computational shortcuts for calculating them. For testing and demo purposes, I made the tables only 15 rows high to force more than one page to be generated as shown in the output at the end. from itertools import zip_longest import locale import math locale.setlocale(locale.LC_ALL, '') # enable locale-specific formatting def zip_discard(*iterables, _NULL=object()): """ Like zip_longest() but doesn't fill out all rows to equal length. https://stackoverflow.com/questions/38054593/zip-longest-without-fillvalue """ return [[entry for entry in iterable if entry is not _NULL] for iterable in zip_longest(*iterables, fillvalue=_NULL)] def grouper(n, seq): """ Group elements in sequence into groups of "n" items. """ for i in range(0, len(seq), n): yield seq[i:i+n] def tabularize(width, height, numbers): """ Print list of numbers in column-major tabular form given the dimensions of the table in characters (rows and columns). Will create multiple tables of required to display all numbers. """ # Determine number of chars needed to hold longest formatted numeric value gap = 2 # including space between numbers col_width = len('{:n}'.format(max(numbers))) + gap # Determine number of columns that will fit within the table's width. num_cols = width // col_width chunk_size = num_cols * height # maximum numbers in each table for i, chunk in enumerate(grouper(chunk_size, numbers), start=1): print('---- Page {} ----'.format(i)) num_rows = int(math.ceil(len(chunk) / num_cols)) # rounded up table = zip_discard(*grouper(num_rows, chunk)) for row in table: print(''.join(('{:{width}n}'.format(num, width=col_width) for num in row))) def find_primes(n): """ Create list of prime numbers from 1 to n. """ prime_list = [] for number in range(1, n+1): for i in range(2, int(math.sqrt(number)) + 1): if not number % i: # Evenly divisible? break # Not prime. else: prime_list.append(number) return prime_list primes = find_primes(4027) tabularize(80, 15, primes) Output: ---- Page 1 ---- 1 47 113 197 281 379 463 571 659 761 863 2 53 127 199 283 383 467 577 661 769 877 3 59 131 211 293 389 479 587 673 773 881 5 61 137 223 307 397 487 593 677 787 883 7 67 139 227 311 401 491 599 683 797 887 11 71 149 229 313 409 499 601 691 809 907 13 73 151 233 317 419 503 607 701 811 911 17 79 157 239 331 421 509 613 709 821 919 19 83 163 241 337 431 521 617 719 823 929 23 89 167 251 347 433 523 619 727 827 937 29 97 173 257 349 439 541 631 733 829 941 31 101 179 263 353 443 547 641 739 839 947 37 103 181 269 359 449 557 643 743 853 953 41 107 191 271 367 457 563 647 751 857 967 43 109 193 277 373 461 569 653 757 859 971 ---- Page 2 ---- 977 1,069 1,187 1,291 1,427 1,511 1,613 1,733 1,867 1,987 2,087 983 1,087 1,193 1,297 1,429 1,523 1,619 1,741 1,871 1,993 2,089 991 1,091 1,201 1,301 1,433 1,531 1,621 1,747 1,873 1,997 2,099 997 1,093 1,213 1,303 1,439 1,543 1,627 1,753 1,877 1,999 2,111 1,009 1,097 1,217 1,307 1,447 1,549 1,637 1,759 1,879 2,003 2,113 1,013 1,103 1,223 1,319 1,451 1,553 1,657 1,777 1,889 2,011 2,129 1,019 1,109 1,229 1,321 1,453 1,559 1,663 1,783 1,901 2,017 2,131 1,021 1,117 1,231 1,327 1,459 1,567 1,667 1,787 1,907 2,027 2,137 1,031 1,123 1,237 1,361 1,471 1,571 1,669 1,789 1,913 2,029 2,141 1,033 1,129 1,249 1,367 1,481 1,579 1,693 1,801 1,931 2,039 2,143 1,039 1,151 1,259 1,373 1,483 1,583 1,697 1,811 1,933 2,053 2,153 1,049 1,153 1,277 1,381 1,487 1,597 1,699 1,823 1,949 2,063 2,161 1,051 1,163 1,279 1,399 1,489 1,601 1,709 1,831 1,951 2,069 2,179 1,061 1,171 1,283 1,409 1,493 1,607 1,721 1,847 1,973 2,081 2,203 1,063 1,181 1,289 1,423 1,499 1,609 1,723 1,861 1,979 2,083 2,207 ---- Page 3 ---- 2,213 2,333 2,423 2,557 2,687 2,789 2,903 3,037 3,181 3,307 3,413 2,221 2,339 2,437 2,579 2,689 2,791 2,909 3,041 3,187 3,313 3,433 2,237 2,341 2,441 2,591 2,693 2,797 2,917 3,049 3,191 3,319 3,449 2,239 2,347 2,447 2,593 2,699 2,801 2,927 3,061 3,203 3,323 3,457 2,243 2,351 2,459 2,609 2,707 2,803 2,939 3,067 3,209 3,329 3,461 2,251 2,357 2,467 2,617 2,711 2,819 2,953 3,079 3,217 3,331 3,463 2,267 2,371 2,473 2,621 2,713 2,833 2,957 3,083 3,221 3,343 3,467 2,269 2,377 2,477 2,633 2,719 2,837 2,963 3,089 3,229 3,347 3,469 2,273 2,381 2,503 2,647 2,729 2,843 2,969 3,109 3,251 3,359 3,491 2,281 2,383 2,521 2,657 2,731 2,851 2,971 3,119 3,253 3,361 3,499 2,287 2,389 2,531 2,659 2,741 2,857 2,999 3,121 3,257 3,371 3,511 2,293 2,393 2,539 2,663 2,749 2,861 3,001 3,137 3,259 3,373 3,517 2,297 2,399 2,543 2,671 2,753 2,879 3,011 3,163 3,271 3,389 3,527 2,309 2,411 2,549 2,677 2,767 2,887 3,019 3,167 3,299 3,391 3,529 2,311 2,417 2,551 2,683 2,777 2,897 3,023 3,169 3,301 3,407 3,533 ---- Page 4 ---- 3,539 3,581 3,623 3,673 3,719 3,769 3,823 3,877 3,919 3,967 4,019 3,541 3,583 3,631 3,677 3,727 3,779 3,833 3,881 3,923 3,989 4,021 3,547 3,593 3,637 3,691 3,733 3,793 3,847 3,889 3,929 4,001 4,027 3,557 3,607 3,643 3,697 3,739 3,797 3,851 3,907 3,931 4,003 3,559 3,613 3,659 3,701 3,761 3,803 3,853 3,911 3,943 4,007 3,571 3,617 3,671 3,709 3,767 3,821 3,863 3,917 3,947 4,013
Removing a recurrant regular expression in a string - Python
I have the following collection of items. I would like to add a comma followed by a space at the end of each item so I can create a list out of them. I am assuming the best way to do this is to form a string out of the items and then replace 3 spaces between each item with a comma, using regular expressions? I would like to do this with python, which I am new to. 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
Instead of a regular expression, how about this (assuming you have it in a file somewhere): items = open('your_file.txt').read().split() If it's just in a string variable: items = your_input.split() To combine them again with a comma in between: print ', '.join(items)
data = """179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 """ To get the list out of it: lst = re.findall("(\d+)", data) print lst To add comma after each item, replace multiple spaces with , and space. data = re.sub("[ ]+", ", ", data) print data