Personally, I have the following string "E2017010000000601". This character E is for control, after comes the year, then the month and in the last positions comes a user code with a maximum of 7 positions. I would like to know how can I in Python remove those 0 from the middle of the string that are unnecessary.
For example, in the string "E2018090001002202", I do not need these 3 zeros between 9 and 1.
Already in the string "E2017010000000601", I do not need those 7 zeros between 1 and 6 ..
I have over 1000 files with this type of string, and renaming it one by one is tricky. I know that in Python I can rename this huge amount of files, but I did some code and I'm not able to mount the way I explained ... Any help?
This is basic string slicing as long as you are sure the structure is identical for each string.
You can use something like:
original_string = "E2017010000000601"
cut_string = str(int(original_string[7:]))
This should work because first you remove the first 7 values, the control char, year and month.
Then you turn to integer which removes all the zeroes at the front, then back to string.
Basically the same answer as Alexis, but since I can't comment yet, in a separate answer: since you want to keep the "EYYYYMM" part of the string, the code would be:
>>>original_string = 'E2017010000000601'
>>>cut_string= original_string[:7] + str(int(original_string[7:]))
>>>cut_string
'E201701601'
A quick explanation: we know what the first seven characters of the string will be, and we want to keep those in the string. Then we add the rest of the string, turned into an integer and back into a string, so that all unnecessary zeroes in front are removed.
Related
I have a string that is 5 GB in size, I would like to get the last 30 characters of the string. Is using the slice function the best way to get that substring, will it cause memory problem? Is it that another 5 GB will be created because a 4.99 GB and a 0.1 kb substring are created during the splitting process?
str.split() creates a list. So, you will end up with, at the very least, a 5GB string and a 5GB list, plus whatever memory is used in the process. The best way to get the last x characters of a string is negative indexing.
x = 30
last_30_characters = very_long_string[-x:]
Edit: Slicing a list does not generate a copy, so, at maximum, it should only use as much memory as is needed for the original string. Source.
I believe you could use negative indexing.
sample_string = 'hello there'
print(sample_string[-3:])
You can get the last 30 characters using string slicing e.g. name_of_string[-30:] to slice the last 30 characters. This won't create a new object for the rest of the string.
I assume you have your string stored in a file.
You don't have to load your entire string into memory even if there is no \n separating them. This link is helpful: https://docs.python.org/3/tutorial/inputoutput.html
Say, text.txt file contains 0123456789\n as its content.
with open('text.txt', 'rb') as f:
f.seek(-4, 2) # move the file cursor to the 4th last byte.
# read the rest string into memory, strip trailing newline, decode to normal string
text = f.read().strip().decode("utf-8")
print(text) # '789'
You need to adjust it to your application.
I want to have a string where I can format it with an integer so that it:
Adds a sign in front of the integer (+ for positive ints, - for negative ints)
Surround the signed int with parentheses (i.e. with ())
Left align the int with parentheses on the left, adding if necessary spaces to the end.
I know how to do these steps separately, but I haven't been able to combine them into a single string.
1 and 2 would be accomplished with for example '({:+d})'.format(3), this would result in (+3).
3 is done for an arbitrary string with '{:<5}'.format(3), this would result in 3 (4 trailing spaces).
My goal is to have a single string where I can call .format on only once, so
format_string.format(3)
would result in
(+3)
with one trailing space to make the string length 5.
Is this possible?
I've tried ({{:+d}:<5}) but this doesn't work as it thinks {:+d} is the field name to format with <5, which is obviously not the case.
I've also looked into f-strings, but these are not suitable for my use case as I call .format on the format string later than when it's created.
Any help would be most welcome!
Solution with one call for format:
def special_format_int(n, SPACES=5):
return '({:+d})'.format(n).ljust(SPACES)
first of all, I'm new to python, so maybe my code is a little weird or bordering to be wrong, but it works, so there is that.
I've been googleing for this problem, but can't find anyone who writes about, I got this huge list written like this
1 2 3 4 5
2 2 2 2 2
3 3 3 3 3
etc, note that it is spaces and not tab, and this I can't change, since I'm working with a print out from ls-dyna
So I am using this script to remove the whitespaces before the numbers, since they have been giving me troubles when trying to format the numbers into a matrix and then i remove the empty lines afterwards
for line in input:
print >> output, line.lstrip(' ')
but for some reason, I have 4442 lines (and here I mean writen lines, which is easy to track since they are enumerated) but the output only has 4411, so it removes 31 lines, with numbers I need
Why is this?
The lstrip() won't remove lines because it is used inside the print statement which will always append a newline character (the way you use it). But the for line in input might step through the list of lines in an unexpected way, i. e. it could skip lines or combine them in a manner you didn't expect.
Maybe newline and carriage return characters result in this strange problem.
I propose to let the .lstrip(' ') away for testing and compare the output with the input to find the places where something gets changed. Probably you should use output.write(line) to circumvent all the automatics of the print statement (especially appending newline characters).
Then you should use a special separator when outputting (output.write('###' + line) or similar) to find out how the iteration through the input takes place.
I have a massive string im trying to parse as series of tokens in string form, and i found a problem: because many of the strings are alike, sometimes doing string.replace()will cause previously replaced characters to be replaced again.
say i have the string being replaced is 'goto' and it gets replaced by '41' (hex) and gets converted into ASCII ('A'). later on, the string 'A' is also to be replaced, so that converted token gets replaced again, causing problems.
what would be the best way to get the strings to be replaced only once? breaking each token off the original string and searching for them one at a time takes very long
This is the code i have now. although it more or less works, its not very fast
# The largest token is 8 ASCII chars long
'out' is the string with the final outputs
while len(data) != 0:
length = 8
while reverse_search(data[:length]) == None:#sorry THC4k, i used your code
#at first, but it didnt work out
#for this and I was too lazy to
#change it
length -= 1
out += reverse_search(data[:length])
data = data[length:]
If you're trying to substitute strings at once, you can use a dictionary:
translation = {'PRINT': '32', 'GOTO': '41'}
code = ' '.join(translation[i] if i in translation else i for i in code.split(' '))
which is basically O(2|S|+(n*|dict|)). Very fast. Although memory usage could be quite substantial. Keeping track of substitutions would allow you to solve the problem in linear time, but only if you exclude the cost of looking up previous substitution. Altogether, the problem seems to be polynomial by nature.
Unless there is a function in python to translate strings via dictionaries that i don't know about, this one seems to be the simplest way of putting it.
it turns
10 PRINT HELLO
20 GOTO 10
into
10 32 HELLO
20 41 10
I hope this has something to do with your problem.
I've been searching on this but am coming up a little short on exactly how to do specifically what i am trying to do.. I want to concatentate a string (I guess it would be a string in this case as it has a variable and string) such as below, where I need to use a variable consisting of a string to call a listname that has an index (from another variable).. I simplified my code below to just show the relevant parts its part of a macro that is replacing values:
toreplacetype = 'type'
toreplace_indx = 5
replacement_string = 'list'+toreplacetype[toreplace_indx]
so... I am trying to make the string on the last line equal to the actual variable name:
replacement_string = listtype[5]
Any advice on how to do this is appreciated
EDIT:
To explain further, this is for a macro that is sort of a template system where I am indicating things in a python script that I want to replace with specific values so I am using regex to do this. So, when I match something, I want to be able to replace it from a specific value within a list, but, for example, in the template I have {{type}}, so I extract this, but then I need to manipulate it as above so that I can use the extracted value "type" to call a specific value from within a list (such as from a list called "listtype") (there is more than 1 list so I need to find the one called "listtype" so I just want to concatenate as above to get this, based on the value I extracted using regex
This is not recommended. Use a dict instead.
vars['list%s' % toreplacetype][5] = ...
Hrm...
globals()['list%s'% toreplacetype][toreplace_indx]
replacement_string = 'list'+toreplacetype+'['+str(toreplace_indx)+']'
will yield listtype[5] when you print it.
You need to basically break it into 5 parts: 1 string variable, 3 strings and an int casted to a string.
I think this is what you are asking?