I have two binary data files and I want to replace the contents of part of the second binary data file
This is the sample code I have so far
Binary_file1 = open("File1.yuv","rb")
Binary_file2 = open("File2.yuv","rb")
data1 = Binary_file1.read()
data2 = Binary_file2.read()
bytes = iter(data1)
for i in range(4, 10):
data2[i] = next(bytes)
It fails at the part where I equate the data2[i] with next(bytes) and gives me an error saying that “'str' object does not support item assignment”
The part I dont understand is that how is this a string object and how can I resolve this error ,Any help would be appreciated .
PLease note the Binary files here are huge and I would like to avoid creating duplicate files as I alwyas will run into Memory Issues
You opened file and read it. So, You have string in data2. Strings do not support item assignment.
Instead, You could do:
data2 = data[2][:i] + next(bytes) + data[2][i + 1:]
Strings cannot be changed inplace (i.e. they are immutable). Try this:
a = 'abcde'
a[2] = 'F'
You will get an error. But, this will work.
a = a.replace(a[2], 'F')
You might be better off building a new string, then slicing it into your data2.
newstring = ''
for i in range(4, 10):
newstring += next(bytes)
data2 = data2.replace(data2[4:10], newstring)
Of course, the problem here is that data2[4:10] may not be unique within data2, in which case you will have multiple replacements. So, the following may be even better:
data2 = data2[:4] + newstring + data[10:]
Related
I have a list of lists of sequences, and a corresponding list of lists of names.
testSequences = [
['aaaa', 'cccc'],
['tt', 'gg'],
['AAAAAAA', 'CCCCCC', 'TTTTTT', 'GGGGGG']]
testNames = [
['>xx_oneFish |xzx', '>xx_twoFish |zzx'],
['>xx_redFish |zxx', '>xx_blueFish |zxx'],
['>xx_oneFish |xzx', '>xx_twoFish |xzx', '>xz_redFish |xxx', '>zx_blueFish |xzz']]
I also have a list of all the identifying parts of the names:
taxonNames = ['oneFish', 'twoFish', 'redFish', 'blueFish']
I am trying to produce a new list, where each item in the list will correspond to one of the "identifying parts of the names", and the string will be made up of all the sequences for that name.
If a name and sequence does not appear in one of the lists in the lists (i.e. no redFish or blueFish in the first list of testNames) I want to add in a string of hyphens the same length as the sequences in that list. This would give me this output:
['aaaa--AAAAAA', 'cccc--CCCCCC', '----ttTTTTTT', '----ggGGGG']
I have this piece of code to do this.
complete = [''] * len(taxonNames)
for i in range(len(testSequences)):
for j in range(len(taxonNames)):
sequenceLength = len(testSequences[i][0])
for k in range(len(testSequences[i])):
if taxonNames[j] in testNames[i][k]:
complete[j].join(testSequences[i][k])
if taxonNames[j] not in testNames[i][k]:
hyphenString = "-" * sequenceLength
complete[j].join(hyphenString)
print complete
"complete" should give my final output as explained above, but it comes out looking like this:
['', '', '', '']
How can I fix my code to give me the correct answer?
The main issue with your code, which makes it very hard to understand, is you're not really leveraging the language elements that make Python so strong.
Here's a solution to your problem that works:
test_sequences = [
['aaaa', 'cccc'],
['tt', 'gg'],
['AAAAAAA', 'CCCCCC', 'TTTTTT', 'GGGGGG']]
test_names = [
['>xx_oneFish |xzx', '>xx_twoFish |zzx'],
['>xx_redFish |zxx', '>xx_blueFish |zxx'],
['>xx_oneFish |xzx', '>xx_twoFish |xzx', '>xz_redFish |xxx', '>zx_blueFish |xzz']]
taxon_names = ['oneFish', 'twoFish', 'redFish', 'blueFish']
def get_seqs(taxon_name, sequences_list, names_list):
for seqs, names in zip(sequences_list, names_list):
found_seq = None
for seq, name in zip(seqs, names):
if taxon_name in name:
found_seq = seq
break
yield found_seq if found_seq else '-' * len(seqs[0])
result = [''.join(get_seqs(taxon_name, test_sequences, test_names))
for taxon_name in taxon_names]
print(result)
The generator get_seqs pairs up lists from test_sequences and test_names and for each pair, tries to find the sequence (seq) for the name (name) that matches and yields it, or yields a string of the right number of hyphens for that list of sequences.
The generator (a function that yields multiple values) has code that quite literally follows the explanation above.
The result is then simply a matter of, for each taxon_name, getting all the resulting sequences that match in order and joining them together into a string, which is the result = ... line.
You could make it work with list indexing loops and string concatenation, but this is not a PHP question, now is it? :)
Note: for brevity, you could just access the global test_sequences and test_names instead of passing them in as parameters, but I think that would come back to haunt you if you were to actually use this code. Also, I think it makes semantic sense to change the order of names and sequences in the entire example, but I didn't to avoid further deviating from your example.
Here is a solution that may do what you want. It begins, not with your data structures from this post, but with the three example files from your previous post (which you used to build this post's data structures).
The only thing I couldn't figure out was how many hyphens to use for a missing sequence from a file.
differentNames = ['oneFish', 'twoFish', 'redFish', 'blueFish']
files = ['f1.txt', 'f2.txt', 'f3.txt']
data = [[] for _ in range(len(differentNames))]
final = []
for file in files:
d = dict()
with open(file, 'r') as fin:
for line in fin:
line = line.rstrip()
if line.startswith('>'): # for ex., >xx_oneFish |xxx
underscore = line.index('_')
space = line.index(' ')
key = line[underscore+1:space]
else:
d[key] = line
for i, key in enumerate(differentNames):
data[i].append(d.get(key, '-' * 4))
for array in data:
final.append(''.join(array))
print(final)
Prints:
['AAAAAAAaaaa----', 'CCCCCCcccc----', 'TTTTTT----tt', 'GGGGGG----gg']
I am trying to match elements from two lists and write it to a file, match columns from both the files col[0] and print certain columns in to a new file
with open('~/gf_out.txt', 'w') as w:
w.write('\t'.join(headers) + '\n')
for i in d1: #list1
for j in d2: # list2
if i[0] == j[0]:
out = ((j[0:10]),i[1],i[2],j[11],j[12])
# print out
w.write('\t'.join(out) + '\n')
TypeError: sequence item 0: expected string, list found
if out changed to
out = (str(j[0:10]),i[1],i[2],j[11],j[12])
the final output would have [ ] around the first 10 columns, how can this be fixed
ANALYSIS
Your problem is right where the error messge (certainly) told and, and just what it described ... once you're comfortable enough with Python to interpret the description.
out = ((j[0:10]),i[1],i[2],j[11],j[12])
w.write('\t'.join(out) + '\n')
join operates on a sequence of strings. You gave it a sequence, but the first element of that is the tuple (j[0:10]).
REMEDY
You have nested lists, so you need nested joins.
sep = '\t' # separator
out_0 = sep.join(j[0:10])
out_line = sep.join(out_0,,i[1],i[2],j[11],j[12])
w.write(out_line)
Yes, you can recombine this to a single-line write; I broke it down to make the logic clear.
If this doesn't match your needs, then please provide the required MCVE to clarify the problems.
What exactly are you wanting it to do? j[0:10] is a list, so if you want to convert it to a string, it will have square brackets. if you want those lements to be joind by tabs as well, you need to either do that explicitly or join it to the other list instead of embedding it.
out = ('\t'.join(j[0:10]),i[1],i[2],j[11],j[12])
or
out = j[0:10] + [i[1],i[2],j[11],j[12]]
I have a long string variable full of hex values:
hexValues = 'AA08E3020202AA08E302AA1AA08E3020101' etc..
The first 2 bytes (AA08) are a signature for the start of a frame and the rest of the data up to the next AA08 are the contents of the signature.
I want to slice the string into a list based on the reoccurring start of frame sign, e.g:
list = [AA08, E3020202, AA08, F25S1212, AA08, 42ABC82] etc...
I'm not sure how I can split the string up like this. Some of the frames are also corrupted, where the start of the frame won'y have AA08, but maybe AA01.. so I'd need some kind of regex to spot these.
if I do list = hexValues.split('AA08)', the list just removes all the starts of the frame...
So I'm a bit stuck.
Newbie to python.
Thanks
For the case when you don't have "corrupted" data the following should do:
hex_values = 'AA08E3020202AA08E302AA1AA08E3020101'
delimiter = hex_values[:4]
hex_values = hex_values.replace(delimiter, ',' + delimiter + ',')
hex_list = hex_values.split(',')[1:]
print(hex_list)
['AA08', 'E3020202', 'AA08', 'E302AA1', 'AA08', 'E3020101']
Without considering corruptions, you may try this.
l = []
for s in hexValues.split('AA08'):
if s:
l += ['AA08', s]
I have a file i am trying to replace parts of a line with another word.
it looks like bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212
i need to delete everything but bob123#bobscarshop.com, but i need to match 23rh32o3hro2rh2 with 23rh32o3hro2rh2:poniacvibe , from a different text file and place poniacvibe infront of bob123#bobscarshop.com
so it would look like this bob123#bobscarshop.com:poniacvibe
I've had a hard time trying to go about doing this, but i think i would have to split the bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212 with data.split(":") , but some of the lines have a (:) in a spot that i don't want the line to be split at, if that makes any sense...
if anyone could help i would really appreciate it.
ok, it looks to me like you are using a colon : to separate your strings.
in this case you can use .split(":") to break your strings into their component substrings
eg:
firststring = "bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212"
print(firststring.split(":"))
would give:
['bobkeiser', 'bob123#bobscarshop.com', '0.0.0.0.0', '23rh32o3hro2rh2', '234212']
and assuming your substrings will always be in the same order, and the same number of substrings in the main string you could then do:
firststring = "bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212"
firstdata = firststring.split(":")
secondstring = "23rh32o3hro2rh2:poniacvibe"
seconddata = secondstring.split(":")
if firstdata[3] == seconddata[0]:
outputdata = firstdata
outputdata.insert(1,seconddata[1])
outputstring = ""
for item in outputdata:
if outputstring == "":
outputstring = item
else
outputstring = outputstring + ":" + item
what this does is:
extract the bits of the strings into lists
see if the "23rh32o3hro2rh2" string can be found in the second list
find the corresponding part of the second list
create a list to contain the output data and put the first list into it
insert the "poniacvibe" string before "bob123#bobscarshop.com"
stitch the outputdata list back into a string using the colon as the separator
the reason your strings need to be the same length is because the index is being used to find the relevant strings rather than trying to use some form of string type matching (which gets much more complex)
if you can keep your data in this form it gets much simpler.
to protect against malformed data (lists too short) you can explicitly test for them before you start using len(list) to see how many elements are in it.
or you could let it run and catch the exception, however in this case you could end up with unintended results, as it may try to match the wrong elements from the list.
hope this helps
James
EDIT:
ok so if you are trying to match up a long list of strings from files you would probably want something along the lines of:
firstfile = open("firstfile.txt", mode = "r")
secondfile= open("secondfile.txt",mode = "r")
first_raw_data = firstfile.readlines()
firstfile.close()
second_raw_data = secondfile.readlines()
secondfile.close()
first_data = []
for item in first_raw_data:
first_data.append(item.replace("\n","").split(":"))
second_data = []
for item in second_raw_data:
second_data.append(item.replace("\n","").split(":"))
output_strings = []
for item in first_data:
searchstring = item[3]
for entry in second_data:
if searchstring == entry[0]:
output_data = item
output_string = ""
output_data.insert(1,entry[1])
for data in output_data:
if output_string == "":
output_string = data
else:
output_string = output_string + ":" + data
output_strings.append(output_string)
break
for entry in output_strings:
print(entry)
this should achieve what you're after and as prove of concept will print the resulting list of stings for you.
if you have any questions feel free to ask.
James
Second edit:
to make this output the results into a file change the last two lines to:
outputfile = open("outputfile.txt", mode = "w")
for entry in output_strings:
outputfile.write(entry+"\n")
outputfile.close()
This is my code:
import re
with open("C:\\Corpora\\record-13.txt") as f:
concepts = f.readlines()
j = 0
for line in concepts:
PATTERN = re.compile(r'''((?:[^ "]|"[^"]*")+)''')
TokCurrLineCon = PATTERN.split(line)[1::2]
temp = TokCurrLineCon[1].split(':')
StartLineNum[j] = temp[0]
StartOffset[j] = temp[1]
temp = TokCurrLineCon[2].split('||')
EndOfCon[j] = temp[0]
TypeOfCon[j] = temp[1]
temp = EndOfCon[j].split(':')
EndLineNum[j] = temp[0]
EndOffset[j] = temp[1]
temp = TypeOfCon[j].split('"')
TypeOfCon[j] = temp[1]
j +=1
I need 5 lists as the end (StartLineNum, StartOffset, EndLineNum, EndOffset, TypeOfCon), but when I run it I face the error StartLineNum[j] = temp[0]
TypeError: 'str' object does not support item assignment
Any idea how to fix it?
The error message is telling you that StartLineNum is a str, so StartLineNum[j] = <anything> is illegal.
From your description, it sounds like you expected StartLineNum to be a list. So presumably the problem is that you constructed a string instead of a list somewhere in the code above. Since we can't see that code, we can't fix it, beyond saying that you should create a list if you want a list.
However, I suspect there's another problem in your code. For this to work, StartLineNum would have to be not just a list, but a list that's already got as many members as the file has lines. But you can't know how many that is until you've read the whole file in. A better solution would be to use the append method on lists. (Then you don't need the j variable, either.) For example:
StartLineNum = []
for line in concepts:
# blah blah
StartLineNum.append(temp[0])
# etc.