I am trying to memset a substring in python file using ctypes.memset
Here is the code I am using:
import ctypes
def memz(string):
buffsize = len(string)+1
size = sys.getsizeof(string)
offset = size - buffsize
start = string.find('{') -----------> 142
end = string.find('}') + 1 --------> 167
location = id(string)
ctypes.memset(location + offset + start, 0, end-start)
I see that this does not memset the substring, but writes some other part of memory. I suspect I am not passing the correct memory location to ctypes.memset.
Do I need to change the format of the location(location + offset + start) that I am passing to ctypes.memset?
PS: The solution is derived from Mark data as sensitive in python. I tried that solution but using the memset from ctypes.CDLL('libc.so.6').memset results in a seg fault. I am using Python2.7.11.
Your solution works too. I tried running it locally and it cleared the required substring.
str = 'This is a {sample} string'
print(ctypes.string_at(id(str), sys.getsizeof(str)))
buffSize = len(str)+1
size = sys.getsizeof(str)
offset = size - buffSize
start = str.find('{')
end = str.find('}') + 1
location = id(str)
ctypes.memset(location + offset + start, 0, end-start)
print(ctypes.string_at(id(str), sys.getsizeof(str)))
It produces following output
���A,�5~��dThis is a {sample} string
���A,�5~��dThis is a string
Related
I'm trying to go through each feature in one file (1 per line) and find all matching features based on one column of that line in a second file. I have this solution, which does what I want on small files, but it's very slow on big files (my files have >20,000,000 lines). Here's a sample of the two input files.
My (slow) code:
FEATUREFILE = 'S2_STARRseq_rep1_vsControl_peaks.bed'
CONSERVATIONFILEDIR = './conservation/'
with open(str(FEATUREFILE),'r') as peakFile, open('featureConservation.td',"w+") as outfile:
for line in peakFile.readlines():
chrom = line.split('\t')[0]
startPos = int(line.split('\t')[1])
endPos = int(line.split('\t')[2])
peakName = line.split('\t')[3]
enrichVal = float(line.split('\t')[4])
#Reject negative peak starts, if they exist (sometimes this can happen w/ MACS)
if startPos > 0:
with open(str(CONSERVATIONFILEDIR) + str(chrom)+'.bed','r') as conservationFile:
cumulConserv = 0.
n = 0
for conservLine in conservationFile.readlines():
position = int(conservLine.split('\t')[1])
conservScore = float(conservLine.split('\t')[3])
if position >= startPos and position <= endPos:
cumulConserv += conservScore
n+=1
featureConservation = cumulConserv/(n)
outfile.write(str(chrom) + '\t' + str(startPos) + '\t' + str(endPos) + '\t' + str(peakName) + '\t' + str(enrichVal) + '\t' + str(featureConservation) + '\n')
The best solution for my purposes seems to be rewriting the above code for pandas. Here's what's working well for me on some very large files:
from __future__ import division
import pandas as pd
FEATUREFILE = 'S2_STARRseq_rep1_vsControl_peaks.bed'
CONSERVATIONFILEDIR = './conservation/'
peakDF = pd.read_csv(str(FEATUREFILE), sep = '\t', header=None, names=['chrom','start','end','name','enrichmentVal'])
#Reject negative peak starts, if they exist (sometimes this can happen w/ MACS)
peakDF.drop(peakDF[peakDF.start <= 0].index, inplace=True)
peakDF.reset_index(inplace=True)
peakDF.drop('index', axis=1, inplace=True)
peakDF['conservation'] = 1.0 #placeholder
chromNames = peakDF.chrom.unique()
for chromosome in chromNames:
chromSubset = peakDF[peakDF.chrom == str(chromosome)]
chromDF = pd.read_csv(str(CONSERVATIONFILEDIR) + str(chromosome)+'.bed', sep='\t', header=None, names=['chrom','start','end','conserveScore'])
for i in xrange(0,len(chromSubset.index)):
x = chromDF[chromDF.start >= chromSubset['start'][chromSubset.index[i]]]
featureSubset = x[x.start < chromSubset['end'][chromSubset.index[i]]]
x=None
featureConservation = float(sum(featureSubset.conserveScore)/(chromSubset['end'][chromSubset.index[i]]-chromSubset['start'][chromSubset.index[i]]))
peakDF.set_value(chromSubset.index[i],'conservation',featureConservation)
featureSubset=None
peakDF.to_csv("featureConservation.td", sep = '\t')
To start with you are looping over ALL of conservationFile every time you read a single line from peakFile so stick a break after n+=1 in the if statement and that should help somewhat. Assuming that there is only one match that is.
Another option is to try using mmap which may help with buffering
Bedtools was made for this, specifically the intersect function:
http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
I am trying to print out the xyz coordinates over time of a series of animating locators (named tracker1, tracker2, etc). I need to reconvert the locator's xyz data into a text file so I can then bring it into an alternate tracking program. I know that what I need to do on a base level is run a mel or python script to print out the xyz data in a complete list within the script editor, but am having trouble with syntax. The text file itself I can take care of, and I do not need a compiling script for all the locators at once either, though that would be great. Any idea how to do this?
Revised:
Ok so here is what we have right now.
We are using this script, and successfully generating the xyz values for a single frame.
Example: item name "tracker1",
frame: frame "1"
Script:
for ($item in `ls -sl`){
$temp=`xform -q -t -ws $item `;
print ($temp[0]+" "+$temp[1]+" "+$temp[2]+"\n");};
0.1513777615 22.7019734 176.3084331
Thing is, we need this xyz information for every frame in the sequence (frames 1-68).
Thanks in advance
Try this.
I wrote in Python, this can record all selected objects' translate attr at every frame,
and write in to a .txt file.
start frame and end frame was defined by time slider's playback range.
# .txt file path where you want to save, for example C:/trackInfo.txt
outPath = 'C:/trackInfo.txt'
# start time of playback
start = cmds.playbackOptions(q= 1, min= 1)
# end time of playback
end = cmds.playbackOptions(q= 1, max= 1)
#locators list
locList = cmds.ls(sl= 1)
if len(locList) > 0:
try:
# create/open file path to write
outFile = open(outPath, 'w')
except:
cmds.error('file path do not exist !')
# info to write in
infoStr = ''
# start recoard
for frame in range(int(start), int(end + 1)):
# move frame
cmds.currentTime(frame, e= 1)
# if you need to add a line to write in frame number
infoStr += str(frame) + '\n'
# get all locators
for loc in locList:
# if you need to add a line to write in locator name
infoStr += loc + '\n'
# get position
pos = cmds.xform(loc, q= 1, t= 1, ws= 1)
# write in locator pos
infoStr += str(pos[0]) + ' ' + str(pos[1]) + ' ' + str(pos[2]) + ' ' + '\n'
# file write in and close
outFile.write(infoStr)
outFile.close()
else:
cmds.warning('select at least one locator')
For moving time frame
1. Use currentTime cmd
Mel:
currentTime -e $frame
Python:
cmds.currentTime(frame, e= 1)
2. With a for loop and set start, end frame number
Mel:
// in your case
int $start = 1;
int $end = 68;
for( $frame = $start; $frame < $end + 1; $frame++ ){
currentTime -e $frame;
// do something...
}
Python:
# in your case
start = 1
end = 68
for frame in range(start, end + 1):
cmds.currentTime(frame, e= 1)
# do something...
I have a string stored in a variable. Is there a way to read a string up to a certain size e.g. File objects have f.read(size) which can read up to a certain size?
Check out this post for finding object sizes in python.
If you are wanting to read the string from the start until a certain size MAX is reached, then return that new (possibly shorter string) you might want to try something like this:
import sys
MAX = 176 #bytes
totalSize = 0
newString = ""
s = "MyStringLength"
for c in s:
totalSize = totalSize + sys.getsizeof(c)
if totalSize <= MAX:
newString = newString + str(c)
elif totalSize > MAX:
#string that is slightly larger or the same size as MAX
print newString
break
This prints 'MyString' which is less than (or equal to) 176 Bytes.
Hope this helps.
message = 'a long string which contains a lot of valuable information.'
bite = 10
while message:
# bite off a chunk of the string
chunk = message[:bite]
# set message to be the remaining portion
message = message[bite:]
do_something_with(chunk)
I have a folder filled with 20 years precipitation pcraster mapstack in days, I've managed to extract from the original netcdf file precipitation value for my interest area and rename it into this to avoid confusion
precip.19810101
precip.19810102
precip.19810103
precip.19810104
precip.19810105
...
precip.20111231
but after that, I want to rename all of my files into pcraster mapstack based on this sequence of dates
precip00.001
precip00.002
precip00.003
precip00.004
...
I'm a beginner in python, is there any help or example for me to figure it out how to do this?
Thank you
Here's something I put together, based on some old Python scripts I once wrote:
#! /usr/bin/env python
# Rename PCRaster map stack with names following prefix.yyymmmdd to stack with valid
# PCRaster time step numbers
# Johan van der Knijff
#
# Example input stack:
#
# precip.19810101
# precip.19810102
# precip.19810103
# precip.19810104
# precip.19810105
#
# Then run script with following arguments:
#
# python renpcrstack.py precip 1
#
# Result:
#
# precip00.001
# precip00.002
# precip00.003
# precip00.004
# precip00.005
#
import sys
import os
import argparse
import math
import datetime
import glob
# Create argument parser
parser = argparse.ArgumentParser(
description="Rename map stack")
def parseCommandLine():
# Add arguments
parser.add_argument('prefix',
action="store",
type=str,
help="prefix of input map stack (also used as output prefix)")
parser.add_argument('stepStartOut',
action="store",
type=int,
help="time step number that is assigned to first map in output stack")
# Parse arguments
args = parser.parse_args()
return(args)
def dateToJulianDay(date):
# Calculate Julian Day from date
# Source: https://en.wikipedia.org/wiki/Julian_day#Converting_Julian_or_Gregorian_calendar_date_to_Julian_day_number
a = (14 - date.month)/12
y = date.year + 4800 - a
m = date.month +12*a - 3
JulianDay = date.day + math.floor((153*m + 2)/5) + 365*y + math.floor(y/4) \
- math.floor(y/100) + math.floor(y/400) - 32045
return(JulianDay)
def genStackNames(prefix,start,end, stepSize):
# Generate list with names of all maps
# map name is made up of 11 characters, and chars 8 and 9 are
# separated by a dot. Name starts with prefix, ends with time step
# number and all character positions in between are filled with zeroes
# define list that will contain map names
listMaps = []
# Count no chars prefix
charsPrefix = len(prefix)
# Maximum no chars needed for suffix (end step)
maxCharsSuffix = len(str(end))
# No of free positions between pre- and suffix
noFreePositions = 11 - charsPrefix - maxCharsSuffix
# Trim prefix if not enough character positions are available
if noFreePositions < 0:
# No of chars to cut from prefix if 11-char limit is exceeded
charsToCut = charsPrefix + maxCharsSuffix - 11
charsToKeep = charsPrefix - charsToCut
# Updated prefix
prefix = prefix[0:charsToKeep]
# Updated prefix length
charsPrefix = len(prefix)
# Generate name for each step
for i in range(start,end + 1,stepSize):
# No of chars in suffix for this step
charsSuffix = len(str(i))
# No of zeroes to fill
noZeroes = 11 - charsPrefix - charsSuffix
# Total no of chars right of prefix
charsAfterPrefix = noZeroes + charsSuffix
# Name of map
thisName = prefix + (str(i)).zfill(charsAfterPrefix)
thisFile = thisName[0:8]+"." + thisName[8:11]
listMaps.append(thisFile)
return listMaps
def main():
# Parse command line arguments
args = parseCommandLine()
prefix = args.prefix
stepStartOut = args.stepStartOut
# Glob pattern for input maps: prefix + dot + 8 char extension
pattern = prefix + ".????????"
# Get list of all input maps based on glob pattern
mapsIn = glob.glob(pattern)
# Set time format
tfmt = "%Y%m%d"
# Set up dictionary that will act as lookup table between Julian Days (key)
# and Date string
jDayDate = {}
for map in mapsIn:
baseNameIn = os.path.splitext(map)[0]
dateIn = os.path.splitext(map)[1].strip(".")
# Convert to date / time format
dt = datetime.datetime.strptime(dateIn, tfmt)
# Convert date to Julian day number
jDay = int(dateToJulianDay(dt))
# Store as key-value pair in dictionary
jDayDate[jDay] = dateIn
# Number of input maps (equals number of key-value pairs)
noMaps = len(jDayDate)
# Create list of names for output files
mapNamesOut = genStackNames(prefix, stepStartOut, noMaps + stepStartOut -1, 1)
# Iterate over Julian Days (ascending order)
i = 0
for key in sorted(jDayDate):
# Name of input file
fileIn = prefix + "."+ jDayDate[key]
# Name of output file
fileOut = mapNamesOut[i]
# Rename file
os.rename(fileIn, fileOut)
print("Renamed " + fileIn + " ---> " + fileOut)
i += 1
main()
(Alternatively download the code from my Github Gist.)
You can run it from the command line, using the prefix of your map stack and the number of the first output map as arguments, e.g.:
python renpcrmaps.py precip 1
Please note that the script renames the files in place, so make sure to make a copy of your original map stack in case something goes wrong (I only did some very limited testing on this!).
Also, the script assumes a non-sparse input map stack, i.e. in case of daily maps, an input map exists for each day. In case of missing days, the numbering of the output maps will not be what you'd expect.
The internal conversion of all dates to Julian Days may be a bit overkill here, but once you start doing more advanced transformations it does make things easier because it gives you decimal numbers which are more straightforward to manipulate than date strings.
as you gave the [batch-file] tag, I assume, Batch is ok:
#echo off
setlocal enabledelayedexpansion
set /a counti=0
for /f "delims=" %%a in ('dir /b /on precip.*') do (
set /a counti+=1
set "counts=000000000!counti!"
ECHO ren "%%a" "precip!counts:~-6,3!.!counts:~-3!"
)
remove the ECHO after successfully checking the Output
EDITED to match your precip00.999 is precip01.000 ... until precip07.300 requirement (in your question it's precip000.001 in your comment it's precip00.001 - I decided to use the first Format, can easily be changed to ECHO ren "%%a" "precip!counts:~-5,2!.!counts:~-3!" for the second Format.). Although it's not Batch anymore, I'll leave the answer, maybe you can at least use the logic.
If you are not firm with Batch, the %variable:~-6,3% Syntax is explained with set /?
I've faced this issue a short while ago. Please note I am new both to python and PCRaster so do not take me example without check.
import os
import shutil
import fnmatch
import subprocess
from os import listdir
from os.path import isfile, join
from shutil import copyfile
TipeofFile = 'precip.????????' # original file
Files = []
for iListFile in sorted(os.listdir('.')):
if fnmatch.fnmatch(iListFile, TipeofFile):
Files.append(iListFile)
digiafter = 3 #after the point: .001, .002, 0.003
digitTotal = 8 #total: precipi00000.000 (5.3)
for j in xrange(0, len(Files)):
num = str(j + 1)
nameFile = Files[j]
putZeros = digitTotal - len(num)
for x in xrange(0,putZeros):
num = "0" + num
precip = num[0:digitTotal-digiafter]+ '.' +num[digitTotal-digiafter:digitTotal]
precip = str(precip)
precip = 'precip' + precip
copyfile(nameFile, precip)
I am running the following code on ubuntu 11.10, python 2.7.2+.
import urllib
import Image
import StringIO
source = '/home/cah/Downloads/evil2.gfx'
dataFile = open(source, 'rb').read()
slicedFile1 = StringIO.StringIO(dataFile[::5])
slicedFile2 = StringIO.StringIO(dataFile[1::5])
slicedFile3 = StringIO.StringIO(dataFile[2::5])
slicedFile4 = StringIO.StringIO(dataFile[3::5])
jpgimage1 = Image.open(slicedFile1)
jpgimage1.save('/home/cah/Documents/pychallenge12.1.jpg')
pngimage1 = Image.open(slicedFile2)
pngimage1.save('/home/cah/Documents/pychallenge12.2.png')
gifimage1 = Image.open(slicedFile3)
gifimage1.save('/home/cah/Documents/pychallenge12.3.gif')
pngimage2 = Image.open(slicedFile4)
pngimage2.save('/home/cah/Documents/pychallenge12.4.png')
in essence i'm taking a .bin file that has hex code for several image files jumbled
like 123451234512345... and clumping together then saving. The problem is i'm getting the following error:
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 96, in read
len = i32(s)
File "/usr/lib/python2.7/dist-packages/PIL/PngImagePlugin.py", line 44, in i32
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24)
IndexError: string index out of range
i found the PngImagePlugin.py and I looked at what it had:
def i32(c):
return ord(c[3]) + (ord(c[2])<<8) + (ord(c[1])<<16) + (ord(c[0])<<24) (line 44)
"Fetch a new chunk. Returns header information."
if self.queue:
cid, pos, len = self.queue[-1]
del self.queue[-1]
self.fp.seek(pos)
else:
s = self.fp.read(8)
cid = s[4:]
pos = self.fp.tell()
len = i32(s) (lines 88-96)
i would try tinkering, but I'm afraid I'll screw up png and PIL, which have been erksome to get working.
thanks
It would appear that len(s) < 4 at this stage
len = i32(s)
Which means that
s = self.fp.read(8)
isn't reading the whole 4 bytes
probably the data in the fp you are passing isn't making sense to the image decoder.
Double check that you are slicing correctly
Make sure that the string you are passing in is of at least length 4.