java - How to optimize the computing speed of this for loop? -
i kind of stuck difficult problem (for me, @ least). noticed when profiling code (single core) computing time eaten single nested loop below (double integral on image). think best way accelerate computing?
i tried map nested streams, not understand how map multiple if
blocks... trying on gpu using opencl better suited problem?
ip
imagej imageprocessor
, , method .getpixelvalue(x,y)
quite ressource consuming. since belongs established lib, avoid modifying if can.
variables declarations:
private imageprocessor ip = null; //this type comes imagej private double area; private double a11, a22; private double u1, u2; private double v1, v2; private double y1, y2; private static final double half_sqrt2 = sqrt(2.0) / 2.0; private static final double sqrt_tiny = sqrt((double)float.intbitstofloat((int)0x33ffffff));
function:
private double contrast ( ) { if (area < 1.0) { return(1.0 / sqrt_tiny); } double c = 0.0; final int xmin = max((int)floor(u1), 0); final int xmax = min((int)ceil(v1), width - 1); final int ymin = max((int)floor(u2), 0); final int ymax = min((int)ceil(v2), height - 1); if ((u1 < xmin) || (xmax < v1) || (u2 < ymin) || (ymax < v2)){ return(1.0 / sqrt_tiny); } if ((xmax <= xmin) || (ymax <= ymin)) { return(1.0 / sqrt_tiny); } (int y = ymin; (y <= ymax); y++) { final double dy = y2 - (double)y; final double dy2 = dy * dy; (int x = xmin; (x <= xmax); x++) { final double dx = y1 - (double)x; final double dx2 = dx * dx; final double d = sqrt(dx2 + dy2); double z = a11 * dx2 + a12 * dx * dy + a22 * dy2; if (z < sqrt_tiny) { c -= ip.getpixelvalue(x, y); continue; } z = a3 / sqrt(z); double d0 = (1.0 - z / sqrt2) * d; if (d0 < -half_sqrt2) { c -= ip.getpixelvalue(x, y); continue; } if (d0 < half_sqrt2) { c += sqrt2 * d0 * ip.getpixelvalue(x, y); continue; } d0 = (1.0 - z) * d; if (d0 < -1.0) { c += ip.getpixelvalue(x, y); continue; } if (d0 < 1.0) { c += (1.0 - d0) * ip.getpixelvalue(x, y) / 2.0; continue; } } } return(c / area);
you try divide , conquer approach.
divide image n parts, processed in parallel. have handle edge cases occur on borders (where 2 parts meet).
or start looking numeric algorithms, compute (discrete) integrals , designed parallelism.
update
since method called contrast
, assume you're changing contrast of image.
operations on images can performed via convolution (which discrete double integral, if performed on 2d image) specific kernel (image filter). these operations can computed on gpu , yield speed-ups orders of magnitude. can use opencl write programs execute on several gpus.
Comments
Post a Comment