Parallel Image Processing in OpenMP - Splitting Image -
i have function defined intel ipp operate on image / region of image.
input image pointer image, parameters define size process , parameters of filter.
ipp function single threaded.
now, have image of size m x n.
want apply filter on in parallel.
main idea simple, break image 4 sub images independent of each other.
apply filter each sub image , write result sub block of empty image each thread write distinct set of pixels.
it's processing 4 images each on own core.
this program i'm doing with:
void openmptest() { const int width = 1920; const int height = 1080; ipp32f input_image[width * height]; ipp32f output_image[width * height]; ippisize size = { width, height }; int step = width * sizeof(ipp32f); /* splitting image */ ippisize section_size = { width / 2, height / 2}; ipp32f* input_upper_left = input_image; ipp32f* input_upper_right = input_image + width / 2; ipp32f* input_lower_left = input_image + (height / 2) * width; ipp32f* input_lower_right = input_image + (height / 2) * width + width / 2; ipp32f* output_upper_left = input_image; ipp32f* output_upper_right = input_image + width / 2; ipp32f* output_lower_left = input_image + (height / 2) * width; ipp32f* output_lower_right = input_image + (height / 2) * width + width / 2; ipp32f* input_sections[4] = { input_upper_left, input_upper_right, input_lower_left, input_lower_right }; ipp32f* output_sections[4] = { output_upper_left, output_upper_right, output_lower_left, output_lower_right }; /* filter params */ ipp32f pkernel[7] = { 1, 2, 3, 4, 3, 2, 1 }; omp_set_num_threads(4); #pragma omp parallel (int = 0; < 4; i++) ippifilterrow_32f_c1r( input_sections[i], step, output_sections[i], step, section_size, pkernel, 7, 3); }
now, issues see no gain versus working single threaded mode on image.
tried change image size or filter size , nothing change picture.
gain nothing significant (10-20%).
i thought might have can't "promise" each thread zone received "read only".
let know memory location writes belongs himself.
read defining variables private , share, yet couldn't find guide deal arrays , pointers.
what proper way deal pointers , sub arrays in openmp?
how performance of threaded ipp compare? assuming no race conditions, performance problems writing shared arrays occur in cache lines part of line written 1 thread , part read another. it's require data region larger 10 megabytes or before full parallel speedup seen.
need deeper analysis, e.g. intel vtune amplifier, see whether memory bandwidth or data overlaps limiting performance.
Comments
Post a Comment