JayBeams
0.1
Another project to have fun coding.
|
Implement a generic reducer for OpenCL. More...
#include <generic_reduce.hpp>
Public Types | |
Type traits | |
using | input_type = input_type_t |
The host type used to represent the input into the reduction. More... | |
using | output_type = output_type_t |
The host type representing the output of the reduction. More... | |
using | vector_iterator = typename boost::compute::vector< input_type >::iterator |
The type of the vector used to store the results. More... | |
Public Member Functions | |
generic_reduce (std::size_t size, boost::compute::command_queue const &queue) | |
Constructor. More... | |
template<typename InputIterator > | |
boost::compute::future< vector_iterator > | execute (InputIterator begin, InputIterator end, boost::compute::wait_list const &wait=boost::compute::wait_list()) |
Schedule the execution of a reduction. More... | |
boost::compute::future< vector_iterator > | execute (boost::compute::vector< input_type > const &src, boost::compute::wait_list const &wait=boost::compute::wait_list()) |
Schedule a reduction for a full vector. More... | |
Static Public Member Functions | |
static boost::compute::program | create_program (boost::compute::command_queue const &queue) |
Private Attributes | |
boost::compute::command_queue | queue_ |
boost::compute::program | program_ |
boost::compute::kernel | initial_ |
boost::compute::kernel | intermediate_ |
std::size_t | max_workgroup_size_ |
std::size_t | sizeof_output_type_ |
std::size_t | scratch_size_ |
std::size_t | effective_workgroup_size_ |
boost::compute::vector< output_type > | ping_ |
boost::compute::vector< output_type > | pong_ |
Implement a generic reducer for OpenCL.
Aggregating all the values in a vector to a single value, also known as reductions, is a common building block in parallel algorithms. All the reductions follow a common form, this template class implements a generic reduction given the aggregation function and its input / output types.
This implementation uses a parallel reduction, for a general motivation and description please see: http://developer.amd.com/resources/articles-whitepapers/opencl-optimization-case-study-simple-reductions/
TODO(coryan) this class is work in progress, it is not fully implemented
reducer | a class derived from generic_reduce<reducer,...>. Please see jb::opencl::reducer_concept for details. |
input_type_t | the host type that represents the input |
output_type_t | the host type that represents the output |
Definition at line 37 of file generic_reduce.hpp.
using jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::input_type = input_type_t |
The host type used to represent the input into the reduction.
Definition at line 44 of file generic_reduce.hpp.
using jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::output_type = output_type_t |
The host type representing the output of the reduction.
Definition at line 47 of file generic_reduce.hpp.
using jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::vector_iterator = typename boost::compute::vector<input_type>::iterator |
The type of the vector used to store the results.
The final output is a single element, but OpenCL makes it easier to treat that as a result of a vector with a single element.
Definition at line 55 of file generic_reduce.hpp.
|
inline |
Constructor.
Initialize a generic reduce for a given size and device queue.
size | the size of the input array |
queue | a command queue to communicate with a single OpenCL device |
Definition at line 65 of file generic_reduce.hpp.
References jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::effective_workgroup_size_, jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::max_workgroup_size_, jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::program_, jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::queue_, jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::scratch_size_, and jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::sizeof_output_type_.
|
inlinestatic |
Definition at line 238 of file generic_reduce.hpp.
References jb::error, jb::opencl::generic_reduce_program_source, and JB_LOG.
|
inline |
Schedule the execution of a reduction.
The algorithm works in phases, each phase runs in the OpenCL device, reducing the input to a (typically much smaller) vector, which is stored in either the ping_ or pong_ variable.
If necessary the algorithm schedules multiple repeated phases, asynchronously (but waiting for each other), until the output has been reduced to a vector with a single element.
begin | the beginning of the range to be reduced. |
end | the end of the range to be reduced. |
wait | a list of events to wait for before any work starts on the device. |
Definition at line 126 of file generic_reduce.hpp.
References jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::effective_workgroup_size_, jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::initial_, jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::intermediate_, JB_ASSERT_THROW, jb::p2ceil(), jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::ping_, jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::pong_, jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::queue_, and jb::testing::defaults::size.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::execute().
|
inline |
Schedule a reduction for a full vector.
See the other overload of this member function for details.
src | the vector to be reduced |
wait | a wait list that must be completed before the reduction starts |
Definition at line 231 of file generic_reduce.hpp.
References jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::execute().
|
private |
Definition at line 285 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::execute(), and jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::generic_reduce().
|
private |
Definition at line 280 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::execute().
|
private |
Definition at line 281 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::execute().
|
private |
Definition at line 282 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::generic_reduce().
|
private |
Definition at line 286 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::execute().
|
private |
Definition at line 287 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::execute().
|
private |
Definition at line 279 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::generic_reduce().
|
private |
Definition at line 278 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::execute(), and jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::generic_reduce().
|
private |
Definition at line 284 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::generic_reduce().
|
private |
Definition at line 283 of file generic_reduce.hpp.
Referenced by jb::opencl::generic_reduce< reducer, input_type_t, output_type_t >::generic_reduce().