Focus on the issue of multi-threading tech.
As prevailing of multi-core processors, the methodology of programming has changed recently. Severe contentions are on there, though, multi-threading paradigm is a well-known solution for parallel processing. I want to see this in the aspect of the performance and its implementation.

The basic concept of multi-threading is this: divide and conquer. But you should not consider it of a speed-up engine which improves performance in proportional to the number of processors. There's a way to determine the limit on the performance gaining from adding cores: Amdahl's law. You can calculate total speed-up rate by using the formula below.
Speed-up-rate = 1 / (S + (1-S)/n)
Where S is the time spent executing the portion assigned to a thread or a processor and n is the number of processors. It shows that the speed-up rate would be decreased as the number of cores increases. Maximumly, if n goes to infinite, speed-up rate would converge to 1/S. For example, a task is divided into 10 parts and n is 2(dual-core), the speed-up rate for the task can be derived here: speedup = 1 / (0.1 + (1 - 0.1) / 2) = 1 / 0.55 ~= 1.8. (For quad-core, performance would increase more than 3 times according to the formula)
The actual importance is on its implementation. Even though the number of processor cores is numerous enough, the implementation may get down its performance. Parallelizing programming supports the hardware specification strongly and it's ineluctable to increase performance. I've implemented a multi-thread processing library recently. And I want to reveal it in the part of running and waiting thread model which is implemented in C/C++ and Windows API.
First, I wrote the task class. It has 4 major member variables: a pointer for thread handler, that of parameter structure, thread handle, and its state. And the next, task manager class. It supports some methods to add/remove task objects, run, and wait. Now I open the code for Run and Wait functions.
// Run: run all of the waiting tasks
// _max_thread is determined by the number of processors
void wc_task_manager::Run() {
size_t thread_count= _active_tasks.size();
// if thread pull is not full and any waiting task exists, assign a waiting task to the idle thread
while(thread_count < _max_thread && _wait_tasks.size() > 0) {
wc_task* task = (wc_task*)(*_wait_tasks.begin());
task->_complete = running;_wait_tasks.erase(_wait_tasks.begin());
_active_tasks.push_back(task);
thread_count++;
// CreateThread function is available only for windows.
task->_hThread = CreateThread(NULL, task->_stackSize, (LPTHREAD_START_ROUTINE)task->_handler, task->_param, 0, &task->_hID);
}
}
// Wait: wait for threads
// it uses 3 container member variables: wait_tasks list, active_tasks list, and complete_tasks map
void wc_task_manager::Wait() {
// do while active task exists
while(_active_tasks.size() > 0) {
// count the waiting objects
size_t numberOfCurrentHandle = _active_tasks.size();
if(numberOfCurrentHandle > MAXIMUM_WAIT_OBJECTS)
numberOfCurrentHandle = MAXIMUM_WAIT_OBJECTS;// fill in the array of HANDLE
HANDLE hList[MAXIMUM_WAIT_OBJECTS];
int index = 0;
for(list<wc_task*>::iterator pt = _active_tasks.begin(); pt != _active_tasks.end(); advance(pt, 1)) {
wc_task* task = *pt;
hList[index] = (HANDLE)(task->_hThread);
index++;
if(index == numberOfCurrentHandle) break;
}// waiting for signals
// _clockWait is set to 1 and that means it checks active threads each msec
DWORD signal = WaitForMultipleObjects((DWORD)numberOfCurrentHandle, hList, FALSE, _clockWait);// if some task has been completed
if(signal != WAIT_TIMEOUT) {
wc_task* task = NULL;// if some task finishes in success, set the state to complete
if(WAIT_OBJECT_0 <= signal && signal < WAIT_OBJECT_0 + numberOfCurrentHandle) {
HANDLE target_handle = hList[signal - WAIT_OBJECT_0];
task = GetActiveTask(target_handle);
if(task) task->_complete = complete;
}
// if abandoned, set the state to abandoned
else if(WAIT_ABANDONED_0 <= signal && signal < WAIT_ABANDONED_0 + numberOfCurrentHandle) {
HANDLE target_handle = hList[signal - WAIT_OBJECT_0];
task = GetActiveTask(target_handle);
if(task) task->_complete = abandoned;
}// if task exists
if(task) {
// set the task to be complete and scheduling new task
pair<HANDLE, wc_task*> p(task->_hThread, task);
_complete_tasks.insert(p);
_active_tasks.remove(task);
Run();
}
}
}
}
It's not perfect but well-tested version. The reason why I don't open the rest of the code is that there might be lots of troubles to overhaul so as to avoid exceptional errors. But the above 2 functions are the core factors of the library and it can be refered into the other applications.