… application needed a queue of small objects, and on a modern quad–core CPU the cache misses were hurting performance so much that although a lock–free queue did have near 100% scalability, the overall operation was completing 165% faster with a locked queue with zero scalability.
The next best thing is to combines the best of both worlds: design a queue with low overhead and medium scalability. Using a reader–writer lock with a combination of lock–free operations, I came up with a queue that only needs to do a full lock once every 32 or 64 operations. The result? Scalability 5% lower than a lock–free queue, with overall performance 210% better.
OK, I’ll admit it: I cheated, somewhat. Lock–free algorithms are good for more than just scalability. They also offer immunity to nasty effects like deadlock, livelock, and priority inversion. In my case I wasn’t in a situation to worry about these, but you might be. The lesson here is to know your situation and decide carefully, and don’t trust what others tell you: always try things yourself and profile.