Clean Code
Chapter 13
Concurrency
-
Decoupling strategy for what gets done and when it gets done
- Improve throughput and structures of an application
- Can make systems easier to understand and a way to separate concerns
-
Examples where concurrency is useful
- System handling one user at a time and requires one second per user
- As number of users greatly increase, so does response time
- System interpreting large data sets but only provides solution after processing everything
- Each data set could be processed in parallel and provide partial solutions as they individually complete
- System handling one user at a time and requires one second per user
Myths and Misconceptions
- Common myths / misconceptions:
- Concurrency always improves performance
- Can sometimes improve performance, but only when lots of wait time can be distributed between multiple threads / processors
- Design doesn't change when writing concurrent programs
- A concurrent algorithm can differ from an algorithm designed for a single-threaded system
- Understanding concurrency issues is not important when working with a container such as a Web or EJB container
- Learn how to prevent issues of concurrently updating shared resources and deadlock
- Concurrency incurs some overhead, both in performance as well as writing additional code
- Correct concurrency is complex, even for simple problems
- Concurrency bugs aren't usually repeatable, so they are often ignored
- Concurrency often requires a fundamental change in design strategy
- Concurrency always improves performance
Challenges
-
Consider the snippet:
public class X { private int lastIdUsed; public int getNextId() { return ++lastIdUsed; } }
-
Suppose an instance of X is created with lastIdUsed set to 42 and shared between two threads, there are many paths:
- Thread 1 gets 43, thread 2 gets 44, lastIdUsed is 44
- Thread 1 gets 44, thread 2 gets 43, lastIdUsed is 44
- Thread 1 gets 43, thread 2 gets 43, lastIdUsed is 43
-
The unpredictable result occurs from the many possible paths that threads can take to generate different results
- Occurs because threads are accessing and manipulating a shared resource simultaneously
Concurrency Defense Principles
-
Single Responsibility Principle
- Concurrency design is complex, should be separated from rest of the code
- Has its own life cycle of development, change and tuning
- Has its own set of challenges
- High frequency of producing miswritten concurrency code
-
Limit the scope of data
- Modifying same resource causes unexpected behaviour, want to restrict critical sections of code
- The more places shared data gets updated, more likely to:
- Forget to protect places with critical code
- More effort to ensure everything is guarded
- Difficult to determine source of failures
-
Use copies of data
- Avoid sharing data in the first place, each thread can operate on a copy and then merge results on a single thread
-
Threads should be as independent as possible
- Write threaded code such that threads don't share data with other threads
- Omit synchronisation requirements
Execution Models
-
Terminology
- Bound resources: Resources of fixed size
- Mutual exclusion: Only one thread can access a shared resource at a time
- Starvation: A thread or group of threads can't proceed with execution from waiting forever / a long time
- Deadlock: Two or more threads waiting for each other to finish. The threads depend on obtaining resources from other threads
- Livelock: Threads in lockstep, each trying to do work but are unable to progress for a long time
-
Producer-Consumer
- One or more producers create work and places it in a buffer or queue
- One or more consumer threads get work from the data structure and completes it
- Bound resource, producers wait for queue to be empty, consumers wait for queue to be non-empty
-
Readers-Writers
- Writers wait until no readers before performing an update
- Writers are starved if there are continuous readers
- If there are frequent writers, they are given priority at the cost of throughput
-
Dining philosophers
- There are many threads and few resources
- Essentially, threads compete for the resources which causes some of them to wait for the resource to become available again
- Can experience deadlock, livelock and low throughput
Dependencies between synchronised methods
-
Avoid using more than one method on a shared object
-
For situations using more than one method:
- Client-based locking: Client locks the server before calling 1st method and the lock extends to the last calling method
- Server-based locking: Within the server, create a method locking the server, call all methods then unlock. Client calls new method
- Adapted server: Create intermediary that performs locking
-
Locks create overhead and delay, keep critical sections as minimal as possible
Extra notes
- Test for shut down code early, can be difficult to achieve if threads deadlock before receiving a shut down signal
- Write tests to expose potential problems then run with different program and system configurations
- Treat spurious failures as candidate threading issues
- Get non-threaded code passing first
- Make threaded code pluggable and tunable
- Run with more threads than processors
- Run different platforms
- Try force failures