This document provides guidelines for thread-safe programming in the SHIP connection implementation.
The ShipConnection handles the data connection and coordinates SHIP and SPINE message I/O. It manages handshake state machines, timer operations, and message buffering in a concurrent environment.
The ShipConnection uses multiple mutexes to protect different aspects of the connection:
mux- Main connection state mutexbufferMux- SPINE message buffer protectionhandshakeTimerMux- Handshake timer state protectionshutdownOnce- Ensures single shutdown execution
The original setState() method held the main lock while calling timer methods, creating potential deadlocks:
// OLD PROBLEMATIC PATTERN (Fixed)
func setState(newState, err) {
c.mux.Lock() // Main lock acquired
// ... state update
c.setHandshakeTimer() // Timer method called while holding lock
c.mux.Unlock()
}The new implementation separates state changes from timer operations:
// NEW SAFE PATTERN
func (c *ShipConnection) setState(newState model.ShipMessageExchangeState, err error) {
// Phase 1: Update state atomically while holding lock
var timerOp func()
var shouldNotify bool
var notifyState model.ShipState
c.mux.Lock()
oldState := c.smeState
c.smeState = newState
c.smeError = err
// Determine timer operation while holding lock, but don't execute it yet
switch newState {
case model.SmeHelloStateReadyInit:
timerOp = func() { c.setHandshakeTimer(timeoutTimerTypeWaitForReady, tHelloInit) }
case model.SmeHelloStateOk:
timerOp = func() { c.stopHandshakeTimer() }
}
if oldState != newState {
shouldNotify = true
notifyState = model.ShipState{State: newState, Error: err}
}
c.mux.Unlock()
// Phase 2: Execute timer operations and notifications without holding lock
if timerOp != nil {
timerOp()
}
if shouldNotify {
c.infoProvider.HandleShipHandshakeStateUpdate(c.remoteSKI, notifyState)
}
}- Deadlock Prevention: Timer operations no longer execute while holding the main lock
- Atomicity: State changes remain atomic within the critical section
- Performance: Reduced lock holding time improves concurrency
- Safety: External callbacks happen without locks, preventing circular dependencies
// Timer operations are now safe to call from setState()
func (c *ShipConnection) setHandshakeTimer(timerType timeoutTimerType, duration time.Duration) {
c.stopHandshakeTimer() // Safe - no main lock held
c.handshakeTimerMux.Lock()
c.handshakeTimerRunning = true
c.handshakeTimerType = timerType
c.handshakeTimerMux.Unlock()
// Timer goroutine can safely call handleState()
go func() {
select {
case <-time.After(duration):
c.handleState(true, nil) // No deadlock risk
case <-c.handshakeTimerStopChan:
return
}
}()
}func (c *ShipConnection) getHandshakeTimerRunning() bool {
c.handshakeTimerMux.Lock()
defer c.handshakeTimerMux.Unlock()
return c.handshakeTimerRunning
}SPINE message buffering uses a separate mutex to avoid contention:
func (c *ShipConnection) HandleIncomingWebsocketMessage(message []byte) {
c.bufferMux.Lock()
c.spineBuffer = append(c.spineBuffer, message)
c.bufferMux.Unlock()
// Process buffer without holding lock
c.processBufferedMessages()
}func (c *ShipConnection) getState() model.ShipMessageExchangeState {
c.mux.Lock()
defer c.mux.Unlock()
return c.smeState
}func (c *ShipConnection) ApproveIfPending() bool {
c.mux.Lock()
if c.smeState != model.SmeHelloStatePendingListen {
c.mux.Unlock()
return false
}
// Update state while holding lock
c.smeState = model.SmeHelloStateReadyInit
c.mux.Unlock()
// Handle state transitions without lock to avoid deadlock
c.stopHandshakeTimer()
c.handleState(false, nil)
return true
}# Using Makefile (recommended)
make test-deadlock # Enhanced deadlock detection
make test-deadlock-specific # Run specific deadlock tests
make test-race # Standard race detection
make test-concurrency # All concurrency tests
# Development workflow
make dev-test # Quick format, lint, and test
make pre-commit # Full pre-commit validation
# Manual test commands
go test -race -tags=deadlock -timeout=30s ./shipDeadlock tests are automatically executed in the CI/CD pipeline:
- Every push and pull request runs deadlock detection tests
- Nightly runs with enhanced concurrency testing
- Performance regression monitoring
TestSetStateTimerDeadlock: Tests the specific setState/timer deadlock scenarioTestStateTimerConsistency: Verifies timer state consistency under concurrent modificationsTestTimerLifecycleDeadlock: Tests complete timer lifecycle for deadlock issuesTestConcurrentStateChangeWithTimerExpiry: Tests race between timer expiry and state changes
go test -bench=BenchmarkSetState -benchtime=1s ./ship
go test -bench=BenchmarkConcurrentStateChanges ./ship// Always use setState() for state changes
conn.setState(model.SmeHelloStateOk, nil)
// Read state safely
currentState := conn.getState()// Timer operations are safe to call from any context
conn.setHandshakeTimer(timeoutTimerTypeWaitForReady, duration)
conn.stopHandshakeTimer()
// Check timer state
if conn.getHandshakeTimerRunning() {
// Timer is active
}// Proper connection shutdown
conn.CloseConnection(false, 4001, "reason")
// The shutdownOnce ensures cleanup happens only once- Critical sections are kept minimal
- Timer operations happen outside of main lock
- External callbacks occur without holding locks
- State queries don't block timer operations
- Timer operations don't block state changes
- Multiple goroutines can query state simultaneously
- Buffer operations use separate mutex
- No locks held during memory allocations
- Timer goroutines clean up automatically
- Use
setState()for all state changes - Query state with
getState() - Use atomic approval/abort methods
- Test with race detector enabled
- Keep critical sections small
- Don't call timer methods while holding
mux - Don't hold locks during external callbacks
- Don't manually manipulate
c.smeStatedirectly - Don't create new timer patterns without review
State changes with errors:
conn.setState(model.SmeStateError, fmt.Errorf("connection failed: %v", err))Timer expiry handling:
// Timer goroutines automatically handle expiry
// No manual intervention requiredFollowing the pattern [COMPONENT]_[TOPIC]_GUIDE.md:
SHIP_PERFORMANCE_GUIDE.md- Performance optimization guidelinesSHIP_TESTING_GUIDE.md- Testing strategies and patternsSHIP_HANDSHAKE_GUIDE.md- Handshake state machine documentationSHIP_MESSAGE_GUIDE.md- Message handling patterns
For code that previously relied on the old setState() behavior:
- Timer operations: Now happen asynchronously after state change
- Callbacks: Now happen without locks held
- Atomicity: State changes remain atomic, but side effects are deferred
- Performance: Improved due to reduced lock contention
The two-phase approach maintains all functional correctness while eliminating deadlock risks and improving performance.