**Exclusive Control for Compound Operations On Hardware Transactional Memory** Keisuke MASHITA<sup>†</sup>, Anju HIROTA<sup>†</sup>, and Tomoaki TSUMURA<sup>†</sup> <sup>†</sup>Nagoya Institute of Technology, Japan

load A

store A

store A

Req.A

Req.A

Commit

Nack

Req.A

Abort

Restart

### Summary

#### Hardware Transactional Memory (HTM) is

a promising mechanism for multi/many-core programming. A read variable will often be written before long, and this severely degrades the performance of TMs.

We propose a transaction scheduling for mitigating this with very simple implementation.

## Managing w/ a Flag in Each Cache Line

We propose a very simple implementation. • Only an additional 1-bit flag (called *C-bit*) is required for each cache line.



□ The total execution cycles are reduced 72.2% at a maximum, and 17.5% on average. □ The required additional hardware cost is only **512 Bytes**.

# Compound Ops lead to Futile Stalls

Typical HTM scheduling □ When a thread tries to access load A Req.A a shared variable, the thread sends a request to detect a conflict. □ If a thread detects deadlock, store A the thread aborts its transaction. **Read-after-read (RaR) accesses** cause no conflict. However, many <u>read</u> accesses are followed by <u>write</u> accesses to the same addresses.

As a result, *Futile Stall* is caused even if an RaR access is allowed.

*C-bit* represents that the address has been accessed by a compound operation

Each cache line has these bits for conflict detection on general HTM

□ If an RaR access to an address is expected to be followed by a write access, the transactions are serialized.



**Performance Evaluation** 

Many Futile Stalls are caused by such an access pattern. (e.g. in *Compound Operations*)

□ Many transactions have such *compound operations*.

(e.g. increment, decrement, and compound assignment operation) The part of the transaction in Prioqueue

```
BEGIN TRANSACTION
if (array[key] != index){ /* array[key] is read */
   array[key] = index;  /* array[key] is written */
COMMIT_TRANSACTION
```

Many aborts are caused by *Compound Operations*. **Compound Operations** degrade the performance of HTM.

Percentage of the aborts caused by *Compound Operations* 



#### Simulator □ Simics : Full system simulator (SPARC-V9, Solaris10) **GEMS** : Memory system simulator The execution cycles of LogTM is one of the most the slowest thread in 16 threads. general HTM systems. (B) LogTM(Baseline) (S) Store Predictor Abort\_ovh Stall (P) Proposal Good\_trans Non\_trans 12 10 0.8 8 0.6 0.4 0.2

We aim to solve these problems by a transaction scheduling with practical light-weight implementation.



The proposed scheduling prevents *Futile Stalls* and aborts. Especially, in Deque, abort is never caused with our scheduling.

**Comparison between our proposal and Store Predictor\*** 

- Store Predictor is one of the existing work for solving bad influence of RaR accesses.
  - Unlike our proposal, even once an address is managed by Store Predictor, all accesses to the address are stalled.
  - □ This leads to serious performance degradation in Btree, because of unnecessary serialization.

\* J. Bobba et al : Performance Pathologies in Hardware Transactional Memory, Proc.34<sup>th</sup> ISCA