ジャバのメモリ構成

最近ジャバ言語のスレッドの使い方に関して読んでいます。
It seems that, although objects are allocated on a heap that is shared by all threads (also known as main memory), each thread has its own heap (called working memory). Whenever an object is passed to the thread, it seems that the thread copies it from main memory to working memory. What is gained by having such a local copy I am uncertain (perhaps some performance gains). However, if the variable is either synchronized (used in a synchronized block) or declared to be "volatile", all reads must retrieve the copy from main memory and all writes be flushed back to main memory. In addition, the order of reads and writes to main memory must be consistent with the order stated in the program, in the case of the volatile keyword.
I have read that a "volatile" keyword with a similar meaning exists in C++, so I am wondering whether C++ threads also keep working copies of heap memory. In addition, as the .NET Common Language Runtime is similar to the Java Virtual Machine, I wonder whether .NET also uses such working memory.
更新 In .NET, it seems that threads can cache variables in local storage (via either compiler's translation into CIL or the CLR's translation of CIL into native code). In such a case, either the use of locks or of the volatile keyword (at least in C#) allows the write-back to main memory to occur. I am wondering where such implicit caching occurs in C++. Perhaps the native code generated by C++ compilers keeps the variable in a register and does not guarantee write-back to main memory in any order or within any particular time. However, if two threads are working with a shared object, I believe that the updates should be seen within a certain time limit (e.g. if thread 1 goes into a wait loop until thread 2 updates a shared object, thread 1's loop should be finite).