Immutable Data Structures in Concurrent Java Applications

Concurrent applications have multiple threads running simultaneously. Access to data shared by multiple threads requires synchronization which is often a source of fragile and hard to maintain code, hard to find bugs, and performance issues. You can minimize synchronization and the headaches that go with it using immutable data structures. In this article I’ll demonstrate how to create and use immutable data structures to reduce and localize the need for synchronization.

Consider an application that uses a set of contacts. The contacts are stored in a map with the contact name as the key and a Contact object as the value. Various threads in the application access the contacts. Both the Contact class and the map require synchronization. Here is thread safe implementation of the Contact class:

public class Contact {
    private volatile String name;
    private volatile String email;
    private volatile String phone;

    public String getName() {return name;}
    public void setName(String name) {this.name = name;}

    public String getEmail() {return email;}
    public void setEmail(String email) {this.email = email;}

    public String getPhone() {return phone;}
    public void setPhone(String phone) {this.phone = phone;}
}

I could have chosen to synchronize all the setters and getters but making the fields volatile is a simpler solution. Using this implementation for the contact objects and ConcurrentHashMap for the contact map provides a thread safe solution. Usually thread safety at the class level alone is not sufficient to support application logic as demonstrated in the following method:

public void handleGmailNotification(Contact contact) {
    if (contact.getEmail().endsWith("gmail.com")) {
        sendGmailNotification(contact.getEmail());
    }
}

This method is supposed to send a special email to contacts with a GMail address. If another thread changes the contact between the execution of the first two lines of code the GMail specific email may be sent to a non-GMail account. That may only happen once a month and will likely be a difficult bug to find especially if the code that changed the email was added by a developer unaware of the use above. More course grained synchronization can be used to fix the problem but that requires that all code that deals with the Contact’s email address be correctly synchronized. Multiply that by the many different pieces of data being shared by multiple threads in a highly concurrent application and you get fragile, bug ridden, and hard to maintain code. Making Contact immutable alleviates these issues:

public final class Contact {
    private final String name;
    private final String email;
    private final phone;

    public Contact(String name, String email, String phone) {
        this.name = name;
        this.email = email;
        this.phone = phone;
    }

    public String getName() {return name;}
    public String getEmail() {return email;}
    public String getPhone() {return phone;}
}

Using the immutable version I can be sure that the Contact object I’m interacting with will not change. If I need to make a change to the email address I create a new Contact object and put it in the contact map in place of the existing one.

The ConcurrentHashMap, while thread safe, can cause similar issues. This code is supposed to send a text message and an email to every contact:

public void sendMessages(Map contactMap) {
    sendTextToPhone(contactMap.values();
    sendEmail(contactMap.values();
}

If another thread updates the contact map while this is executing we will have another hard to find bug. You could make the collection unmodifiable using Collections.unmodifiableMap but its not truly immutable. The underlying map can still be modified. Also, you have no way to change the contact list. A map implementation that creates a copy when its modified is a nice alternative. Here’s a simple implementation:

public class ImmutableMap implements Map {
    private final Map map;

    public ImmutableMap() {
        this(Collections.EMPTY_MAP);
    }

    public ImmutableMap immutablePut(K key, V value) {
        Map newMap = new HashMap(map);
        newMap.put(key, value);
        return new ImmutableMap(newMap);
    }

    public ImmutableMap immutableRemove(K key) {
        Map newMap = new HashMap(map);
        newMap.remove(key);
        return new ImmutableMap(newMap);
    }

    private ImmutableMap(Map delegate) {
        this.map = Collections.unmodifiableMap(delegate);
    }

    // All the map methods are simply delegated to the map field.
    // To conserve space they are not shown here.
}

The immutablePut and immutableRemove methods return a copy of the map with the changes, leaving the original map unchanged. All the methods from the Map interface delegate to the underlying HashMap. Since it is unmodifiable any mutating methods will throw UnsupportedOperationException. Immutable versions of other mutating map methods can be implemented using the same pattern as immutablePut and immutableRemove. You can safely use any instance of this type in any way with no concern about causing issues in other parts of the application or vice-versa.

Its likely the application we’ve been using as an example will require a central reference to the contact list. Here’s a contact service implementation using the ImmutableMap:

public class ContactService {
    private final ReentrantLock lock = new ReentrantLock();
    private volatile ImmutableMap contacts = new ImmutableMap();

    public void addContact(Contact contact) {
        lock.lock();
        try {
            contacts = contacts.immutablePut(contact.getName(), contact);
        } finally {
            lock.unlock();
        }
    }

    public ImmutableMap getContacts() {
        return contacts;
    }
}

There is some synchronization required to ensure that if multiple threads want to add a contact to the central list all additions are maintained correctly. Without the lock, if two threads called addContact at the same time they would each add a different contact to the existing map and the last one assigned would be the only one saved. The other would be lost. The contact must be volatile to ensure all threads calling getContacts will get the updated reference. The synchronization required in this case is very localized and easy to maintain. Any code that gets the contacts from the service can interact with them without synchronization concerns.

This paradigm works best in cases where updates are infrequent and the data set is relatively small. If updates are frequent or the data set is very large then the copying required can become a performance issue. Don’t let that deter you from adopting this paradigm where you can. Immutable data structures make concurrent programming a much simpler and safer endeavor. It is a proven approach that is fundamental in functional programming languages such as Scala and gaining traction in the object oriented world.

About the Author

Object Partners profile.

One thought on “Immutable Data Structures in Concurrent Java Applications

  1. Paul Horn says:

    Great article Jim!

  2. Firass Shehdeh says:

    Thanks for the very nice article. It is unfortunate that immutable data structures haven’t yet reached the level of adoption that they deserve.

    I personally find this approach very useful, especially when trying to understand new codebase, or when debugging existing applications. Buy limiting the number of points where data elements can be modified, I get a better chance of determining when/why a data element got corrupted.

    I agree with you that the cloning the immutable objects might introduce a big overhead for large data structures, but this should not be a big problem, since most of the time, the immutable objects tend to be small (think of ValueObjects as discussed by Martin Fowler)

  3. Vic says:

    I am not sure, but there is a better solution, e.g. why you did not use ConcurrentHashMap? As I understand you are trying to guarantee invariability of contact list on sendMessages invocation.
    From javadoc: “Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration”. It looks like it fits to your requirements.
    BTW, I am not sure about safe publication in your code of ImmutableMap

  4. Guy Incognito says:

    “[…]cloning the immutable objects[…]”

    What for?

    You wouldn’t be able to differentiate between them.

    A very good collections library for immutable data structures is the google-collection library.

  5. Hardcoded says:

    Nice one.
    The copy-on-write style synchronisation is far more robust and understandable than the lock-/monitor-based ones.

  6. Sue Schedin says:

    Nice article!

  7. Dean says:

    Jim,

    I like the idea of using immutable data structures for simple classes, but for a collection this could incur a pretty large performance hit. Compare the performance of immutablePut() to this:

    synchronized(contactMap) {
    sendMessages(Map contactMap);
    }

    When taking over someone else’s code I like to take inventory of where in the code shared, mutable state exists. If contactMap is not shared state somewhere in the call stack above the call to sendMessages() then there’s no problem and no reason to incur the costs of ImmutableMap. If it is shared then you need to weigh the costs involved.

    Taking inventory of shared state is complicated by the situation where a local variable is turned into shared state like this:

    ….
    Map localMap = new TreeMap();
    ….
    someInstance.setMap(localMap);

    The call to setMap() could result in localMap now being an instance variable of someInstance.

    Ideally when writing concurrent code this kind of thing would be documented in a shared state inventory, but I’m dreaming.

  8. Dave Andreasen says:

    Great article. I like copy approach for handling the updates to the set of contacts. I have used other synchronization approaches in the past and they have been fragile for the reasons you have outlined above.

  9. jmcclure says:

    Dean,

    You are correct in that there is some performance overhead. This is just a shallow copy however so its not a big impact unless either the map is huge or it is updated often. The overhead is outweighed by the safety of the methodology. This is a common paradigm in functional programming that is starting to make its way into OOP due in no small part to the creation of Scala.

    Ultimately you are correct in that you should always weigh the costs of different implementations. I’m not intending to imply that you should always use immutable data structures. Only to present another tool for your development toolbox.

  10. Heinz Kabutz says:

    Fantastic article, Jim!

    Immutability certainly makes things much easier.

    Some weird effects happen when we modify the state of keys in collections – they tend to disappear completely, together with their values, only to reappear later when the map is rehashed. Depends on the map implementation of course.

    We probably also need an atomic “replace” method that removes the old contact and puts in a new one (also immutable). Otherwise there would be no way to change your email address.

    Keep up the articles, I’m looking forward to the next one!

    Heinz

  11. Such structures are called “persistent”, or fully persistent in this case since every version of the structure can be used to create a new version. Hash tables (or even array-based lists) are particularly bad examples of such, since they incur excessive computational cost.

    A much better example of such a structures would be a tree-based map (a balanced tree in particular would only incur O(log(n)) node copies instead of O(n) of hash table).

    Still, if the persistent map above is merely to used to avoid synchronization issues, and not to be able to access historical versions of a structure, then it is unnecessarily wasteful. Assuming you need a Map with the ability to capture proper snapshots, so CHM is out: if a Map is shared, fine, make that immutable. If any thread needs to update that shared instance, sure, protect it with a lock or so. But if a thread wants to access the map, and then modify it locally (in your example, by calling getContacts() and then immutablePut etc), it is much more efficient to copy the map *once* (using the immutable Map from contacts) into a regular (“ephemeral”) HashMap for further updates – no need to continuously copy a thread-local value if you don’t need the persistency.

    Dimitris

  12. bubak says:

    Scala and case classes makes it way easier.

  13. ruben says:

    Hi,

    Two things:

    1) If Contact is stored in a Map, please override equals() & hashCode().

    2) Collections.EMPTY_MAP is already immutable, no need to re-wrap it in the constructor.

    Cheers.

Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]