Scala Collections Compared with Python, C++

2 minutes read

Scala’s collection data types are imporessively rich and complex. Picking the best suitable one may not be straightforward and some of the distinctions can be subtle in actual use cases. This post aims to summarize the commonly used types, compared with their counterparts in Python and in C++. This comparison is more for helping position the mindset and making analogies than being precise, espeicially from the data structure perspective.

Changes and additions to this post may be made over time.

Abstract class and trait hierarchy

scala collection hierarchy

See more here

Summary

Scala JavaPythonC++Description
abstractclass    
Seq.IndexedSeqArrayArray []Fixed length seq. Mutable in elements but not in length
Seq.IndexedSeqVector   Fixed length seq. Immutable both in elements and length. Internally as a 32-branch tree. Default for immutable.IndexedSeq
Seq.IndexedSeqArrayBuffer liststd::vectorVariable length seq. Mutable both in elements and length. Efficient to append.
Seq.LinearSeqListBuffer collections.dequestd::listInternal list. Efficient prepend and append on both ends.
MapMap   Mutable and immutable map implemented as hash trie
MapHashMap dict Both mutable and immutable
MapTreeMap  const std::mapImmutable RB tree map
  TreeMap std::mapMutable RB tree map

The underlying implementation is best described here for mutable collection and for immutable collection.

Handy literals

  • indexed seq
    • immutable: Vector(1, 2, 3)
    • mutable: collection.mutable.ArrayBuffer(1, 2, 3)
  • linear seq
    • immutable: List(1, 2, 3)
    • mutable: collection.mutable.ListBuffer(1, 2, 3)
  • map
    • immutable: Map(1->"a", 2->"b")
    • mutable: collection.mutable.Map(1->"a", 2->"b")
  • set
    • immutable: Set(1, 2, 3)
    • mutable: collection.mutable.Set(1, 2, 3)

Caveats and tricks

Mutable vs immutable

Immtable collection is optimized based on the agreement that both the size and the content cannot be changed, whereas mutable collection allows changes in both the size and the content. Array is something in between: you can change the content but not the size. Therefore, in some sense Array is mutable, but due to the underlying data structure, changes in size are natually not allowed. People are arguing on this point. The definition of mutability in the official document is that “A mutable collection can be updated or extended in place.” So logically, if any change can be made after the object being created, it is mutable. But, really, it doesn’t matter much.

val vs var

var for an immutable collection allows operations such as +=. But what’s under the hood is that a new immutable collection is being created and attached to the same reference. The old one is then recycled. In this case, you still cannot modify values. scala var x = Map("AL" -> "Alabama") // still default immutable.Map x += ("CA"->"California") // okay because a new map is being generated x("CA") = "Air China" // illegal because of an attempt of modifying values

Initialize through other collection objects

val m = Map(1->"a")
val tm = collection.immutable.TreeMap(m.toArray:_*)

Performance characteristics

The best way to choose a proper collection type is to look at the complexity of operations being used in your applications. This perhaps is also the easist way to figure out the underlying implementation and data structure.

The official document provides an excellent summary of the performance characteristics for each Scala collection class.

Updated:

Leave a Comment