2016年11月17日

Scala

by smallnest

[译]Scala Collection的性能

目录 [−]

这是翻译自 Li Haoyi 的 Benchmarking Scala Collections。

Li Haoyi 的背景了解Scala的人都知道，虽然Scala的官方文档对集合框架的性能提供了定性的分析，但是Li Haoyi的这篇文章定量的比较了各个框架类的性能，非常有参考意义，也便于你更好的正确的选择 Scala 集合中的类。

这篇文章从经验的角度深入研究了Scala集合库的运行特性。你也许已经看到了很多的从实现的角度介绍Scala集合库的文章(inheritance hierarchies, CanBuildFrom, 等等),但是很少有人写这些集合运行时的行为特性。

List比Vector快，还是Vector比List快？使用未装箱的数组存储基本类型可以节省多少内存？什么时候你会执行一些性能的技巧，比如预分配大小的数组、使用while-loop取代foreach调用等，这些技巧真的有效么？声明var l: List还是val b: mutable.Buffer？这篇文章会给你答案。

插播一条广告，对Scala集合感兴趣的读者可以阅读拙著《Scala集合技术手册》，详细介绍了Scala集合的各个类的功能，各大网上书店均有销售，繁体中文版刚刚在台湾出版。

Scala编程语言内建了丰富的集合类型： List、 Vector、 Array、 Set、 Map等等。当然还有一些众所周知的基础知识：List可以快速地在头部增加元素，但是索引起来很慢， Vector是一个很好的通用目的的集合，但是实践中这些集合处理的数据确很少。

例如，Vector集合要比数组多占用多少内存？List呢？使用while-loop比foreach调用要快多少？使用Map和Set的性能呢？

目前最好的描述这些运行时特性的文章是下面的表格，这个表格来自 docs.scala-lang.org:

operation	head	tail	apply	update	prepend	append	insert
List	C	C	L	L	C	L
Stream	C	C	L	L	C	L
Vector	eC	eC	eC	eC	eC	eC
Stack	C	C	L	L	C	C	L
Queue	aC	aC	L	L	L	C
Range	C	C	C
String	C	L	C	L	L	L

其中表格中的数据代表时间复杂度，意义如下

Key	Meaning
C	常数时间The operation takes (fast) constant time.
eC	等价常数时间The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys.
aC	平均常数时间The operation takes amortized constant time. Some invocations of the operation might take longer, but if many operations are performed on average only constant time per operation is taken.
Log	对数时间The operation takes time proportional to the logarithm of the collection size.
L	线性时间The operation is linear, that is it takes time proportional to the collection size.
	The operation is not supported.

This lacks concrete numbers and is a purely theoretical analysis. Worst, it uses weird terminology like "Effectively Constant time, assuming maximum length", which is confusing and doesn't match what the rest of the world thinks when they discuss performance characteristics or asymptotic complexity (If you're wondering why you've never heard the term "Effectively Constant Time" before, it's because everyone else calls it "Logarithmic time", and it's the same as the "Log" category above).

This post will thus go into detail with benchmarking both the memory and performance characteristics of various Scala collections, from an empirical point of view. By using runtime benchmarks rather than theoretical analysis, we will gain an understanding of the behavior and nuances of the various Scala collections, far more than what you'd gain from theoretical analysis or blind-leading-the-blind in-person discussions.

这个表格的问题在于它的指标是基于纯粹的理论分析。更糟糕的是，它使用奇怪的术语，如“等价常数时间，平均常数时间”等，这让人很迷惑，而且不符合其他人在讨论性能特征或渐近复杂性时的通常理解。

因此，这篇文章将从经验的角度详细介绍各种Scala集合的内存占用和性能特性。通过使用运行时benchmark数据而不是理论分析，我们将了解各种Scala集合的行为和细微差别，远远超过了从理论分析或盲人指路的讨论中获得的知识。

内存占用

本文第一个要介绍的是各种集合的内存占用。它比性能更容易分析，因为它的结果是确定的：你不需要多次进行benchmark测试来求平均值来减少误差。虽然不太常用，但是你还是可以通过直接使用反射和Java的Instrumentation API写个程序来分析一个对象的内存占用。

有很多的文章介绍这个，比如：

它相对来说比较简单，我使用Scala写了一个，它使用反射来抓取对象引用图，然后使用Java Instrumentation agent提供的getObjectSize得到对象的大小：

package bench
import java.lang.reflect.Modifier
import java.util
import scala.collection.mutable
object DeepSize {
  private val SKIP_POOLED_OBJECTS: Boolean = false
  private def isPooled(paramObject: AnyRef): Boolean = {
    paramObject match{
      case e: java.lang.Enum[_]   => true
      case s: java.lang.String    => s eq s.intern()
      case b: java.lang.Boolean   => (b eq java.lang.Boolean.TRUE) || (b eq java.lang.Boolean.FALSE)
      case i: java.lang.Integer   => i eq java.lang.Integer.valueOf(i)
      case s: java.lang.Short     => s eq java.lang.Short.valueOf(s)
      case b: java.lang.Byte      => b eq java.lang.Byte.valueOf(b)
      case l: java.lang.Long      => l eq java.lang.Long.valueOf(l)
      case c: java.lang.Character => c eq java.lang.Character.valueOf(c)
      case _ => false
    }
  }
  /**
    * Calculates deep size
    *
    * @param obj
    * object to calculate size of
    * @return object deep size
    */
  def apply(obj: AnyRef): Long = {
    deepSizeOf(obj)
  }
  private def skipObject(obj: AnyRef, previouslyVisited: util.Map[AnyRef, AnyRef]): Boolean = {
    if (SKIP_POOLED_OBJECTS && isPooled(obj)) return true
    (obj == null) || previouslyVisited.containsKey(obj)
  }
  private def deepSizeOf(obj0: AnyRef): Long = {
    val previouslyVisited = new util.IdentityHashMap[AnyRef, AnyRef]
    val objectQueue = mutable.Queue(obj0)
    var current = 0L
    while(objectQueue.nonEmpty){
      val obj = objectQueue.dequeue()
      if (!skipObject(obj, previouslyVisited)){
        previouslyVisited.put(obj, null)
        val thisSize = agent.Agent.getObjectSize(obj)
        // get size of object + primitive variables + member pointers
        // for array header + len + if primitive total value for primitives
        obj.getClass match{
          case a if a.isArray =>
            current += thisSize
            // primitive type arrays has length two, skip them (they included in the shallow size)
            if (a.getName.length != 2) {
              val lengthOfArray = java.lang.reflect.Array.getLength(obj)
              for (i <- 0 until lengthOfArray) {
                objectQueue.enqueue(java.lang.reflect.Array.get(obj, i))
              }
            }
          case c =>
            current += thisSize
            var currentClass: Class[_] = c
            do {
              val objFields = currentClass.getDeclaredFields
              for(field <- objFields) {
                if (
                  !Modifier.isStatic(field.getModifiers) &&
                    !field.getType.isPrimitive
                ) {
                  field.setAccessible(true)
                  var tempObject: AnyRef = null
                  tempObject = field.get(obj)
                  if (tempObject != null) objectQueue.enqueue(tempObject)
                }
              }
              currentClass = currentClass.getSuperclass
            } while (currentClass != null)
        }
      }
    }
    current
  }
}

随着这段代码没有考虑一些特殊的情况(32/64位的JVM／压缩指针的启用等等)，可能还有些bug,但是实际中哦我使用它计算一堆对象的大小的时候，它的结果和其它的工具比如JProfiler是一致的，所以我倾向于它的结果还是很准确的。如果你想尝试运行这段代码，或者想修改一下测量方法，可以运行下面的代码：

Benchmark Code

这样，我们运行不同大小的不同集合的时候计算内存占用就很简单了。每个集合填充新的对象都是一样的，除了SortedSet。SortedSet使用java.lang.Integer(...)填充了一段整数，这样它们可以被排序。

下表中就是测试的结果，单位是字节(byte),测试分别填充零个元素、一个元素、四个元素以及4的指数，一直到1,048,576个元素：

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Vector	56	216	264	456	1,512	5,448	21,192	84,312	334,440	1,353,192	5,412,168	21,648,072
Array[Object]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536
List	16	56	176	656	2,576	10,256	40,976	162,776	647,696	2,621,456	10,485,776	41,943,056
Stream (unforced)	16	160	160	160	160	160	160	160	160	160	160	160
Stream (forced)	16	56	176	656	2,576	10,256	40,976	162,776	647,696	2,621,456	10,485,776	41,943,056
Set	16	32	96	880	3,720	14,248	59,288	234,648	895,000	3,904,144	14,361,000	60,858,616
Map	16	56	176	1,648	6,800	26,208	109,112	428,592	1,674,568	7,055,272	26,947,840	111,209,368
SortedSet	40	104	248	824	3,128	12,344	49,208	195,368	777,272	3,145,784	12,582,968	50,331,704
Queue	40	80	200	680	2,600	10,280	41,000	162,800	647,720	2,621,480	10,485,800	41,943,080
String	40	48	48	72	168	552	2,088	8,184	32,424	131,112	524,328	2,097,192

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

m.Buffer	104	120	168	360	1,320	5,160	20,520	81,528	324,648	1,310,760	5,242,920	20,971,560
m.Map	120	176	344	1,080	4,152	16,440	65,592	260,688	1,037,880	4,194,360	16,777,272	67,108,920
m.Set	184	200	248	568	2,104	8,248	32,824	130,696	521,272	2,097,208	8,388,664	33,554,488
m.Queue	48	88	208	688	2,608	10,288	41,008	162,808	647,728	2,621,488	10,485,808	41,943,088
m.PriQueue	144	160	208	464	1,616	6,224	24,656	81,568	324,688	1,572,944	6,291,536	25,165,904
m.Stack	32	72	192	672	2,592	10,272	40,992	162,792	647,712	2,621,472	10,485,792	41,943,072
m.SortedSet	80	128	272	848	3,152	12,368	49,232	195,392	777,296	3,145,808	12,582,992	50,331,728

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Array[Boolean]	16	24	24	32	80	272	1,040	4,088	16,208	65,552	262,160	1,048,592
Array[Byte]	16	24	24	32	80	272	1,040	4,088	16,208	65,552	262,160	1,048,592
Array[Short]	16	24	24	48	144	528	2,064	8,160	32,400	131,088	524,304	2,097,168
Array[Int]	16	24	32	80	272	1,040	4,112	16,296	64,784	262,160	1,048,592	4,194,320
Array[Long]	16	24	48	144	528	2,064	8,208	32,568	129,552	524,304	2,097,168	8,388,624
Boxed Array[Boolean]	16	40	64	112	304	1,072	4,144	16,328	64,816	262,192	1,048,624	4,194,352
Boxed Array[Byte]	16	40	96	336	1,296	5,136	8,208	20,392	68,880	266,256	1,052,688	4,198,416
Boxed Array[Short]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,230,608	20,910,096
Boxed Array[Int]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536
Boxed Array[Long]	16	48	128	464	1,808	7,184	28,688	113,952	453,392	1,835,024	7,340,048	29,360,144

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

j.List	216	240	312	600	1,944	7,320	28,824	114,192	454,296	1,835,160	7,340,184	29,360,280
j.Map	240	296	464	1,200	4,272	16,560	65,712	260,808	1,038,000	4,194,480	16,777,392	67,109,040
j.Set	296	312	360	680	2,216	8,360	32,936	130,808	521,384	2,097,320	8,388,776	33,554,600

你可以花点时间好好看看这些数据，但是我单独挑出来下面的数据进行比较和讨论：

不可变集合的内存占用

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Vector	56	216	264	456	1,512	5,448	21,192	84,312	334,440	1,353,192	5,412,168	21,648,072
Array[Object]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536
List	16	56	176	656	2,576	10,256	40,976	162,776	647,696	2,621,456	10,485,776	41,943,056
Stream (unforced)	16	160	160	160	160	160	160	160	160	160	160	160
Stream (forced)	16	56	176	656	2,576	10,256	40,976	162,776	647,696	2,621,456	10,485,776	41,943,056
Set	16	32	96	880	3,720	14,248	59,288	234,648	895,000	3,904,144	14,361,000	60,858,616
Map	16	56	176	1,648	6,800	26,208	109,112	428,592	1,674,568	7,055,272	26,947,840	111,209,368
SortedSet	40	104	248	824	3,128	12,344	49,208	195,368	777,272	3,145,784	12,582,968	50,331,704
Queue	40	80	200	680	2,600	10,280	41,000	162,800	647,720	2,621,480	10,485,800	41,943,080
String	40	48	48	72	168	552	2,088	8,184	32,424	131,112	524,328	2,097,192

它们都是很常用的Scala集合，大部分都是不可变的，java.lang.String, 还有scala.Stream 和Array[Object]。

　看下面的比较:

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Vector	56	216	264	456	1,512	5,448	21,192	84,312	334,440	1,353,192	5,412,168	21,648,072
Array[Object]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536

小的vector会比小的数组多占用3 ~ 5倍的内存。只有一个元素的vector就会占用1K内存的五分之一。
16个元素的vector会有所好转，只多占了30%左右的内存，而1,048,576个元素的vector只多占了5%左右的内存。

内部实现上，小的vector有些浪费，而大的vector的浪费就可以忽略不计了。

再看下表：

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Array[Object]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536
List	16	56	176	656	2,576	10,256	40,976	162,776	647,696	2,621,456	10,485,776	41,943,056

从四个元素开始，List内存占用是Array[Object]的两倍。这并不奇怪，因为List的每个节点是一个包装节点，这个节点对每个元素进行了包装。

再看下表：

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

List	16	56	176	656	2,576	10,256	40,976	162,776	647,696	2,621,456	10,485,776	41,943,056
Stream (unforced)	16	160	160	160	160	160	160	160	160	160	160	160
Stream (forced)	16	56	176	656	2,576	10,256	40,976	162,776	647,696	2,621,456	10,485,776	41,943,056

Stream (forced)和List占用的内存一样多，Stream (unforced)几乎不占内存，因为它还没有被填充。

继续看下个表格的比较：

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Array[Object]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536
Set	16	32	96	880	3,720	14,248	59,288	234,648	895,000	3,904,144	14,361,000	60,858,616
Map	16	56	176	1,648	6,800	26,208	109,112	428,592	1,674,568	7,055,272	26,947,840	111,209,368

小的Set的内存占用和数组一样多，但是大的Set的内存占用是数组的三倍。这是合理的，小Set专用用来存储单一的对象，而大的Set用树做存储。让我有些惊奇的是占用会那么多，我本来期望多占用50% ~ 100%也就够了，没想到会多占用200%。

同样的情况也适用小Map和大Map。一开始小Map会比数组占用两倍的内存，最后占用6倍的内存。

再看一个内存比较：

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Array[Object]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536
String	40	48	48	72	168	552	2,088	8,184	32,424	131,112	524,328	2,097,192

String存储2-byte字符，而不是存储4-byte的指针，这些指针指向16-byte的对象。因此测试结果也不意味，它们使用数组十分之一的内存来存储相同大小的元素。

不同数组的内存占用比较

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Array[Object]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536

Size	0	1	4	16	64	256	1,024	4,069	16,192	65,536	262,144	1,048,576

Array[Boolean]	16	24	24	32	80	272	1,040	4,088	16,208	65,552	262,160	1,048,592
Array[Byte]	16	24	24	32	80	272	1,040	4,088	16,208	65,552	262,160	1,048,592
Array[Short]	16	24	24	48	144	528	2,064	8,160	32,400	131,088	524,304	2,097,168
Array[Int]	16	24	32	80	272	1,040	4,112	16,296	64,784	262,160	1,048,592	4,194,320
Array[Long]	16	24	48	144	528	2,064	8,208	32,568	129,552	524,304	2,097,168	8,388,624
Boxed Array[Boolean]	16	40	64	112	304	1,072	4,144	16,328	64,816	262,192	1,048,624	4,194,352
Boxed Array[Byte]	16	40	96	336	1,296	5,136	8,208	20,392	68,880	266,256	1,052,688	4,198,416
Boxed Array[Short]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,230,608	20,910,096
Boxed Array[Int]	16	40	96	336	1,296	5,136	20,496	81,400	323,856	1,310,736	5,242,896	20,971,536
Boxed Array[Long]	16	48	128	464	1,808	7,184	28,688	113,952	453,392	1,835,024	7,340,048	29,360,144

各种未装箱的基本类型的数据的内存占用要比装箱的数组内存占用小一个数量级。这是合理的，因为Array[Object]需要为每一个元素存储4-byte的指针，数组本身
要包含16-byte object header (8 bytes for mark word, 8 bytes for class-pointer)。相比之下，Byte、 Short、 Int 或 Long相应的要占用1、 2、 4 或 8 byte。你可以看到这些数组的内存占用的确切比例，数组的对象头有几个字节的开销。

译者注: 你可以参考下列文章来了解数组的开销：

How to calculate the memory usage of a Java array

An overview of memory saving techniques in Java

How Much Heap Space Does My Object Use?

值得注意的是，Array[Boolean]`占用了与Array[Byte]相同的内存。虽然理论上它们可以为布尔值进行压缩，以一个bit来存储，但实际它们不是这样存储的。如果你想要更有效地存储标志，你应该使用某种bit set来存储你的布尔值。

装箱数组的行为也很有趣。你可能认为一个装箱的原始数组需要占用Array[Object]相同的内存占用。毕竟，一个Boxed Int或Byte最终是一个java.lang.Integer或java.lang.Byte，它是Object的子类。然而，这些常量的较小值是内部池化的(interned)，因此在整个程序中只有公用的两个装箱的布尔值和256个装箱字节占用。因此，装箱的Array[Byte]最多只占用与Array[Int]相同的内存，因为它只填充了4个字节的指向共享对象的指针。

装箱的Int和Long对于小的数值是公用的，但是对绝大部分的数值不会共用。因此，这些最终占用与Array[Object]相同的（或更多）内存。

译者注：如果你对Java Integer小值的实现还不太了解的话，你可以测试下面的代码，看看结果是不是很意外
Integer i1=127;
Integer i2=127;
System.out.println(i1==i2); //true
Integer i3=128;
Integer i4=128;
System.out.println(i3==i4); //false

你可能从上面的表格数据中会发现更多的有趣的东西，但是上面基本的比较已经让你感觉到不同的数据结构有很大的不同。尽管当今的计算机拥有很多的内存，但是知道内存会占多大依然很有用：

使用更有效的数据结构意味着可以减少内存使用，可以使用更小的Amazon EC2 主机或者其它小的云主机来减少花费。在写这篇文章的时候，AWS m3-large机器的花费只是 r3-large机器的1/3，而后者只不过多了一倍的内存而已。积水成多，大的集群的节省的费用累加起来就不是一个小数目。
为每个元素使用较小的内存意味着你可以把它们直接放入内存，而不是持久化到数据库或者磁盘上。内存中的操作至少要比读取文件或者数据库访问要快一个数量级。

即使你一开始先忽略内存使用然后再回来优化它，当你回来优化的时候，你仍然想知道如何权衡不同集合的内存开销。希望这部分内容可以帮助你做出使您的代码更有效率的决定。

性能

我们下一个要关注的事情就是使用各种集合执行通用的操作要花多长时间。不像内存占用那样可以静态分析，运行时的性能通常会有噪点和=和随机性：JIT编译器是否启用、垃圾回收是否发生等。

然而，即使我们不能得到精确的数字，我们肯定能得到一个非常接近的数值。下面的内容就是执行各种有代表性的操作得到的“粗”的benchmark：

每个benchmark运行7次，一次2秒
每次运行都和其他的benchmark交叉运行
运行完毕，只取中间的五个值，最大最小值抛弃。然后计算平均值和标准差。

尽管这不是非常精确，但是对benchmark来说也足够了。Benchmark代码已经要花4个半小时运行，我不想再让它变得更严谨。

测试的独立的benchmark包括:

Construct: 构建一个大小为n的数据结构,从空的数据结构（Nil,等等）开始一次增加一个元素。使用 :: for Lists, :+ for Vectors, + for Sets, .append for mutable.Buffer等等，对于数组， benchmark即使用了:+也使用预分配大小的数组来测试。
Concat: 构建大小为n的两个同样类型的集合，然后把它们连接成一个集合。对于不可变集合使用++,对于可变集合使用 .appendAll 或 ++= 执行原地操作
Deconstruct: 开始构建一个大小为n的集合，然后一次一个的把元素移除，直到它为空为止。
Foreach: 迭代一个集合元素的时间花费
Lookup: 查找每一个元素的时间花费,使用collection(i)语法。注意它测试的查找每一个元素，通过一个while-loop实现。

Foreach 和 Lookup的 benchmark 要比其它的测试长10 甚至 100倍。总的时间被平均了（divided）。这是为了故意拉长运行时间,以便可以看到随机噪点和运行时的变化。

汇总数据如下，你可以查看原始数据。每一个benchmark（lookup,concat）占表格的一节,每一行代表一种集合的一个操作(有的集合有多个benchmark)，每一列代表执行这个benchmark所花费的平均时间。

注意为了简单这里并没有显示标准方差，如果你想查看它们可以跳到带标准差的性能数据一节。

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array-prealloc	17	10	14	41	186	710	2,710	11,000	45,100	183,000	730,000	3,100,000
Array:+	2	12	58	270	1,460	19,800	260,000	3,170,000	60,000,000	1,020,000,000
List::	1	4	12	69	301	1,220	4,900	19,800	79,000	329,000	1,510,000	9,100,000
Vector:+	2	15	99	410	1,730	7,000	28,600	324,000	1,498,000	7,140,000	31,700,000	131,000,000
Set+	1	12	58	1,860	8,530	37,400	166,000	783,000	3,600,000	18,100,000	94,000,000	473,000,000
Map+	1	6	95	2,100	9,010	38,900	171,000	810,000	3,710,000	18,400,000	96,000,000	499,000,000
Array.toSet	73	75	187	2,140	9,220	40,000	174,000	833,000	3,800,000	19,300,000	101,000,000	506,000,000
Array.toMap	21	31	104	2,100	9,200	39,500	173,000	820,000	3,790,000	19,500,000	104,000,000	540,000,000
Array.toVector	95	109	143	287	903	3,310	12,850	51,100	203,800	821,000	3,270,000	13,300,000
m.Buffer	19	30	58	174	691	2,690	10,840	43,000	169,800	687,000	2,770,000	11,790,000
m.Map.put	6	79	297	1,420	6,200	25,500	103,000	414,000	1,820,000	8,100,000	57,000,000	348,000,000
m.Set.add	13	76	276	1,430	6,700	27,900	113,000	455,000	1,840,000	7,900,000	39,000,000	267,000,000

deconstruct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array.tail	7	26	114	582	4,517	55,500	821,000	12,140,000	188,000,000	3,100,000,000
List.tail	2	2	7	21	100	420	2,100	10,000	35,000	120,000	540,000	1,500,000
Vector.tail	3	6	90	425	1,970	11,800	58,400	500,000	2,390,000	11,000,000	50,200,000	211,000,000
Vector.init	2	5	103	483	2,490	12,800	64,000	543,000	2,470,000	11,900,000	52,600,000	218,000,000
Set.-	8	30	162	1,480	7,700	34,200	164,000	770,000	3,660,000	20,300,000	94,000,000	420,000,000
Map.-	12	52	201	1,430	7,660	34,900	169,000	810,000	3,990,000	24,000,000	103,000,000	470,000,000
m.Buffer	6	8	14	43	166	630	2,510	10,000	40,600	167,000	660,000	2,490,000
m.Set	5	28	130	671	4,900	54,000	770,000	11,990,000	189,000,000	3,040,000,000
m.Map	7	44	172	670	3,650	26,400	282,000	3,970,000	62,600,000	1,000,000,000

concat	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array++	89	83	85	91	144	330	970	4,100	17,000	70,000	380,000	1,700,000
arraycopy	23	18	20	27	48	280	1,000	4,000	16,000	65,000	360,000	1,400,000
List	7	81	162	434	1,490	5,790	23,200	92,500	370,000	1,510,000	6,300,000	30,000,000
Vector	5	48	188	327	940	3,240	12,700	52,000	210,000	810,000	3,370,000	14,500,000
Set	91	95	877	1,130	5,900	26,900	149,000	680,000	3,600,000	23,000,000	100,000,000	280,000,000
Map	54	53	967	1,480	6,900	31,500	166,000	760,000	4,100,000	27,000,000	118,000,000	450,000,000
m.Buffer	11	32	32	38	70	250	700	3,900	20,000	40,000	400,000	1,500,000
m.Set	58	81	142	1,080	4,200	16,000	69,000	263,000	1,160,000	6,300,000	43,000,000	310,000,000
m.Map	47	69	181	990	3,700	15,000	62,000	290,000	1,500,000	16,000,000	103,000,000	493,000,000

foreach	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array	2	5	15	57	230	900	3,580	14,200	55,600	228,000	910,000	3,610,000
Array-while	0	1	0	1	0	0	0	-4	10	70	0	500
List	0	3	13	50	209	800	3,500	14,100	55,000	231,000	920,000	3,800,000
List-while	4	5	13	49	211	812	3,400	14,200	57,000	226,000	930,000	3,700,000
Vector	15	19	30	74	268	1,000	3,960	16,200	62,000	256,000	1,030,000	4,300,000
Set	4	5	10	99	420	1,560	10,200	51,000	217,000	2,200,000	10,800,000	48,600,000
Map	19	7	20	140	610	2,500	13,900	72,800	360,000	3,700,000	20,700,000	75,000,000
m.Buffer	0	1	1	1	1	0	1	2	-1	-10	0	-200
m.Set	19	26	50	130	508	2,190	11,900	56,600	235,000	940,000	3,800,000	14,700,000
m.Map	8	16	48	146	528	2,210	10,300	54,100	255,000	1,140,000	6,800,000	30,000,000

lookup	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array	0	1	1	1	0	0	1	-1	4	0	100	-200
List	0	1	8	103	2,390	47,200	870,000	16,900,000
Vector	0	1	5	17	104	440	1,780	8,940	38,000	198,000	930,000	4,260,000
Set	0	18	81	507	1,980	7,800	39,800	203,000	1,040,000	8,300,000
Map	0	12	97	578	2,250	9,400	46,000	233,000	1,150,000	11,400,000
m.Buffer	0	1	1	1	1	1	1	0	6	-10	0	0
m.Set	0	5	22	97	410	1,690	7,100	31,300	148,000	690,000	4,800,000
m.Map	0	6	25	112	454	1,910	9,400	52,500	243,000	1,760,000	9,900,000

带标准差的原始的数据和测试代码在文章的后面，你可以深入挖据一下这些数据。在本文中，我将讨论一些我看到的一些有趣的数据。

Construction Performance

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array-prealloc	17	10	14	41	186	710	2,710	11,000	45,100	183,000	730,000	3,100,000
Array:+	2	12	58	270	1,460	19,800	260,000	3,170,000	60,000,000	1,020,000,000
List::	1	4	12	69	301	1,220	4,900	19,800	79,000	329,000	1,510,000	9,100,000
Vector:+	2	15	99	410	1,730	7,000	28,600	324,000	1,498,000	7,140,000	31,700,000	131,000,000
Set+	1	12	58	1,860	8,530	37,400	166,000	783,000	3,600,000	18,100,000	94,000,000	473,000,000
Map+	1	6	95	2,100	9,010	38,900	171,000	810,000	3,710,000	18,400,000	96,000,000	499,000,000
Array.toSet	73	75	187	2,140	9,220	40,000	174,000	833,000	3,800,000	19,300,000	101,000,000	506,000,000
Array.toMap	21	31	104	2,100	9,200	39,500	173,000	820,000	3,790,000	19,500,000	104,000,000	540,000,000
Array.toVector	95	109	143	287	903	3,310	12,850	51,100	203,800	821,000	3,270,000	13,300,000
m.Buffer	19	30	58	174	691	2,690	10,840	43,000	169,800	687,000	2,770,000	11,790,000
m.Map.put	6	79	297	1,420	6,200	25,500	103,000	414,000	1,820,000	8,100,000	57,000,000	348,000,000
m.Set.add	13	76	276	1,430	6,700	27,900	113,000	455,000	1,840,000	7,900,000	39,000,000	267,000,000

上面的表格显示的构建集合的性能,一次添加一个数据。使用 :: for Lists, :+ for Vectors, .add 或 .append 或 .put for the mutable collections. Array有两个benchmark: 一个使用 :+, 一个使用预先分配大小的数组，然后使用一个 while-loop 添加元素。

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

List::	1	4	12	69	301	1,220	4,900	19,800	79,000	329,000	1,510,000	9,100,000
Vector:+	2	15	99	410	1,730	7,000	28,600	324,000	1,498,000	7,140,000	31,700,000	131,000,000

它显示构建一个Vector要比构建List慢5到15倍，具体和集合大小有关。

这也许不令人惊讶。增加一个元素到链表中非常简单，但是这个数量接让我震惊。如果你是一个一个的添加元素到集合中，使用List要比Vector快很多。如果你的代码中构建一个Vector变成了瓶颈，那么你应该考虑使用List或者Buffer来替换它。

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

List::	1	4	12	69	301	1,220	4,900	19,800	79,000	329,000	1,510,000	9,100,000
m.Buffer	19	30	58	174	691	2,690	10,840	43,000	169,800	687,000	2,770,000	11,790,000

使用.append构建一个mutable.Buffer看起来要比使用::构建List要慢2 ~ 3倍，尽管对于大下降到1.5倍慢。我有点惊讶，这意味着如果你想更快，List是一个更好的选择。

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array-prealloc	17	10	14	41	186	710	2,710	11,000	45,100	183,000	730,000	3,100,000
List::	1	4	12	69	301	1,220	4,900	19,800	79,000	329,000	1,510,000	9,100,000
m.Buffer	19	30	58	174	691	2,690	10,840	43,000	169,800	687,000	2,770,000	11,790,000

最快的是预分配内存的数组，大约比构建List快4倍，比构建mutable.Buffer快5倍，比构建Vector快15倍。

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array-prealloc	17	10	14	41	186	710	2,710	11,000	45,100	183,000	730,000	3,100,000
Array:+	2	12	58	270	1,460	19,800	260,000	3,170,000	60,000,000	1,020,000,000

使用:+构建数组需要花费双倍的时间，因为它每次都要复制整个数组：当数组还小的时候还好，但是数组长度变大的时候就很慢了，当元素的数量到了上万个的时候时间花费变得不可思议的长了。

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array-prealloc	17	10	14	41	186	710	2,710	11,000	45,100	183,000	730,000	3,100,000
Set+	1	12	58	1,860	8,530	37,400	166,000	783,000	3,600,000	18,100,000	94,000,000	473,000,000
Map+	1	6	95	2,100	9,010	38,900	171,000	810,000	3,710,000	18,400,000	96,000,000	499,000,000
Array.toSet	73	75	187	2,140	9,220	40,000	174,000	833,000	3,800,000	19,300,000	101,000,000	506,000,000
Array.toMap	21	31	104	2,100	9,200	39,500	173,000	820,000	3,790,000	19,500,000	104,000,000	540,000,000

一个个添加元素的方式构建Set 和 Map变得很慢：比构建List慢30倍，比预分配大小的数组慢150倍。这大概是因为Set和Map需要构建Hash或者做类似检查来维持一致性。

Set和Map慢是在意料之中，意料之外的是居然这么慢。这意味着如果你想要代码更快，你不应该使用Set或者Map，除非你必需它们的唯一性这个性质，否则使用List或者mutable.Buffer。

预分配的数组调用.toSet 或 .toMap方法并不会直接构造Set和Map更快。而调用toVector却可以更快。

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array-prealloc	17	10	14	41	186	710	2,710	11,000	45,100	183,000	730,000	3,100,000
List::	1	4	12	69	301	1,220	4,900	19,800	79,000	329,000	1,510,000	9,100,000
Vector:+	2	15	99	410	1,730	7,000	28,600	324,000	1,498,000	7,140,000	31,700,000	131,000,000
Array.toVector	95	109	143	287	903	3,310	12,850	51,100	203,800	821,000	3,270,000	13,300,000

上表表明，通过预分配数组，填充元素，然后再调用.toVector方法可以更快的构建Vector，会比直接构建Vector快10倍。这里没有列出的benchmark表明，先构建mutable.Buffer然后调用 .toVector 方法也比直接构建Vector快。

Deconstruction Performance

deconstruct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array.tail	7	26	114	582	4,517	55,500	821,000	12,140,000	188,000,000	3,100,000,000
List.tail	2	2	7	21	100	420	2,100	10,000	35,000	120,000	540,000	1,500,000
Vector.tail	3	6	90	425	1,970	11,800	58,400	500,000	2,390,000	11,000,000	50,200,000	211,000,000
Vector.init	2	5	103	483	2,490	12,800	64,000	543,000	2,470,000	11,900,000	52,600,000	218,000,000
Set.-	8	30	162	1,480	7,700	34,200	164,000	770,000	3,660,000	20,300,000	94,000,000	420,000,000
Map.-	12	52	201	1,430	7,660	34,900	169,000	810,000	3,990,000	24,000,000	103,000,000	470,000,000
m.Buffer	6	8	14	43	166	630	2,510	10,000	40,600	167,000	660,000	2,490,000
m.Set	5	28	130	671	4,900	54,000	770,000	11,990,000	189,000,000	3,040,000,000
m.Map	7	44	172	670	3,650	26,400	282,000	3,970,000	62,600,000	1,000,000,000

mutable.Buffer 和 List 是最快的移除元素操作的集合。这很合理，因为从mutable.Buffer移除元素只是改变它的size。从List的头部移除元素只是得到它的.tail，不需要做数据结构的改变。

construct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Vector:+	2	15	99	410	1,730	7,000	28,600	324,000	1,498,000	7,140,000	31,700,000	131,000,000

deconstruct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Vector.tail	3	6	90	425	1,970	11,800	58,400	500,000	2,390,000	11,000,000	50,200,000	211,000,000
Vector.init	2	5	103	483	2,490	12,800	64,000	543,000	2,470,000	11,900,000	52,600,000	218,000,000

使用.tail或.init从Vector移除元素更慢，要比:+增加一个元素慢50%。

deconstruct	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

List.tail	2	2	7	21	100	420	2,100	10,000	35,000	120,000	540,000	1,500,000
Vector.tail	3	6	90	425	1,970	11,800	58,400	500,000	2,390,000	11,000,000	50,200,000	211,000,000
Set.-	8	30	162	1,480	7,700	34,200	164,000	770,000	3,660,000	20,300,000	94,000,000	420,000,000
Map.-	12	52	201	1,430	7,660	34,900	169,000	810,000	3,990,000	24,000,000	103,000,000	470,000,000
m.Set	5	28	130	671	4,900	54,000	770,000	11,990,000	189,000,000	3,040,000,000
m.Map	7	44	172	670	3,650	26,400	282,000	3,970,000	62,600,000	1,000,000,000

因为某种原因，重复地从Map和Set移除.head也很慢，从mutable.Map和mutable.Set移除元素更慢。我不知道为什么出现这种现象。

Concatenation Performance

concat	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array++	89	83	85	91	144	330	970	4,100	17,000	70,000	380,000	1,700,000
arraycopy	23	18	20	27	48	280	1,000	4,000	16,000	65,000	360,000	1,400,000
List	7	81	162	434	1,490	5,790	23,200	92,500	370,000	1,510,000	6,300,000	30,000,000
Vector	5	48	188	327	940	3,240	12,700	52,000	210,000	810,000	3,370,000	14,500,000
Set	91	95	877	1,130	5,900	26,900	149,000	680,000	3,600,000	23,000,000	100,000,000	280,000,000
Map	54	53	967	1,480	6,900	31,500	166,000	760,000	4,100,000	27,000,000	118,000,000	450,000,000
m.Buffer	11	32	32	38	70	250	700	3,900	20,000	40,000	400,000	1,500,000
m.Set	58	81	142	1,080	4,200	16,000	69,000	263,000	1,160,000	6,300,000	43,000,000	310,000,000
m.Map	47	69	181	990	3,700	15,000	62,000	290,000	1,500,000	16,000,000	103,000,000	493,000,000

连接集合最快的是mutable.Buffers和普通的数组，因为它们只是简单的复制元素到一个新的集合中。 mutable.Buffer内部使用一个数组，所以它需要重新分配更大的数组来复制数据，而数组则是将两个输入数组复制到一个更大的数组中。你使用 Array ++ Array还是System.arraycopy并不重要。

It turns out that while the clever algorithms and structural-sharing and what not that go into Scala's immutable Vectors and Sets make it faster to build things up incrementally element-by-element (as seen in the Construction Performance benchmark), for this kind of bulk-concatenation it's still faster just to copy everything manually into a new array and skip all the fancy data-structure stuff.

尽管Vector和List的连接要比mutable.Buffer或Array要慢，Vector的连接却比List的连接要快一倍。

Set和Map还是很慢，比Vector或List要慢10倍，比Array或mutable.Buffer慢100倍。

Foreach Performance

foreach	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array	2	5	15	57	230	900	3,580	14,200	55,600	228,000	910,000	3,610,000
Array-while	0	1	0	1	0	0	0	-4	10	70	0	500
List	0	3	13	50	209	800	3,500	14,100	55,000	231,000	920,000	3,800,000
List-while	4	5	13	49	211	812	3,400	14,200	57,000	226,000	930,000	3,700,000
Vector	15	19	30	74	268	1,000	3,960	16,200	62,000	256,000	1,030,000	4,300,000
Set	4	5	10	99	420	1,560	10,200	51,000	217,000	2,200,000	10,800,000	48,600,000
Map	19	7	20	140	610	2,500	13,900	72,800	360,000	3,700,000	20,700,000	75,000,000
m.Buffer	0	1	1	1	1	0	1	2	-1	-10	0	-200
m.Set	19	26	50	130	508	2,190	11,900	56,600	235,000	940,000	3,800,000	14,700,000
m.Map	8	16	48	146	528	2,210	10,300	54,100	255,000	1,140,000	6,800,000	30,000,000

迭代大部分常用的集合都很快，无论集合是List、Vector还是Array。使用while-loop和head/tail的速度是一样的，所以如果你想手写迭代来提高性能，结果可能没什么区别。

另一方面，相比迭代数组或者Vector，迭代Set和Map要慢10～15倍。可变的Set和Map要好一点，值慢了3～8倍。

使用while-loop迭代数组是最快的，基本上快的连噪点都测量不出来。但是迭代mutable.Buffer并没有显著的提升，不清楚为什么这样。

Lookup Performance

lookup	0	1	4	16	64	256	1,024	4,096	16,192	65,536	262,144	1,048,576

Array	0	1	1	1	0	0	1	-1	4	0	100	-200
List	0	1	8	103	2,390	47,200	870,000	16,900,000
Vector	0	1	5	17	104	440	1,780	8,940	38,000	198,000	930,000	4,260,000
Set	0	18	81	507	1,980	7,800	39,800	203,000	1,040,000	8,300,000
Map	0	12	97	578	2,250	9,400	46,000	233,000	1,150,000	11,400,000
m.Buffer	0	1	1	1	1	1	1	0	6	-10	0	0
m.Set	0	5	22	97	410	1,690	7,100	31,300	148,000	690,000	4,800,000
m.Map	0	6	25	112	454	1,910	9,400	52,500	243,000	1,760,000	9,900,000

我们看到数组和mutable.Buffer的lookup非常快，以至于无法测到噪点。下一个快的集合是索引的Vector的lookup,它会使用较少的时间，例如，在一百万的元素中查找一个元素需要4毫秒：在大部分的情况下这并不明显，但是如果累加起来时间就很客观了，例如在实时的游戏中就对时间要求很高。

注意这个时间是在集合中查找每一个元素，而不是查找某一个元素。所以我们希望用大的集合来花费更多的时间来完成性能测试，即使时间花费是一个常量。

不可变的Set和Map比Vector要花费非常长的时间执行查找。不可变的Set会花费10～20倍的时间，而Map要花费10～40倍的时间。

而可变Set和Map查找的性能要比不可变的Set和Map要好。可变Set的查找的时间要比Vector慢4～5倍，而可变Map要比Vector的查找慢5～10倍。可以看到可变Set和Map要比不可变Set和Map快2～4倍，这可能是因为这些集合的可变版本使用hash表而这些不可变集合使用tree。

List lookups by index across the entire list is, as expected, quadratic in time: Each individual lookup takes time linear in the size of the collection, and there are a linear number of lookups to perform in this benchmark. While it's reasonable up to about 16 elements, it quickly blows up after that.

测试结论

我们已经看到了很多数据，也比较了一些集合的数据。对于天天使用Scala的程序员，这些数据有什么意义呢？下面列出了一些有趣的结论。

数组超级好

一个未装箱的基本类型的数组只要装箱的数组的内存的1/4或者1/5，例如 Array[Int]vs Array[java.lang.Integer]。这不简单，如果你要处理大量的基本类型的数据，使用非装箱数组会帮你省好多钱。

除了基本类型数组，即使装箱的数组要有漂亮的性能数据。连接两个数据要比连接其它集合要快，甚至比List和Vector这些有精心设计的数据结构还要快。对于有一百万元素的集合，甚至要快10倍。SI-4442提交了一个issue用来解决这个问题，但是目前的性能还是如此。

我非常惊讶的是数组的连接是那么快，甚至那些"花哨"的数据结构如List和Vector，使用共享数据避免复制整个数据都比不上数组。它表明复制整体数据事实上要比组合那些持久化数据结构都要快。所以如果你要处理一个不可变集合，有时候需要把它分成片段或者连接起来，使用数组要更快。

Set 和 Map 都很慢

使用不可变的Vector的查找只需要不可变的Set的1/10 ～ 1/20，不可变Map的1/10 ～ 1/40，即使是使用可变的Set和Map，Vector的速度也很快，当然使用原始的数组会更快。

这是合理的，因为Set和Map的查找包含很多的hash计算和hash比较，即使是对于简单的键值也是这样，这些操作会带来显著的时间花费。相反，Vector查找只是简单的整数运算，数组查找只是指针的移动。

Set和Map不只是查找很慢：构建它们也很慢，移除元素也很慢，连接集合也很慢。即使操作不需要执行hash计算，比如迭代(iteration),也比迭代一个Vector慢19倍。

所以除非你需要不能重复元素的集合才使用Set，需要键值对才选择Map，否则尽量不用它们，因为它们才是程序慢的原因。如果你的键的集合比较小并且性能有问题，你可以为每个键分配一个整数，使用Vector[Boolean]代替Set，使用Vector[T]代替Map，查找的时候使用整数查找。有时候，即使你知道集合的元素不会重复，不使用Set而是使用数组、Vector或者List会带来很好的性能。

可变Set和Map比不可变Set和Map会快一点，内存占用也小一点：1/2的内存占用，2～4倍快的查找，2倍快的迭代，2倍快的构建（.add/.put）,但是相比数组、Vector和List来说还是很慢。

Lists vs Vectors

是选择单链表形式的List还是选择tree形式的Vector，选择哪个集合的标准很模糊。“有效的常数时间”的定义不能理清这种模糊认识，但是通过上面的测试数据，我们可以有一个更好的判断：

相同大小的List的内存占用是Vector的内存。后者和数组的开销有得一拼。
构建一个List的时间比构建Vector快5～15倍。
通过.tail从List中移除元素比Vector快50～60倍
可以通过索引在Vector查找元素，而通过List按照索引查找的时间是O(n),元素数量很大的时候时间花费是很大的
迭代这两个集合的时间花费差不多

如果你自己构建集合并且一个个的移除元素，迭代它们，最好使用List。如果你需要根据索引查找元素，则选择Vector。使用:+和.tail增加元素和移除元素虽然不会把你弄垮，但是它们却非常的慢。

Lists vs mutable.Buffer

List除了作为不可变集合外，还经常使用var作为可变集合，而mutable.Buffer也有相同的目的，那你该用哪一个集合呢？

数据表明使用List要比mutable.Buffer在处理一个个的元素的时候要快：对小集合来说快2～3倍，大集合要快1.5～2倍。很大的差异！

除了性能，它们之间也有一些其它不同：

mutable.Buffer可以快速的按照索引查找，而List不可以
List是持久化的，所以你可以增加/移除你的列表副本而不会影响其他人对这个列表的使用，而mutable.Buffer却不是，一个人的修改会影响其他使用的人
List的构建是"向后"的(backwards)。最后加入的元素在迭代的时候位于第一个位置，这可能不是你所期望的，你可能在使用之前对它反转。mutable.Buffer则是"向前"构建的，第一个增加的元素在迭代的时候总是第一个。
mutable.Buffer只占List的花销（overhead）的一半

对于遍历元素的操作，List更快，但会用更多的内存。如果你根本不关心其它影响因素，这个因素可能帮你决定你该使用哪一个集合。

Vector还算好吧

尽管Vector经常倍认为是一个很好的通用目的的数据结构，但是数据表明它也不是想象的那么好：

小的vector，比如之包含一个元素，有1/5的内存花销，而对于大的Vector，它的花销却可以忽略不计。如果你有很多的小集合，使用Vector可能会带来内存的浪费
Vector一个元素一个元素填充式的构建比List或者mutable.Buffer要慢5～15倍，比预分配大小的数组慢40倍
Vector按照索引进行查找的性能可以接受，但也比数组或者mutable.Buffer要慢5～15倍，比预分配大小的数组慢40倍
两个差不多大小的Vector的连接时间比List的连接的事时间快一倍，但比数组慢10倍，尽管数组需要全复制。

总体来说，Vector还是一个可以接受的通用目的的集合。

一方面，你看不到Vector有什么严重的性能缺陷，大部分的Vector的性能都处于中等水平，如果你使用它们，不会出现O(n^2)的时间
另一方面，常用操作要比理想的数据结构慢一个数量级。增量构建比List慢5～15，索引查找比数组慢很多，即使连接操作也比数组慢10倍

Vector通常是缺省的数组结构，但是如果可能的话，直接使用List或者mutable.Buffer可能会给性能带来数量级的提升。可能有些夸张，但是的确是值得关注的事情。十倍的性能提升是巨大的。

总结

本文提供了一个具体基准帮助你选择哪一种Scala集合。理论分析经常错过很多重要因素,因为很自然的你只分析那些你认为重要的因素。本文的测试从实践的角度提供了集合的有意义的行为。

例如，令我惊讶的是从Vector移除一个元素远比增加一个元素要慢很多，还有Set和Map比Vector和List慢很多，即使是迭代这种不受hash操作影响的操作也很慢。还有数组的连接出人意料的快，比那些使用共享数据的集合还快。

就像强调集合之间的差异一样，这套benchmark也强调很多无关重要的事情：

List、Array和Vector的foreach的区别
数据巨大的Vector和Array的内存占用比较
通过+构建Set还是先构建一个数组然后调用.toSet?

尽管代码不同，内部的数据结构也不同，实践中可能你选择哪个数据结构美太大关系，选择一个合适的。

下次你要选择一个特殊的集合的时候，或者你和同事讨论哪个集合合适的时候，可以随时回来查看这篇植物。

参考

带标准差的性能数据

下面的性能数据，加上了标准差。注意它们是从7次测试中取了中间5次之后的标准差。

construct

256

1,024

4,096

16,192

65,536

262,144

1,048,576

Array-prealloc

17 ± 0%

10 ± 0%

14 ± 0%

41 ± 0%

186 ± 0.5%

710 ± 2.1%

2,710 ± 2.6%

11,000 ± 2.7%

45,100 ± 2.1%

183,000 ± 2.0%

730,000 ± 2.0%

3,100,000 ± 1.6%

Array:+

2 ± 0%

12 ± 0%

58 ± 0%

270 ± 0.4%

1,460 ± 2.4%

19,800 ± 1.5%

260,000 ± 1.7%

3,170,000 ± 1.5%

60,000,000 ± 1.8%

1,020,000,000 ± 1.2%

Vector

2 ± 0%

15 ± 6.7%

99 ± 1.0%

410 ± 2.7%

1,730 ± 2.3%

7,000 ± 2.7%

28,600 ± 3.4%

324,000 ± 0.5%

1,498,000 ± 0.5%

7,140,000 ± 0.4%

31,700,000 ± 0.4%

131,000,000 ± 1.0%

List

1 ± 0%

4 ± 0%

12 ± 0%

69 ± 1.4%

301 ± 2.7%

1,220 ± 2.3%

4,900 ± 2.2%

19,800 ± 2.8%

79,000 ± 2.5%

329,000 ± 1.6%

1,510,000 ± 1.2%

9,100,000 ± 8.3%

Set

1 ± 0%

12 ± 0%

58 ± 0%

1,860 ± 0.5%

8,530 ± 0.4%

37,400 ± 0.8%

166,000 ± 1.2%

783,000 ± 0.5%

3,600,000 ± 1.6%

18,100,000 ± 0.7%

94,000,000 ± 1.4%

473,000,000 ± 1.5%

Map

1 ± 0%

6 ± 0%

95 ± 0%

2,100 ± 1.9%

9,010 ± 0.7%

38,900 ± 0.5%

171,000 ± 1.0%

810,000 ± 1.3%

3,710,000 ± 1.3%

18,400,000 ± 1.6%

96,000,000 ± 1.1%

499,000,000 ± 1.4%

Array.toVector

95 ± 1.1%

109 ± 0%

143 ± 0%

287 ± 0.3%

903 ± 0.9%

3,310 ± 0.2%

12,850 ± 0.5%

51,100 ± 0.8%

203,800 ± 0.5%

821,000 ± 0.5%

3,270,000 ± 1.3%

13,300,000 ± 1.4%

Array.toSet

73 ± 0%

75 ± 0%

187 ± 0.5%

2,140 ± 1.4%

9,220 ± 0.9%

40,000 ± 1.2%

174,000 ± 0.9%

833,000 ± 0.6%

3,800,000 ± 1.2%

19,300,000 ± 0.6%

101,000,000 ± 1.4%

506,000,000 ± 1.7%

Array.toMap

21 ± 0%

31 ± 0%

104 ± 1.0%

2,100 ± 0.5%

9,200 ± 1.5%

39,500 ± 1.6%

173,000 ± 1.7%

820,000 ± 1.7%

3,790,000 ± 2.1%

19,500,000 ± 1.6%

104,000,000 ± 2.5%

540,000,000 ± 2.2%

m.Buffer

19 ± 0%

30 ± 0%

58 ± 0%

174 ± 1.1%

691 ± 0.7%

2,690 ± 1.0%

10,840 ± 0.7%

43,000 ± 0.7%

169,800 ± 0.4%

687,000 ± 0.7%

2,770,000 ± 0.6%

11,790,000 ± 0.7%

m.Set

13 ± 0%

76 ± 0%

276 ± 0.4%

1,430 ± 1.1%

6,700 ± 0.9%

27,900 ± 1.2%

113,000 ± 1.6%

455,000 ± 1.4%

1,840,000 ± 1.2%

7,900,000 ± 1.4%

39,000,000 ± 3.0%

267,000,000 ± 3.2%

m.Map

6 ± 0%

79 ± 1.3%

297 ± 0.3%

1,420 ± 1.0%

6,200 ± 0.7%

25,500 ± 1.0%

103,000 ± 1.9%

414,000 ± 2.0%

1,820,000 ± 2.0%

8,100,000 ± 3.3%

57,000,000 ± 4.6%

348,000,000 ± 2.4%

deconstruct

256

1,024

4,096

16,192

65,536

262,144

1,048,576

Array.tail

7 ± 0%

26 ± 0%

114 ± 0.9%

582 ± 0.5%

4,517 ± 0.1%

55,500 ± 0.9%

821,000 ± 0.3%

12,140,000 ± 0.6%

188,000,000 ± 1.0%

3,100,000,000 ± 0.4%

List.tail

2 ± 0%

7 ± 0%

21 ± 4.8%

100 ± 10.6%

420 ± 3.7%

2,100 ± 5.9%

10,000 ± 10.4%

35,000 ± 4.6%

120,000 ± 9.4%

540,000 ± 9.2%

1,500,000 ± 53.5%

Vector.tail

3 ± 0%

6 ± 0%

90 ± 1.1%

425 ± 2.1%

1,970 ± 1.7%

11,800 ± 2.6%

58,400 ± 1.1%

500,000 ± 2.2%

2,390,000 ± 1.3%

11,000,000 ± 1.2%

50,200,000 ± 0.5%

211,000,000 ± 1.3%

Vector.init

2 ± 0%

5 ± 0%

103 ± 1.0%

483 ± 1.9%

2,490 ± 1.8%

12,800 ± 2.0%

64,000 ± 2.8%

543,000 ± 0.8%

2,470,000 ± 1.7%

11,900,000 ± 1.8%

52,600,000 ± 1.5%

218,000,000 ± 1.5%

Set.-

8 ± 12.5%

30 ± 3.3%

162 ± 1.2%

1,480 ± 3.9%

7,700 ± 3.0%

34,200 ± 1.2%

164,000 ± 1.5%

770,000 ± 1.4%

3,660,000 ± 2.6%

20,300,000 ± 0.7%

94,000,000 ± 1.3%

420,000,000 ± 1.8%

Map.-

12 ± 8.3%

52 ± 0%

201 ± 0.5%

1,430 ± 1.3%

7,660 ± 0.5%

34,900 ± 0.9%

169,000 ± 0.7%

810,000 ± 2.1%

3,990,000 ± 0.3%

24,000,000 ± 3.4%

103,000,000 ± 5.1%

470,000,000 ± 3.4%

m.Buffer

6 ± 0%

8 ± 12.5%

14 ± 7.1%

43 ± 2.3%

166 ± 0.6%

630 ± 2.8%

2,510 ± 2.9%

10,000 ± 3.0%

40,600 ± 1.7%

167,000 ± 2.8%

660,000 ± 4.2%

2,490,000 ± 3.0%

m.Set

5 ± 0%

28 ± 7.1%

130 ± 1.5%

671 ± 1.0%

4,900 ± 2.9%

54,000 ± 1.6%

770,000 ± 1.1%

11,990,000 ± 0.8%

189,000,000 ± 1.1%

3,040,000,000 ± 0.5%

m.Map

7 ± 14.3%

44 ± 2.3%

172 ± 3.5%

670 ± 4.6%

3,650 ± 2.6%

26,400 ± 1.8%

282,000 ± 1.3%

3,970,000 ± 0.4%

62,600,000 ± 1.0%

1,000,000,000 ± 1.1%

concat

256

1,024

4,096

16,192

65,536

262,144

1,048,576

arraycopy

23 ± 0%

18 ± 5.6%

20 ± 5.0%

27 ± 3.7%

48 ± 16.7%

280 ± 15.5%

1,000 ± 12.0%

4,000 ± 7.2%

16,000 ± 11.0%

65,000 ± 6.7%

360,000 ± 14.4%

1,400,000 ± 23.1%

Array++

89 ± 1.1%

83 ± 1.2%

85 ± 1.2%

91 ± 1.1%

144 ± 5.6%

330 ± 14.7%

970 ± 5.6%

4,100 ± 5.5%

17,000 ± 17.0%

70,000 ± 14.9%

380,000 ± 11.5%

1,700,000 ± 7.7%

List

7 ± 0%

81 ± 1.2%

162 ± 1.2%

434 ± 0.5%

1,490 ± 2.5%

5,790 ± 0.8%

23,200 ± 1.3%

92,500 ± 0.4%

370,000 ± 1.1%

1,510,000 ± 1.7%

6,300,000 ± 1.8%

30,000,000 ± 6.5%

Vector

5 ± 20.0%

48 ± 2.1%

188 ± 1.6%

327 ± 0.3%

940 ± 2.2%

3,240 ± 2.0%

12,700 ± 2.4%

52,000 ± 4.1%

210,000 ± 2.2%

810,000 ± 1.6%

3,370,000 ± 1.3%

14,500,000 ± 2.8%

Set

91 ± 1.1%

95 ± 3.2%

877 ± 0.7%

1,130 ± 3.4%

5,900 ± 3.1%

26,900 ± 2.5%

149,000 ± 2.7%

680,000 ± 2.0%

3,600,000 ± 3.3%

23,000,000 ± 2.0%

100,000,000 ± 6.9%

280,000,000 ± 12.6%

Map

54 ± 1.9%

53 ± 1.9%

967 ± 0.9%

1,480 ± 5.4%

6,900 ± 2.2%

31,500 ± 1.0%

166,000 ± 1.4%

760,000 ± 2.9%

4,100,000 ± 2.9%

27,000,000 ± 3.5%

118,000,000 ± 4.6%

450,000,000 ± 11.7%

m.Buffer

11 ± 0%

32 ± 9.4%

32 ± 18.8%

38 ± 2.6%

70 ± 19.2%

250 ± 13.7%

700 ± 29.1%

3,900 ± 10.0%

20,000 ± 41.6%

40,000 ± 34.9%

400,000 ± 14.7%

1,500,000 ± 19.5%

m.Set

58 ± 3.4%

81 ± 6.2%

142 ± 4.9%

1,080 ± 3.1%

4,200 ± 3.3%

16,000 ± 6.7%

69,000 ± 5.3%

263,000 ± 2.1%

1,160,000 ± 4.8%

6,300,000 ± 3.7%

43,000,000 ± 5.6%

310,000,000 ± 8.1%

m.Map

47 ± 2.1%

69 ± 2.9%

181 ± 3.3%

990 ± 1.1%

3,700 ± 3.0%

15,000 ± 2.9%

62,000 ± 5.6%

290,000 ± 5.2%

1,500,000 ± 16.2%

16,000,000 ± 6.8%

103,000,000 ± 4.3%

493,000,000 ± 1.2%

foreach

256

1,024

4,096

16,192

65,536

262,144

1,048,576

Array

2 ± 0%

5 ± 0%

15 ± 0%

57 ± 1.8%

230 ± 1.7%

900 ± 2.0%

3,580 ± 1.4%

14,200 ± 1.7%

55,600 ± 0.8%

228,000 ± 1.6%

910,000 ± 1.9%

3,610,000 ± 0.7%

Array-while

0 ± 0%

1 ± 0%

0 ± 0%

1 ± 0%

0 ± 0%

-4 ± 100.0%

10 ± 166.7%

70 ± 97.1%

0 ± 507.0%

500 ± 153.3%

List

0 ± 0%

3 ± 0%

13 ± 0%

50 ± 2.0%

209 ± 1.9%

800 ± 2.1%

3,500 ± 3.5%

14,100 ± 3.8%

55,000 ± 4.6%

231,000 ± 2.8%

920,000 ± 9.7%

3,800,000 ± 6.4%

List-while

4 ± 0%

5 ± 0%

13 ± 0%

49 ± 2.0%

211 ± 0.9%

812 ± 1.1%

3,400 ± 2.9%

14,200 ± 4.5%

57,000 ± 6.9%

226,000 ± 2.2%

930,000 ± 6.5%

3,700,000 ± 4.4%

Vector

15 ± 0%

19 ± 0%

30 ± 0%

74 ± 1.4%

268 ± 2.2%

1,000 ± 2.5%

3,960 ± 1.9%

16,200 ± 4.1%

62,000 ± 2.4%

256,000 ± 1.5%

1,030,000 ± 1.5%

4,300,000 ± 3.2%

Set

4 ± 0%

5 ± 0%

10 ± 0%

99 ± 1.0%

420 ± 2.6%

1,560 ± 2.4%

10,200 ± 4.1%

51,000 ± 1.7%

217,000 ± 2.2%

2,200,000 ± 5.3%

10,800,000 ± 1.7%

48,600,000 ± 1.8%

Map

19 ± 0%

7 ± 0%

20 ± 0%

140 ± 2.1%

610 ± 4.0%

2,500 ± 3.9%

13,900 ± 3.9%

72,800 ± 0.9%

360,000 ± 3.3%

3,700,000 ± 8.2%

20,700,000 ± 1.6%

75,000,000 ± 3.6%

m.Buffer

0 ± 0%

1 ± 0%

0 ± 0%

1 ± 0%

2 ± 100.0%

-1 ± 800.0%

-10 ± 423.5%

0 ± 259.4%

-200 ± 185.1%

m.Set

19 ± 0%

26 ± 0%

50 ± 2.0%

130 ± 1.5%

508 ± 1.4%

2,190 ± 0.5%

11,900 ± 2.1%

56,600 ± 1.4%

235,000 ± 2.6%

940,000 ± 2.2%

3,800,000 ± 5.5%

14,700,000 ± 5.0%

m.Map

8 ± 0%

16 ± 0%

48 ± 2.1%

146 ± 0.7%

528 ± 1.1%

2,210 ± 1.7%

10,300 ± 2.8%

54,100 ± 0.4%

255,000 ± 2.0%

1,140,000 ± 5.4%

6,800,000 ± 5.4%

30,000,000 ± 6.6%

lookup

256

1,024

4,096

16,192

65,536

262,144

1,048,576

Array

0 ± 0%

1 ± 0%

0 ± 0%

1 ± 0%

-1 ± 200.0%

4 ± 200.0%

0 ± 675.0%

100 ± 71.6%

-200 ± 215.5%

List

0 ± 0%

1 ± 0%

8 ± 0%

103 ± 1.0%

2,390 ± 0.5%

47,200 ± 1.0%

870,000 ± 2.6%

16,900,000 ± 2.8%

Vector

0 ± 0%

1 ± 0%

5 ± 0%

17 ± 0%

104 ± 2.9%

440 ± 2.7%

1,780 ± 2.6%

8,940 ± 1.1%

38,000 ± 1.1%

198,000 ± 1.3%

930,000 ± 1.6%

4,260,000 ± 1.7%

Set

0 ± 0%

18 ± 0%

81 ± 1.2%

507 ± 0.6%

1,980 ± 0.8%

7,800 ± 1.8%

39,800 ± 1.8%

203,000 ± 1.1%

1,040,000 ± 2.3%

8,300,000 ± 2.8%

Map

0 ± 0%

12 ± 0%

97 ± 1.0%

578 ± 1.6%

2,250 ± 2.8%

9,400 ± 1.5%

46,000 ± 2.2%

233,000 ± 1.7%

1,150,000 ± 2.9%

11,400,000 ± 2.6%

m.Buffer

0 ± 0%

1 ± 0%

1 ± 100.0%

0 ± ∞%

6 ± 133.3%

-10 ± 200.0%

0 ± 415.4%

0 ± 970.0%

m.Set

0 ± 0%

5 ± 0%

22 ± 0%

97 ± 1.0%

410 ± 2.7%

1,690 ± 1.4%

7,100 ± 2.2%

31,300 ± 1.8%

148,000 ± 1.4%

690,000 ± 2.1%

4,800,000 ± 1.6%

m.Map

0 ± 0%

6 ± 0%

25 ± 0%

112 ± 1.8%

454 ± 1.3%

1,910 ± 0.7%

9,400 ± 0.5%

52,500 ± 0.7%

243,000 ± 2.7%

1,760,000 ± 1.2%

9,900,000 ± 5.3%

Raw Benchmark Data

下面的链接是原始测试数据，包括每一次独立的benchmark,而不是统计均值和标准差后的数据：

results.json

Benchmark Code

如果你想看测试代码，你可以浏览下面的链接：

https://github.com/lihaoyi/scala-bench

或者下载:

scala-bench.bundle

在下载页面点击git clone scala-bench.bundle得到你自己的仓库，然后

sbt "~bench/runMain bench.MemoryMain" 进行内存benchmark
sbt "~bench/runMain bench.PerfMain" 执行性能测试然后将结果导入到 bench/target/results.json。注意在我的笔记本上它运行了4个小时，所以最好启动它然后第二天查看结果。
sbt "~bench/runMain bench.AnalyzeMain" 用来读取结果文件results.json，计算均值和标准差，将结果写入到一个markdown文件。

这个benchmark代码绝对不像使用JMH那么严格，JMH已经相当慢了，而这个benchmark套件也需要四个半小时。

鸟窝

《深入理解Go并发编程》新书发售中。一书在手，并发无忧

[译]Scala Collection的性能

目录 [−]

内存占用

不可变集合的内存占用

不同数组的内存占用比较

性能

Construction Performance

Deconstruction Performance

Concatenation Performance

Foreach Performance

Lookup Performance

测试结论

数组超级好

Set 和 Map 都很慢

Lists vs Vectors

Lists vs mutable.Buffer

Vector还算好吧

总结

参考

带标准差的性能数据

Raw Benchmark Data

Benchmark Code