[译]Scala Collection的性能

目录 [−]

  1. 内存占用
    1. 不可变集合的内存占用
    2. 不同数组的内存占用比较
  2. 性能
    1. Construction Performance
    2. Deconstruction Performance
    3. Concatenation Performance
    4. Foreach Performance
    5. Lookup Performance
  3. 测试结论
    1. 数组超级好
    2. Set 和 Map 都很慢
    3. Lists vs Vectors
    4. Lists vs mutable.Buffer
    5. Vector还算好吧
  4. 总结
  5. 参考
    1. 带标准差的性能数据
    2. Raw Benchmark Data
    3. Benchmark Code

这是翻译自 Li HaoyiBenchmarking Scala Collections

Li Haoyi 的背景了解Scala的人都知道,虽然Scala的官方文档对集合框架的性能提供了定性的分析,但是Li Haoyi的这篇文章定量的比较了各个框架类的性能,非常有参考意义,也便于你更好的正确的选择 Scala 集合中的类。

这篇文章从经验的角度深入研究了Scala集合库的运行特性。你也许已经看到了很多的从实现的角度介绍Scala集合库的文章(inheritance hierarchies, CanBuildFrom, 等等),但是很少有人写这些集合运行时的行为特性。

ListVector快,还是VectorList快?使用未装箱的数组存储基本类型可以节省多少内存?什么时候你会执行一些性能的技巧,比如预分配大小的数组、使用while-loop取代foreach调用等,这些技巧真的有效么?声明var l: List还是val b: mutable.Buffer?这篇文章会给你答案。

插播一条广告,对Scala集合感兴趣的读者可以阅读拙著 《Scala集合技术手册》,详细介绍了Scala集合的各个类的功能,各大网上书店均有销售,繁体中文版刚刚在台湾出版。

Scala编程语言内建了丰富的集合类型: ListVectorArraySetMap等等。当然还有一些众所周知的基础知识:List可以快速地在头部增加元素,但是索引起来很慢, Vector是一个很好的通用目的的集合,但是实践中这些集合处理的数据确很少。

例如,Vector集合要比数组多占用多少内存?List呢?使用while-loopforeach调用要快多少?使用MapSet的性能呢?

目前最好的描述这些运行时特性的文章是下面的表格,这个表格来自 docs.scala-lang.org:

operationheadtailapplyupdateprependappendinsert
ListCCLLCL
StreamCCLLCL
VectoreCeCeCeCeCeC
StackCCLLCCL
QueueaCaCLLLC
RangeCCC
StringCLCLLL

其中表格中的数据代表时间复杂度,意义如下

KeyMeaning
C常数时间The operation takes (fast) constant time.
eC等价常数时间The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys.
aC平均常数时间The operation takes amortized constant time. Some invocations of the operation might take longer, but if many operations are performed on average only constant time per operation is taken.
Log对数时间The operation takes time proportional to the logarithm of the collection size.
L线性时间The operation is linear, that is it takes time proportional to the collection size.
The operation is not supported.

This lacks concrete numbers and is a purely theoretical analysis. Worst, it uses weird terminology like "Effectively Constant time, assuming maximum length", which is confusing and doesn't match what the rest of the world thinks when they discuss performance characteristics or asymptotic complexity (If you're wondering why you've never heard the term "Effectively Constant Time" before, it's because everyone else calls it "Logarithmic time", and it's the same as the "Log" category above).

This post will thus go into detail with benchmarking both the memory and performance characteristics of various Scala collections, from an empirical point of view. By using runtime benchmarks rather than theoretical analysis, we will gain an understanding of the behavior and nuances of the various Scala collections, far more than what you'd gain from theoretical analysis or blind-leading-the-blind in-person discussions.

这个表格的问题在于它的指标是基于纯粹的理论分析。更糟糕的是,它使用奇怪的术语,如“等价常数时间,平均常数时间”等,这让人很迷惑,而且不符合其他人在讨论性能特征或渐近复杂性时的通常理解。

因此,这篇文章将从经验的角度详细介绍各种Scala集合的内存占用和性能特性。通过使用运行时benchmark数据而不是理论分析,我们将了解各种Scala集合的行为和细微差别,远远超过了从理论分析或盲人指路的讨论中获得的知识。

内存占用

本文第一个要介绍的是各种集合的内存占用。它比性能更容易分析,因为它的结果是确定的:你不需要多次进行benchmark测试来求平均值来减少误差。虽然不太常用,但是你还是可以通过直接使用反射和Java的Instrumentation API写个程序来分析一个对象的内存占用。

有很多的文章介绍这个,比如:

它相对来说比较简单,我使用Scala写了一个,它使用反射来抓取对象引用图,然后使用Java Instrumentation agent提供的getObjectSize得到对象的大小:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
package bench
import java.lang.reflect.Modifier
import java.util
import scala.collection.mutable
object DeepSize {
private val SKIP_POOLED_OBJECTS: Boolean = false
private def isPooled(paramObject: AnyRef): Boolean = {
paramObject match{
case e: java.lang.Enum[_] => true
case s: java.lang.String => s eq s.intern()
case b: java.lang.Boolean => (b eq java.lang.Boolean.TRUE) || (b eq java.lang.Boolean.FALSE)
case i: java.lang.Integer => i eq java.lang.Integer.valueOf(i)
case s: java.lang.Short => s eq java.lang.Short.valueOf(s)
case b: java.lang.Byte => b eq java.lang.Byte.valueOf(b)
case l: java.lang.Long => l eq java.lang.Long.valueOf(l)
case c: java.lang.Character => c eq java.lang.Character.valueOf(c)
case _ => false
}
}
/**
* Calculates deep size
*
* @param obj
* object to calculate size of
* @return object deep size
*/
def apply(obj: AnyRef): Long = {
deepSizeOf(obj)
}
private def skipObject(obj: AnyRef, previouslyVisited: util.Map[AnyRef, AnyRef]): Boolean = {
if (SKIP_POOLED_OBJECTS && isPooled(obj)) return true
(obj == null) || previouslyVisited.containsKey(obj)
}
private def deepSizeOf(obj0: AnyRef): Long = {
val previouslyVisited = new util.IdentityHashMap[AnyRef, AnyRef]
val objectQueue = mutable.Queue(obj0)
var current = 0L
while(objectQueue.nonEmpty){
val obj = objectQueue.dequeue()
if (!skipObject(obj, previouslyVisited)){
previouslyVisited.put(obj, null)
val thisSize = agent.Agent.getObjectSize(obj)
// get size of object + primitive variables + member pointers
// for array header + len + if primitive total value for primitives
obj.getClass match{
case a if a.isArray =>
current += thisSize
// primitive type arrays has length two, skip them (they included in the shallow size)
if (a.getName.length != 2) {
val lengthOfArray = java.lang.reflect.Array.getLength(obj)
for (i <- 0 until lengthOfArray) {
objectQueue.enqueue(java.lang.reflect.Array.get(obj, i))
}
}
case c =>
current += thisSize
var currentClass: Class[_] = c
do {
val objFields = currentClass.getDeclaredFields
for(field <- objFields) {
if (
!Modifier.isStatic(field.getModifiers) &&
!field.getType.isPrimitive
) {
field.setAccessible(true)
var tempObject: AnyRef = null
tempObject = field.get(obj)
if (tempObject != null) objectQueue.enqueue(tempObject)
}
}
currentClass = currentClass.getSuperclass
} while (currentClass != null)
}
}
}
current
}
}

随着这段代码没有考虑一些特殊的情况(32/64位的JVM/压缩指针的启用等等),可能还有些bug,但是实际中哦我使用它计算一堆对象的大小的时候,它的结果和其它的工具比如JProfiler是一致的,所以我倾向于它的结果还是很准确的。如果你想尝试运行这段代码,或者想修改一下测量方法,可以运行下面的代码:

Benchmark Code

这样,我们运行不同大小的不同集合的时候计算内存占用就很简单了。每个集合填充新的对象都是一样的,除了SortedSetSortedSet使用java.lang.Integer(...)填充了一段整数,这样它们可以被排序。

下表中就是测试的结果,单位是字节(byte),测试分别填充零个元素、一个元素、四个元素以及4的指数,一直到1,048,576个元素:

Size01416642561,0244,06916,19265,536262,1441,048,576
Vector562162644561,5125,44821,19284,312334,4401,353,1925,412,16821,648,072
Array[Object]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536
List16561766562,57610,25640,976162,776647,6962,621,45610,485,77641,943,056
Stream (unforced)16160160160160160160160160160160160
Stream (forced)16561766562,57610,25640,976162,776647,6962,621,45610,485,77641,943,056
Set1632968803,72014,24859,288234,648895,0003,904,14414,361,00060,858,616
Map16561761,6486,80026,208109,112428,5921,674,5687,055,27226,947,840111,209,368
SortedSet401042488243,12812,34449,208195,368777,2723,145,78412,582,96850,331,704
Queue40802006802,60010,28041,000162,800647,7202,621,48010,485,80041,943,080
String404848721685522,0888,18432,424131,112524,3282,097,192
Size01416642561,0244,06916,19265,536262,1441,048,576
m.Buffer1041201683601,3205,16020,52081,528324,6481,310,7605,242,92020,971,560
m.Map1201763441,0804,15216,44065,592260,6881,037,8804,194,36016,777,27267,108,920
m.Set1842002485682,1048,24832,824130,696521,2722,097,2088,388,66433,554,488
m.Queue48882086882,60810,28841,008162,808647,7282,621,48810,485,80841,943,088
m.PriQueue1441602084641,6166,22424,65681,568324,6881,572,9446,291,53625,165,904
m.Stack32721926722,59210,27240,992162,792647,7122,621,47210,485,79241,943,072
m.SortedSet801282728483,15212,36849,232195,392777,2963,145,80812,582,99250,331,728
Size01416642561,0244,06916,19265,536262,1441,048,576
Array[Boolean]16242432802721,0404,08816,20865,552262,1601,048,592
Array[Byte]16242432802721,0404,08816,20865,552262,1601,048,592
Array[Short]162424481445282,0648,16032,400131,088524,3042,097,168
Array[Int]162432802721,0404,11216,29664,784262,1601,048,5924,194,320
Array[Long]1624481445282,0648,20832,568129,552524,3042,097,1688,388,624
Boxed Array[Boolean]1640641123041,0724,14416,32864,816262,1921,048,6244,194,352
Boxed Array[Byte]1640963361,2965,1368,20820,39268,880266,2561,052,6884,198,416
Boxed Array[Short]1640963361,2965,13620,49681,400323,8561,310,7365,230,60820,910,096
Boxed Array[Int]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536
Boxed Array[Long]16481284641,8087,18428,688113,952453,3921,835,0247,340,04829,360,144
Size01416642561,0244,06916,19265,536262,1441,048,576
j.List2162403126001,9447,32028,824114,192454,2961,835,1607,340,18429,360,280
j.Map2402964641,2004,27216,56065,712260,8081,038,0004,194,48016,777,39267,109,040
j.Set2963123606802,2168,36032,936130,808521,3842,097,3208,388,77633,554,600

你可以花点时间好好看看这些数据,但是我单独挑出来下面的数据进行比较和讨论:

不可变集合的内存占用

Size01416642561,0244,06916,19265,536262,1441,048,576
Vector562162644561,5125,44821,19284,312334,4401,353,1925,412,16821,648,072
Array[Object]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536
List16561766562,57610,25640,976162,776647,6962,621,45610,485,77641,943,056
Stream (unforced)16160160160160160160160160160160160
Stream (forced)16561766562,57610,25640,976162,776647,6962,621,45610,485,77641,943,056
Set1632968803,72014,24859,288234,648895,0003,904,14414,361,00060,858,616
Map16561761,6486,80026,208109,112428,5921,674,5687,055,27226,947,840111,209,368
SortedSet401042488243,12812,34449,208195,368777,2723,145,78412,582,96850,331,704
Queue40802006802,60010,28041,000162,800647,7202,621,48010,485,80041,943,080
String404848721685522,0888,18432,424131,112524,3282,097,192

它们都是很常用的Scala集合,大部分都是不可变的,java.lang.String, 还有scala.Stream 和Array[Object]。

 看下面的比较:

Size01416642561,0244,06916,19265,536262,1441,048,576
Vector562162644561,5125,44821,19284,312334,4401,353,1925,412,16821,648,072
Array[Object]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536

小的vector会比小的数组多占用3 ~ 5倍的内存。只有一个元素的vector就会占用1K内存的五分之一。
16个元素的vector会有所好转,只多占了30%左右的内存,而1,048,576个元素的vector只多占了5%左右的内存。

内部实现上,小的vector有些浪费,而大的vector的浪费就可以忽略不计了。

再看下表:

Size01416642561,0244,06916,19265,536262,1441,048,576
Array[Object]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536
List16561766562,57610,25640,976162,776647,6962,621,45610,485,77641,943,056

从四个元素开始,List内存占用是Array[Object]的两倍。这并不奇怪,因为List的每个节点是一个包装节点,这个节点对每个元素进行了包装。

再看下表:

Size01416642561,0244,06916,19265,536262,1441,048,576
List16561766562,57610,25640,976162,776647,6962,621,45610,485,77641,943,056
Stream (unforced)16160160160160160160160160160160160
Stream (forced)16561766562,57610,25640,976162,776647,6962,621,45610,485,77641,943,056

Stream (forced)List占用的内存一样多,Stream (unforced)几乎不占内存,因为它还没有被填充。

继续看下个表格的比较:

Size01416642561,0244,06916,19265,536262,1441,048,576
Array[Object]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536
Set1632968803,72014,24859,288234,648895,0003,904,14414,361,00060,858,616
Map16561761,6486,80026,208109,112428,5921,674,5687,055,27226,947,840111,209,368

小的Set的内存占用和数组一样多,但是大的Set的内存占用是数组的三倍。这是合理的,小Set专用用来存储单一的对象,而大的Set用树做存储。让我有些惊奇的是占用会那么多,我本来期望多占用50% ~ 100%也就够了,没想到会多占用200%。

同样的情况也适用小Map和大Map。一开始小Map会比数组占用两倍的内存,最后占用6倍的内存。

再看一个内存比较:

Size01416642561,0244,06916,19265,536262,1441,048,576
Array[Object]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536
String404848721685522,0888,18432,424131,112524,3282,097,192

String存储2-byte字符,而不是存储4-byte的指针,这些指针指向16-byte的对象。因此测试结果也不意味,它们使用数组十分之一的内存来存储相同大小的元素。

不同数组的内存占用比较

Size01416642561,0244,06916,19265,536262,1441,048,576
Array[Object]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536
Size01416642561,0244,06916,19265,536262,1441,048,576
Array[Boolean]16242432802721,0404,08816,20865,552262,1601,048,592
Array[Byte]16242432802721,0404,08816,20865,552262,1601,048,592
Array[Short]162424481445282,0648,16032,400131,088524,3042,097,168
Array[Int]162432802721,0404,11216,29664,784262,1601,048,5924,194,320
Array[Long]1624481445282,0648,20832,568129,552524,3042,097,1688,388,624
Boxed Array[Boolean]1640641123041,0724,14416,32864,816262,1921,048,6244,194,352
Boxed Array[Byte]1640963361,2965,1368,20820,39268,880266,2561,052,6884,198,416
Boxed Array[Short]1640963361,2965,13620,49681,400323,8561,310,7365,230,60820,910,096
Boxed Array[Int]1640963361,2965,13620,49681,400323,8561,310,7365,242,89620,971,536
Boxed Array[Long]16481284641,8087,18428,688113,952453,3921,835,0247,340,04829,360,144

各种未装箱的基本类型的数据的内存占用要比装箱的数组内存占用小一个数量级。这是合理的,因为Array[Object]需要为每一个元素存储4-byte的指针,数组本身
要包含16-byte object header (8 bytes for mark word, 8 bytes for class-pointer)。相比之下,Byte、 Short、 Int 或 Long相应的要占用1、 2、 4 或 8 byte。你可以看到这些数组的内存占用的确切比例,数组的对象头有几个字节的开销。

译者注: 你可以参考下列文章来了解数组的开销:

值得注意的是,Array[Boolean]``占用了与Array[Byte]`相同的内存。虽然理论上它们可以为布尔值进行压缩,以一个bit来存储,但实际它们不是这样存储的。如果你想要更有效地存储标志,你应该使用某种bit set来存储你的布尔值。

装箱数组的行为也很有趣。你可能认为一个装箱的原始数组需要占用Array[Object]相同的内存占用。毕竟,一个Boxed IntByte最终是一个java.lang.Integerjava.lang.Byte,它是Object的子类。然而,这些常量的较小值是内部池化的(interned),因此在整个程序中只有公用的两个装箱的布尔值和256个装箱字节占用。因此,装箱的Array[Byte]最多只占用与Array[Int]相同的内存,因为它只填充了4个字节的指向共享对象的指针。

装箱的IntLong对于小的数值是公用的,但是对绝大部分的数值不会共用。 因此,这些最终占用与Array[Object]相同的(或更多)内存。

译者注: 如果你对Java Integer小值的实现还不太了解的话,你可以测试下面的代码,看看结果是不是很意外
Integer i1=127;
Integer i2=127;
System.out.println(i1==i2); //true
Integer i3=128;
Integer i4=128;
System.out.println(i3==i4); //false

你可能从上面的表格数据中会发现更多的有趣的东西,但是上面基本的比较已经让你感觉到不同的数据结构有很大的不同。尽管当今的计算机拥有很多的内存,但是知道内存会占多大依然很有用:

  • 使用更有效的数据结构意味着可以减少内存使用,可以使用更小的Amazon EC2 主机或者其它小的云主机来减少花费。在写这篇文章的时候,AWS m3-large机器的花费只是 r3-large机器的1/3,而后者只不过多了一倍的内存而已。积水成多,大的集群的节省的费用累加起来就不是一个小数目。

  • 为每个元素使用较小的内存意味着你可以把它们直接放入内存,而不是持久化到数据库或者磁盘上。内存中的操作至少要比读取文件或者数据库访问要快一个数量级。

即使你一开始先忽略内存使用然后再回来优化它,当你回来优化的时候,你仍然想知道如何权衡不同集合的内存开销。希望这部分内容可以帮助你做出使您的代码更有效率的决定。

性能

我们下一个要关注的事情就是使用各种集合执行通用的操作要花多长时间。不像内存占用那样可以静态分析,运行时的性能通常会有噪点和=和随机性:JIT编译器是否启用、垃圾回收是否发生等。

然而,即使我们不能得到精确的数字,我们肯定能得到一个非常接近的数值。下面的内容就是执行各种有代表性的操作得到的“粗”的benchmark:

  • 每个benchmark运行7次,一次2秒
  • 每次运行都和其他的benchmark交叉运行
  • 运行完毕,只取中间的五个值,最大最小值抛弃。然后计算平均值和标准差。

尽管这不是非常精确,但是对benchmark来说也足够了。Benchmark代码已经要花4个半小时运行,我不想再让它变得更严谨。

测试的独立的benchmark包括:

  • Construct: 构建一个大小为n的数据结构,从空的数据结构(Nil,等等)开始一次增加一个元素。使用 :: for Lists, :+ for Vectors, + for Sets, .append for mutable.Buffer等等, 对于数组, benchmark即使用了:+也使用预分配大小的数组来测试。
  • Concat: 构建大小为n的两个同样类型的集合,然后把它们连接成一个集合。 对于不可变集合使用++,对于可变集合使用 .appendAll++= 执行原地操作
  • Deconstruct: 开始构建一个大小为n的集合,然后一次一个的把元素移除,直到它为空为止。
  • Foreach: 迭代一个集合元素的时间花费
  • Lookup: 查找每一个元素的时间花费,使用collection(i)语法。注意它测试的查找每一个元素,通过一个while-loop实现。

ForeachLookup的 benchmark 要比其它的测试长10 甚至 100倍。总的时间被平均了(divided)。这是为了故意拉长运行时间,以便可以看到随机噪点和运行时的变化。

汇总数据如下,你可以查看原始数据。每一个benchmark(lookup,concat)占表格的一节,每一行代表一种集合的一个操作(有的集合有多个benchmark),每一列代表执行这个benchmark所花费的平均时间。

注意为了简单这里并没有显示标准方差,如果你想查看它们可以跳到 带标准差的性能数据一节。

construct01416642561,0244,09616,19265,536262,1441,048,576
Array-prealloc171014411867102,71011,00045,100183,000730,0003,100,000
Array:+212582701,46019,800260,0003,170,00060,000,0001,020,000,000
List::1412693011,2204,90019,80079,000329,0001,510,0009,100,000
Vector:+215994101,7307,00028,600324,0001,498,0007,140,00031,700,000131,000,000
Set+112581,8608,53037,400166,000783,0003,600,00018,100,00094,000,000473,000,000
Map+16952,1009,01038,900171,000810,0003,710,00018,400,00096,000,000499,000,000
Array.toSet73751872,1409,22040,000174,000833,0003,800,00019,300,000101,000,000506,000,000
Array.toMap21311042,1009,20039,500173,000820,0003,790,00019,500,000104,000,000540,000,000
Array.toVector951091432879033,31012,85051,100203,800821,0003,270,00013,300,000
m.Buffer1930581746912,69010,84043,000169,800687,0002,770,00011,790,000
m.Map.put6792971,4206,20025,500103,000414,0001,820,0008,100,00057,000,000348,000,000
m.Set.add13762761,4306,70027,900113,000455,0001,840,0007,900,00039,000,000267,000,000
deconstruct01416642561,0244,09616,19265,536262,1441,048,576
Array.tail7261145824,51755,500821,00012,140,000188,000,0003,100,000,000
List.tail227211004202,10010,00035,000120,000540,0001,500,000
Vector.tail36904251,97011,80058,400500,0002,390,00011,000,00050,200,000211,000,000
Vector.init251034832,49012,80064,000543,0002,470,00011,900,00052,600,000218,000,000
Set.-8301621,4807,70034,200164,000770,0003,660,00020,300,00094,000,000420,000,000
Map.-12522011,4307,66034,900169,000810,0003,990,00024,000,000103,000,000470,000,000
m.Buffer6814431666302,51010,00040,600167,000660,0002,490,000
m.Set5281306714,90054,000770,00011,990,000189,000,0003,040,000,000
m.Map7441726703,65026,400282,0003,970,00062,600,0001,000,000,000
concat01416642561,0244,09616,19265,536262,1441,048,576
Array++898385911443309704,10017,00070,000380,0001,700,000
arraycopy23182027482801,0004,00016,00065,000360,0001,400,000
List7811624341,4905,79023,20092,500370,0001,510,0006,300,00030,000,000
Vector5481883279403,24012,70052,000210,000810,0003,370,00014,500,000
Set91958771,1305,90026,900149,000680,0003,600,00023,000,000100,000,000280,000,000
Map54539671,4806,90031,500166,000760,0004,100,00027,000,000118,000,000450,000,000
m.Buffer11323238702507003,90020,00040,000400,0001,500,000
m.Set58811421,0804,20016,00069,000263,0001,160,0006,300,00043,000,000310,000,000
m.Map47691819903,70015,00062,000290,0001,500,00016,000,000103,000,000493,000,000
foreach01416642561,0244,09616,19265,536262,1441,048,576
Array2515572309003,58014,20055,600228,000910,0003,610,000
Array-while0101000-410700500
List0313502098003,50014,10055,000231,000920,0003,800,000
List-while4513492118123,40014,20057,000226,000930,0003,700,000
Vector151930742681,0003,96016,20062,000256,0001,030,0004,300,000
Set4510994201,56010,20051,000217,0002,200,00010,800,00048,600,000
Map197201406102,50013,90072,800360,0003,700,00020,700,00075,000,000
m.Buffer01111012-1-100-200
m.Set1926501305082,19011,90056,600235,000940,0003,800,00014,700,000
m.Map816481465282,21010,30054,100255,0001,140,0006,800,00030,000,000
lookup01416642561,0244,09616,19265,536262,1441,048,576
Array0111001-140100-200
List0181032,39047,200870,00016,900,000
Vector015171044401,7808,94038,000198,000930,0004,260,000
Set018815071,9807,80039,800203,0001,040,0008,300,000
Map012975782,2509,40046,000233,0001,150,00011,400,000
m.Buffer011111106-1000
m.Set0522974101,6907,10031,300148,000690,0004,800,000
m.Map06251124541,9109,40052,500243,0001,760,0009,900,000

带标准差的原始的数据和测试代码在文章的后面,你可以深入挖据一下这些数据。在本文中,我将讨论一些我看到的一些有趣的数据。

Construction Performance

construct01416642561,0244,09616,19265,536262,1441,048,576
Array-prealloc171014411867102,71011,00045,100183,000730,0003,100,000
Array:+212582701,46019,800260,0003,170,00060,000,0001,020,000,000
List::1412693011,2204,90019,80079,000329,0001,510,0009,100,000
Vector:+215994101,7307,00028,600324,0001,498,0007,140,00031,700,000131,000,000
Set+112581,8608,53037,400166,000783,0003,600,00018,100,00094,000,000473,000,000
Map+16952,1009,01038,900171,000810,0003,710,00018,400,00096,000,000499,000,000
Array.toSet73751872,1409,22040,000174,000833,0003,800,00019,300,000101,000,000506,000,000
Array.toMap21311042,1009,20039,500173,000820,0003,790,00019,500,000104,000,000540,000,000
Array.toVector951091432879033,31012,85051,100203,800821,0003,270,00013,300,000
m.Buffer1930581746912,69010,84043,000169,800687,0002,770,00011,790,000
m.Map.put6792971,4206,20025,500103,000414,0001,820,0008,100,00057,000,000348,000,000
m.Set.add13762761,4306,70027,900113,000455,0001,840,0007,900,00039,000,000267,000,000

上面的表格显示的构建集合的性能,一次添加一个数据。使用 :: for Lists, :+ for Vectors, .add.append.put for the mutable collections. Array有两个benchmark: 一个使用 :+, 一个使用预先分配大小的数组,然后使用一个 while-loop 添加元素。

construct01416642561,0244,09616,19265,536262,1441,048,576
List::1412693011,2204,90019,80079,000329,0001,510,0009,100,000
Vector:+215994101,7307,00028,600324,0001,498,0007,140,00031,700,000131,000,000

它显示构建一个Vector要比构建List慢5到15倍,具体和集合大小有关。

这也许不令人惊讶。增加一个元素到链表中非常简单,但是这个数量接让我震惊。如果你是一个一个的添加元素到集合中,使用List要比Vector快很多。如果你的代码中构建一个Vector变成了瓶颈,那么你应该考虑使用List或者Buffer来替换它。

construct01416642561,0244,09616,19265,536262,1441,048,576
List::1412693011,2204,90019,80079,000329,0001,510,0009,100,000
m.Buffer1930581746912,69010,84043,000169,800687,0002,770,00011,790,000

使用.append构建一个mutable.Buffer看起来要比使用::构建List要慢2 ~ 3倍,尽管对于大下降到1.5倍慢。我有点惊讶,这意味着如果你想更快,List是一个更好的选择。

construct01416642561,0244,09616,19265,536262,1441,048,576
Array-prealloc171014411867102,71011,00045,100183,000730,0003,100,000
List::1412693011,2204,90019,80079,000329,0001,510,0009,100,000
m.Buffer1930581746912,69010,84043,000169,800687,0002,770,00011,790,000

最快的是预分配内存的数组,大约比构建List快4倍,比构建mutable.Buffer快5倍,比构建Vector快15倍。

construct01416642561,0244,09616,19265,536262,1441,048,576
Array-prealloc171014411867102,71011,00045,100183,000730,0003,100,000
Array:+212582701,46019,800260,0003,170,00060,000,0001,020,000,000

使用:+构建数组需要花费双倍的时间,因为它每次都要复制整个数组:当数组还小的时候还好,但是数组长度变大的时候就很慢了,当元素的数量到了上万个的时候时间花费变得不可思议的长了。

construct01416642561,0244,09616,19265,536262,1441,048,576
Array-prealloc171014411867102,71011,00045,100183,000730,0003,100,000
Set+112581,8608,53037,400166,000783,0003,600,00018,100,00094,000,000473,000,000
Map+16952,1009,01038,900171,000810,0003,710,00018,400,00096,000,000499,000,000
Array.toSet73751872,1409,22040,000174,000833,0003,800,00019,300,000101,000,000506,000,000
Array.toMap21311042,1009,20039,500173,000820,0003,790,00019,500,000104,000,000540,000,000

一个个添加元素的方式构建SetMap变得很慢: 比构建List慢30倍,比预分配大小的数组慢150倍。这大概是因为Set和Map需要构建Hash或者做类似检查来维持一致性。

Set和Map慢是在意料之中,意料之外的是居然这么慢。这意味着如果你想要代码更快,你不应该使用Set或者Map, 除非你必需它们的唯一性这个性质,否则使用List或者mutable.Buffer。

预分配的数组调用.toSet.toMap方法并不会直接构造Set和Map更快。而调用toVector却可以更快。

construct01416642561,0244,09616,19265,536262,1441,048,576
Array-prealloc171014411867102,71011,00045,100183,000730,0003,100,000
List::1412693011,2204,90019,80079,000329,0001,510,0009,100,000
Vector:+215994101,7307,00028,600324,0001,498,0007,140,00031,700,000131,000,000
Array.toVector951091432879033,31012,85051,100203,800821,0003,270,00013,300,000

上表表明,通过预分配数组,填充元素,然后再调用.toVector方法可以更快的构建Vector,会比直接构建Vector快10倍。这里没有列出的benchmark表明,先构建mutable.Buffer然后调用 .toVector 方法也比直接构建Vector快。

Deconstruction Performance

deconstruct01416642561,0244,09616,19265,536262,1441,048,576
Array.tail7261145824,51755,500821,00012,140,000188,000,0003,100,000,000
List.tail227211004202,10010,00035,000120,000540,0001,500,000
Vector.tail36904251,97011,80058,400500,0002,390,00011,000,00050,200,000211,000,000
Vector.init251034832,49012,80064,000543,0002,470,00011,900,00052,600,000218,000,000
Set.-8301621,4807,70034,200164,000770,0003,660,00020,300,00094,000,000420,000,000
Map.-12522011,4307,66034,900169,000810,0003,990,00024,000,000103,000,000470,000,000
m.Buffer6814431666302,51010,00040,600167,000660,0002,490,000
m.Set5281306714,90054,000770,00011,990,000189,000,0003,040,000,000
m.Map7441726703,65026,400282,0003,970,00062,600,0001,000,000,000

mutable.BufferList 是最快的移除元素操作的集合。这很合理,因为从mutable.Buffer移除元素只是改变它的size。从List的头部移除元素只是得到它的.tail,不需要做数据结构的改变。

construct01416642561,0244,09616,19265,536262,1441,048,576
Vector:+215994101,7307,00028,600324,0001,498,0007,140,00031,700,000131,000,000
deconstruct01416642561,0244,09616,19265,536262,1441,048,576
Vector.tail36904251,97011,80058,400500,0002,390,00011,000,00050,200,000211,000,000
Vector.init251034832,49012,80064,000543,0002,470,00011,900,00052,600,000218,000,000

使用.tail.init从Vector移除元素更慢,要比:+增加一个元素慢50%。

deconstruct01416642561,0244,09616,19265,536262,1441,048,576
List.tail227211004202,10010,00035,000120,000540,0001,500,000
Vector.tail36904251,97011,80058,400500,0002,390,00011,000,00050,200,000211,000,000
Set.-8301621,4807,70034,200164,000770,0003,660,00020,300,00094,000,000420,000,000
Map.-12522011,4307,66034,900169,000810,0003,990,00024,000,000103,000,000470,000,000
m.Set5281306714,90054,000770,00011,990,000189,000,0003,040,000,000
m.Map7441726703,65026,400282,0003,970,00062,600,0001,000,000,000

因为某种原因,重复地从MapSet移除.head也很慢,从mutable.Mapmutable.Set移除元素更慢。我不知道为什么出现这种现象。

Concatenation Performance

concat01416642561,0244,09616,19265,536262,1441,048,576
Array++898385911443309704,10017,00070,000380,0001,700,000
arraycopy23182027482801,0004,00016,00065,000360,0001,400,000
List7811624341,4905,79023,20092,500370,0001,510,0006,300,00030,000,000
Vector5481883279403,24012,70052,000210,000810,0003,370,00014,500,000
Set91958771,1305,90026,900149,000680,0003,600,00023,000,000100,000,000280,000,000
Map54539671,4806,90031,500166,000760,0004,100,00027,000,000118,000,000450,000,000
m.Buffer11323238702507003,90020,00040,000400,0001,500,000
m.Set58811421,0804,20016,00069,000263,0001,160,0006,300,00043,000,000310,000,000
m.Map47691819903,70015,00062,000290,0001,500,00016,000,000103,000,000493,000,000

连接集合最快的是mutable.Buffers和普通的数组,因为它们只是简单的复制元素到一个新的集合中。 mutable.Buffer内部使用一个数组,所以它需要重新分配更大的数组来复制数据,而数组则是将两个输入数组复制到一个更大的数组中。你使用 Array ++ Array还是System.arraycopy并不重要。

It turns out that while the clever algorithms and structural-sharing and what not that go into Scala's immutable Vectors and Sets make it faster to build things up incrementally element-by-element (as seen in the Construction Performance benchmark), for this kind of bulk-concatenation it's still faster just to copy everything manually into a new array and skip all the fancy data-structure stuff.

尽管VectorList的连接要比mutable.BufferArray要慢,Vector的连接却比List的连接要快一倍。

SetMap还是很慢,比VectorList要慢10倍,比Arraymutable.Buffer慢100倍。

Foreach Performance

foreach01416642561,0244,09616,19265,536262,1441,048,576
Array2515572309003,58014,20055,600228,000910,0003,610,000
Array-while0101000-410700500
List0313502098003,50014,10055,000231,000920,0003,800,000
List-while4513492118123,40014,20057,000226,000930,0003,700,000
Vector151930742681,0003,96016,20062,000256,0001,030,0004,300,000
Set4510994201,56010,20051,000217,0002,200,00010,800,00048,600,000
Map197201406102,50013,90072,800360,0003,700,00020,700,00075,000,000
m.Buffer01111012-1-100-200
m.Set1926501305082,19011,90056,600235,000940,0003,800,00014,700,000
m.Map816481465282,21010,30054,100255,0001,140,0006,800,00030,000,000

迭代大部分常用的集合都很快,无论集合是ListVector还是Array。使用while-loophead/tail的速度是一样的,所以如果你想手写迭代来提高性能,结果可能没什么区别。

另一方面,相比迭代数组或者Vector,迭代SetMap要慢10~15倍。可变的SetMap要好一点,值慢了3~8倍。

使用while-loop迭代数组是最快的,基本上快的连噪点都测量不出来。但是迭代mutable.Buffer并没有显著的提升,不清楚为什么这样。

Lookup Performance

lookup01416642561,0244,09616,19265,536262,1441,048,576
Array0111001-140100-200
List0181032,39047,200870,00016,900,000
Vector015171044401,7808,94038,000198,000930,0004,260,000
Set018815071,9807,80039,800203,0001,040,0008,300,000
Map012975782,2509,40046,000233,0001,150,00011,400,000
m.Buffer011111106-1000
m.Set0522974101,6907,10031,300148,000690,0004,800,000
m.Map06251124541,9109,40052,500243,0001,760,0009,900,000

我们看到数组和mutable.Buffer的lookup非常快,以至于无法测到噪点。下一个快的集合是索引的Vector的lookup,它会使用较少的时间,例如,在一百万的元素中查找一个元素需要4毫秒:在大部分的情况下这并不明显,但是如果累加起来时间就很客观了,例如在实时的游戏中就对时间要求很高。

注意这个时间是在集合中查找每一个元素,而不是查找某一个元素。所以我们希望用大的集合来花费更多的时间来完成性能测试,即使时间花费是一个常量。

不可变的Set和Map比Vector要花费非常长的时间执行查找。不可变的Set会花费10~20倍的时间,而Map要花费10~40倍的时间。

而可变Set和Map查找的性能要比不可变的Set和Map要好。可变Set的查找的时间要比Vector慢4~5倍,而可变Map要比Vector的查找慢5~10倍。可以看到可变Set和Map要比不可变Set和Map快2~4倍,这可能是因为这些集合的可变版本使用hash表而这些不可变集合使用tree。

List lookups by index across the entire list is, as expected, quadratic in time: Each individual lookup takes time linear in the size of the collection, and there are a linear number of lookups to perform in this benchmark. While it's reasonable up to about 16 elements, it quickly blows up after that.

测试结论

我们已经看到了很多数据,也比较了一些集合的数据。对于天天使用Scala的程序员,这些数据有什么意义呢?下面列出了一些有趣的结论。

数组超级好

一个未装箱的基本类型的数组只要装箱的数组的内存的1/4或者1/5,例如 Array[Int]vs Array[java.lang.Integer]。这不简单,如果你要处理大量的基本类型的数据,使用非装箱数组会帮你省好多钱。

除了基本类型数组,即使装箱的数组要有漂亮的性能数据。连接两个数据要比连接其它集合要快,甚至比List和Vector这些有精心设计的数据结构还要快。对于有一百万元素的集合,甚至要快10倍。SI-4442提交了一个issue用来解决这个问题,但是目前的性能还是如此。

我非常惊讶的是数组的连接是那么快,甚至那些"花哨"的数据结构如List和Vector,使用共享数据避免复制整个数据都比不上数组。它表明复制整体数据事实上要比组合那些持久化数据结构都要快。所以如果你要处理一个不可变集合,有时候需要把它分成片段或者连接起来,使用数组要更快。

Set 和 Map 都很慢

使用不可变的Vector的查找只需要不可变的Set的1/10 ~ 1/20,不可变Map的1/10 ~ 1/40,即使是使用可变的Set和Map,Vector的速度也很快,当然使用原始的数组会更快。

这是合理的,因为Set和Map的查找包含很多的hash计算和hash比较,即使是对于简单的键值也是这样,这些操作会带来显著的时间花费。相反,Vector查找只是简单的整数运算,数组查找只是指针的移动。

Set和Map不只是查找很慢:构建它们也很慢,移除元素也很慢,连接集合也很慢。即使操作不需要执行hash计算,比如迭代(iteration),也比迭代一个Vector慢19倍。

所以除非你需要不能重复元素的集合才使用Set,需要键值对才选择Map,否则尽量不用它们,因为它们才是程序慢的原因。如果你的键的集合比较小并且性能有问题,你可以为每个键分配一个整数,使用Vector[Boolean]代替Set,使用Vector[T]代替Map,查找的时候使用整数查找。有时候,即使你知道集合的元素不会重复,不使用Set而是使用数组、Vector或者List会带来很好的性能。

可变Set和Map比不可变Set和Map会快一点,内存占用也小一点:1/2的内存占用,2~4倍快的查找,2倍快的迭代,2倍快的构建(.add/.put),但是相比数组、Vector和List来说还是很慢。

Lists vs Vectors

是选择单链表形式的List还是选择tree形式的Vector,选择哪个集合的标准很模糊。“有效的常数时间”的定义不能理清这种模糊认识,但是通过上面的测试数据,我们可以有一个更好的判断:

  • 相同大小的List的内存占用是Vector的内存。后者和数组的开销有得一拼。
  • 构建一个List的时间比构建Vector快5~15倍。
  • 通过.tail从List中移除元素比Vector快50~60倍
  • 可以通过索引在Vector查找元素,而通过List按照索引查找的时间是O(n),元素数量很大的时候时间花费是很大的
  • 迭代这两个集合的时间花费差不多

如果你自己构建集合并且一个个的移除元素,迭代它们,最好使用List。如果你需要根据索引查找元素,则选择Vector。使用:+.tail增加元素和移除元素虽然不会把你弄垮,但是它们却非常的慢。

Lists vs mutable.Buffer

List除了作为不可变集合外,还经常使用var作为可变集合,而mutable.Buffer也有相同的目的,那你该用哪一个集合呢?

数据表明使用List要比mutable.Buffer在处理一个个的元素的时候要快:对小集合来说快2~3倍,大集合要快1.5~2倍。很大的差异!

除了性能,它们之间也有一些其它不同:

  • mutable.Buffer可以快速的按照索引查找,而List不可以
  • List是持久化的,所以你可以增加/移除你的列表副本而不会影响其他人对这个列表的使用,而mutable.Buffer却不是,一个人的修改会影响其他使用的人
  • List的构建是"向后"的(backwards)。最后加入的元素在迭代的时候位于第一个位置,这可能不是你所期望的,你可能在使用之前对它反转。mutable.Buffer则是"向前"构建的,第一个增加的元素在迭代的时候总是第一个。
  • mutable.Buffer只占List的花销(overhead)的一半

对于遍历元素的操作,List更快,但会用更多的内存。如果你根本不关心其它影响因素,这个因素可能帮你决定你该使用哪一个集合。

Vector还算好吧

尽管Vector经常倍认为是一个很好的通用目的的数据结构,但是数据表明它也不是想象的那么好:

  • 小的vector,比如之包含一个元素,有1/5的内存花销,而对于大的Vector,它的花销却可以忽略不计。如果你有很多的小集合,使用Vector可能会带来内存的浪费
  • Vector一个元素一个元素填充式的构建比List或者mutable.Buffer要慢5~15倍,比预分配大小的数组慢40倍
  • Vector按照索引进行查找的性能可以接受,但也比数组或者mutable.Buffer要慢5~15倍,比预分配大小的数组慢40倍
  • 两个差不多大小的Vector的连接时间比List的连接的事时间快一倍,但比数组慢10倍,尽管数组需要全复制。

总体来说,Vector还是一个可以接受的通用目的的集合。

  • 一方面,你看不到Vector有什么严重的性能缺陷,大部分的Vector的性能都处于中等水平,如果你使用它们,不会出现O(n^2)的时间
  • 另一方面,常用操作要比理想的数据结构慢一个数量级。增量构建比List慢5~15,索引查找比数组慢很多,即使连接操作也比数组慢10倍

Vector通常是缺省的数组结构,但是如果可能的话,直接使用List或者mutable.Buffer可能会给性能带来数量级的提升。可能有些夸张,但是的确是值得关注的事情。十倍的性能提升是巨大的。

总结

本文提供了一个具体基准帮助你选择哪一种Scala集合。理论分析经常错过很多重要因素,因为很自然的你只分析那些你认为重要的因素。本文的测试从实践的角度提供了集合的有意义的行为。

例如,令我惊讶的是从Vector移除一个元素远比增加一个元素要慢很多,还有Set和Map比Vector和List慢很多,即使是迭代这种不受hash操作影响的操作也很慢。还有数组的连接出人意料的快,比那些使用共享数据的集合还快。

就像强调集合之间的差异一样,这套benchmark也强调很多无关重要的事情:

  • List、Array和Vector的foreach的区别
  • 数据巨大的Vector和Array的内存占用比较
  • 通过+构建Set还是先构建一个数组然后调用.toSet?

尽管代码不同,内部的数据结构也不同,实践中可能你选择哪个数据结构美太大关系,选择一个合适的。

下次你要选择一个特殊的集合的时候,或者你和同事讨论哪个集合合适的时候,可以随时回来查看这篇植物。

参考

带标准差的性能数据

下面的性能数据,加上了标准差。注意它们是从7次测试中取了中间5次之后的标准差。

construct01416642561,0244,09616,19265,536262,1441,048,576
Array-prealloc17 ± 0%10 ± 0%14 ± 0%41 ± 0%186 ± 0.5%710 ± 2.1%2,710 ± 2.6%11,000 ± 2.7%45,100 ± 2.1%183,000 ± 2.0%730,000 ± 2.0%3,100,000 ± 1.6%
Array:+2 ± 0%12 ± 0%58 ± 0%270 ± 0.4%1,460 ± 2.4%19,800 ± 1.5%260,000 ± 1.7%3,170,000 ± 1.5%60,000,000 ± 1.8%1,020,000,000 ± 1.2%
Vector2 ± 0%15 ± 6.7%99 ± 1.0%410 ± 2.7%1,730 ± 2.3%7,000 ± 2.7%28,600 ± 3.4%324,000 ± 0.5%1,498,000 ± 0.5%7,140,000 ± 0.4%31,700,000 ± 0.4%131,000,000 ± 1.0%
List1 ± 0%4 ± 0%12 ± 0%69 ± 1.4%301 ± 2.7%1,220 ± 2.3%4,900 ± 2.2%19,800 ± 2.8%79,000 ± 2.5%329,000 ± 1.6%1,510,000 ± 1.2%9,100,000 ± 8.3%
Set1 ± 0%12 ± 0%58 ± 0%1,860 ± 0.5%8,530 ± 0.4%37,400 ± 0.8%166,000 ± 1.2%783,000 ± 0.5%3,600,000 ± 1.6%18,100,000 ± 0.7%94,000,000 ± 1.4%473,000,000 ± 1.5%
Map1 ± 0%6 ± 0%95 ± 0%2,100 ± 1.9%9,010 ± 0.7%38,900 ± 0.5%171,000 ± 1.0%810,000 ± 1.3%3,710,000 ± 1.3%18,400,000 ± 1.6%96,000,000 ± 1.1%499,000,000 ± 1.4%
Array.toVector95 ± 1.1%109 ± 0%143 ± 0%287 ± 0.3%903 ± 0.9%3,310 ± 0.2%12,850 ± 0.5%51,100 ± 0.8%203,800 ± 0.5%821,000 ± 0.5%3,270,000 ± 1.3%13,300,000 ± 1.4%
Array.toSet73 ± 0%75 ± 0%187 ± 0.5%2,140 ± 1.4%9,220 ± 0.9%40,000 ± 1.2%174,000 ± 0.9%833,000 ± 0.6%3,800,000 ± 1.2%19,300,000 ± 0.6%101,000,000 ± 1.4%506,000,000 ± 1.7%
Array.toMap21 ± 0%31 ± 0%104 ± 1.0%2,100 ± 0.5%9,200 ± 1.5%39,500 ± 1.6%173,000 ± 1.7%820,000 ± 1.7%3,790,000 ± 2.1%19,500,000 ± 1.6%104,000,000 ± 2.5%540,000,000 ± 2.2%
m.Buffer19 ± 0%30 ± 0%58 ± 0%174 ± 1.1%691 ± 0.7%2,690 ± 1.0%10,840 ± 0.7%43,000 ± 0.7%169,800 ± 0.4%687,000 ± 0.7%2,770,000 ± 0.6%11,790,000 ± 0.7%
m.Set13 ± 0%76 ± 0%276 ± 0.4%1,430 ± 1.1%6,700 ± 0.9%27,900 ± 1.2%113,000 ± 1.6%455,000 ± 1.4%1,840,000 ± 1.2%7,900,000 ± 1.4%39,000,000 ± 3.0%267,000,000 ± 3.2%
m.Map6 ± 0%79 ± 1.3%297 ± 0.3%1,420 ± 1.0%6,200 ± 0.7%25,500 ± 1.0%103,000 ± 1.9%414,000 ± 2.0%1,820,000 ± 2.0%8,100,000 ± 3.3%57,000,000 ± 4.6%348,000,000 ± 2.4%
deconstruct01416642561,0244,09616,19265,536262,1441,048,576
Array.tail7 ± 0%26 ± 0%114 ± 0.9%582 ± 0.5%4,517 ± 0.1%55,500 ± 0.9%821,000 ± 0.3%12,140,000 ± 0.6%188,000,000 ± 1.0%3,100,000,000 ± 0.4%
List.tail2 ± 0%2 ± 0%7 ± 0%21 ± 4.8%100 ± 10.6%420 ± 3.7%2,100 ± 5.9%10,000 ± 10.4%35,000 ± 4.6%120,000 ± 9.4%540,000 ± 9.2%1,500,000 ± 53.5%
Vector.tail3 ± 0%6 ± 0%90 ± 1.1%425 ± 2.1%1,970 ± 1.7%11,800 ± 2.6%58,400 ± 1.1%500,000 ± 2.2%2,390,000 ± 1.3%11,000,000 ± 1.2%50,200,000 ± 0.5%211,000,000 ± 1.3%
Vector.init2 ± 0%5 ± 0%103 ± 1.0%483 ± 1.9%2,490 ± 1.8%12,800 ± 2.0%64,000 ± 2.8%543,000 ± 0.8%2,470,000 ± 1.7%11,900,000 ± 1.8%52,600,000 ± 1.5%218,000,000 ± 1.5%
Set.-8 ± 12.5%30 ± 3.3%162 ± 1.2%1,480 ± 3.9%7,700 ± 3.0%34,200 ± 1.2%164,000 ± 1.5%770,000 ± 1.4%3,660,000 ± 2.6%20,300,000 ± 0.7%94,000,000 ± 1.3%420,000,000 ± 1.8%
Map.-12 ± 8.3%52 ± 0%201 ± 0.5%1,430 ± 1.3%7,660 ± 0.5%34,900 ± 0.9%169,000 ± 0.7%810,000 ± 2.1%3,990,000 ± 0.3%24,000,000 ± 3.4%103,000,000 ± 5.1%470,000,000 ± 3.4%
m.Buffer6 ± 0%8 ± 12.5%14 ± 7.1%43 ± 2.3%166 ± 0.6%630 ± 2.8%2,510 ± 2.9%10,000 ± 3.0%40,600 ± 1.7%167,000 ± 2.8%660,000 ± 4.2%2,490,000 ± 3.0%
m.Set5 ± 0%28 ± 7.1%130 ± 1.5%671 ± 1.0%4,900 ± 2.9%54,000 ± 1.6%770,000 ± 1.1%11,990,000 ± 0.8%189,000,000 ± 1.1%3,040,000,000 ± 0.5%
m.Map7 ± 14.3%44 ± 2.3%172 ± 3.5%670 ± 4.6%3,650 ± 2.6%26,400 ± 1.8%282,000 ± 1.3%3,970,000 ± 0.4%62,600,000 ± 1.0%1,000,000,000 ± 1.1%
concat01416642561,0244,09616,19265,536262,1441,048,576
arraycopy23 ± 0%18 ± 5.6%20 ± 5.0%27 ± 3.7%48 ± 16.7%280 ± 15.5%1,000 ± 12.0%4,000 ± 7.2%16,000 ± 11.0%65,000 ± 6.7%360,000 ± 14.4%1,400,000 ± 23.1%
Array++89 ± 1.1%83 ± 1.2%85 ± 1.2%91 ± 1.1%144 ± 5.6%330 ± 14.7%970 ± 5.6%4,100 ± 5.5%17,000 ± 17.0%70,000 ± 14.9%380,000 ± 11.5%1,700,000 ± 7.7%
List7 ± 0%81 ± 1.2%162 ± 1.2%434 ± 0.5%1,490 ± 2.5%5,790 ± 0.8%23,200 ± 1.3%92,500 ± 0.4%370,000 ± 1.1%1,510,000 ± 1.7%6,300,000 ± 1.8%30,000,000 ± 6.5%
Vector5 ± 20.0%48 ± 2.1%188 ± 1.6%327 ± 0.3%940 ± 2.2%3,240 ± 2.0%12,700 ± 2.4%52,000 ± 4.1%210,000 ± 2.2%810,000 ± 1.6%3,370,000 ± 1.3%14,500,000 ± 2.8%
Set91 ± 1.1%95 ± 3.2%877 ± 0.7%1,130 ± 3.4%5,900 ± 3.1%26,900 ± 2.5%149,000 ± 2.7%680,000 ± 2.0%3,600,000 ± 3.3%23,000,000 ± 2.0%100,000,000 ± 6.9%280,000,000 ± 12.6%
Map54 ± 1.9%53 ± 1.9%967 ± 0.9%1,480 ± 5.4%6,900 ± 2.2%31,500 ± 1.0%166,000 ± 1.4%760,000 ± 2.9%4,100,000 ± 2.9%27,000,000 ± 3.5%118,000,000 ± 4.6%450,000,000 ± 11.7%
m.Buffer11 ± 0%32 ± 9.4%32 ± 18.8%38 ± 2.6%70 ± 19.2%250 ± 13.7%700 ± 29.1%3,900 ± 10.0%20,000 ± 41.6%40,000 ± 34.9%400,000 ± 14.7%1,500,000 ± 19.5%
m.Set58 ± 3.4%81 ± 6.2%142 ± 4.9%1,080 ± 3.1%4,200 ± 3.3%16,000 ± 6.7%69,000 ± 5.3%263,000 ± 2.1%1,160,000 ± 4.8%6,300,000 ± 3.7%43,000,000 ± 5.6%310,000,000 ± 8.1%
m.Map47 ± 2.1%69 ± 2.9%181 ± 3.3%990 ± 1.1%3,700 ± 3.0%15,000 ± 2.9%62,000 ± 5.6%290,000 ± 5.2%1,500,000 ± 16.2%16,000,000 ± 6.8%103,000,000 ± 4.3%493,000,000 ± 1.2%
foreach01416642561,0244,09616,19265,536262,1441,048,576
Array2 ± 0%5 ± 0%15 ± 0%57 ± 1.8%230 ± 1.7%900 ± 2.0%3,580 ± 1.4%14,200 ± 1.7%55,600 ± 0.8%228,000 ± 1.6%910,000 ± 1.9%3,610,000 ± 0.7%
Array-while0 ± 0%1 ± 0%0 ± 0%1 ± 0%0 ± 0%0 ± 0%0 ± 0%-4 ± 100.0%10 ± 166.7%70 ± 97.1%0 ± 507.0%500 ± 153.3%
List0 ± 0%3 ± 0%13 ± 0%50 ± 2.0%209 ± 1.9%800 ± 2.1%3,500 ± 3.5%14,100 ± 3.8%55,000 ± 4.6%231,000 ± 2.8%920,000 ± 9.7%3,800,000 ± 6.4%
List-while4 ± 0%5 ± 0%13 ± 0%49 ± 2.0%211 ± 0.9%812 ± 1.1%3,400 ± 2.9%14,200 ± 4.5%57,000 ± 6.9%226,000 ± 2.2%930,000 ± 6.5%3,700,000 ± 4.4%
Vector15 ± 0%19 ± 0%30 ± 0%74 ± 1.4%268 ± 2.2%1,000 ± 2.5%3,960 ± 1.9%16,200 ± 4.1%62,000 ± 2.4%256,000 ± 1.5%1,030,000 ± 1.5%4,300,000 ± 3.2%
Set4 ± 0%5 ± 0%10 ± 0%99 ± 1.0%420 ± 2.6%1,560 ± 2.4%10,200 ± 4.1%51,000 ± 1.7%217,000 ± 2.2%2,200,000 ± 5.3%10,800,000 ± 1.7%48,600,000 ± 1.8%
Map19 ± 0%7 ± 0%20 ± 0%140 ± 2.1%610 ± 4.0%2,500 ± 3.9%13,900 ± 3.9%72,800 ± 0.9%360,000 ± 3.3%3,700,000 ± 8.2%20,700,000 ± 1.6%75,000,000 ± 3.6%
m.Buffer0 ± 0%1 ± 0%1 ± 0%1 ± 0%1 ± 0%0 ± 0%1 ± 0%2 ± 100.0%-1 ± 800.0%-10 ± 423.5%0 ± 259.4%-200 ± 185.1%
m.Set19 ± 0%26 ± 0%50 ± 2.0%130 ± 1.5%508 ± 1.4%2,190 ± 0.5%11,900 ± 2.1%56,600 ± 1.4%235,000 ± 2.6%940,000 ± 2.2%3,800,000 ± 5.5%14,700,000 ± 5.0%
m.Map8 ± 0%16 ± 0%48 ± 2.1%146 ± 0.7%528 ± 1.1%2,210 ± 1.7%10,300 ± 2.8%54,100 ± 0.4%255,000 ± 2.0%1,140,000 ± 5.4%6,800,000 ± 5.4%30,000,000 ± 6.6%
lookup01416642561,0244,09616,19265,536262,1441,048,576
Array0 ± 0%1 ± 0%1 ± 0%1 ± 0%0 ± 0%0 ± 0%1 ± 0%-1 ± 200.0%4 ± 200.0%0 ± 675.0%100 ± 71.6%-200 ± 215.5%
List0 ± 0%1 ± 0%8 ± 0%103 ± 1.0%2,390 ± 0.5%47,200 ± 1.0%870,000 ± 2.6%16,900,000 ± 2.8%
Vector0 ± 0%1 ± 0%5 ± 0%17 ± 0%104 ± 2.9%440 ± 2.7%1,780 ± 2.6%8,940 ± 1.1%38,000 ± 1.1%198,000 ± 1.3%930,000 ± 1.6%4,260,000 ± 1.7%
Set0 ± 0%18 ± 0%81 ± 1.2%507 ± 0.6%1,980 ± 0.8%7,800 ± 1.8%39,800 ± 1.8%203,000 ± 1.1%1,040,000 ± 2.3%8,300,000 ± 2.8%
Map0 ± 0%12 ± 0%97 ± 1.0%578 ± 1.6%2,250 ± 2.8%9,400 ± 1.5%46,000 ± 2.2%233,000 ± 1.7%1,150,000 ± 2.9%11,400,000 ± 2.6%
m.Buffer0 ± 0%1 ± 0%1 ± 0%1 ± 0%1 ± 0%1 ± 0%1 ± 100.0%0 ± ∞%6 ± 133.3%-10 ± 200.0%0 ± 415.4%0 ± 970.0%
m.Set0 ± 0%5 ± 0%22 ± 0%97 ± 1.0%410 ± 2.7%1,690 ± 1.4%7,100 ± 2.2%31,300 ± 1.8%148,000 ± 1.4%690,000 ± 2.1%4,800,000 ± 1.6%
m.Map0 ± 0%6 ± 0%25 ± 0%112 ± 1.8%454 ± 1.3%1,910 ± 0.7%9,400 ± 0.5%52,500 ± 0.7%243,000 ± 2.7%1,760,000 ± 1.2%9,900,000 ± 5.3%

Raw Benchmark Data

下面的链接是原始测试数据,包括每一次独立的benchmark,而不是统计均值和标准差后的数据:

Benchmark Code

如果你想看测试代码,你可以浏览下面的链接:

或者下载:

在下载页面点击git clone scala-bench.bundle得到你自己的仓库,然后

  • sbt "~bench/runMain bench.MemoryMain" 进行内存benchmark
  • sbt "~bench/runMain bench.PerfMain" 执行性能测试然后将结果导入到 bench/target/results.json。注意在我的笔记本上它运行了4个小时,所以最好启动它然后第二天查看结果。
  • sbt "~bench/runMain bench.AnalyzeMain" 用来读取结果文件results.json,计算均值和标准差,将结果写入到一个markdown文件。

这个benchmark代码绝对不像使用JMH那么严格,JMH已经相当慢了,而这个benchmark套件也需要四个半小时。