lynx286 一失足成千古风流人物!
论坛CEO
![[Avatar]](/images/avatar/eccbc87e4b5ce2fe28308fd9f2a7baf3.jpg)
注册时间: 2008-04-22 11:52:00
文章: 652
来自: 四海为家
离线
|
DataStage doesn't know how large your data is, so cannot make an
informed choice whether to combine data using a join stage or a lookup
stage. Here's how to decide which to use:
There are two data sets being combined. One is the primary or driving
dataset, sometimes called the left of the join. The other data set(s) are the
reference datasets, or the right of the join.
In all cases we are concerned with the size of the reference datasets. If
these take up a large amount of memory relative to the physical RAM
memory size of the computer you are running on, then a lookup stage
may thrash because the reference datasets may not fit in RAM along with
everything else that has to be in RAM. This results in very slow
performance since each lookup operation can, and typically does, cause a
page fault and an I/O operation.
So, if the reference datasets are big enough to cause trouble, use a join. A
join does a high-speed sort on the driving and reference datasets. This can
involve I/O if the data is big enough, but the I/O is all highly optimized
and sequential. Once the sort is over the join processing is very fast and
never involves paging or other I/O.
|