Thursday, February 16, 2012

Can a single table be used for cube?

Hi,
Maybe a silly question since I am new to SW/OLAP/etc. etc.
I was wondering that if I have a single table having all the things that
can be used as 'measures' PLUS all the possible information about
dimensions in the same table.
Basically this would end up with a highly un-normalized table but
wouldn't it give a very good performance since there will be no joins
required.
I'll appreciate feedback and guidance from the experienced audience.
Thanks.
I'm not sure it would give good performance. Consider a star schema with a date dimension, for example. To know what values are available for the dimension, it's compacted into this nice small table that only has to deal with dates. If you denormalized
all of that into one table, then you'd have to scan the whole table to know what dates are available for slicing and dicing.
If you did that with all of your dimensions, you'd end up having to scan the whole table several times to get all of the possible values for slicing and dicing. You could, I suppose, index those, but now you've created some overhead where you didn't have
it before.
Besides--and I'm just guessing here--I would think that under the covers there's not really a join going on anyway. The IDs for the dimensions are probably used for retrieving data from the fact table.
|||Hi,
Thanks for the reply. Based on my findings over the past few days, I
have concluded that having only a single table to be used for the fact
table as well as dimension will be a VERY BAD APPROACH so I agree
with your suggestions.
To answer your question: Yes I wan not planning on any joins... the
single table would have been being available after running a DTS over
night which would have done the joining which many tables to give me the
single highly un-normalized table...
But you know what; to be honest this is where I do get a little confused
though because though there are many advantages of normalization, the
performance from an un normalized table is supposed to be far
superior... I guess the counter argument to this would be your own
comment I.e. "If you did that with all of your dimensions, you'd end up
having to scan the whole table several times to get all of the possible
values for slicing and dicing..." and as I understand and assume, this
would result in the calculated aggregations taking up a lot of space on
the Hard disk (server's or local PC's - depending on whether I am using
ROLAP, MOLAP, HOLAP, WOWLAP )
Please DO comment in response to my mail as it will help my confirm my
understanding of many things so far.
Thanks again for the reply and regards.

> I'm not sure it would give good performance. Consider a star schema with a date
> dimension, for example. To know what values are available for the dimension, it's
> compacted into this nice small table that only has to deal with dates. If you
> denormalized all of that into one table, then you'd have to scan the whole table
> to know what dates are available for slicing and dicing.
> If you did that with all of your dimensions, you'd end up having to scan the
> whole table several times to get all of the possible values for slicing and
> dicing. You could, I suppose, index those, but now you've created some overhead
> where you didn't have it before.
> Besides--and I'm just guessing here--I would think that under the covers there's
> not really a join going on anyway. The IDs for the dimensions are probably
> used for retrieving data from the fact table.
>

No comments:

Post a Comment