Column

A column-oriented database stores data in tables, organized by columns, whereas rdbms organize data in rows.

It cas manage large dataset and access it fast. It allows complex analytic calculus but is effective when data has the same type.

Usages: CMS, blogs, counters, expiring usage, etc.

Examples: Cassandra, Hbase, BigTable, Parquet

Pros

Only attributes that are needed are read from disk
Adding new column is easy

Cons

Combining values from multiple column is costly (tuple reconstruction)
Inserting a new tuple is costly

Column compression

When there are repeated values, we can encode the column to reduce dataset size and speed up requests.

Sometimes, we don’t have to decode the column to answer the request e.g, sum the values to count number of elements.

run-length

Count number of repetitions Format: value, start row, run-length

bit-vector

For each value in column, create bit vector (one bit / row). Good for few distinct values.

dictionary

Replace values by shorter placeholders. Maintain dictionary to map placeholders back.

frame of reference

Choose median value as a reference. Store off-set for following values. Use # marker for “big offsets values” exceptions.

differential

Like frame of reference, but we store the difference with the preceding row instead of the reference.

We can also use # marker for big offsets values.

Sorting

Genetic algorithms

Graph algorithms

Problems

Representation model

Other

Sysml

UML

Behaviour-diagrams

Structural-diagrams

Paradigms

Assets

Quality Attributes

Binary

Data structure

Heap

NoSQL

Data types

Cloud

Glossary

Glossary

Operating System

Learning paradigms

Neural Network

Linear algebra

Tensor

Physics

Column ​

Column compression ​

run-length ​

bit-vector ​

dictionary ​

frame of reference ​

differential ​