Content: Markdown
Framework: Zola
Hosting: Github Pages
Styling: SCSS
Theme: Consoler Dark
Title: parkerjones.dev

wtf is columnar data

3 minute read Published: 2025-01-06

What is Columnar Storage?

Columnar storage is a way of organizing data in a database where values of a single column are stored together, rather than storing all the data for a single row together. This storage method is optimized for read-heavy, analytical workloads where queries often operate on a subset of columns over many rows.

How It Works:

In row-oriented databases, like MySQL or PostgreSQL, data is stored row by row:

Row 1: [ID1, Name1, Age1, Salary1]
Row 2: [ID2, Name2, Age2, Salary2]

In column-oriented databases, data is stored column by column:

ID:   [ID1, ID2, ID3, ...]
Name: [Name1, Name2, Name3, ...]
Age:  [Age1, Age2, Age3, ...]
Salary: [Salary1, Salary2, Salary3, ...]

Key Differences Between Columnar and Row-Oriented Storage

AspectRow-Oriented StorageColumnar Storage
Data LayoutStores data row by row.Stores data column by column.
Best Use CaseTransactional workloads (OLTP) like updating user accounts, processing orders, etc.Analytical workloads (OLAP) like aggregations, filtering, and reporting.
Query PerformanceGood for queries involving entire rows.Fast for queries involving a few columns over many rows.
Data CompressionLess effective; different data types in rows make compression harder.Highly compressible; similar data types in columns lead to better compression.
Write PerformanceFaster for frequent row-level updates.Slower for updates, as entire columns may need rewriting.
ExamplesMySQL, PostgreSQL, SQLite.DuckDB, Apache Parquet, ClickHouse.

Why Columnar Storage Matters

1. Efficient Analytical Queries

Columnar storage is optimized for queries that:

By only reading the columns needed, columnar storage avoids wasting time scanning irrelevant data.

2. Better Compression

Because all the data in a column is of the same type (e.g., integers, floats, strings), columnar storage compresses data better. For example:

This reduces the size of data on disk and speeds up queries.

3. Vectorized Processing

Columnar databases often use vectorized execution, which processes chunks of columns in batches. This is faster than row-by-row processing.


When to Use Columnar vs. Row-Oriented Databases

Use Columnar Storage for:

Use Row-Oriented Storage for:


Example Comparison

Scenario: Salary Data for Employees

If you want to find the average salary of employees:

This makes columnar storage faster and more resource-efficient for this type of query.


Columnar Storage in Action

Modern tools and formats like DuckDB, Apache Parquet, and Amazon Redshift leverage columnar storage to handle analytics efficiently.

Would you like help setting up a columnar database or converting your data into columnar formats like Parquet?