Study Web

Understanding Data

What is Data?

Introduction to the concept of data — its definition, history, types, and relationship to knowledge. Covers the DIKW pyramid, big data, and the difference between raw and processed data.

What is Data?

At its most basic level, data are measurements or observations collected as information for reference or analysis (Australian Bureau of Statistics). Data can be quantitative (numerical) or qualitative (observational), and it comes from a wide range of sources in a variety of forms. The word data also carries a more colloquial meaning — such as mobile data plans — illustrating how deeply the concept has penetrated everyday language.

A Brief History of Data

The word data has Latin roots meaning "to give" or "thing given." It adopted a new technical meaning in the 1890s, referring to numerical facts collected for future reference. By the 1940s it was associated with computational information, giving rise to terms like database and data processing. This shift reflects a broader transformation in how modern societies generate, store, and use information. Understanding this history helps us recognise that data is not a neutral or inevitable category — it was shaped by specific social, technological, and intellectual developments.

Types of Data

Data is classified in multiple overlapping ways. Kitchin (2014) identifies several key distinctions:

  • Quantitative vs Qualitative — numerical data (counts, measurements) vs non-numerical data (observations, descriptions, categories)
  • Structured, Semi-structured, and Unstructured — the degree to which data is organised in fixed formats (e.g. databases, spreadsheets) vs loosely formatted (e.g. emails, social media posts)
  • Captured, Exhaust, Transient, Derived — how data is generated: deliberately captured, produced as a by-product of activity, temporary and short-lived, or computed from other data
  • Primary, Secondary, Tertiary — whether data was generated by the researcher, sourced from existing records, or aggregated from secondary sources
  • Indexical vs Attribute Data — whether data can function as a unique identifier (e.g. a fingerprint) or describes properties of an object
  • Metadata — data about data; descriptive information that provides context (e.g. file creation dates, GPS coordinates embedded in photos)

The DIKW Pyramid

The Data–Information–Knowledge–Wisdom (DIKW) pyramid is a widely used model illustrating the progression from raw data to actionable insight:

  • Data — raw, unprocessed facts and figures without context (e.g. the number 37)
  • Information — organised, structured, and contextualised data that answers who, what, when, where (e.g. 37 degrees Celsius, recorded in Sydney on Tuesday)
  • Knowledge — the result of analysing information to uncover patterns, trends, and relationships; answers how (e.g. 37°C is above average for this time of year)
  • Wisdom — well-informed decisions and effective action based on deep understanding; answers why and guides choices (e.g. issuing a health warning for vulnerable populations)

The pyramid is useful but has been critiqued for presenting a linear, overly neat progression. In practice, the boundaries between levels are blurry and context-dependent.

Raw Data and the Problem of Objectivity

The term raw data implies that data can be collected in a pure, unmediated state. Gitelman (2013) challenges this directly: "Raw data is both an oxymoron and a bad idea; to the contrary, data should be cooked with care." Every act of data collection involves choices — what to measure, how to measure it, what to include, and what to leave out. These choices are shaped by human values, institutional structures, and social contexts. This means data can never fully escape interpretation: it is always, in some sense, already processed.

Big Data

Big data refers to datasets so large or complex that they require specialised software tools to process. Kitchin and McArdle (2016) characterise big data by three properties — the three Vs:

  • Volume — enormous quantities of data (far beyond the capacity of conventional databases)
  • Velocity — data created and processed in real-time or near-real-time (e.g. social media feeds, sensor networks)
  • Variety — data in structured, semi-structured, and unstructured forms simultaneously (e.g. numbers, text, images, video)

Big data is often contrasted with small data: small data tends to be limited in volume, slow-moving, and based on samples, while big data is exhaustive, fast, and covers entire populations. This distinction matters for research methodology: big data may seem to offer complete pictures, but it introduces its own forms of bias and error.

A Media Studies Approach to Data

This unit combines critical data studies with communication skills. Rather than treating data purely as information or facts for predictive analysis, a media studies approach is concerned with the power of data — who generates it, who controls it, and how it shapes society. This involves developing both theoretical understanding and practical communication skills: not just how to use data, but how to communicate with and about data critically and responsibly.

Why Study Data?

  1. Data is everywhere — in science, policy, business, media, and daily life
  2. Data has enormous economic and social value — flagged by many as "the new oil" (Toonders, 2014)
  3. Data literacy skills are increasingly essential for modern work and civic participation
  4. Data shapes decisions that affect people's lives — making critical engagement with it a matter of social importance