Understanding Data

Data as a Concept

A deeper conceptual examination of data — exploring how data is defined computationally, why data is never objective, the relationship between data and 'fact', and the socio-technical nature of data.

Data as a Concept

Having introduced the basic definitions and social dimensions of data in earlier weeks, this lecture takes a more deeply philosophical and conceptual approach. It examines: What exactly is data in a computational sense? Is data truly a modern invention? Can data ever be objective? And what does it mean to say that data is social?

Data = Objects + Features

Manovich (2019) offers a precise computational definition of data that goes beyond the common-sense understanding. He argues that in a digital environment, "data" is not just any arbitrary collection of items — it is specifically something a computer can read, transform, and analyse. This definition imposes fundamental constraints on how we represent anything as data.

Manovich identifies three key decisions that shape how anything is represented as data:

What is chosen as objects — which phenomena or entities are selected for measurement and representation (e.g. people, events, transactions)
What features are chosen — which properties or attributes of those objects are captured (e.g. age, location, income, sentiment)
How those features are encoded — the technical and categorical choices about how information is translated into a computable format (e.g. categories, numerical scales, boolean flags)

These three decisions determine what becomes computable, manageable, knowable, and shareable through data science techniques. Crucially, they are always choices — made by people, shaped by values and assumptions, and reflective of particular worldviews. Different choices would produce different data about the same phenomenon.

Data is Never Objective

A central claim of this lecture — and of critical data studies more broadly — is that data is never fully objective. Data is historically linked to the rise of scientific modernity and the idea of the objective fact. But as Rosenberg (in Kitchin, 2014) demonstrates, facts and data are not the same thing:

"Facts are ontological, evidence is epistemological, data is rhetorical. A datum may also be a fact, just as a fact may be evidence. The existence of a datum has been independent of any consideration of corresponding ontological truth. When a fact is proven false, it ceases to be a fact. False data is data nonetheless."

This distinction is crucial. Data can persist even when it is false, incomplete, or misleading. Data can be constructed to serve particular rhetorical goals — to persuade, to mislead, or to justify decisions that serve particular interests. Understanding data as rhetorical means recognising that its power lies not in its accuracy but in its authority and legibility.

Is Data New?

A common assumption is that data is a fundamentally modern phenomenon, tied to computers and digital technology. Rosenberg (2013) challenges this assumption. While the term data as used today is new — linked to the rise of computational software and digital infrastructure — the underlying concept is not. Human societies have long collected and stored information for reference and decision-making. What is new is the scale, speed, and reach of contemporary data collection, and the ways in which digital infrastructure makes data so pervasive and consequential.

Gitelman (2013) extends this argument with the claim that data is never raw. Even before analysis begins, data is always already shaped by the assumptions, methods, and priorities of those who collect it. It is, in Gitelman's memorable phrase, always "cooked" — processed, selected, and interpreted. The idea that there exists a pristine, unprocessed state of data that simply mirrors reality is a myth.

Data is Social

The most important conceptual claim of this lecture is that data is inherently social. Kitchin (2021) articulates this powerfully:

"Data then are not benign, neutral measures that reflect the world as it is, within technical constraints. What data are generated, and how they are produced, handled and used, is the result of choices and decisions by people. These, in turn, are shaped by intended outcomes, theories and concepts, research designs, procedures, protocols, standards, resources and finance, regulations and laws, organizational processes, ethics reviews, political context and so on. The cooking of data does not take place in a vacuum, but within context. Data-driven endeavours are not simply technical systems, but are socio-technical systems. That is, they are as much a result of human values, desires and social relations as they are scientific principles and technologies." (Kitchin, 2021)

This means that to understand data, we must understand the social contexts, institutional structures, and human decisions that produced it. Data science divorced from social analysis risks producing technically sophisticated but socially naive results.

Key Takeaways

Data is not new — its history predates the digital age
Data is never raw — it is always already processed and interpreted (Gitelman, 2013)
Data is never fully objective — it is rhetorical and shaped by human choices
Data is always social — produced within and shaped by social relations, values, and power (Kitchin, 2021)
Data does not equal fact — false data is still data (Rosenberg, in Kitchin, 2014)