Beyond the numbers: addressing data quality challenges in the cocoa sector

With stricter regulations on the horizon in the EU, companies sourcing cocoa face mounting pressure to prove that their supply chains are free from deforestation, forced labor, and child labor. New rules such as the EU Deforestation Regulation (EUDR) and proposed due diligence directives are raising the bar for traceability, transparency, and accountability in the cocoa sector.

In practice, this means one thing: data. Lots of it. But as data collection efforts surge across producing countries, a critical question is being overlooked: what is the quality of the data being gathered? And beyond compliance, can the data currently collected truly support meaningful interventions, track real progress, and drive lasting improvements in cocoa producing countries?

In the cocoa sector, traders are the main data collectors. Their clients are the main beneficiaries.

Data in the cocoa sector is mainly collected for compliance and reporting purposes. Cocoa traders, who are often tasked with implementing sustainability programs on behalf of their downstream clients, play a central role in both executing these interventions and monitoring the associated data. Much of this data serves to meet clients’ due diligence requirements, or to assess the performance of the traders’ own initiatives. While chocolate manufacturers also conduct surveys, they frequently delegate data collection responsibilities to cocoa traders or NGOs, often with similar objectives: fulfilling legal obligations, responding to client expectations, or evaluating program outcomes.

Cocoa traders, who oversee movements of cocoa along the supply chain and are best positioned to capture sourcing information are also capturing traceability data. Certification bodies also intervene to assess compliance with sustainability standards.

Can a patchwork of processes lead to data integrity ?

Across the cocoa sector, those involved in data collection often gather similar types of information, particularly on issues like child labor, deforestation, or household income, usually collected directly from farmers through household surveys. Because these data are mainly collected to meet reporting requirements, they must often be produced within strict timelines. This creates a strong operational pressure to conduct surveys and monitoring activities within tight windows, sometimes at the expense of depth, accuracy, or respondent availability.

Traceability data, on the other hand, draws from more diverse sources and can be captured at multiple points along the supply chain. In practice, this results in a patchwork of digital systems, manual reporting, and handwritten records, which reflect both the complexity and the fragmentation of traceability mechanisms in the sector.

The profiles of the individuals responsible for collecting data also vary widely. In some cases, they are part of the cocoa communities themselves, cooperative agents or lead farmers, while in others, they are NGO staff or hired enumerators with differing levels of training, oversight, and independence.

The identity of the enumerator is a double-edged sword: being from the community may help build trust and encourage openness from farmers, but it can also introduce bias: the enumerator may have an incentive to protect peers or avoid reporting information that could lead to sanctions.

For traceability-related data, collection is most commonly carried out by employees of cocoa cooperatives or trading companies, as they are directly involved in managing and documenting cocoa flows on a daily basis. These diverse roles and responsibilities contribute to a highly heterogeneous data ecosystem: one shaped as much by reporting imperatives as by real- world operational constraints.

Challenges in current data collection practices

A serious concern lies in who is doing the data collection, and how. While nearly every actor in the cocoa sector is involved in some form of data collection, few are equipped with the specialized skills it truly requires. Gathering high-quality data, especially on sensitive and complex issues, is not just a task to be added to existing roles; it demands technical know-how and methodological precision. Designing a reliable survey takes technical expertise to minimize bias and generate useful, honest responses. Administering these surveys, especially on sensitive topics like child labor or household income, demands proper training and ethical awareness. Without solid preparation for enumerators in the field, the data collected is often unreliable. And when poor-quality data feeds into monitoring systems, there is a real risk of a “garbage in, garbage out” effect: flawed reporting, misleading metrics, and ineffective program decisions based on weak evidence.

Another key issue is the lack of external oversight in how data is managed after collection, especially when it involves sensitive topics. Too often, the information gathered is rarely monitored, audited, or verified by independent third parties. This is particularly problematic when the same actor responsible for collecting the data is also the one reporting it to validate their own performance or program outcomes. Such a setup creates a clear conflict of interest and opens the door to biased reporting, inflated figures, or selective disclosure. In the case of child labor detection, even audits conducted by third parties often fall short: farmers are typically informed in advance of the visit, giving them time to prepare and potentially conceal any issues. As a result, these audits offer limited assurance of the accuracy or completeness of the data collected.

Beyond the technical and operational shortcomings, there is a deeper ethical concern around how farmers are repeatedly drawn into overlapping data exercises. They are often surveyed multiple times, by different organizations, on similar topics, year after year. This leads to growing fatigue and, in some cases, frustration among farmers who see little return for the time and information they provide. Despite being a critical source of insight, farmers are rarely, if ever, compensated for their participation. The prevailing rationale among data-collecting actors is that participation is simply part of the certification process. But this logic ignores an uncomfortable reality: the benefits of certification remain highly variable for farmers and, in some cases, negligible.

Good intention, bad incentives?

As new regulations like the EU Deforestation Regulation (EUDR) raise the bar for traceability and compliance in the cocoa sector, they might also bring unintended consequences for data quality. These measures are driven by important environmental and social goals, but in practice, they rely heavily on the quality, accuracy, and reliability of field-level data systems that are often still underdeveloped or fragmented. When compliance becomes a prerequisite for market access, and data systems are not yet mature enough to support that compliance credibly, the pressure to “deliver clean data” can unintentionally encourage misreporting, corner-cutting, or strategic omission of information. In environments where independent verification is weak and accountability is diffuse, stricter rules can end up creating perverse incentives that undermine the very integrity of the data they aim to improve.

Collect smarter: Pathways to better cocoa data

Collecting meaningful data is not just a logistical task, it takes expertise. Yet in the cocoa sector, the responsibility for gathering complex, sensitive information is often assigned to actors like traders or NGOs whose core expertise may lie elsewhere. This raises a fundamental question: are we asking the right people to do the job? Reliable, actionable data demands specialized skills, from survey design to ethical fieldwork, and requires that enumerators receive not only robust technical training, but also continuous oversight and support.

At the same time, the sector’s fragmented approach to data collection reinforces this challenge. With limited coordination between actors, each operating with their own tools, timelines, and methodologies, data collection in the cocoa sector often lacks alignment and integration. Traders, NGOs, and certification bodies are often collecting similar types of data, yet rarely do these efforts feed into a shared system. In many cases, actors invest in parallel digital platforms, each with their own traceability protocols and reporting formats, making it difficult to compare, aggregate, or validate data across the supply chain.

Encouragingly, initiatives like the International Cocoa Initiative’s push for a common methodology on child labor monitoring, or the World Cocoa Foundation’s efforts to standardize how cocoa farmer income data is gathered and used, show that progress is possible when stakeholders align around shared goals and practices. Building on these examples, the sector should move toward more open data-sharing frameworks, common indicators, and interoperable systems.

Marine Jouvin, PhD