Insights

Every European cosmetic, detergent or paint formulator applying for the EU Ecolabel runs into the same wall. The reference document — the Detergent Ingredients Database (DID list) — is organized partly by chemical family and partly by substance, with no clear guidance on which is which. Formulators work from CAS numbers on their Safety Data Sheets; the Ecolabel criteria refer to DID entries. The bridge between the two exists nowhere officially. Each company ends up rebuilding its own mapping, inconsistently, dossier after dossier.

The data gap

The 2023 DID list is, in practice, a hybrid system. Of its 265 entries, about a third (95 entries, 36 %) point to a single, specific substance — for example, DID 2022 “C16–18 Fatty acid methyl Ester Sulphonate” maps uniquely to CAS 3076-26-4. The remainder describe chemical families of varying breadth: small groups of 2–5 CAS, medium groups of 6–10, and a tail of broad families covering 11 or more CAS each. A further 33 entries are too generic to be mapped to any specific CAS at all — ethoxylation series (EO 3–12), carbon-chain ranges (C14–20), or UVCB substances whose composition is defined by a process rather than a structure.

Fig 1: Distribution of the number of CAS per DID across the 265 entries of the 2023 list. Left: per-DID histogram (mapped DIDs only, n = 232). Right: bucketed view of all 265 entries, highlighting that 95 of them are already single-substance entries.

This distribution matters. When a DID already points to one CAS, a formulator knows exactly which environmental profile applies. When a DID covers twenty or more CAS, the choice becomes a judgment call — and two evaluators working in good faith can reach different conclusions. The exact choice changes the calculated Critical Dilution Volume (CDV) of the final product: a borderline CAS assigned to a different DID means different Toxicity Factors (TF) and Degradation Factors (DF), which in turn can decide whether a formulation passes the Ecolabel threshold or not.

Why it matters

This is not a documentation issue. It is a reproducibility issue. Two evaluators working on the same formulation, in good faith, can reach different CDV values because they make different CAS→DID attributions. For the industry, that means regulatory uncertainty. For the Ecolabel scheme itself, it means the scientific rigour of the underlying calculation is weakened by an arbitrary upstream decision.

The ambiguity does not stop at the Ecolabel itself. Any tool that uses the DID list as its environmental reference — whether it is an industry screening workflow, a consultancy spreadsheet, or a software platform — inherits the same upstream uncertainty. You cannot build reliable sustainability assessments on an ambiguous reference dataset.

Our approach

We rebuilt the mapping from scratch, using five independent sources and documenting every attribution:

• A.I.S.E. ESC Tool workbook — the industry-maintained reference, pre-AI, built on expert chemist consensus.

• HERA Risk Assessments — 42 technical reports covering the most-used household detergent substances.

• UBA Surfactants Suspect List (2018) — systematic CAS/name catalogue from the German Environment Agency, enriched with DTXSIDs.

• Explicit CAS numbers and IUPAC names in the DID list itself — sometimes the information is right there, just not machine-readable.

• AI-assisted web lookup — targeted queries against PubChem, ECHA, and NIST, run in a dual-mode setup (local agent + web-search chat) for cross-validation.

Every (DID, CAS) couple was then scored for reliability on a five-level scale — from very high (multi-source agreement with validated chemical structure) down to flagged (semantic inconsistency detected and awaiting expert review).

Figure 2: Five-source consolidation pipeline. Each (DID, CAS) couple carries a traceable provenance and a reliability score.

Concrete results

• 1 410 DID↔CAS couples mapped across the 265 DID entries.

• 8.1 % flagged for expert review — mismatches the automated pipeline accepted but AI semantic validation caught.

• Two genuine CAS typographical errors identified in upstream sources — both passed the ISO mod-10 check-digit validation; only the semantic AI lookup exposed them.

• 33 DID entries classified as non-mappable by design — carbon-chain ranges (C14–20), ethoxylation series (EO 3–12), UVCB substances whose composition is too broad for a single-CAS attribution.

Every couple is traceable: anyone can re-open the dataset, see which of the five sources supports a given attribution, and audit the reasoning.

The policy implication

Here is the real lesson. Even with the most advanced AI tools available today, mapping a specific CAS to a family-based DID entry remains a perilous exercise. The ambiguity is not in the mapping tool — it is in the DID list itself.

The pragmatic recommendation is straightforward: the next revision of the DID list should be rebuilt starting from the CAS level, not from chemical families. And the good news is that 36 % of entries already are CAS-level — the refactoring is not a methodological rupture, it is the extension of an existing practice to the whole list. A substance-first architecture — one DID per defined CAS, with families emerging as groupings rather than as foundations — would eliminate the reproducibility problem at its source. It would also give the European Commission’s own ecodesign toolbox a solid footing, which is essential if these tools are to be accepted by the industry they target.

This is not a rebuttal of the DID list. It is a proposal for its evolution, consistent with the data-quality principles that the SSbD framework is built on.

What comes next

The complete dataset, the scoring methodology, and the five-source pipeline are being finalized for peer-review submission to Integrated Environmental Assessment and Management (IEAM). The mapping is also the substance backbone of EcoCalc.eu, the ecodesign platform we are finalizing for formulators, regulatory consultants, and competent bodies — where the reliability score is surfaced directly to the user, not buried in a spreadsheet.

A companion paper is already in preparation. Once every DID is traceable to specific CAS numbers, it becomes possible to compare the DID-list Toxicity Factors (TF) and Degradation Factors (DF) against substance-specific values derived from independent ecotoxicological databases — JRC’s PEF datasets, HESI EnviroTox, the HERA risk assessments, NORMAN, and others. This is the subject of our next study, and it is made possible precisely by the CAS-level traceability introduced in the present work.

We welcome feedback — from cosmetic, detergent, and paint formulators who have lived this pain, from evaluators in competent bodies, and from colleagues at the JRC and AISE who have worked on the DID list over the years. Rebuilding a shared reference is everyone’s job.

* * *

About the author. Erwan Saouter is the founder of Net-Zero Impact SAS and a registered A.I.S.E. Charter for Sustainable Cleaning consultant. He has contributed to the JRC Product Environmental Footprint programme, acts as an evaluator for Horizon Europe SSbD calls, and is the developer of the EcoCalc platform for chemical sustainability assessment.

Contact: saouter@net-zero-impact.eu — ssbd-expert.eu/insights

Net-Zero Impact SAS - SSbD-Expert