# Phase 1 spec schema

**Zebra Skimmers redesign · spec data layer specification · v0.1**

Companion to [phase-1-schemas.md](./zebra-redesign-phase1-schemas.md). This document specifies the metaobject types that hold structured specification data — the biggest data gap in the current catalog and the foundation for the comparison view, the printable spec sheet, the collection-page spec filters, and the AI Search spec-aware ranking.

---

## 00 · Why specs are first-class data

The scraped catalog at `product_data/products_full.json` has `specifications: dict[0]` on every product. The actual specs live in description prose — "Reach: 8" - 101"", "Skimming Capacity: 1 quart per hour", "Frequency / Voltage: 60 Hz / 120 V" — readable by humans, not by anything else. That means:

- The comparison view cannot exist. No structured data to compare.
- Collection-page spec filters cannot exist. No structured data to facet.
- Printable spec sheets are manually maintained PDFs that drift from the website over time.
- AI Search can't rank by spec match because the spec is unstructured text.
- The future "Find Your Fit" guided product finder has to rely on category-level heuristics instead of actual spec matching.

Specs become first-class metaobjects with two types — `product_spec_group` for categories (Performance, Electrical, Mechanical, Materials, Certifications, Service) and `product_spec_row` for individual values. Products and variants reference rows by metafield. The same row entry can appear on multiple products when the value is shared (every ZVA8 variant uses the same motor speed); variants override product-level rows when the value differs by configuration.

The same data feeds the PDP spec table, the comparison view, the printable spec sheet (single source of truth), the collection-page filters, the Find Your Fit rules engine, and the AI Search metadata. One schema, six consumers.

---

## 01 · `product_spec_group`

Categories that organize spec rows on the PDP and in printable output. Small, shared list — most stores need 5-8 groups total.

### Definition

| Property | Value |
|---|---|
| **Type** | `product_spec_group` |
| **Display name** | Spec group |
| **Description** | A category of related specifications (Performance, Electrical, Mechanical, etc.) used to group rows in PDP, comparison, and print views. |

### Fields

| Key | Type | Required | Notes |
|---|---|---|---|
| `label` | `single_line_text_field` | yes | Display name. e.g., "Performance", "Electrical", "Mechanical", "Materials", "Certifications", "Service & warranty". |
| `key` | `single_line_text_field` | yes | Machine-readable identifier. Snake_case. e.g., `performance`, `electrical`, `mechanical`. Used for URL fragments (`#spec-electrical`), CSS hooks, and comparison-view column alignment. |
| `description` | `multi_line_text_field` | no | Optional context shown under the group label. Most groups don't need it. |
| `display_order` | `number_integer` | yes | Lower numbers render first. Performance typically 10, Electrical 20, Mechanical 30, Materials 40, Certifications 50, Service 60. Leaving gaps lets you insert new groups without renumbering. |
| `icon` | `single_line_text_field` | no | Optional icon identifier. e.g., `lightning` for Electrical, `gear` for Mechanical. Drives an inline SVG in the group header — see Phase 2 icon set. |

### Default seed entries (theme install)

A fresh theme install creates these six groups by default so a merchant has a starting structure:

```yaml
- label: Performance
  key: performance
  display_order: 10
  icon: gauge

- label: Electrical
  key: electrical
  display_order: 20
  icon: lightning

- label: Mechanical
  key: mechanical
  display_order: 30
  icon: gear

- label: Materials
  key: materials
  display_order: 40
  icon: layers

- label: Certifications
  key: certifications
  display_order: 50
  icon: shield

- label: Service & warranty
  key: service
  display_order: 60
  icon: tool
```

Merchants in other verticals (food service, lab equipment, agricultural) can rename, reorder, or replace these. The schema makes no assumption about which groups exist — it just requires that every row references one.

---

## 02 · `product_spec_row`

Individual spec lines. Values can be numeric (with dual US/metric units), textual, ranges, or enums. The same row entry attaches to multiple products when the value is shared.

### Definition

| Property | Value |
|---|---|
| **Type** | `product_spec_row` |
| **Display name** | Spec row |
| **Description** | A single specification value. Belongs to a spec group. Reusable across products when the value is shared. |

### Fields

| Key | Type | Required | Notes |
|---|---|---|---|
| `label` | `single_line_text_field` | yes | Display name for the row. e.g., "Capacity", "Reach (depth)", "Power", "Material". |
| `key` | `single_line_text_field` | yes | Machine-readable identifier. Snake_case. e.g., `capacity`, `reach_depth`, `power_voltage`. Drives comparison-view column alignment, filter facets, and structured-data emission. **Rows with the same `key` are treated as the same spec across products** — that's how the comparison view aligns "Reach" from ZVA8 next to "Reach" from a competing belt skimmer. |
| `group` | `metaobject_reference[product_spec_group]` | yes | Which group this row belongs to. |
| `value_text` | `multi_line_text_field` | one of value_text / value_number_us | Free text value. Used when the spec isn't numeric, or when it carries qualifiers numbers can't express. e.g., "316 stainless · viton seals", "Side / clamp / magnetic". |
| `value_number_us` | `number_decimal` | one of value_text / value_number_us | Numeric value in US units. e.g., `4` (for 4 gph), `115` (for 115 VAC). |
| `unit_us` | `single_line_text_field` | conditional | Required when `value_number_us` is set. Free text but validated against a canonical unit list (see **Unit handling** below). |
| `value_number_metric` | `number_decimal` | no | Numeric value in metric units. e.g., `15` (for 15 L/h), `230` (for 230 VAC). |
| `unit_metric` | `single_line_text_field` | conditional | Required when `value_number_metric` is set. |
| `value_range_min` | `number_decimal` | no | For ranges: minimum value. e.g., `8` for "Reach: 8" - 101"". Uses `unit_us` / `unit_metric` from above. |
| `value_range_max` | `number_decimal` | no | For ranges: maximum. e.g., `101`. |
| `comparable` | `boolean` | yes | Default true. When false, the row is suppressed from the comparison view. Used for one-off rows that aren't meaningful to compare ("Color: black"). |
| `filterable` | `boolean` | yes | Default false. When true, the row generates a facet on the parent collection page. Reserved for high-cardinality numeric specs (Reach, Capacity, Power) and discrete enums (Mount type, Material). |
| `display_priority` | `number_integer` | yes | Within a group, lower numbers render first. Default 100. |
| `notes` | `multi_line_text_field` | no | Footnote. Renders as a small ⓘ tooltip or as a `<details>` expandable below the row. e.g., "Capacity measured at 70°F coolant temperature." |
| `applies_to_variants` | `list.variant_reference` | no | When set, this row only applies to the listed variants. Empty list means it applies to all variants of any product that references this row. See **Variant-specific specs** below. |
| `auto_derived` | `boolean` | no | Default false. When true, indicates the row is computed by automation (e.g., a Shopify Flow that updates capacity from a CSV import) and shouldn't be edited manually. Surfaces a warning in the metaobject editor. |

### Unit handling

Numbers without units are meaningless in industrial contexts. The schema separates value from unit so the renderer can format consistently and the comparison view can detect unit mismatches.

The `unit_us` and `unit_metric` fields are free text for flexibility but validated against a canonical list to catch typos:

```
# Length
in, ft, mm, cm, m

# Volume / flow
gal, gph, gpm, qt, fl_oz
L, L/h, L/min, mL

# Mass
lb, oz
kg, g

# Force
lbf, N

# Pressure
psi, bar, kPa

# Temperature  
°F, °C

# Electrical
VAC, VDC, V, A, W, Hz, Ω

# Time
s, min, h, day

# Speed / rotation
rpm, ft/s, m/s

# Dimensionless
%, ratio, qty
```

When both `value_number_us` and `value_number_metric` are set, the PDP renders both with a separator (`4 gph · 15 L/h`). When only one is set, the renderer can compute the other on display using a conversion table — but the source-of-truth value is whichever the merchant entered. The converted value carries a small "(converted)" annotation in the printable spec sheet.

### Variant-specific specs

Most specs are shared across variants of a product. Some aren't:

- ZVA8 reach varies by variant: `ZVA8-08` → 8 in, `ZVA8-101` → 101 in
- ZVA8 voltage varies: standard → 115 VAC, European → 220 VAC · 50 Hz
- ZVA8 weight varies slightly with reach

Two ways to handle this:

**Option A: Per-variant metafield override.** Attach a shared `product_spec_row` at the product level for the common case (`Reach: 8" - 101"` range). On individual variants, add a row with the same `key` and a specific value — the renderer prefers variant-level rows over product-level rows with matching keys.

**Option B: `applies_to_variants` field on the row itself.** The row carries a list of variant references. The renderer includes the row only when the current variant is in the list. Multiple rows with the same key but different `applies_to_variants` lists let you express variant-specific values without creating per-product duplication.

Option A is more conventional (the variant override pattern matches how Shopify generally handles variant-level metafields). Option B is denser — fewer rows total, but more complex render logic. **Recommendation: ship Option A as the default pattern, support Option B as an optimization for products with many variants where Option A would explode into row count.** Both rely on the same `product_spec_row` schema.

### Product binding

| Property | Value |
|---|---|
| **Namespace** | `theme` |
| **Key** | `specs` |
| **Owner type** | `product` |
| **Type** | `list.metaobject_reference[product_spec_row]` |
| **Access** | Storefront: read |

| Property | Value |
|---|---|
| **Namespace** | `theme` |
| **Key** | `specs` |
| **Owner type** | `variant` |
| **Type** | `list.metaobject_reference[product_spec_row]` |
| **Access** | Storefront: read |

When the PDP renders the spec table:

1. Read product-level `theme.specs` list. Build a map keyed by `product_spec_row.key`.
2. Read variant-level `theme.specs` for the currently-selected variant. Overlay onto the map — variant rows replace product rows with the same key.
3. Group the resulting map by `product_spec_row.group`.
4. Within each group, sort by `display_priority`.
5. Render groups in `product_spec_group.display_order`.

The same logic runs server-side in Liquid for the initial render and client-side in JS when the variant picker changes.

### Example entries (real Zebra ZVA8-08 specs)

From the prose specs in the current product description, structured:

```yaml
# Performance group
- label: Capacity
  key: capacity
  group: performance
  value_number_us: 1
  unit_us: qt/h
  value_number_metric: 0.95
  unit_metric: L/h
  comparable: true
  filterable: true
  display_priority: 10
  notes: "Measured at 70°F (21°C) coolant temperature."

- label: Reach (depth)
  key: reach_depth
  group: performance
  value_range_min: 8
  value_range_max: 101
  unit_us: in
  unit_metric: mm  # min: 203, max: 2565
  comparable: true
  filterable: true
  display_priority: 20

- label: Engine speed
  key: engine_speed
  group: performance
  value_number_us: 28
  unit_us: rpm
  comparable: true
  display_priority: 30
  notes: "Continuous duty."

- label: Motor power
  key: motor_power
  group: performance
  value_number_us: 0.11
  unit_us: HP
  value_number_metric: 0.082
  unit_metric: kW
  comparable: true
  display_priority: 40

# Electrical group  
- label: Voltage
  key: voltage
  group: electrical
  value_number_us: 115
  unit_us: VAC
  comparable: true
  filterable: true
  display_priority: 10

- label: Frequency
  key: frequency
  group: electrical
  value_number_us: 60
  unit_us: Hz
  comparable: true
  display_priority: 20

# Mechanical group
- label: Mount style
  key: mount_style
  group: mechanical
  value_text: "Side / clamp · Magnetic base · BGX2 LockJaw™"
  comparable: true
  filterable: true
  display_priority: 10

- label: Shipping weight
  key: shipping_weight
  group: mechanical
  value_number_us: 8.5
  unit_us: lb
  value_number_metric: 3.85
  unit_metric: kg
  comparable: true
  display_priority: 20

# Materials group
- label: Tube material
  key: tube_material
  group: materials
  value_text: "Polyvinyl industrial grade tubing"
  comparable: true
  filterable: true
  display_priority: 10

- label: Max fluid temperature
  key: max_fluid_temp
  group: materials
  value_number_us: 90
  unit_us: °F
  value_number_metric: 30
  unit_metric: °C
  comparable: true
  display_priority: 20
  notes: "Continuous operating limit. Brief excursions to 100°F acceptable."

# Service group
- label: Warranty
  key: warranty
  group: service
  value_text: "Lifetime on mechanism"
  comparable: true
  display_priority: 10

- label: Country of origin
  key: country_of_origin
  group: service
  value_text: "USA · Solon, Ohio"
  comparable: true
  display_priority: 20
```

Twelve rows in six groups. The same data drives:
- PDP spec section
- Comparison view when ZVA8-08 is in the compare set
- Collection facets for `reach_depth`, `voltage`, `mount_style`, `tube_material` (the four flagged `filterable`)
- Find Your Fit rules ("if user picks 'tramp oil' + 'CNC sump' + 'depth > 30"' → recommend ZVA8 reach variants")
- Printable spec sheet
- JSON-LD `additionalProperty` array for structured data

### Variant-specific row example

For ZVA8-08 specifically (the 8-inch reach standard 115V variant), override the product-level `reach_depth` row with a variant-specific one:

```yaml
# Variant-level row, attached to ZVA8-08 variant
- label: Reach (depth)
  key: reach_depth         # same key as the product-level row
  group: performance
  value_number_us: 8        # specific value, not a range
  unit_us: in
  value_number_metric: 203
  unit_metric: mm
  comparable: true
  filterable: true
  display_priority: 20
  applies_to_variants: [<ZVA8-08 variant id>]
```

When a customer is viewing the ZVA8-08 variant, the spec table shows `Reach (depth): 8 in · 203 mm`. When viewing ZVA8-101, the value swaps to `101 in · 2565 mm`. The product-level range row (`8" - 101"`) is what shows on the collection card and in non-variant-context summaries.

---

## 03 · Comparison view

The comparison view at `/products/compare?ids=A,B,C` (typically 2-4 products) is one of the largest consumers of structured spec data and the most-requested feature for industrial catalogs.

### Data assembly

For each product in the compare set:
1. Resolve to a specific variant (URL parameter, or default variant if unspecified).
2. Pull the merged spec map per the algorithm in §02.
3. Index by `product_spec_row.key`.

For the union of keys across all products in the compare set:
1. Group by `product_spec_group` (using the group of the first row encountered for that key).
2. Within each group, sort by the average `display_priority` of rows across products.
3. Render as a table: rows are spec keys, columns are products.

### Handling missing values

When a key exists on Product A but not Product B, the Product B cell renders as `—` with hover text "Not specified" — not as a missing-data error. Comparison should make absence visible without making it feel like a bug.

### Unit normalization

If Product A has `capacity` in `gph` and Product B has `capacity` in `L/h`, the comparison view normalizes to the user's preferred unit system (cookie or geo) and shows both: `4 gph (15 L/h)`. The merchant doesn't have to enter values in both systems; the renderer converts using the unit table.

### Comparable filter

Rows where `comparable: false` are omitted entirely from the compare view regardless of which product they appear on. This prevents one-off product-specific specs ("Pattern: cross-hatched") from creating sparse junk rows.

### URL pattern

`/products/compare?items=zva8-sidewinder-tube-skimmer:zva8-08,zva-belt-skimmer:bpf1-12,bgx2-lockjaw-mount` — product handle + optional variant SKU, colon-separated. Pure URL state, no client-side cookies, deeplinkable from email/chat/sales decks.

---

## 04 · Collection-page filters

Rows flagged `filterable: true` automatically generate facets on the parent collection page. The Find Your Fit page reuses the same facet data.

### Facet types by spec shape

| Spec shape | Facet UI |
|---|---|
| Numeric with single value (`capacity: 4 gph`) | Range slider with min/max from collection data |
| Numeric range (`reach_depth: 8-101 in`) | Range slider, matches if any value in product range overlaps filter range |
| Text enum (`mount_style: Side / clamp`) | Checkbox list of unique values |
| Boolean (`is_made_in_usa: true`) | Single checkbox |

The collection template introspects the spec rows referenced by products in the collection and builds the facet panel dynamically. No hardcoded facet definitions per collection.

### URL state

`/collections/tramp-oil-skimmers?reach_depth=8..30&voltage=115&mount_style=clamp`

Standard query-string filters. Multiple values for the same key OR'd; different keys AND'd. Compatible with Shopify's native collection filter mechanism so the URL pattern is stable.

---

## 05 · Printable spec sheet

A per-product `/products/{handle}/spec-sheet.pdf` (or `.html` for print) renders a clean engineering-style spec sheet from the same `theme.specs` data. Built as a Liquid template with print-specific CSS.

This solves the document-drift problem in the current catalog where the Zebra Skimmers Product Catalog PDF and the website specs diverge over time. With a single source of truth, the printable view is always current.

### Output structure

```
[Company logo]                              [Product image]
                                            
ZVA8 Sidewinder Tube Skimmer · ZVA8-08      [Barcode/SKU]
Tube oil skimmer · 115 VAC                  

PERFORMANCE
  Capacity              4 gph · 15 L/h
  Reach (depth)         8 in · 203 mm
  Engine speed          28 rpm
  Motor power           0.11 HP · 0.082 kW

ELECTRICAL
  Voltage               115 VAC
  Frequency             60 Hz

[... etc ...]

Zebra Skimmers · zebraskimmers.com · Made in Solon, Ohio · Since 1994
Revised {{ now | date }} from theme.specs metafield. Single source of truth — replaces older static PDF spec sheets.
```

Generated on demand. No PDF storage. No drift.

---

## 06 · JSON-LD emission

The PDP's `Product` schema gains a structured `additionalProperty` array directly from `theme.specs`:

```json
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "ZVA8 Sidewinder Tube Skimmer",
  "sku": "ZVA8-08",
  ...
  "additionalProperty": [
    {
      "@type": "PropertyValue",
      "name": "Capacity",
      "value": "4",
      "unitText": "gph",
      "propertyID": "capacity"
    },
    {
      "@type": "PropertyValue",
      "name": "Reach (depth)",
      "value": "8",
      "unitText": "in",
      "propertyID": "reach_depth"
    },
    ...
  ]
}
```

This is the structured-data shape Google Shopping, AI crawlers, and B2B procurement tools (Octopart, Findchips, GovWin) actually parse. Today's Zebra catalog has zero `additionalProperty` data; the new pipeline emits it automatically for every spec row.

---

## 07 · Find Your Fit integration

The guided product finder uses spec rows as the matching primitive. The finder is a tree of questions; each question maps to one or more spec keys; user answers narrow the product set by filtering on those specs.

```yaml
# Sample finder rule
question: "How deep is your CNC sump?"
spec_key: reach_depth
input_type: number
input_unit: in
match_logic: |
  Show products where reach_depth range covers the entered depth.
  If product has reach_depth as range_min..range_max, match when answer is in that range.
  If product has reach_depth as single value, match when answer ≤ value.
```

This lets the merchant build the finder by composing existing spec data instead of building a parallel rules engine. New products automatically participate in the finder as soon as their spec rows are populated — no separate finder configuration to maintain.

---

## 08 · Migration approach

The current data has spec content in product description prose, not in structured form. Migration is a one-time extraction pass:

1. **Parse prose specs from product descriptions.** The current scraper at `zebra-product-images/scripts/scrapers/scrape_product_info.py` already isolates spec-like sections in product copy (look for patterns like `Skimmer Type:`, `Reach:`, `Skimming Capacity:`). Extend it to write extracted pairs as candidate `product_spec_row` metaobjects.

2. **Human review pass.** Each candidate row gets reviewed: confirm label, assign to group, normalize units, mark `comparable` and `filterable`. This is the irreducible human work — there's no way to fully automate it because the prose specs use inconsistent labels ("Engine Speed" vs "RPM" vs "Motor Speed" for the same thing).

3. **Bulk metaobject creation.** Approved rows POST via `metaobjectCreate` mutations. Idempotent — rerunning with the same `key` updates the existing entry.

4. **Product binding.** A second script links each product to its applicable rows via `metafieldsSet` on `theme.specs`. Variant-level overrides applied separately.

Estimated effort: ~12 hours of human review for 120 products. The actual scripting is straightforward — the human pass is where the value gets added because every catalog has inconsistent prose specs and the review is also a content cleanup pass.

---

## 09 · Open questions

1. **Unit conversion at edit time vs. render time.** When a merchant enters `value_number_us` but no `value_number_metric`, should the metaobject editor auto-populate the metric value (storing both), or should the renderer convert on the fly (storing one)? Recommendation: render-time conversion for v1, with a small "show metric equivalent" toggle in the editor for the merchant's reference. Simpler data model, single source of truth, never drifts.

2. **Spec inheritance across product variants.** A range row at the product level (`Reach: 8" - 101"`) implies values for every variant. Should the system auto-generate per-variant rows from the range plus the variant's option selection? This is appealing but assumes a structured option name (Option 1: "Reach"). For Zebra's current data where variants encode multiple axes in option1, automation isn't possible without the variant cleanup. Recommendation: ship manual per-variant rows for v1, revisit auto-derivation after the variant axis cleanup in Phase 1.

3. **Spec set sharing across product lines.** Three Dazzle products share the same set of base specs (capacity, voltage, communication protocol) plus a few product-specific ones. The current schema requires linking each shared row from each product's metafield. A `product_spec_set` metaobject that bundles a list of rows + a name ("Dazzle base specs") would reduce metafield-list size, but adds complexity. Recommendation: skip for v1. If the metafield list grows past 30 rows on any product, revisit.

4. **Tolerance and uncertainty.** Engineering specs often have tolerances (`±0.5 in`) or qualifications (`@70°F`). The `notes` field handles qualifications. Tolerances aren't currently in the schema — recommend adding `value_tolerance` (number_decimal) as a follow-on field if tolerance data ever gets captured in the catalog. Most B2B catalogs don't expose tolerances publicly, so this might never matter.

5. **Spec versioning.** When a product spec changes (manufacturer revises motor power on a new production run), should the old value be preserved? Recommendation: no versioning in the metaobject — the row is the current value. If history matters, that's a database concern outside the storefront. Shopify metaobject audit logs capture who changed what when, which is enough for accountability.

---

## 10 · Phase 1 deliverable list — spec data

1. ✅ This document reviewed and approved
2. Metaobject definitions created (`product_spec_group`, `product_spec_row`)
3. Default spec_group entries seeded (6 groups)
4. Metafield definitions created (product `theme.specs`, variant `theme.specs`)
5. Migration script: prose spec extraction → candidate spec_row metaobjects
6. Human review pass: 120 products, normalize labels and assign groups
7. Bulk metaobject creation via `metaobjectCreate`
8. Product binding via `metafieldsSet` on `theme.specs`
9. Variant-level override entries for known multi-axis variants (ZVA8 family especially)
10. Validation: every multi-variant product has at least one variant-level spec row where the spec value differs by variant

---

*Companion to the Zebra Redesign Plan and phase-1-schemas.md · v0.1*