PIDGraph: Redesigning AI-Powered Industrial Tool

A research-led UX improvement that took a platform engineers struggled to use (SUS 30.4 — "unacceptable") and transformed it into one they recommend (SUS 80.3 — "excellent") through rigorous audit, remote contextual inquiry, and 40+ targeted design changes.

Key outcomes

80.3

Final SUS score — up from 30.4

0.2%

Error rate — down from 13%

40+

Design recommendations shipped

10+

New features introduced

Role

UX Designer & Researcher

Timeline

Dec 2020 – Apr 2021

Method

UX Audit & Contextual Inquiry

Participants

5 Engineers (Germany & Ukraine)

01 — Context

The problem: engineers were fighting the tool instead of using it

PIDGraph is an AI-powered platform used by industrial engineers to digitize paper-based Piping and Instrumentation Diagrams (P&IDs). The AI converts hand-drawn technical drawings into accurate digital versions — but when the AI output was incomplete or incorrect, engineers had no efficient way to fix it manually.

The result: a platform with enormous potential that was actively frustrating its core users. My brief was to understand exactly where it was failing and design a solution that gave engineers back control.

What PIDGraph does

Engineers upload scanned P&ID drawings. The AI automatically identifies and classifies piping symbols, instruments, and connections — converting them into a structured digital format.

Where it was failing

When the AI made errors — which it did — the manual correction workflow was slow, confusing, and error-prone. Engineers were losing trust in the output and spending excessive time on fixes.

02 — UX Audit

Starting with evidence: a structured heuristic evaluation

Before speaking to any users, I conducted a thorough heuristic evaluation against Nielsen's 10 usability principles, benchmarked against three competitor tools that solve similar problems. This served three purposes: build my own platform fluency, create stakeholder-ready evidence for why user research was necessary, and design a focused testing protocol around the real pain points — not assumptions.

Heuristic #1 — Visibility of status

No clear feedback when AI processing was running, complete, or had failed. Engineers couldn't tell if the system was working.

Heuristic #3 — User control

No undo/redo for manual edits. A single mistake required restarting the correction workflow from scratch.

Heuristic #6 — Recognition not recall

P&ID symbol library required engineers to remember classification codes rather than recognizing visual symbols.

Heuristic #4 — Consistency

Editing tools used different interaction models in different parts of the platform with no consistent pattern.

Heuristic #8 — Minimalist design

The interface surfaced all configuration options simultaneously, overwhelming engineers during routine tasks.

Heuristic #5 — Error prevention

No confirmation dialogs on destructive actions. Engineers could accidentally overwrite AI output with no warning.

High Severity

Medium Severity

Low Severity

Why the audit came before user testing

Presenting heuristic findings to stakeholders gave me a concrete, evidence-based case for investing in user research — before spending any budget on participant recruitment. It's a sequencing decision that builds organizational trust in the research process.

03 — User Test

Remote contextual inquiry: watching real engineers in their real environments

I chose remote contextual inquiry deliberately over lab-based usability testing. P&ID work is deeply context-dependent — engineers work with large second monitors, reference physical drawings alongside the digital tool, and are interrupted by colleagues. A decontextualized lab session would have missed half the story.

Method

Remote contextual inquiry

Participants

5 frequent users

Session duration

1–2 hours per participant

Locations

Germany & Ukraine

Quantitative findings from the baseline session confirmed what the heuristic audit suggested — and then some:

Before — Baseline

SUS score

30.4

Error rate

13%

Total errors observed

18

Severity

Medium

After — Post Redesign

SUS score

80.3

Error rate

0.2%

Total errors observed

1

Severity

Low

What do these SUS scores mean?

Baseline

Post-redesign

0

25

51

71

100

Good (71)

Marginal (51)

30.4 — Baseline (unacceptable range)

80.3 — Post-redesign (excellent range)

Qualitative analysis via affinity mapping surfaced three dominant themes across all five participants:

Loss of trust in AI output

Engineers had learned to distrust the AI's classifications and were manually reviewing every element — even correct ones. The tool had no way to communicate its own confidence level.

Correction friction

Finding and fixing a single AI error required navigating 4–6 steps across disconnected parts of the UI. Engineers described the correction workflow as "hunting."

No recovery path

The absence of undo meant engineers were afraid to make edits. Several described working in a "read-only mindset" to avoid making irreversible mistakes.

04 — Recommendations

40+ recommendations prioritized by impact and feasibility

Not every finding becomes a recommendation, and not every recommendation ships. After the research synthesis, I developed 40+ specific changes organized into three categories — then worked with product and engineering to assess feasibility and prioritize by user impact vs. implementation cost.

18

Interaction & workflow

Undo/redo system, streamlined correction flow, confidence indicators on AI output.

14

Information Architecture

Tool panel reorganization, progressive disclosure of advanced options, consistent navigation model.

10+

New Features

Visual symbol library, bulk edit mode, error summary dashboard, session history.

How prioritization worked

I ran a prioritization workshop with the product and engineering leads using an impact/effort matrix. Undo/redo and the AI confidence indicator were the two highest-impact, lowest-effort items — they went into the first sprint. Feature additions were phased across subsequent releases based on engineering capacity.

05 — New Design

Redesigned within the existing design system

The redesign maintained alignment with Bilfinger's brand identity and existing component library — the goal was not to rebuild the visual design, but to fix the interaction architecture and add the missing control mechanisms engineers needed. Consistency with what already existed was intentional: engineers shouldn't need to relearn a familiar product.

Redesigned interface — before and after

06 — Validation

A second round of testing with the same participants and same tasks

The validation study used identical tasks, identical metrics, and wherever possible, the same five participants from the baseline round. Methodological consistency was deliberate: it's the only way to make before/after comparisons credible.

Method

Usability Testing

Participants

5 (same cohort as baseline)

Session duration

1 hour per participant

Task

Identical to baseline session

The results confirmed the redesign had worked — not just marginally, but categorically. Moving from a SUS of 30 to 80 means crossing from the "unacceptable" band into the "excellent" band. The error rate drop from 13% to 0.2% (18 errors to 1) represents near-elimination of the failure modes identified in the audit.

SUS improvement

+163%

30.4 → 80.3

Error rate reduction

−98%

13% → 0.2%

07 — Takeaways

What this project taught me

01

Sequence research to build organizational trust, not just to gather data

Starting with the audit — before asking for user research budget — gave me credible evidence to justify the investment. This sequencing is a stakeholder management decision as much as a research one.

02

Context matters more than quantity in expert user research

Five engineers in their real working environments told a richer story than a larger lab sample would have. Remote contextual inquiry was the right method for this user group, not the convenient one.

03

The biggest UX fix is sometimes the most obvious missing feature

Undo/redo is table stakes in any editing tool. Its absence had created a cascading anxiety that affected every other interaction. Sometimes the right insight from research is embarrassingly simple — and that's exactly why you do the research.

© 2025 Michael Gibran