Testing

Butler Tax — Backend (pytest)

521 tests, all passing.

bash

cd butler-tax/backend
uv run pytest                       # all tests
uv run pytest tests/test_approvals.py
uv run pytest -k "test_matching"

Coverage by file

File	Focus
`test_approvals.py`	Approval workflow rule resolution, event sequencing
`test_matching_score_service.py`	Reconciliation scoring, due-date calculation
`test_agent_tools.py` / `test_agent_tools_suggest.py` / `test_agent_tools_exec.py`	Butler Chat tools (read / suggest / exec)
`test_data_filtering.py`	`CorporateContext` scoping, role-based query filters
`test_advisor_demo_mode.py`	MOCK_TOKEN demo bypass
`test_alert_batch.py`	Scheduled alert batch processing
`test_users.py`	Registration, employee sync, tax-firm clients
`test_studio_admin_auth.py`	Studio Admin token + cookie authentication
`test_depreciation_service.py`	Depreciation engine (#184 D2) — pure calculation, no DB. Values pinned to NTA examples (Tax Answer No.2106): straight-line, 200%/250% declining balance selected by acquisition date, guaranteed-rate floor → revised-rate switch, used-asset simplified life, SME 3-way classification, monthly-sum = annual invariant
`test_ledger.py`	Ledger engine + reports incl. consumption-tax summary (#188): tax-category aggregation normalized per税込/税抜 snapshot, per-rate base/tax, net payable, draft excluded, accounting-only permission; #189 journal CSV (simple/compound/諸口/drafts); #190 株主資本等変動計算書 ties to BS + 決算報告書 bundle/notes
`test_fixed_assets.py`	Fixed-asset candidates + bulk registration + register ops (#184 D3/D4) — flag permissions/guards, candidate filtering, all-rows-or-nothing validation, schedule v1 generation, double-registration prevention, used-asset life (+ mismatch rejection), SL-only enforcement, disposal/sale with schedule version-up and closed-month guards, depreciation journal preview/generate (idempotent, ratio-applied, missing-account 422), classification override within allowed classes, manual (existing-asset) registration incl. past-date schedules and cancel; #187 sweep skips asset-flagged docs + acquisition entry on registration (tax-aware, bulk→一括償却資産, manual→none, cancel deletes acquisition draft)

Adding backend tests

Place new tests under backend/tests/
The shared conftest.py provides:
- mock_firestore (autouse) — in-memory Firestore fake supporting where().where().limit().stream(), document().set/update/delete, transaction(), batch(), ArrayUnion, @transactional
- mock_firebase_admin_auth — auth.create_user / get_user_by_email fakes
- mock_auth(uid) — switches the authenticated user
For routes that depend on get_corporate_context, override the dependency with app.dependency_overrides[get_corporate_context] = lambda: CorporateContext(...). Patching resolve_corporate_id alone is insufficient.

Butler Tax — Frontend E2E (Playwright)

154 tests, all passing.

bash

cd butler-tax/frontend
npx playwright test                 # all tests
npx playwright test --ui            # interactive UI mode
npx playwright test rbac            # filter by file name

Coverage by file

File	Tests	Focus
`rbac_boundaries.spec.ts`	12	URL guards, dynamic permission flips
`document_matching.spec.ts`	12	Reconciliation page, double-match prevention
`impersonation.spec.ts`	10	Tax firm proxy, banner, `as_corporate` injection
`document_approval.spec.ts`	9	Approval lifecycle, re-submission
`chat_actions.spec.ts`	9	Butler Chat ops, file attach, `download_url`
`e2e_business_flow.spec.ts`	3	Cross-page business scenarios
`customer_review.spec.ts`	12	Tax firm dashboard customer review
`ai_chat_tool.spec.ts`	3	Tool format parsing regression
`alert_settings.spec.ts` / `tax_firm_alerts.spec.ts`	19	Alert configuration
`bank_account.spec.ts` / `customer_list.spec.ts` / `direct_messages.spec.ts` / `tax_firm_dashboard.spec.ts` / `invitations.spec.ts`	65	Other UI flows

Conventions

All API calls are stubbed via page.route(). Tests never require a running backend.
Use the shared helpers:
- helpers/auth.ts — bypassAuth, stubAuthAsEmployee, stubAuthAsTaxFirm, stubAuthAsCorporate, stubPlans, stubCredits
- helpers/noise.ts — installFallback (catch-all /api/v1/* → []), stubCorporateDashboardNoise
Test IDs use a per-file prefix (RBAC-, MATCH-, IMP-, APPR-, BIZ-, CHAT-, D-, I-, A-, etc.). Avoid duplicating IDs across files.
The Butler chat overlay opens on login by default. Tests that need to click underneath the overlay should inject CSS via addInitScript to hide .overlay-root.

Known UI quirks tested via E2E

usePermissions.canAccess() short-circuits when !initialized, which means the initial render doesn't track settings.value reactively. When the test triggers a permission-settings change after mount, an explicit re-render trigger (e.g. clicking the chat close button) may be required to flush the Vue v-if.
bulkActions: true is configured only for issued invoices in DocumentListPage. Receipt and received-invoice list pages cannot bulk-submit; their draft → pending transition happens at create time.

bash

# requires the live dev stack (start-dev.sh): platform:8003 / tax:8001 / law:8002
cd butler-tax/frontend
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.ts

Opt-in (E2E_LIVE=1); skipped in normal CI so the mocked suite stays fast and backend-free. This layer covers what the mocked suite cannot catch by construction — page.route() stubs return 200 regardless of headers/cookies, so auth/cookie/cross-service wiring is never exercised (the Bearer null download bug, #191, slipped through both pytest and the mocked e2e). The live smoke performs a real POST /auth/login (helpers/live.ts → butler_session cookie in the context) and then:

Test	Catches
`LS-TAX-01`	#191 regression — drives the real UI (clicks the journal-CSV download button) so the actual `fetch` wiring runs against the real backend. A `Bearer null`/missing-`credentials` regression 401s → no download event → fail (verified to fail when reverted).
`LS-TAX-02`	authed exports (CSV / 全銀 / 決算報告書一式) return 200 with the real cookie
`LS-TAX-03`	the tax customer's cookie crosses to law authed endpoints (`butler_tax` premium → law entitlement)

Design rule: a live smoke that only injects a cookie and calls the API directly would not have caught #191 — it bypasses the frontend getToken()/fetch path. The regression guard must drive the real button (this is also why butler-law's LS-07 drives the real chat UI, not just /api/chat). Account: the dev customer butlerseriestest@ (override via LIVE_EMAIL/LIVE_PASSWORD).

bash

# requires the live dev stack (start-dev.sh): platform:8003 + real MongoDB + SendGrid
cd butler-platform/auth/frontend
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.ts

Opt-in (E2E_LIVE=1); skipped in normal CI. Covers the platform-crossing seams the mocked suite can't: real auth, SendGrid send wiring, and the sales-agent invitation → real registration flow.

Test	Catches
`LS-PLAT-01`	Sales agent (`sales-agent-token` mock) issues an invitation → 201 + token + `sales_agent_id`, and SendGrid sends in sandbox mode (`X-Email-Sandbox: 1` → `X-Email-Sandbox-Sent: 1`): verifies the API key / verified sender / payload / connectivity without delivering a real email.
`LS-PLAT-02`	The invited email completes a real `POST /auth/register` (account persisted; a duplicate re-register returns 409).

The describe block runs serial; beforeAll/afterAll shell out to two dev-only backend scripts (idempotent, ENVIRONMENT != production):

scripts/seed_live_smoke_sales_agent.py — ensures the seed_sales_agent_uid account + agent_lsmoke01 master that sales-agent-token resolves to.
scripts/cleanup_live_smoke.py — deletes the real registered corporate (Firebase user + platform_accounts + its invitation) so re-runs are clean.

Layer boundary: the sales-agent commission linkage (contract.sales_agent_id) is propagated via Stripe, so it is not in the live smoke — it's covered by tests/test_sales_agent_invitations.py::test_accept_sales_agent_invitation_propagates_sales_agent_id. The sandbox header wiring is unit-tested by test_post_sandbox_header_surfaces_send_result (no real SendGrid).

Butler Law — Tests

Backend (pytest)

bash

cd butler-law/backend
uv run pytest tests/        # 122 tests; uses the isolated butler_law_test DB

Covers role-based access (require_law_read), the four knowledge sources (scraper parsers via HTML fixtures, retrieve lanes, viewer HTML structure) and revision tracking (#160).

Frontend E2E (Playwright)

bash

cd butler-law/frontend
npx playwright test                              # mocked suite (CI-safe)
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.ts   # live smoke (see below)

File	Coverage
`chat.spec.ts`	Chat (SSE streaming, error surfacing), Knowledge Base tabs, citation cards (条文/通達/質疑応答事例/タックスアンサー) and the enforcement-version footer rendering
`auth.spec.ts`	Auth redirects
`live-smoke.spec.ts`	Opt-in (`E2E_LIVE=1`): runs against the live dev backend + real MongoDB data. `LS-01..05` (API): all four sources populated, viewers return the two-pane HTML, law viewer shows the enforcement version, revisions feed responds. `LS-06`: KB tabs show real counts. `LS-07` (#192): drives the real chat UI — types a question, submits, asserts the SSE-streamed answer renders (the markdown/streaming wiring, not answer quality → golden-set). `LS-08` (#192): the tax→law escalation target (`POST :8002/api/chat` with the forwarded `butler_session`, `agent_id=tax_01`) returns a structured `{response, session_id, source_refs}` — the exact call `chat_service.query_law_agent` makes. Skipped without the env var, so CI stays mock-only

LLM answer quality (e.g. whether the model actually emits the enforcement-version footer) is out of scope for E2E — that belongs to the golden-set evaluation in butler-law/backend (Issue #40).

Studio Admin E2E (Playwright)

bash

cd butler-platform/studio_admin/frontend
npx playwright test         # 67 tests incl. law-admin.spec.ts (Law operations pages)

Butler Tax — Ledger verification harness (#175 S3)

bash

cd butler-tax/backend
uv run python -m scripts.verify_ledger_testdata          # requires local TestData/

Options: --keep leaves the generated entries in the DB after the run (default wipes them), --corporate-id <id> targets a specific corporate (e.g. to demo real figures in the ledger UI on a dev login).

Local-only (real-company data lives outside the repo): parses a Yayoi 残高試算表(年間推移) PDF, reconstructs the year as opening balances + one compound journal entry per month + a closing entry, then reconciles Butler's trial balance / PL / BS against Yayoi's own figures for every leaf account and every month. Exit code 0 = exact match.

Verified 2026-06-12 against ㈱My2cents 令和7年12月期: all figures match (20 BS accounts × 11 months, full PL waterfall, BS totals and equation).

Butler Tax — Depreciation verification harness (#184 D5)

bash

cd butler-tax/backend
uv run python -m scripts.verify_depreciation_testdata   # requires local TestData/ + platform on :8003

Local-only. Parses 別表十六(二)/(七) and the 適用額明細書 from the filed corporate tax return of Eagle Asset Management (令和7年10月期) and reconciles them against the depreciation engine and the platform rate/threshold masters: cross-foots of the declining-balance schedule, the single-asset 車両運搬具 row (調整前償却額 = floor(基礎 × 0.333) and 償却保証額 = floor(取得価額 × 0.09911) — a real-world confirmation of the 6-year 200% DB master values), the no-switch branch, the engine's total-depreciation invariant, and the SME immediate-expensing classification of both 措法67の5 assets incl. the adoption statement total.

Verified 2026-06-12: 18 checks, all matching the filed return. Per-asset multi-year schedule verification needs the Yayoi 固定資産台帳 (per-asset acquisition dates), which is not in TestData yet.

Butler Tax — Payroll verification harness (#178 P4)

bash

cd butler-tax/backend
uv run python -m scripts.verify_payroll_testdata    # requires local TestData/

Local-only. Reads the monthly amounts of the five payroll accounts (役員報酬 / 給料手当 / 雑給 / 賞与 / 法定福利費) from the same Yayoi annual trend PDF, pushes each month through the payroll import path (import → journal generation → confirm), and reconciles the trial balance against Yayoi for every account and month. Negative months in the real data (red-slip corrections, the December 雑給→ reclassification) are booked as manual correction entries — they are corrections, not payroll.

Verified 2026-06-12: all figures match (5 accounts × 12 months + annual totals; 給料手当 ¥54,915,638 reproduced through the payroll path). Parsing notes that generalize: Yayoi BS columns are month-end balances in section-direction sign convention (contra accounts negative); PL columns are monthly amounts; the 13th column is closing entries; breakdown rows (e.g. 外注費 as a sub-row of 当期商品仕入高) must not be double-counted.

Testing ​

Butler Tax — Backend (pytest) ​

Coverage by file ​

Adding backend tests ​

Butler Tax — Frontend E2E (Playwright) ​

Coverage by file ​

Conventions ​

Known UI quirks tested via E2E ​

Live smoke — real login + real backend (#192) ​

Butler Platform — Live smoke (real login + SendGrid + register, #192) ​

Butler Law — Tests ​

Backend (pytest) ​

Frontend E2E (Playwright) ​

Studio Admin E2E (Playwright) ​

Butler Tax — Ledger verification harness (#175 S3) ​

Butler Tax — Depreciation verification harness (#184 D5) ​

Butler Tax — Payroll verification harness (#178 P4) ​

Testing

Butler Tax — Backend (pytest)

Coverage by file

Adding backend tests

Butler Tax — Frontend E2E (Playwright)

Coverage by file

Conventions

Known UI quirks tested via E2E

Live smoke — real login + real backend (#192)

Butler Platform — Live smoke (real login + SendGrid + register, #192)

Butler Law — Tests

Backend (pytest)

Frontend E2E (Playwright)

Studio Admin E2E (Playwright)

Butler Tax — Ledger verification harness (#175 S3)

Butler Tax — Depreciation verification harness (#184 D5)

Butler Tax — Payroll verification harness (#178 P4)