Skip to content

Testing

Butler Tax — Backend (pytest)

521 tests, all passing.

bash
cd butler-tax/backend
uv run pytest                       # all tests
uv run pytest tests/test_approvals.py
uv run pytest -k "test_matching"

Coverage by file

FileFocus
test_approvals.pyApproval workflow rule resolution, event sequencing
test_matching_score_service.pyReconciliation scoring, due-date calculation
test_agent_tools.py / test_agent_tools_suggest.py / test_agent_tools_exec.pyButler Chat tools (read / suggest / exec)
test_data_filtering.pyCorporateContext scoping, role-based query filters
test_advisor_demo_mode.pyMOCK_TOKEN demo bypass
test_alert_batch.pyScheduled alert batch processing
test_users.pyRegistration, employee sync, tax-firm clients
test_studio_admin_auth.pyStudio Admin token + cookie authentication
test_depreciation_service.pyDepreciation engine (#184 D2) — pure calculation, no DB. Values pinned to NTA examples (Tax Answer No.2106): straight-line, 200%/250% declining balance selected by acquisition date, guaranteed-rate floor → revised-rate switch, used-asset simplified life, SME 3-way classification, monthly-sum = annual invariant
test_ledger.pyLedger engine + reports incl. consumption-tax summary (#188): tax-category aggregation normalized per税込/税抜 snapshot, per-rate base/tax, net payable, draft excluded, accounting-only permission; #189 journal CSV (simple/compound/諸口/drafts); #190 株主資本等変動計算書 ties to BS + 決算報告書 bundle/notes
test_fixed_assets.pyFixed-asset candidates + bulk registration + register ops (#184 D3/D4) — flag permissions/guards, candidate filtering, all-rows-or-nothing validation, schedule v1 generation, double-registration prevention, used-asset life (+ mismatch rejection), SL-only enforcement, disposal/sale with schedule version-up and closed-month guards, depreciation journal preview/generate (idempotent, ratio-applied, missing-account 422), classification override within allowed classes, manual (existing-asset) registration incl. past-date schedules and cancel; #187 sweep skips asset-flagged docs + acquisition entry on registration (tax-aware, bulk→一括償却資産, manual→none, cancel deletes acquisition draft)

Adding backend tests

  • Place new tests under backend/tests/
  • The shared conftest.py provides:
    • mock_firestore (autouse) — in-memory Firestore fake supporting where().where().limit().stream(), document().set/update/delete, transaction(), batch(), ArrayUnion, @transactional
    • mock_firebase_admin_authauth.create_user / get_user_by_email fakes
    • mock_auth(uid) — switches the authenticated user
  • For routes that depend on get_corporate_context, override the dependency with app.dependency_overrides[get_corporate_context] = lambda: CorporateContext(...). Patching resolve_corporate_id alone is insufficient.

Butler Tax — Frontend E2E (Playwright)

154 tests, all passing.

bash
cd butler-tax/frontend
npx playwright test                 # all tests
npx playwright test --ui            # interactive UI mode
npx playwright test rbac            # filter by file name

Coverage by file

FileTestsFocus
rbac_boundaries.spec.ts12URL guards, dynamic permission flips
document_matching.spec.ts12Reconciliation page, double-match prevention
impersonation.spec.ts10Tax firm proxy, banner, as_corporate injection
document_approval.spec.ts9Approval lifecycle, re-submission
chat_actions.spec.ts9Butler Chat ops, file attach, download_url
e2e_business_flow.spec.ts3Cross-page business scenarios
customer_review.spec.ts12Tax firm dashboard customer review
ai_chat_tool.spec.ts3Tool format parsing regression
alert_settings.spec.ts / tax_firm_alerts.spec.ts19Alert configuration
bank_account.spec.ts / customer_list.spec.ts / direct_messages.spec.ts / tax_firm_dashboard.spec.ts / invitations.spec.ts65Other UI flows

Conventions

  • All API calls are stubbed via page.route(). Tests never require a running backend.
  • Use the shared helpers:
    • helpers/auth.tsbypassAuth, stubAuthAsEmployee, stubAuthAsTaxFirm, stubAuthAsCorporate, stubPlans, stubCredits
    • helpers/noise.tsinstallFallback (catch-all /api/v1/*[]), stubCorporateDashboardNoise
  • Test IDs use a per-file prefix (RBAC-, MATCH-, IMP-, APPR-, BIZ-, CHAT-, D-, I-, A-, etc.). Avoid duplicating IDs across files.
  • The Butler chat overlay opens on login by default. Tests that need to click underneath the overlay should inject CSS via addInitScript to hide .overlay-root.

Known UI quirks tested via E2E

  • usePermissions.canAccess() short-circuits when !initialized, which means the initial render doesn't track settings.value reactively. When the test triggers a permission-settings change after mount, an explicit re-render trigger (e.g. clicking the chat close button) may be required to flush the Vue v-if.
  • bulkActions: true is configured only for issued invoices in DocumentListPage. Receipt and received-invoice list pages cannot bulk-submit; their draft → pending transition happens at create time.

Live smoke — real login + real backend (#192)

bash
# requires the live dev stack (start-dev.sh): platform:8003 / tax:8001 / law:8002
cd butler-tax/frontend
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.ts

Opt-in (E2E_LIVE=1); skipped in normal CI so the mocked suite stays fast and backend-free. This layer covers what the mocked suite cannot catch by construction — page.route() stubs return 200 regardless of headers/cookies, so auth/cookie/cross-service wiring is never exercised (the Bearer null download bug, #191, slipped through both pytest and the mocked e2e). The live smoke performs a real POST /auth/login (helpers/live.tsbutler_session cookie in the context) and then:

TestCatches
LS-TAX-01#191 regression — drives the real UI (clicks the journal-CSV download button) so the actual fetch wiring runs against the real backend. A Bearer null/missing-credentials regression 401s → no download event → fail (verified to fail when reverted).
LS-TAX-02authed exports (CSV / 全銀 / 決算報告書一式) return 200 with the real cookie
LS-TAX-03the tax customer's cookie crosses to law authed endpoints (butler_tax premium → law entitlement)

Design rule: a live smoke that only injects a cookie and calls the API directly would not have caught #191 — it bypasses the frontend getToken()/fetch path. The regression guard must drive the real button (this is also why butler-law's LS-07 drives the real chat UI, not just /api/chat). Account: the dev customer butlerseriestest@ (override via LIVE_EMAIL/LIVE_PASSWORD).

Butler Platform — Live smoke (real login + SendGrid + register, #192)

bash
# requires the live dev stack (start-dev.sh): platform:8003 + real MongoDB + SendGrid
cd butler-platform/auth/frontend
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.ts

Opt-in (E2E_LIVE=1); skipped in normal CI. Covers the platform-crossing seams the mocked suite can't: real auth, SendGrid send wiring, and the sales-agent invitation → real registration flow.

TestCatches
LS-PLAT-01Sales agent (sales-agent-token mock) issues an invitation → 201 + token + sales_agent_id, and SendGrid sends in sandbox mode (X-Email-Sandbox: 1X-Email-Sandbox-Sent: 1): verifies the API key / verified sender / payload / connectivity without delivering a real email.
LS-PLAT-02The invited email completes a real POST /auth/register (account persisted; a duplicate re-register returns 409).

The describe block runs serial; beforeAll/afterAll shell out to two dev-only backend scripts (idempotent, ENVIRONMENT != production):

  • scripts/seed_live_smoke_sales_agent.py — ensures the seed_sales_agent_uid account + agent_lsmoke01 master that sales-agent-token resolves to.
  • scripts/cleanup_live_smoke.py — deletes the real registered corporate (Firebase user + platform_accounts + its invitation) so re-runs are clean.

Layer boundary: the sales-agent commission linkage (contract.sales_agent_id) is propagated via Stripe, so it is not in the live smoke — it's covered by tests/test_sales_agent_invitations.py::test_accept_sales_agent_invitation_propagates_sales_agent_id. The sandbox header wiring is unit-tested by test_post_sandbox_header_surfaces_send_result (no real SendGrid).

Butler Law — Tests

Backend (pytest)

bash
cd butler-law/backend
uv run pytest tests/        # 122 tests; uses the isolated butler_law_test DB

Covers role-based access (require_law_read), the four knowledge sources (scraper parsers via HTML fixtures, retrieve lanes, viewer HTML structure) and revision tracking (#160).

Frontend E2E (Playwright)

bash
cd butler-law/frontend
npx playwright test                              # mocked suite (CI-safe)
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.ts   # live smoke (see below)
FileCoverage
chat.spec.tsChat (SSE streaming, error surfacing), Knowledge Base tabs, citation cards (条文/通達/質疑応答事例/タックスアンサー) and the enforcement-version footer rendering
auth.spec.tsAuth redirects
live-smoke.spec.tsOpt-in (E2E_LIVE=1): runs against the live dev backend + real MongoDB data. LS-01..05 (API): all four sources populated, viewers return the two-pane HTML, law viewer shows the enforcement version, revisions feed responds. LS-06: KB tabs show real counts. LS-07 (#192): drives the real chat UI — types a question, submits, asserts the SSE-streamed answer renders (the markdown/streaming wiring, not answer quality → golden-set). LS-08 (#192): the tax→law escalation target (POST :8002/api/chat with the forwarded butler_session, agent_id=tax_01) returns a structured {response, session_id, source_refs} — the exact call chat_service.query_law_agent makes. Skipped without the env var, so CI stays mock-only

LLM answer quality (e.g. whether the model actually emits the enforcement-version footer) is out of scope for E2E — that belongs to the golden-set evaluation in butler-law/backend (Issue #40).

Studio Admin E2E (Playwright)

bash
cd butler-platform/studio_admin/frontend
npx playwright test         # 67 tests incl. law-admin.spec.ts (Law operations pages)

Butler Tax — Ledger verification harness (#175 S3)

bash
cd butler-tax/backend
uv run python -m scripts.verify_ledger_testdata          # requires local TestData/

Options: --keep leaves the generated entries in the DB after the run (default wipes them), --corporate-id <id> targets a specific corporate (e.g. to demo real figures in the ledger UI on a dev login).

Local-only (real-company data lives outside the repo): parses a Yayoi 残高試算表(年間推移) PDF, reconstructs the year as opening balances + one compound journal entry per month + a closing entry, then reconciles Butler's trial balance / PL / BS against Yayoi's own figures for every leaf account and every month. Exit code 0 = exact match.

Verified 2026-06-12 against ㈱My2cents 令和7年12月期: all figures match (20 BS accounts × 11 months, full PL waterfall, BS totals and equation).

Butler Tax — Depreciation verification harness (#184 D5)

bash
cd butler-tax/backend
uv run python -m scripts.verify_depreciation_testdata   # requires local TestData/ + platform on :8003

Local-only. Parses 別表十六(二)/(七) and the 適用額明細書 from the filed corporate tax return of Eagle Asset Management (令和7年10月期) and reconciles them against the depreciation engine and the platform rate/threshold masters: cross-foots of the declining-balance schedule, the single-asset 車両運搬具 row (調整前償却額 = floor(基礎 × 0.333) and 償却保証額 = floor(取得価額 × 0.09911) — a real-world confirmation of the 6-year 200% DB master values), the no-switch branch, the engine's total-depreciation invariant, and the SME immediate-expensing classification of both 措法67の5 assets incl. the adoption statement total.

Verified 2026-06-12: 18 checks, all matching the filed return. Per-asset multi-year schedule verification needs the Yayoi 固定資産台帳 (per-asset acquisition dates), which is not in TestData yet.

Butler Tax — Payroll verification harness (#178 P4)

bash
cd butler-tax/backend
uv run python -m scripts.verify_payroll_testdata    # requires local TestData/

Local-only. Reads the monthly amounts of the five payroll accounts (役員報酬 / 給料手当 / 雑給 / 賞与 / 法定福利費) from the same Yayoi annual trend PDF, pushes each month through the payroll import path (import → journal generation → confirm), and reconciles the trial balance against Yayoi for every account and month. Negative months in the real data (red-slip corrections, the December 雑給→ reclassification) are booked as manual correction entries — they are corrections, not payroll.

Verified 2026-06-12: all figures match (5 accounts × 12 months + annual totals; 給料手当 ¥54,915,638 reproduced through the payroll path). Parsing notes that generalize: Yayoi BS columns are month-end balances in section-direction sign convention (contra accounts negative); PL columns are monthly amounts; the 13th column is closing entries; breakdown rows (e.g. 外注費 as a sub-row of 当期商品仕入高) must not be double-counted.

Butler Series — Saikoku Studio