Testing
Butler Tax — Backend (pytest)
521 tests, all passing.
cd butler-tax/backend
uv run pytest # all tests
uv run pytest tests/test_approvals.py
uv run pytest -k "test_matching"Coverage by file
| File | Focus |
|---|---|
test_approvals.py | Approval workflow rule resolution, event sequencing |
test_matching_score_service.py | Reconciliation scoring, due-date calculation |
test_agent_tools.py / test_agent_tools_suggest.py / test_agent_tools_exec.py | Butler Chat tools (read / suggest / exec) |
test_data_filtering.py | CorporateContext scoping, role-based query filters |
test_advisor_demo_mode.py | MOCK_TOKEN demo bypass |
test_alert_batch.py | Scheduled alert batch processing |
test_users.py | Registration, employee sync, tax-firm clients |
test_studio_admin_auth.py | Studio Admin token + cookie authentication |
test_depreciation_service.py | Depreciation engine (#184 D2) — pure calculation, no DB. Values pinned to NTA examples (Tax Answer No.2106): straight-line, 200%/250% declining balance selected by acquisition date, guaranteed-rate floor → revised-rate switch, used-asset simplified life, SME 3-way classification, monthly-sum = annual invariant |
test_ledger.py | Ledger engine + reports incl. consumption-tax summary (#188): tax-category aggregation normalized per税込/税抜 snapshot, per-rate base/tax, net payable, draft excluded, accounting-only permission; #189 journal CSV (simple/compound/諸口/drafts); #190 株主資本等変動計算書 ties to BS + 決算報告書 bundle/notes |
test_fixed_assets.py | Fixed-asset candidates + bulk registration + register ops (#184 D3/D4) — flag permissions/guards, candidate filtering, all-rows-or-nothing validation, schedule v1 generation, double-registration prevention, used-asset life (+ mismatch rejection), SL-only enforcement, disposal/sale with schedule version-up and closed-month guards, depreciation journal preview/generate (idempotent, ratio-applied, missing-account 422), classification override within allowed classes, manual (existing-asset) registration incl. past-date schedules and cancel; #187 sweep skips asset-flagged docs + acquisition entry on registration (tax-aware, bulk→一括償却資産, manual→none, cancel deletes acquisition draft) |
Adding backend tests
- Place new tests under
backend/tests/ - The shared
conftest.pyprovides:mock_firestore(autouse) — in-memory Firestore fake supportingwhere().where().limit().stream(),document().set/update/delete,transaction(),batch(),ArrayUnion,@transactionalmock_firebase_admin_auth—auth.create_user/get_user_by_emailfakesmock_auth(uid)— switches the authenticated user
- For routes that depend on
get_corporate_context, override the dependency withapp.dependency_overrides[get_corporate_context] = lambda: CorporateContext(...). Patchingresolve_corporate_idalone is insufficient.
Butler Tax — Frontend E2E (Playwright)
154 tests, all passing.
cd butler-tax/frontend
npx playwright test # all tests
npx playwright test --ui # interactive UI mode
npx playwright test rbac # filter by file nameCoverage by file
| File | Tests | Focus |
|---|---|---|
rbac_boundaries.spec.ts | 12 | URL guards, dynamic permission flips |
document_matching.spec.ts | 12 | Reconciliation page, double-match prevention |
impersonation.spec.ts | 10 | Tax firm proxy, banner, as_corporate injection |
document_approval.spec.ts | 9 | Approval lifecycle, re-submission |
chat_actions.spec.ts | 9 | Butler Chat ops, file attach, download_url |
e2e_business_flow.spec.ts | 3 | Cross-page business scenarios |
customer_review.spec.ts | 12 | Tax firm dashboard customer review |
ai_chat_tool.spec.ts | 3 | Tool format parsing regression |
alert_settings.spec.ts / tax_firm_alerts.spec.ts | 19 | Alert configuration |
bank_account.spec.ts / customer_list.spec.ts / direct_messages.spec.ts / tax_firm_dashboard.spec.ts / invitations.spec.ts | 65 | Other UI flows |
Conventions
- All API calls are stubbed via
page.route(). Tests never require a running backend. - Use the shared helpers:
helpers/auth.ts—bypassAuth,stubAuthAsEmployee,stubAuthAsTaxFirm,stubAuthAsCorporate,stubPlans,stubCreditshelpers/noise.ts—installFallback(catch-all/api/v1/*→[]),stubCorporateDashboardNoise
- Test IDs use a per-file prefix (
RBAC-,MATCH-,IMP-,APPR-,BIZ-,CHAT-,D-,I-,A-, etc.). Avoid duplicating IDs across files. - The Butler chat overlay opens on login by default. Tests that need to click underneath the overlay should inject CSS via
addInitScriptto hide.overlay-root.
Known UI quirks tested via E2E
usePermissions.canAccess()short-circuits when!initialized, which means the initial render doesn't tracksettings.valuereactively. When the test triggers a permission-settings change after mount, an explicit re-render trigger (e.g. clicking the chat close button) may be required to flush the Vue v-if.bulkActions: trueis configured only for issued invoices inDocumentListPage. Receipt and received-invoice list pages cannot bulk-submit; theirdraft → pendingtransition happens at create time.
Live smoke — real login + real backend (#192)
# requires the live dev stack (start-dev.sh): platform:8003 / tax:8001 / law:8002
cd butler-tax/frontend
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.tsOpt-in (E2E_LIVE=1); skipped in normal CI so the mocked suite stays fast and backend-free. This layer covers what the mocked suite cannot catch by construction — page.route() stubs return 200 regardless of headers/cookies, so auth/cookie/cross-service wiring is never exercised (the Bearer null download bug, #191, slipped through both pytest and the mocked e2e). The live smoke performs a real POST /auth/login (helpers/live.ts → butler_session cookie in the context) and then:
| Test | Catches |
|---|---|
LS-TAX-01 | #191 regression — drives the real UI (clicks the journal-CSV download button) so the actual fetch wiring runs against the real backend. A Bearer null/missing-credentials regression 401s → no download event → fail (verified to fail when reverted). |
LS-TAX-02 | authed exports (CSV / 全銀 / 決算報告書一式) return 200 with the real cookie |
LS-TAX-03 | the tax customer's cookie crosses to law authed endpoints (butler_tax premium → law entitlement) |
Design rule: a live smoke that only injects a cookie and calls the API directly would not have caught #191 — it bypasses the frontend
getToken()/fetchpath. The regression guard must drive the real button (this is also why butler-law'sLS-07drives the real chat UI, not just/api/chat). Account: the dev customerbutlerseriestest@(override viaLIVE_EMAIL/LIVE_PASSWORD).
Butler Platform — Live smoke (real login + SendGrid + register, #192)
# requires the live dev stack (start-dev.sh): platform:8003 + real MongoDB + SendGrid
cd butler-platform/auth/frontend
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.tsOpt-in (E2E_LIVE=1); skipped in normal CI. Covers the platform-crossing seams the mocked suite can't: real auth, SendGrid send wiring, and the sales-agent invitation → real registration flow.
| Test | Catches |
|---|---|
LS-PLAT-01 | Sales agent (sales-agent-token mock) issues an invitation → 201 + token + sales_agent_id, and SendGrid sends in sandbox mode (X-Email-Sandbox: 1 → X-Email-Sandbox-Sent: 1): verifies the API key / verified sender / payload / connectivity without delivering a real email. |
LS-PLAT-02 | The invited email completes a real POST /auth/register (account persisted; a duplicate re-register returns 409). |
The describe block runs serial; beforeAll/afterAll shell out to two dev-only backend scripts (idempotent, ENVIRONMENT != production):
scripts/seed_live_smoke_sales_agent.py— ensures theseed_sales_agent_uidaccount +agent_lsmoke01master thatsales-agent-tokenresolves to.scripts/cleanup_live_smoke.py— deletes the real registered corporate (Firebase user +platform_accounts+ its invitation) so re-runs are clean.
Layer boundary: the sales-agent commission linkage (
contract.sales_agent_id) is propagated via Stripe, so it is not in the live smoke — it's covered bytests/test_sales_agent_invitations.py::test_accept_sales_agent_invitation_propagates_sales_agent_id. The sandbox header wiring is unit-tested bytest_post_sandbox_header_surfaces_send_result(no real SendGrid).
Butler Law — Tests
Backend (pytest)
cd butler-law/backend
uv run pytest tests/ # 122 tests; uses the isolated butler_law_test DBCovers role-based access (require_law_read), the four knowledge sources (scraper parsers via HTML fixtures, retrieve lanes, viewer HTML structure) and revision tracking (#160).
Frontend E2E (Playwright)
cd butler-law/frontend
npx playwright test # mocked suite (CI-safe)
E2E_LIVE=1 npx playwright test e2e/live-smoke.spec.ts # live smoke (see below)| File | Coverage |
|---|---|
chat.spec.ts | Chat (SSE streaming, error surfacing), Knowledge Base tabs, citation cards (条文/通達/質疑応答事例/タックスアンサー) and the enforcement-version footer rendering |
auth.spec.ts | Auth redirects |
live-smoke.spec.ts | Opt-in (E2E_LIVE=1): runs against the live dev backend + real MongoDB data. LS-01..05 (API): all four sources populated, viewers return the two-pane HTML, law viewer shows the enforcement version, revisions feed responds. LS-06: KB tabs show real counts. LS-07 (#192): drives the real chat UI — types a question, submits, asserts the SSE-streamed answer renders (the markdown/streaming wiring, not answer quality → golden-set). LS-08 (#192): the tax→law escalation target (POST :8002/api/chat with the forwarded butler_session, agent_id=tax_01) returns a structured {response, session_id, source_refs} — the exact call chat_service.query_law_agent makes. Skipped without the env var, so CI stays mock-only |
LLM answer quality (e.g. whether the model actually emits the enforcement-version footer) is out of scope for E2E — that belongs to the golden-set evaluation in butler-law/backend (Issue #40).
Studio Admin E2E (Playwright)
cd butler-platform/studio_admin/frontend
npx playwright test # 67 tests incl. law-admin.spec.ts (Law operations pages)Butler Tax — Ledger verification harness (#175 S3)
cd butler-tax/backend
uv run python -m scripts.verify_ledger_testdata # requires local TestData/Options: --keep leaves the generated entries in the DB after the run (default wipes them), --corporate-id <id> targets a specific corporate (e.g. to demo real figures in the ledger UI on a dev login).
Local-only (real-company data lives outside the repo): parses a Yayoi 残高試算表(年間推移) PDF, reconstructs the year as opening balances + one compound journal entry per month + a closing entry, then reconciles Butler's trial balance / PL / BS against Yayoi's own figures for every leaf account and every month. Exit code 0 = exact match.
Verified 2026-06-12 against ㈱My2cents 令和7年12月期: all figures match (20 BS accounts × 11 months, full PL waterfall, BS totals and equation).
Butler Tax — Depreciation verification harness (#184 D5)
cd butler-tax/backend
uv run python -m scripts.verify_depreciation_testdata # requires local TestData/ + platform on :8003Local-only. Parses 別表十六(二)/(七) and the 適用額明細書 from the filed corporate tax return of Eagle Asset Management (令和7年10月期) and reconciles them against the depreciation engine and the platform rate/threshold masters: cross-foots of the declining-balance schedule, the single-asset 車両運搬具 row (調整前償却額 = floor(基礎 × 0.333) and 償却保証額 = floor(取得価額 × 0.09911) — a real-world confirmation of the 6-year 200% DB master values), the no-switch branch, the engine's total-depreciation invariant, and the SME immediate-expensing classification of both 措法67の5 assets incl. the adoption statement total.
Verified 2026-06-12: 18 checks, all matching the filed return. Per-asset multi-year schedule verification needs the Yayoi 固定資産台帳 (per-asset acquisition dates), which is not in TestData yet.
Butler Tax — Payroll verification harness (#178 P4)
cd butler-tax/backend
uv run python -m scripts.verify_payroll_testdata # requires local TestData/Local-only. Reads the monthly amounts of the five payroll accounts (役員報酬 / 給料手当 / 雑給 / 賞与 / 法定福利費) from the same Yayoi annual trend PDF, pushes each month through the payroll import path (import → journal generation → confirm), and reconciles the trial balance against Yayoi for every account and month. Negative months in the real data (red-slip corrections, the December 雑給→ reclassification) are booked as manual correction entries — they are corrections, not payroll.
Verified 2026-06-12: all figures match (5 accounts × 12 months + annual totals; 給料手当 ¥54,915,638 reproduced through the payroll path). Parsing notes that generalize: Yayoi BS columns are month-end balances in section-direction sign convention (contra accounts negative); PL columns are monthly amounts; the 13th column is closing entries; breakdown rows (e.g. 外注費 as a sub-row of 当期商品仕入高) must not be double-counted.
