Assistant Registrar — Computer Systems Institute
A suite of Python desktop tools built to replace entirely manual administrative workflows — parsing hundreds of PDF attendance sheets with a multi-mode OCR pipeline, processing multi-quarter enrollment data through pandas, generating styled Excel output, and dispatching email drafts directly through Outlook via the Windows COM API.
What it does
Combines native PDF text extraction (PyMuPDF) with multi-PSM Tesseract OCR at 5x zoom — automatically selecting the mode that yields the most section matches per page.
11 regex patterns match section IDs across every OCR output format variation. Each match is normalized (dash variants, spacing) and validated against a prefix allowlist before display.
A companion script reads processed PDFs and automatically writes attendance status into the corresponding rows and date columns of the Excel tracking workbook using openpyxl.
Accepts per-quarter or combined multi-quarter Excel input, auto-splits by TermID, processes each term independently, and outputs one styled Excel workbook per quarter.
Uses win32com to create a pre-populated Outlook email draft with the output files attached via CF_HDROP clipboard — ready to send with one click.
All tools run locally, handle data only within authorized staff workflows, and expose no student records externally. Specific data structures and identifiers are redacted in this showcase.
Tool 1
A Tkinter desktop application that processes PDF attendance sheets in bulk. It first attempts native text extraction via PyMuPDF, then falls back to multi-PSM Tesseract OCR at 5x zoom if the page text is empty or a placeholder — automatically selecting the PSM mode (3, 6, or 11) that yields the most section-ID matches per page.
Get-ClipboardFillAttendanceSheetTracker script reads processed PDFs and auto-fills the Excel attendance tracker using openpyxl
Tool 2
A dark-themed Tkinter application that replaces a fully manual workflow of copying data, splitting by program category, formatting, and emailing. Accepts per-quarter or combined multi-quarter Excel input, processes each quarter independently through pandas, and outputs a styled Excel workbook per term with frozen headers, auto-width columns, and correct numeric formatting.
win32com creates a pre-populated Outlook draft with subject, HTML body, and attachments ready to sendconfig.jsonTechnology
Core
PDF & OCR
Office & OS Integration