Methodology

Rule-Based and Explainable Phishing Detection Framework

PhishAID follows a structured, rule-based detection methodology designed to identify phishing websites using transparent and explainable indicators. Unlike black-box machine learning models, this approach ensures that every detection decision can be traced back to specific, well-defined rules.

The methodology focuses on analyzing the structural, lexical, and contextual properties of URLs and domains that are commonly exploited in phishing attacks. Each website is evaluated independently without relying on prior training data or external reputation databases.

Overall Detection Flow

The detection process begins when a URL is submitted to the system. The URL is parsed and examined across multiple rule categories. Each rule evaluates a specific phishing indicator and contributes a weighted score if triggered.

The cumulative score obtained from all triggered rules determines the final classification of the website as Legitimate, Suspicious, or Phishing.

Key Analysis Components

PhishAID applies multiple layers of analysis to ensure comprehensive detection coverage:

URL Structure Analysis – Examines the lexical composition of URLs, including length, symbols, and subdomain depth.
Protocol and Domain Inspection – Evaluates HTTPS usage, SSL certificate age, and domain format.
Heuristic-Based Scoring – Assigns weighted penalties to triggered rules based on phishing severity.
Explainable Verdict Generation – Produces a rule-wise breakdown explaining the final decision.

Rule Categories

The rule engine is organized into well-defined categories, each targeting a specific phishing strategy:

URL Lexical Analysis – Detection of abnormal URL patterns and deceptive structures.
IP-Based URL Detection – Identification of URLs using raw IP addresses instead of domain names.
Suspicious Keyword Detection – Detection of commonly abused terms such as “login”, “verify”, and “secure”.
HTTPS and Certificate Checks – Validation of secure transport and certificate maturity.
URL Length and Entropy Analysis – Identification of excessively long or complex URLs designed to confuse users.

Scoring and Verdict Logic

Each rule contributes a predefined negative score when triggered. Rules with higher phishing relevance impose stronger penalties. The final verdict is derived using threshold-based classification:

Legitimate – No or minimal risk indicators detected
Suspicious – Moderate risk indicators requiring user caution
Phishing – Strong evidence of malicious intent

This scoring mechanism ensures consistency, interpretability, and resistance to adversarial manipulation.

Explainability and Transparency

A core strength of PhishAID lies in its explainability. For every analyzed website, the system generates a detailed rule-wise evaluation table showing which rules were triggered, their descriptions, and their individual score contributions.

This transparency makes the system suitable for academic research, security training, and regulatory environments where decision accountability is essential.