Decision Tree Approach to Discovering Fraud in Leasing Agreements

Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.


Introduction
Leasing is a modern financing method developed in the U.S.A. in the 30s of the last century, and has been widely accepted and applied in the world from 1950s onwards.Leasing allows the user to use needed equipment or property for a required period of time, rather than to buy it.A leasing object is a movable or an immovable thing in accordance with the applicable rules governing property or other proprietary rights (Smith, Wakeman, 1985;Morais, 2013).
A leasing agreement becomes realized and active after being signed by a leasing company and a customer.There is no delay in activation or conditional activation of the agreement.There are two main ways in which a leasing agreement can be terminated: the expiration of the agreement and the premature termination.The circumstances that lead to an early termination can be divided into the circumstances caused by users of the lease (total loss, failure to pay monthly installments) and the circumstances caused by external influences (theft, total loss due to natural disasters).
If the agreement is terminated and the attempt to perpetrate fraud or deception is found, the damage for a leasing house is created.Therefore, risk management and using credit scoring are important levers for increasing the security of a leasing company.Advanced analytical methods of assessing the risk of fraud have proved successful in predicting one of the two possible outcomes of the agreements: a successful implementation and finalization of the agreement and an attempted fraud (Ngai et al., 2011;Bhattacharyya et al., 2011;Huang et al., 2012).However, in previous studies, leasing has not been the subject of modeling knowledge discovery from databases, although the method is often used in practice.Therefore, the aim of the paper is to develop a model for detecting fraud in the lease, using actual data from a leasing company.To achieve the objective, knowledge discovery from databases was used and the decision tree method was applied (Sinha, Zhao, 2008).

Data
The used database contains information on all leasing agreements and offers in the core system on the date of running the report.The number of active or completed agreements at the time of running the report was 25,000.In the same period a total of 561 agreements in which fraud was realized were found.In order to ensure the possibility of forming a decision tree model, the method of under sampling was used and 560 agreements with no fraud attempts were randomly selected from the total number of observed agreements.
Although the database contains more than a hundred variables, due to the confidentiality of data, selected variables are sufficiently general in character and do not disclose protected information about leasing customers, suppliers and employees, while at the same time they are specific enough to be important for the realization of the model.Figure 1 contains the variables used in the discovery of knowledge from databases.In cases when the sum is smaller than 100%, there were missing data.

Decision trees
Decision trees are a popular and widely accepted tool for classification and prediction, and their strength is reflected in the fact that they are easily understandable due to a graphical display (Apté, Weiss, 1997;Tsang et al., 2011).A decision tree is a statistical method of pattern recognition which is used to solve problems with predictive nature while monitoring the learning process is needed.Predictive problems include forecasting values in the future, pattern recognition, regression of multiple features, the differential analysis, evaluation functions of more features and supervised learning.Decision trees are very efficient when dealing with large databases and when many variables should be taken into account (Li, 2005; Wu, Banzhaf, 2010).The paper used the CHAID algorithm for trees to detect fraud in the leasing agreements, since this algorithm is suitable for classification problems where the variables have more than two modalities (McCarty, Hastak, 2007;Coussement et al., 2014).The paper uses the software package SPSS, ver.19 th , and two types of models have been developed: (i) Model A: the model with a simpler classification of leased assets (the variable Object classification 1) and (ii) Model B: the model with a complex classification of leasing involving facilities (the variable Object classification 2).
Model A is represented graphically on the Figure 2, and also trough generated business rules in the form of SQL code on the Figure 3. Source: Authors' work Model A will be described in greater detail.The variable used for branching on the first level is Object 1, which is statistically significant with a level of 1% probability (P-value = 0.000).Second level nodes show branching variables Object 1 at three knots.o Node 1 (node1) contains 210 data for which the average value of the variable Fraud is 0.738, which means that 73.8% of the agreements for which the subject of the agreement is GF3 resulted in fraud.o Node 2 has 667 agreements for which the average value of the variable Fraud is 0.391, which means that 39.1% of the agreements for the GF1 and the unknown object contracting resulted in fraud.o In the same way we interpret Node 3.This node has 244 agreements for which the average value of the variable Fraud is 0.594, which means that the 59.4% of the agreements for the GF2 resulted in fraud.
The variable for branching on the second level is Source of information, which is statistically significant with a probability level of 1% (p-value = 0.000).Third-level nodes show the branching variable Source of information on the two nodes.o Node 4 shows the clients who come directly to the leasing company or or the source of initial information is not available.This node contains 261 agreements with the average value of 0.287, which means that 28.7% of the agreements resulted in fraud.o Node 5 shows clients who are contracted through the dealer or the manufacturer, and via the Internet (only a small share).The average value of this node is 0.458, meaning that 45.8% of the agreements resulted in fraud.The variable used for branching on the third level is Type of leasing, which is statistically significant with a probability level of 1% (p-value = 0.000).o Node 6 contains agreements of operating lease, where the average agreement value is 0.583, meaning that 58.3% of the agreements resulted in fraud.Node 7 includes financial leasing and loans, where the average agreement value is 0.352, meaning that 35.2% of the agreements resulted in fraud.The variable for branching on the second level is Source of information, which is statistically significant with a probability level of 1% (p-value = 0.000).Third-level nodes show the branching variable Source of information on the two nodes.o Node 6 shows the clients who come directly to the leasing company or the source of initial information is not available.This node contains 165 agreements with the average value of 0.297, which means that 29.7% of the agreements resulted in fraud.o Node 7 shows clients who are contracted through the dealer or manufacturer, and via the Internet (only a small share).The average value of this node is 0.491, meaning that 49.1% of the agreements resulted in fraud.The variable used for branching on the third level is Type of leasing, which is statistically significant with a probability level of 1% (p-value = 0.000).o Node 8 contains 146 agreements of operating lease, where the average agreement value is 0.582, meaning that 58.2% of the agreements resulted in fraud.o Node 9 includes financial leasing and loan and, contains 139 agreements where the average agreement value is 0.396, meaning that 39.6% of the agreements resulted in fraud.

Practical implications
Introduction of this model in the business would certainly show that certain frauds could be prevented and would indicate the leasing agreements which present a fraud risk.However, to make this project come to life, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.It would be necessary to show already existing fraud events, fraud events that are emerging and potential fraud events so that for each of these categories an appropriate action could be taken.
The solution could be implemented into the current environment through the existing SQL-based applications by developing a separate module.In this case, it would be necessary to employ the original developers to integrate the module within the existing application to set up an alarm system.This is probably the best solution because the program would be incorporated into the existing central application enabling full access to all data in the core system, regardless of the period.According to similar projects, the estimated costs of the development of these modules would be at the level of approximately 15,000 EUR.This estimation is based on the market research conducted for the leasing firm used for the case study.Prevention of even a single case of fraud would prove the purposefulness of this project since instances of fraud in most cases involved expensive leasing objects.Prevention of fraud events results not only in savings connected with the value of lease agreements, but also results in a number of other positive externalities.The accounts receivable department has one less difficult case to handle, there is no need to pay the costs of interventions for finding fraud subjects of leasing and eventually significant legal costs and the costs of hiring legal services staff are avoided.

Figure 1
Figure 1Variables used in the discovery of knowledge from databases.

Figure 2
Figure 2 Decision tree generated with a more aggregate object classification (Object classification 1)

Figure 3 /
Figure 3Rules generated based on decision tree algorithm

merkac@fkpv.si
Ivan Horvat received the univ.specfrom the Faculty of Economics and Business, University of Zagreb in Information Management.He is currently controlling specialist at the VB leasing Croatia and external associate at Faculty of Economics and Business, University of Zagreb within the area of informatics and SAP.In VB leasing his main focus is on financial controlling, cost control and analysis, budgeting and reporting.Ivan is currently getting additional specialization in internal auditing.Author can be contacted at ivan.horvat.zg@gmail.comMirjana Pejic-Bach, PhD, is a Full Professor of System Dynamics, Managerial Simulation Games and Data Mining at the Department of Informatics, Faculty of Economics and Business, University of Zagreb.Her current research areas are simulation modelling, data mining and web content research.She is the (co)author of number of articles in international and national journals.She is actively engaged in number of scientific projects (FP7, bilateral cooperation, national projects) and also collaborates in several applied projects in the field of data mining, simulation modelling and informatization.Author can be contacted at mpejic@efzg.hrMarjana Merkač Skok earned her Ph.D. in 1997 from Management and organization sciences at University of Maribor.Currently she is a Dean at Faculty of Business and commercial sciences in Celje, Slovenija.She also works as independent expert for quality assurance in higher education in EU.Before that, she worked as developer and expert in human resource and organizational development in industry and for several years as a business consultant for management.Author is involved in researches about quality, system science, career management, lifelong learning and training.Author can be contacted at marjana.