# Introduction he major process of discovering knowledge in database is the extraction of rules from classes of data. In order to automate this problem, many inductive learning methods are introduced and applied to extract knowledge from databases such as decision tree learning [1], the technology employs a decision tree to achieve the learning function. The rough sets concept proposed by Pawlak in 1982 [2] is a new mathematical approach to imprecision, vagueness and uncertain. The rough sets philosophy is founded on the assumption that with every object of the universe of discourse we associated, some information objects characterized by the same information are indiscernible in the view of the available information about them. The indiscernibility relation is the mathematical basis of rough sets theory. Any set of all indiscernible objects is called an elementary set and forms a basis granule of knowledge about the universe [14] [18]. Any union of elementary Author ? ? ?: Department of Information and Communication Technology (ICT) Manipal University, Manipal, Karnataka, India. e-mails: tribikram.pradhan@manipal.edu, harshanand007@yahoo.co.in, akulgoyal4@gmail.com sets is referred to as a precise set, otherwise the set is rough. Each rough set has boundary line cases. With any rough sets a pair of precise sets-called the lower and the upper approximation of the rough sets is associated [7]. In recent 20 years, rough sets approach seems to be of fundamental importance to artificial intelligence (AI) and cognitive sciences and has been successfully applied many real life problems in medical diagnosis engineering [3], banking [3], finances [4] and others. By coupling rough sets theory with genetic algorithms (GA's), it is able to enhance search speed, induce decision rules from inconsistent information and this paper presents a hybrid approach that integrated rough sets theory, GA's and Boolean algebra for rule induction. # II. Genetic Algorithms (ga's) GA's have been established as a viable technique for search, optimization, machine learning, and other problems. Theoretical developments by Holland and De Jong have laid the foundation of GA's [9]. GA's have been theoretically and empirically proven to provide robust search in complex space. The genetic algorithm consisting of a number of iteration process to make the population evolve [6]. Each iteration consists of the following steps: ? Selection: The first step consists of selecting individuals for reproduction [12]. This selection is done randomly with a probability depending on the relative fitness of the individuals so that best ones are often chosen for reproduction than poor ones. ? Reproduction: In the second step, offspring are bred by the selected individuals. For generating new chromosomes, the algorithm can use both recombination and mutation. ? Evaluation: Then the fitness of the new chromosomes is evaluated. ? Replacement: During the last step, individuals from the old population are killed and replaced by the new ones. Then, the genetic algorithm loops over an iteration process to make the population evolve. Figure 1 depicts the life cycle of GA's. # T Figure 1 : Genetic algorithm cycle Finally, the reproduction step involves the creation of offspring chromosomes by using two genetic operators-mutation and cross-over. The most important part of the genetic algorithm is cross-over where we have to randomly select a cross site and swaps the genes of two parent chromosomes to produce two new offspring chromosomes. This can be easily represented using a pair of chromosomes encoded with two binary strings and cross site is denoted by "|". V is the domain of each attribute of A. F is a total function that defines the following application U x A? V. Definition 2 : In an information system, S the attributes A are further classified into disjoint sets of condition attributes C and decision attributes D. i.e. ? = C?D and Ø = C?D. All the information represented by a table known as Information System which encompasses a number of rows and columns corresponding to the number objects and attributes. b) The set BNr (X) = R*(X) -R.(X) will be referred as the R-boundary region of X. If the boundary region of X is the empty set, i.e. BNr(X) = Ø, then the set X will be called crisp with respect to R; in the opposite case, i.e. if BNr(X) != Ø, the set X will be referred as rough with respect to R. c) In the same way, POSr (X) and NEGr (X) are defined as follows: ? NEGr (X) = U -R*(X) certainly non-member of X ? POSr (X) = R.(X) Year 2014 I THA-A Hybrid Approach for Rule Induction System Using Rough Set Theory, Genetic Algorithm and Boolean Algebra Object U Condition Attribute Decision Attribute A B C x 1 1 0 0 x 2 0 1 1 x 3 1 1 0 x 4 0 0 0 x 5 1 0 1 x 6 0 0 1 x 7 1 1 1 IV. # Tha: The Hybridized Model There are so many inductive learning systems such as ID3, ID4 and ID5 are not capable of handling inconsistent information about training data set effectively. After that Grzymala-Busse designed one system called as LERS which can deal with inconsistent information as well as training data set. But in LERS it's very difficult to maintain a huge training data set. And also the rules induced by LERS are very complicated and very difficult to understand. So in this paper we have proposed a hybridized approach known as THA (Training Data Set on Hybrid Approach), which a combination of Rough set theory, Genetic Algorithm and Boolean algebra. ? Rough Set Theory can handle inconsistent training data set and also missing values. ? Genetic Algorithm based search engine can induce probable decision rules. ? Finally, Boolean operations can simplify the probable decisions rules. The framework for the hybridized approach THA is depicted in Figure 3. Basically, it consists of 4 modules such as selection of raw data, rough set analyzer, performance evaluation of genetic algorithm, simplification by Boolean operation to generate rule. The knowledge collected from the process or experts is forwarded to the hybridized model, THA for classification and generation of rules. After the completion of approximation analysis genetic algorithm will take both certain and possible data set. Generally genetic algorithm performs reproduction, cross-over and mutation to extract certain and possible rules from the training set. Finally, Boolean operations are used to simplify the probable decision rules generated by genetic algorithms. The Boolean operators such as union and intersection are used to simplify the rules. During these operations, redundant rules are to be removed, whereas related rules are to be clustered and generalized during simplification. For every possible rule we have to identify the reliability index, which is defined as the ratio of the number of observations that are correctly classified by possible rules and the number of observation whose condition attributes are covered by the same rule in the original training data set. # Reliability index = Where, observation of possible rule is the number of observations that are correctly classified by a possible rule and observation of original data is the number of observations with the condition attribute covered by the same rule in the original data set. Now we can easily classify the inconsistent training data set correctly. V. # Experimentation THA is validated from the data obtained from a case study which monitors water quality parameters for drinkable water standards. The data in the Table 2 shows different parameters along with their range of values suitable, based on which we can determine the quality of water. For simplicity, we have taken two or three values for each water quality parameters. In the above table D stands for drinkable and UD stands for undrinkable. In order to process the information, we need to depict the parameters in the form of integers. This is done using the following descriptor scheme show in Table4. # Global Journal of Researches in Engineering # Table 4 : Transformation scheme The transformed result is shown in Table 5. # Table 5 : Transformation water quality parameters for drinkable water standards Observation Colour pH value Sulphur Turbidity Fluoride State 1 1 1 1 1 0 0 2 0 2 0 0 1 1 3 0 2 2 0 1 1 4 1 2 2 1 1 0 5 0 0 1 1 0 0 6 0 2 0 0 1 0 7 1 0 2 0 0 0 8 1 1 2 0 0 1 9 1 2 2 1 1 1 10 0 1 0 1 1 0 11 1 1 1 1 0 0 12 0 2 2 0 0 1 13 0 0 2 1 1 0 14 1 1 1 1 1 0 15 0 0 0 0 1 1 16 1 2 2 1 1 1 17 1 1 1 1 0 1 18 0 2 0 0 1 0 As observed from the above table, the observation (1,11,17), (2,6,18) and (4, 9, 16) contradict each other, hence we need to perform approximation and concept forming through rough set analyzer. The two decision states, are characterized by C 1 (state=drinkable) and C 2 (state=non drinkable). Applying the rough set theory, we calculate the lower and upper approximations for the concept C 1 and C 2 . As observed from the above table 5, concepts C 1 and C 2 are represented by the following sets: Now we have to encode the coded data of table 5 by using such a scheme which is called Chromosomes encoding scheme. Chromosome coding for water parameters are given below. # quality parameters The GA based search engine is then used to extract rules from certain possible training data set obtained by rough set analyzer then we have to randomly generate 120 chromosomes to form an initial population of possible solution i.e., chromosomes. These chromosomes are coded using the scheme depicted in Table 6. For chromosome represented the corresponding hybrid approach follows traditional binary string representation and its corresponding crossover and mutation operators. Using this scheme each chromosome is expressed as a binary string i.e., a string containing "0" and "1". After using the scheme a classification rule can be easily represented by 16 bit chromosome. For example, If (colour>=15) and (pH value>=7.5) and (sulphate>=300) and (turbidity>=8) and (fluoride>=1) then state will be drinkable can be coded as 0110110110110010. Other than choosing a good schema for chromosome representation, it is important to define a reasonable fitness function that rewards the kind of chromosomes. Basically the purpose of the GA's is to extract rules that maximize the probability of classifying the objects correctly. Thus, the fitness value of a chromosome can be described by its reliability, or in other words, the probability to classify objects in a training data set correctly. Mathematically, the fitness function used in this work is expressed as ( )2 For example, if a rule (representation of chromosome) can correctly classify five objects in a training data sets, an if there are six objects having the same condition attributes-value pairs as the said rules, then the fitness value of this chromosome is (5/6)2 =0.6944. The square operator appeared in the fitness function is to ensure rapid convergence. It is used to suppress bad chromosomes with low fitness scores and promotes the creation of good chromosomes with high fitness scores. Thus the above fitness function favours rules that can classify objects correctly. Furthermore, it also satisfies more consistency and completeness criteria, which are of great importance to the evaluation of the rule. A rule is said to be consistent if it covers no negative sample, this is, no object in the training data set violating the rule; and it is said to be complete if the rule is able to cover all the positive sample that satisfy the condition of the rule in the training data set. As previously mentioned, after evaluating the fitness values of chromosomes with above average fitness values are selected for reproduction. As for cross-over and mutation, the respective probabilities are fixed at 0.85 and 0.01. With a higher probability of cross-over, offspring chromosomes that maintain the genetic traits of the parent chromosomes can be generated easily. This allows chromosomes with higher fitness values, that is, better solutions, to be discovered. A lower probability of mutation prevents the search for optimal solutions todegenerate into a random one. The rule set induced by GA based search engine may contain rules with identical fitness values. Some of these rules can be combined to form a more general or concise rule using Boolean algebra. The rule pruner is assigned to detect and solve the redundancy problem. If colour>=1 and pH >=1 and fluoride>=1 then state will be drinkable. If colour>=0 and pH >=1 and fluoride>=0 then state will be drinkable. Here from the Boolean algebra point of view Rule1 is a subset of Rule2. And the resultant rule will be If colour>=0 and pH >=1 and fluoride>=0 then state will be drinkable. Two sets of rules are available. As already mentioned, the value recorded in the parentheses following each certain and possible rule represents the completeness and the reliability indices, respectively. All the indices are represented in fraction form, with the numerator corresponding to the number of correctly classified observations whose condition attributes are covered by the rule. Analysis shows that the rules induced by THA are simple, reasonable and logical. # VI. # Induced Rule Set Possible Rules: # Discussion The above experimentation shows that the hybridized approach THA is able to induce rule under uncertainty. We can infer that the hybrid approach can be used for inductive learning under uncertainty. This hybrid approach uses the strength of rough set theory along with efficient GA based search engine and Boolean algebra. In the above experimentation, the GA based search engine reaches its saturation within 60 generations for both possible and certain data sets. This hybrid approach is compared with other inductive learning technique and the result is shown in Table 7. The certain rules that are generated by this approach are identical to those produced by ID3. Finally the rules generated by this system are simple and concise as compared to those produced by LERS. In Figure 5, we have taken the population size as 120 and the number of generation as 60. In every generation we have identified the average fitness function and plotted the graph accordingly. And based on the average fitness function we have selected the chromosomes with higher fitness values. In this experiment, we have selected those chromosomes which have fitness function values more than 0.68. We have discarded all the chromosomes whose fitness function is less than 0.68. 2![Figure 2 : Flow chart of genetic algorithm](image-2.png "Figure 2 :") 314![In information system, each subset of attributes A ? determines a binary relation as follows: IND(A) = ({x, y} U x U , ?a a(x) = a(y)). IND(A) is an equivalence relation on the sets U and is called an indiscernible relation. Module Rough sets theory offers a powerful means to deal with the inconsistency problems. It uses a set of lower and upper approximations as its main vehicles for problem solving. a) There are five regions of interest, such as R*(X), R.(X), POSr (X), BNr(X) and NEGr(X).](image-3.png "Definition 3 : 1 Definition 4 :") 3![Figure 3 : Framework for the hybridized approach THA](image-4.png "Figure 3 :") 4![Lifecycle of Hybridized model-THAIn selection of raw data module such as incomplete and inconsistent data, are feed into the preprocessor which automatically computes all the parameters such as population size, length of the chromosomes, number of generation and the probability of cross-over and mutation necessary for genetic algorithm.The rough set analyzer performs two tasks such as consistency check on the training data set and the classification of object using rough set theory. For inconsistent data we have to perform three more sub task, namely identification of types of attributes, concept forming and types of approximation are to becomputed. Approximation analysis consists of two types of approximation namely lower and upper approximation. After that we have to classify the training data set into two more subset such as certain training data set and possible training data set.](image-5.png "DataFigure 4 :") 1![{1, 4, 5, 6, 7, 10, 11, 13, 14, 18} C 2 = {2, 3, 8, 9, 12, 15, 16, 17} The lower and upper approximations of the above two concepts are respectively given by R(C 1 )= {5,7,10,13,14} R(C 1 )={1,2,4,5,6,7,9,10,11,13,14,16,17,18} R(C 2 )= {3,8,12,15} R(C 2 )= {1,2,3,4,6,8,9,11,12,15,16,17,18} Year 2014 I THA-A Hybrid Approach for Rule Induction System Using Rough Set Theory, Genetic Algorithm and Boolean AlgebraThen boundary region of the above two states are given by BNR(C1)= ? (C1)-R(C1)= {1, 2, 4,6 , 9, 11, 16, 17, 18} BNR(C2)= ? (C2)-R(C2)= {1,2, 4, 6, 9, 11, 16, 17, 18} ](image-6.png "C 1 =") ![The results generated by hybrid approach are shown in next section.](image-7.png "I") 1![If (Color>=15) & (pH>=7.5) & (Sulphate>=300) & (Fluoride>=1) then state = Drinkable Rule 2: If (Color>=15) & (pH>=7.5) & (Sulphate>=300) & (Turbidity>=8) then state = Drinkable Rule 3: If (Color>=15) & (pH>=8) & (Sulphate>=375) & (Turbidity>=8) then state = Un-Drinkable Rule 4: If (Color>=15) & (Sulphate>=375) & (Turbidity>=6) then state = Un-Drinkable Rule 5: If (Color>=15) & (pH>=8) & (Sulphate>=375) & (Fluoride>=1.2) then state = Un-Drinkable Rule 6: If (Color>=12) & (pH>=8) & (Sulphate>=200) & (Turbidity>=6) then state = Drinkable Rule 7: If (Color>=12) & (pH>=7) then state = Drinkable Rule 8: If (pH>=7.5) & (Sulphate>=300) & (Fluoride>=1) then state = Drinkable Rule 9: If (pH>=8) & (Sulphate>=200) & (Fluoride>=1.2) then state = Drinkable Rule 10: If (Color>=12) & (Sulphate>=375) then state = Drinkable Rule 11: If (Sulphate>=300) & (Turbidity>=8) & (Fluoride>=1) then state = Drinkable Rule 12: If (Sulphate>=375) & (Turbidity>=6) & (Fluoride>=1) then state = Un-Drinkable Rule 13: If (Color>=12) & (pH>=8) & (Sulphate>=200) & (Fluoride>=1.2) then state = Drinkable Rule 1: If (Color>=12) & (pH>=8) & (Sulphate>=375) & (Turbidity>=6) & (Fluoride>=1.2) then state = Un-Drinkable Rule 2: If (Color>=12) & (pH>=7) & (Sulphate>=300) & (Turbidity>=8) & (Fluoride>=1) then state = Drinkable Rule 3: If (Color>=15) & (pH>=7) & (Sulphate>=375) & (Turbidity>=6) & (Fluoride>=1) then state = Drinkable Rule 4: If (Color>=15) & (pH>=7.5) & (Sulphate>=375) & (Turbidity>=6) & (Fluoride>=1) then state = Un-Drinkable Rule 5: If (Color>=12) & (pH>=7.5) & (Sulphate>=200) & (Turbidity>=8) & (Fluoride>=1.2) then state = Drinkable Rule 6: If (Color>=12) & (pH>=8) & (Sulphate>=375) & (Turbidity>=6) & (Fluoride>=1) then state = Un-Drinkable Rule 7: If (Color>=12) & (pH>=7) & (Sulphate>=375) & (Turbidity>=8) & (Fluoride>=1.2) then state = Drinkable Rule 8: If (Color>=15) & (pH>=7.5) & (Sulphate>=300) & (Turbidity>=8) & (Fluoride>=1.2) then state = Drinkable Rule 9: If (Color>=12) & (pH>=7) & (Sulphate>=200) & (Turbidity>=6) & (Fluoride>=1.2) then state = Un-Drinkable Certain Rules: VII.](image-8.png "Rule 1 :") 5![Figure 5 : Performance of Genetic Algorithm based search engineThe rules generated by THA have the following advantages:? Easy to comprehend ? Easy to deduce and examine? Easy to validate and cross-checkThis hybrid approach THA can handle uncertainty or inconsistent information in real world industrial application such as Data Mining. It is an important area which frequently requires generation of diagnostic rules to be induced from large database.](image-9.png "Figure 5 :?") 1In Table 1 there are 7 objects, 2 conditionsattribute A and B and 1 decision attribute C. Forexample, in object x 5 the condition attributes are 1 and0, and its decision attribute is 1. Imprecise informationcauses indiscernibility of objects. This indiscernibilityrelation is called equivalence relation on the set ofobject U. In our above example a conflict(inconsistency) exists between object x 1 and x 5 becausethey are indiscernible by condition attribute A and B andhave different decision attribute (C).Equivalence classes of relation are calledElementary set in S. Any finite union of elementary set isknown as Definable Set. The decision elementary setare called concepts. For example, the above tableshows the decision attribute is having 2 types of valueswhich is 0 and 1. Hence, 2 types of concept will beshaped. For example, C 1 and C 2 .C 1 is having the decision attribute 0 and C 2 ishaving values 1.C 1 ={x1, x3, x4}C 2 ={x2, x5, x6, x7} 2ConditionRangeValuesColour[5-25](12, 15)P.H. Value[6.5-8.5](7, 7.5, 8)Sulphate[200-400](220, 300, 375)Total Hardness[300-600](300, 400)Turbidity[5-10](6, 8)Fluoride[1.0-1.5](1.0, 1.2) 3ObservationColourpH valuesulphur turbidity Fluoride state1157.530081D212820061.2UD312837561.2UD415837581.2D512730081D612820061.2D715737561D8157.537561UD915837581.2UD10127.520081.2D11157.530081D1212837561UD1312737581.2D14157.530081.2D1512720061.2UD1615837581.2UD17157.530081UD1812820061.2D 6Bit 3,6,9,12,15; operator 0=Less than or equal to(?)1=Greater than or equal to (?)Bit 1 and 2:Colour00=1201=15Bit 4 and 5: PH value00=701=7.510=8Bit 7 and 8: Sulphate00=20001=30010=375Bit 10 and 11:Turbidity00=601=8Bit 13 and 14:Flouride00=110=1.2Bit 16:State0=D1=UD 7TechniqueDealing with uncertaintySimple and concise ruleExtracting completeand inconsistencyinductionrulesID3NoYesFor consistent data setLERSYesNoNot evaluatedRClassYesYesNoTHAYesYesYes © 2014 Global Journals Inc. (US) * Improved use of continuous attributes in C4.5 IRossQuinlan Journal of artificial intelligence research 4 1996 * Rough Sets Pawlak international Journal of information and computer sciences 11 1982 * Rough sets theory and its application to data analysis Pawlak Cybernetic And Sysrems 29 1998 * Economic and financial prediction using rough set model EHFrancis LixiangTay Shen European Journal of Operational Research 141 2002 * Genetic programming and rough sets: A hybrid approach to bankruptcy classification ThomasEMckee TejeLensberg European Journal of Operational Research 138 2002 * Comparison of rough-set and statistical methods in deduce learning SK MWang WZiarko YRLi international Journal of Man-Machine Studies 24 1986 * Rough computational methods for information system JWGuan DABell Artificial intelligence 105 1998 * Rough set methods in feature selection and recognition RomanWSwiniarski AndraejSkowron Pattern Recognition Letters 24 2003 * Churn modeling by rough set approach JGrant 2001 manuscript * Rough Fuzzy Hybridization S. K. Pal and A. Skowron 1999 Springer * Global Journal of Researches in Engineering * Rough Sets -Theoretical Aspects of Reasoning about Data ZPawlak 1991 Kluwer Boston, London, Dordrecht * Decision rules, Bayes' rule and rough sets ZPawlak New Di-rection in Rough Sets, Data Mining, and Granular-Soft Computing AZhong SSkowron Ohsuga Springer 1999 * New look Bayes' theorem -the rough set outlook ZPawlak Proc. Int. RSTGC-2001 Int. RSTGC-2001Matsue Shimane, Japan May 2001. 1/2, 2001 5 * Rough Sets and Current Trends in Computing. Lecture Notes in Artificial Intelligence 1424 L. Polkowski and A. Skowron 1998 Springer * LPolkowski STsumoto TYLin Rough Set Methods and Applications -New Developments in Knowledge Discovery in Information Systems * Springer 2000 to appear * New Direction in Rough Sets, Data Mining, and Granular-Soft Computing N. Zhong, A. Skowron, and S. Ohsuga 1999 Springer * Rough Sets, Their Extensions and Applications QShen RJensen Set Theory and Logic Mineola-USA Dover Publications 2007. Jul. 2007. 1979 4 Stoll, R.R. * RST-Based System Design of Hybrid Intelligent Control GXie FWang KXie Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics the 2004 IEEE International Conference on Systems, Man and CyberneticsThe Hague-The Netherlands; New Jersey-USA IEEE Press 2004. Oct. 10-13, 2004 * Discovering Attribute Relationships, Dependencies and Rules by using Rough Sets LAZadeh WZiarko NShan Proceedings of the 28th Annual Hawaii International Conference on System Sciences the 28th Annual Hawaii International Conference on System SciencesWailea-USA; New Jersey-USA IEEE Press 1965. 1995. Jan. 3-6, 1995 Fuzzy Sets, Information and Control * Rough Set based Approach to Selection of Node JMWei International Journal of Computational Cognition 1542-8060 1 2 2003 * CWu YYue MLi OAdjei The Rough Set Theory and Applications, Engineering Computations 2004 21