562
Views
9
CrossRef citations to date
0
Altmetric
Research Article

Attribute topologies based similarity

& | (Reviewing Editor)
Article: 1242291 | Received 20 Apr 2016, Accepted 01 Sep 2016, Published online: 18 Oct 2016

Abstract

In this work, we generated more topologies based on similarity relation for an information system and we found lower and upper approximations. This paper discussed two approaches for determining accuracy with Yao’s method and Pawlak’s method of qualitative data. From both ideas, it is seen that due to the uncertainty and vagueness of qualitative data, we get many topologies on one or two attributes. We determined the accuracies by the new method; this method showed the difference between one or two attributes. This method is clarified by application.

Public Interest Statement

In this paper, we introduced a new venture to establish more topologies on a general information system. Such efforts prompt us to blissfully convey that these concepts are also applicable in other areas of advanced topology.

1. Introduction

Topology (Aisling & Brain, Citation1997; Sierpenski & Krieger, Citation1956) and its branches have become hot topics not only for almost all fields of mathematics, but also for many areas of science such as chemistry (Flapan, Citation2000; Kozae, Saleh, Elsafty, & Salama, Citation2015), and information systems (Pawlak, Citation1998). For a long time, general topologists faced many questions about the importance of abstract topological spaces. These questions were directed to them either from themselves or from others. The answers were always about the importance of general spaces in other branches of mathematics such as algebra and analysis.

In our life, we get a lot of data about a subject we are interested in; the important question is how to transform these data to knowledge that helps in decision-making. In the last decade of twentieth century, the revolution of information has become the focus of interest. Topology has a significant place in this age; the age of information. The basic problem in this age is how to transform data to knowledge using the available information.

The notion of rough sets was introduced by Pawlak (Citation1982, Citation1991, Citation1998). From the outset, rough set theory has been a methodology of database mining or knowledge discovery in relational databases. The rough set methodology is based on the premise that lowering the degree of precision in the data makes the data pattern more visible, whereas the central premise of the rough set philosophy is that the knowledge consists in the ability of classification. It is a formal theory derived from fundamental research on logical properties of information systems. Lin (Citation1988, Citation1998) studied to develop a measure theory or theories for neighborhood systems. Interestingly, we found that neighborhood systems are the most natural data structures for belief functions.

The purpose of this article is to use a generalized approximation space (U, R) based on a general binary relation using topological concepts, which is called topological approximation space “TAS.” We introduced in this paper, accuracies about one or two attributes. This work is very important because it was a new proposal that used the attributes. But the past work used objects. The success of rough set in data analysis directed the attention to topological methods to solve uncertainty problems.

2. Basic concepts

2.1. Topological space (Sierpenski & Krieger, Citation1956)

A topological space is a pair (U, τ) consisting of a set U and family of subset of U satisfying the following conditions:

τ contains Ø and U, τ is closed under arbitrary union, and τ is closed under finite intersection.

Definition 2.1.1

A topological space (U, τ) is a set U together with a topology τ on it. The elements of τ are called open subsets of U. A subset FU is closed if its complement U\F is open.

A subset N containing a point x ∊ G is called a neighborhood of x if there exists G open with xGN. Thus, an open neighborhood of x is simply an open subset containing x.

Definition 2.1.2

Let AU be a subset of a topological space, the interior of A is the largest open subset contained in AA={GU:FAandGis open}” and dually the closure of A is the smallest closed subset containing AA¯=FU:AFandFis closed.”

Evidently, A is the union of all open subsets of U which containing in A. Note that A is open if A=A, and Ab=A¯-A, Ab is called the boundary of a set A.

2.2. Approximation space (Lashin, Kozae, Abo Khadra, & Medhat, Citation2005; Lashin & Medhat, Citation2005)

Definition 2.2.1

Let U be a non-empty finite set of objects called the universe and R be an equivalence relation on U named as the indiscernibility relation. The pair (U, R) is called the approximation space.

Let X be a subset of U, the lower and upper approximations of X with respect to R is the set of all objects, which can be, for certain, classified as X with respect to R.

That is R̲X={xU:Rxx},R¯X={xU:RxXϕ}.

The boundary region of X with respect to R is the set of all objects, which are classified neither as X nor as not −X with respect to R and it is denoted by:BNRX=R¯X-R̲X

The set X is said to be rough with respect to R if R¯XX̲. That is, if BNR(X) ≠ ∅.

2.2.2. Accuracy of approximation

The accuracy of rough set approximation is defined as αR(X)=R̲XR¯X,0αR(X)1, where . denoted the cardinality of the set (the number of objects contained in the lower (upper) approximation of the set X).

3. Topological approximation space “Pawlak’s method” (Yao, Citation1998, Citation1999)

The condition of equivalence relation in the approximation space limits the range of applications. Yao introduced a method for generalization of approximation space depending on the right neighborhood as shown:

If U is a finite universe and R is a binary relation on U, then:

The class of right neighborhoods is (x)R = {y ∊ U: xRy}, and the lower and upper approximations for a subset XU according to (x)R are shown as follows, respectively:X̲=(x)RX(x)R,X¯=(X)RXϕ(x)R

Consider a binary relation as a general relation, and using the class of “after sets” (right neighborhood) and “for sets” which are formed by this relation R as a subbase for a topology τ on U

Table 1. Information system

.

Definition 3.1

If U is a finite universe and R is a binary relation on U, then we define:

(1) “After set” as follows: xR = {y: xRy}.

To construct the topology τ using “after set,” we consider the family SR = {xR: xU} as a subbase. And we write Sx = {GSR: xG}.

(2) “For set” as follows: Ry = {x: xRy}.

To construct the topology τ using “after set,” we consider the family RS = {Rx: x ∊ U} as a subbase. And we write xS = {G ∊ RS: x ∊ G}.

Definition 3.2

For each BA, the relation RBU × U defined xRBy=IBix-i(y)B<λ, where |.| is the cardinality of B and λ is represented as any number.

4. Yao’s method

Yao introduced a method for generalization of approximation space depending on the right neighborhood as shown.

If U is a finite universe and R is a binary relation on U, then:

The class of right neighborhood is (x)R = {y ∈ U: x R y}. For a topological space (X, t), a subset A of X, we define the accuracy of Yao as |A/A¯|.

Table 2. After similarity with four attributes

Table 3. Accuracies with Yao and Pawlak methods when λ ≤ 5

Table 4. Accuracies with Yao and Pawlak methods when λ ≤ 10

Example 1

Consider the information system containing the results of exams in four subjects performed for four students U = {x1, x2, x3, x4}, and C = {M, A, E, S} where M = Mathematics, A = Arabic, E = English and S = Science, as follows in Table .

For the four attributes, we get At B = {M, A, E, S}, |B| = 4, xRBy=ix-i(y)4<λ as given in Table .

When λ ≤ 5, we find the subset information system as follows:

xR1y = {(x1, x1), (x2, x2), (x3, x3), (x4, x4)}, then, x1R1 = {x1}, x2R1 = {x2}, x3R1 = {x3}, x4R1 = {x4}, (x)R1 = {{x1}, {x2}, {x3}, {x4}} is a class of right neighborhoods as in Yao’s method.

Then, SR1 = {{x1}, {x2}, {x3}, {x4}} is a subbase of τ1 as in Pawlak’s method.

In Pawlak’s method “Topological Approximation Space,” we get:

BR1 = {Ø, {x1}, {x2}, {x3}, {x4}},

τ1 = {Ø, X, {x1}, {x2}, {x3}, {x4}, {x1, x2}, {x1, x3}, {x1, x4}, {x2, x3}, {x2, x4}, {x3, x4}, {x1, x2, x3}, {x1, x2, x4}, {x2, x3, x4}, {x1, x3, x4} } = τ¯1.

We find lower and upper approximation, closure and interior for all subset of U. The accuracy using Yao’s method and Pawlak’s method are shown in Table .

Note: From Table , we found that the accuracy of Yao is the same as the accuracy of Pawlak when τ=τ¯ (Cl-openmeans all open sets are closed).

When λ ≤ 10, we find the subset information system as follows:

xR2y = {(x1, x1), (x1, x4), (x2, x2), (x2, x3), (x3, x2), (x3, x3), (x3, x4), (x4, x1), (x4, x3),(x4, x4)}, then, x1R2 = {x1, x4}, x2R2 = {x2, x3}, x3R2 = {x2, x3, x4}, x4R2 = {x1, x4}, (x)R2 = {{x1, x4}, {x2, x3}, {x2, x3, x4}} is a class of right neighborhoods as in Yao’s method.

Then, SR2 = {{x1, x4}, {x2, x3}, {x2, x3, x4}} is a subbase of τ2 as in TAS’s method.

In Pawlak’s method “Topological Approximation Space,” we get:

BR2 = {Ø, {x4}, {x1, x4},{x2, x3}, {x2, x3, x4}},

τ2 = {Ø, X, {x4}, {x1, x4}, {x2, x3}, {x2, x3, x4}}, τ¯2 = {X, Ø, {x1, x2, x3},{x2, x3}, {x1, x4}, {x1}}.

The accuracies of Yao’s method and Pawlak’s method are shown in Table .

Note: From Table , we found the accuracy of Yao greater than the accuracy of Pawlak when ττ¯. Similarly, for the one attribute, two attributes, and three attributes.

5. Application

On the basis of the data of the securities business of market, the application can be described as follows; U = {x1, x2, …, x10} denotes 10 listed companies, C = {c1, c2, …, c8} = {increase percent of EPS, increase percent of net asset value per-share, net asset earning rate, increase percent of net asset earning rate, increase percent of business income, increase percent of profit, increase percent of net margin, increase percent of interests of the stockholders}, and D = {d} = {decision of investment}, as follows in Table .

Table 5. Business statement

The result of discretion of Table using the C-means clustering is in Table .

Table 6. Discretion of Table

We get Table after the removal of symmetry of rows and columns. we get U = {X1, X2, …, X8} denotes eight listed companies, and the attributes are C = {C1, C2, …, C6}

Table 7. Discretion of Table

when C1 is removed, we get the objects X4 and X3 are equals, when C3 is removed, we get the objects X5 and X8 are equals, and also, when C4 is removed, we get the objects X4 and X6 are equals.

We notice that, IND (C) ≠ IND (C–{C1}), IND (C) ≠ IND (C–{C3}), and IND (C) ≠ IND (C–{C4}). Then C1, C3, and C4 are indispensable.

Otherwise, when C2, C5, and C6, are removed, we get IND (C) = IND (C–{C2}), IND (C) = IND (C–{C5}), and IND (C) = IND (C–{C6}).

Then C2, C5, and C6 are superfluous, as follows in Table .

Table 8. Removing attributes

Then, the cores are {C1, C3, C4}, and then C2, C5, and C6 are superfluous.

We discuss the result from Table followed by Table .

Table 9. Discuss attributes C2, C5, and C6

After the classification of Table , we get the final Table .

Table 10. Classification attributes C2, C5, and C6

Let d1={X1, X5, X8}, d2={X2}, and d3={X3, X4, X6, X7}, we find the lower and upper approximations.

Lower-1={X5}, Upper-1 = {X1, X3, X4, X5, X6, X7, X8}, then the accuracy (μ1) = 1/7.

Lower-2={X2}, Upper-2 = {X2}, then the accuracy (μ2) = 1.

Lower-3 = {∅}, Upper-1 = {X1, X3, X4, X6, X7, X8}, then the accuracy (μ3) = 0.

Now, we illustrate the data of Table to get the accuracy models by different rules.

For each BC the relation RB⊆ U ×  U is defined as xRBy=IBi(x)-i(y)Bλ, where B is the cardinality of B and λ is represented by any number.

Let B = {C2}, B=1, xRBy(i(x)-i(y))/1λ, we get Table .

Table 11. Similarity about attribute C2

5.1. Discussion

By choosing λ ≥ 1, we find the subset information system as follows:

xR1y = {(t1, t2), (t1, t4), (t2, t1), (t2, t3), (t2, t4), (t3, t2), (t3, t4), (t4, t1), (t4, t2), (t4, t3)}, then t1R1 = {t2, t4}, t2R1 = {t1, t3, t4}, t3R1 = {t2, t4}, t4R1 = {t1, t2, t3}.

(x)R1 = {{t2, t4}, {t1, t3, t4}, {t1, t2, t3}} is a class of For set as in Yao’s method.

Then, SR1 = {{t2, t4}, {t1, t3, t4}, {t1, t2, t3}} is a subbase of τ1 as in TAS method.

we get: BR1 = {Ø, {t2, t4}, {t1, t3, t4}, {t1, t2, t3}, {t2}, {t4}, {t1, t3}}is a base of τ1,

τ1 = {Ø, T, {t2, t4}, {t1, t3, t4}, {t1, t2, t3}, {t2}, {t4}, {t1, t3}} = τ¯1, τ¯1 is a complement of τ1. We find lower and upper approximation for all subset of U (24=16subset).

Using the definitions of Yao and Pawlak, we get the accuracies as follows Table .

Table 12. Accuracies with Yao and Pawlak methods when λ ≥ 1

Results-1: The accuracies

Let B = {C5}, B = 1, xRBy(i(x)-i(y))/1λ, we get Table .

Table 13. Similarity about attribute C5

5.2. Discussion

We choose λ ≥ 1, we find the subset information system as follows:

xR2y = {(t1, t2), (t1, t3), (t1, t4), (t2, t1), (t2, t3), (t2, t4), (t3, t1), (t3, t2), (t4, t1), (t4, t2)}, then t1R2 = {t2, t4}, t2R2 = {t1, t3, t4}, t3R2 = {t1, t2}, t4R2 = {t1, t2}.

(x)R2 = {{t1, t2}, {t1, t3, t4}, {t2, t3, t4}} is a class of For set (left neighborhoods).

Then, SR2 = {{t1, t2}, {t1, t3, t4}, {t2, t3, t4}} is a subbase of τ2 as in TAS method.

we get: BR2 = {Ø, {t1, t2}, {t1, t3, t4}, {t2, t3, t4}, {t1}, {t2}, {t3, t4}}is a base of τ2,

τ2 = {Ø, T, {t1, t2}, {t1, t3, t4}, {t2, t3, t4}, {t1}, {t2}, {t3, t4}} = τ¯2.

Using definitions of Yao and Pawlak, we find the accuracies as follows Table .

Table 14. Accuracies with Yao and Pawlak methods with C5 and λ ≥ 1

Results-2: The accuracies

Let B = {C2, C5}, B = 2, xRBy(i(x)-i(y))/2λ, we get Table .

Table 15. Similarity about attribute C2 and C5

5.3. Discussion

We choose λ ≥ 1, we find the subset information system as follows:

xR3y = {(t1, t2), (t1, t4), (t2, t1), (t2, t3), (t2, t4), (t3, t2), (t3, t4), (t4, t1), (t4, t2), (t4, t3)}, then t1R3 = {t2, t4}, t2R3={t1, t3, t4}, t3R3 = {t2, t4}, t4R3 = {t1, t2, t3}.

(x)R3 = {{t2, t4}, {t1, t3, t4}, {t1, t2, t3}} is a class of For set (left neighborhoods). Then, SR3 = {{t2, t4}, {t1, t3, t4}, {t1, t2, t3}} is a subbase of τ3 as in TAS method.

we get: BR3 = {Ø, {t2, t4}, {t1, t3, t4}, {t1, t2, t3}, {t2}, {t4}, {t1, t3}}is a base of τ3,

τ3 = {Ø, T, {t2, t4}, {t1, t3, t4}, {t1, t2, t3}, {t2}, {t4}, {t1, t3}} = τ¯3.

We get the accuracies as follows Table .

Table 16. Accuracies with Yao and Pawlak methods with C2, C5 and λ ≥ 1

Results-3: The accuracies

6. Conclusion

We found in the above tables the accuracy of Yao and the accuracy of Pawlak. This work is a new of topology where we get the general topology from the information system after reducing the superfluous data. We got the relationship between the accuracies of Yao and Pawlak. We found that the accuracy of Yao is same as the accuracy of Pawlak, when the topology elements are open and closed (Cl-open), but, in general, the accuracy of Yao is greater than the accuracy of Pawlak. We found a lot of accuracy with one attribute and two attributes; the accuracy of two attributes is finer than the accuracy of one attribute. Topological methods are very important and interesting to solve uncertain problems. The results of the rough set approach are presented in the form of classification or decision rules derived from a set of previous application. This study provides a new insight into problem of attribute reduction. It suggests that more semantics properties preserved by an attribute reduce should be carefully examined, so that we give a very chance of the maker to choose a suitable for him.

Additional information

Funding

Funding. The authors received no direct funding for this research.

Notes on contributors

M.A. Elsafty

M.A. Elsafty is an associate professor. The author got PhD in 2011. The author’s area of interest is topology, in particular rough set. Our research reports a new model that finds better accuracies, which competing with that of Pawlak and Yao. Rough set is a key branch of topology contributing to problem-solving in different sciences such as engineering and chemistry, in terms of appearing the important attributes (cores) and eliminating unuseful attributes (superfluous) of these disciplines.

References

  • Aisling, Mc., Brain, Mc. (1997). Topology course lecture notes. Retrieved Topology Atlas Home Page from http://at.yorku.ca
  • Flapan, E. (2000). When topology meets chemistry. Cambridge: Cambridge University Press.10.1017/CBO9780511626272
  • Kozae, A. M., Saleh, S. A., Elsafty, M. A., & Salama, M. M. (2015). Entropy measures for topological approximations of uncertain concepts. Jokull Journal, 65, 192–206.
  • Lashin, E. F., & Medhat, T. (2005). Topological reduction of information systems. Chaos, Solitons and Fractals, 25, 277–286.10.1016/j.chaos.2004.09.107
  • Lashin, E. F., Kozae, A. M., Abo Khadra, A. A., & Medhat, T. (2005). Rough set theory for topological spaces. International Journal of Approximate Reasoning, 40, 35–43.10.1016/j.ijar.2004.11.007
  • Lin, T. Y. (1988). Neighborhood systems and approximation in relational databases and knowledge bases. Proceedings of the 4th International Symposium on Methodologies of Intelligent Systems (pp. 75–86). California State University, Northridge, CA
  • Lin, T.Y. (1998). Granular computing on binary relations I: Data mining and neighborhood systems, II: Rough set representations and belief functions. In Polkowski, L. & Skowron, A. (Eds.), Rough sets in knowledge, Discovery 1 (pp. 107–140). Heidelberg: Physica-Verlag.
  • Pawlak, Z. (1982). Rough sets. International Journal of Computer & Information Sciences, 11, 341–356.10.1007/BF01001956
  • Pawlak, Z. (1991). “Rough sets” theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic.
  • Pawlak, Z. (1998). Granularity of knowledge, indiscernibility and rough sets. Proceedings of 1998 IEEE International Conference on Fuzzy Systems (106–110). IEEE, Piscataway, NJ.
  • Sierpenski, W., & Krieger, C. (1956). General Topology. University of Toronto Press.
  • Yao, Y. Y. (1998). Relational interpretations of neighborhood operators and rough set approximation operators. Information Sciences, 111, 239–259.10.1016/S0020-0255(98)10006-3
  • Yao, Y. Y. (1999, May 9–12). Rough sets, neighborhood systems, and granular computing. Proceedings of the 1999 lEEE Canadian Conference on Electrical and Computer Engineering Shaw Conference Center, Edmonton.