TECHNICAL The security issues and concurrency control in

TECHNICAL PAPER WRITING ON
DISTRIBUTED DATABASE

 

Teng Mee Ling, Haezel Ann Dicken, Marry Teo, Lee Jia Hwee, Tay Xin Hui, Dr Shahreen binti Kasim

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Faculty of Computer Science and Information Technology,

University Tun Hussein Onn Malaysia, Johor, Malaysia

[email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

I.                
INTRODUCTION

 

A distributed
database is defined as a database which can be stored on different computers.
Today, in the age of advances in Information Technology, there is important for
people to gain access to the updated information on hand. Users have the
opportunity to gain access to the information at anywhere and anytime in the
network by using the distributed database. The security issues and concurrency
control in distributed database are discussed in paper 1. Based on the study
2, the growth of distributed database depends on the rapid development in
computer network technology. Distributed Database Management System (DDBMS)
integrated the distributed data over different database at different locations.
The security threats might happen in centralized system or distributed system.
Operating system, network system, different security policies of system are the
factors of security threat of distributed database. Security requirements are a
set of laws stated to protect the security objects which receive and save information
and security subject whose functions as database state updating tools. Some
security threats and security approaches will be discussed in this paper. According
to the reviewed papers 3, a distributed system can be defined as the data or
control of data that is stored on multiple computers or in separate location.
It is more exposed to security threats than centralized database management
system (CDBMS). Object-oriented database model is the replica of distributed
database. It increases the data control level in database system. Relational
database is more mature or complete than object-oriented model in terms of the
development of security standards and procedures. It can be viewed from the
aspects of standards compatibility as object-oriented model still do not
acquire a compatible standard. While for relational databases, the
compatibility standards are well-defined. This paper will mainly discuss the
security concerns of databases and distributed databases, security problems
found in object-oriented and relational database model, security problems in
both models and security comparison of each model. The paper 4 reviews the
security concern of databases and distributed databases in particular. The
security problems found in both distributed and centralized models will be
examined. Moreover, the security problems unique to each system will be
evaluated and finally, the comparison is done relative merits of each model
with respect to security. Based on paper 5, association rule learning is a technique
to discover the relations between variables in the huge database. It used to
form horizontally distributed databases located in centralized database. Association
rule is found based on low speed of data retrieval that caused by centralized
database server served for all side of database simultaneously. Thus,
“association rule helps to group the data into small fragments and stored at
distinct site of computer”. It also brings benefit of improving the availability
of the database to 24/7h and maintains the normalization of the database
simultaneously 5. The most usefulness of the technique is to speed up the
data retrieval and reduce timing required to load into database. Mining
association rule is an advanced technique to measure the rule of interestingness.
Algorithm will be used for the technique is Apriori algorithm which complies
with the principle of all of the subset of a frequent item set in database must
be frequent and could use if-then statement to calculate the frequent item set
with corresponding subsets. Association rule learning is an alternative
protocol improve simplicity, efficiency and privacy of the subset at which
enhances security of computation of the subset 6.

 

II.             
THEORY

 

Based
on paper 1, the communication and data processing have been improved by using
the distributed database system. This is because the data on distributed
database is spread through different computer network sites. Not only for
increasing the speed of data access, but it also provides a local control of
data for users and a single-point of failure is much less probably to arise.
Distributed database is a database which is spread across multiple computers
that are connected via the data communication links. The advantage of
distributed database is data is distributed, so that the network traffic can be
reduced. Moreover, if the network of the company is temporarily broken, the local
database does not affect and it will remain the works. Due to the distributed
database is stored in multiple computers, so that the work of one branch will
not be affected when there is problems exist in other branches. However, to
ensure that the information and indexes are not altered will become more
difficult. Besides that, it is not well-organized when there are heavy
interactions occur between sites.

According to the paper 2, protecting
database is the purpose of developing database security. However, there are
threats that should be focused on. Database manager should identity the threats
and limit the user privilege for accessing the database. Checking authority of
all the user is a must. As the breakdown of hardware, applications or network
system, it might cause the threats on loss availability of database. Every user
that can access to the database should be assigned for required privilege only
according to the position 7, 8. Acquiring extra privilege also a threat to
database security that user can access as administrator and stunt to the
weaknesses of the system. Denial of service that can corrupt the data and cause
flood to the network is needed to pay attention to it. Inference and Identity
theft are the common threats happen frequently.

Paper 3 shown that there are a few
requirements which must be satisfied for a secure database, which includes substantial
integrity, logical integrity, obtainable when needed, contain review system,
fundamental integrity, controlled access to data, system validation and
implication protection on sensitive data. The purpose of defining these
requirements is to ensure data stored in DBMS is protected from illegal action.
The illegal action can be achieved by using access controls, concurrency
controls, two phase commit procedure and inference reduction strategies. The
access permission can be determined by three standards which are data
availability, access adequacy and authenticity assurance. Concurrency control
plays an important role to protect the integrity of data. However, there occur
some concurrency problems which are lost update, unsynchronized transactions and
unrepeatable data. The problems can be solved by implementing the locking or
timestamp method. In DBMS multilevel access system, users are restricted from
acquiring a complete data access. This is probably due to secrecy requirements
or loyalty to the principal of least privilege. Complete data access is
forbidden in case user has special access privileges. A secure multilevel
database design can protect user to make inferences. The inference protection
strategies include data suppression, logging every move users make, and
perturbation of data.

Based on paper 4, distributed database
is a logical union of all the sites where the users cannot see the
distribution. This is due to a DDBS is preferred over a non-distributed or
centralized database system for various reasons. So, the whole database is
local except for the possible communication delay between the sites. Concurrency
control (CC) is one of the issues of database system. It authorizes user to
access a distributed database in a multi-programmed fashion which preserving
the illusion that each user is executing alone on a dedicated system. Next,
“Coordinating 9, concurrent accesses to a database in a multi user database
management system (DDBMS). There are numbers of algorithms that provide
Concurrency control 10, such as two-phase locking, Time stamping,
Multi-version timestamp, and Optimistic non-locking mechanism. Some methods are
less excellence in delivering better concurrency control than other model,
varying on the systems. One of the problems of distributed query processing is
to decide on an approach for executing each query over the network in the most
cost-effective way. Two methods of query optimization are response time and
throughput. Response time is the time taken by a system to respond to a query,
throughput is the average number of transactions successfully passing through
the system.

Paper
5 described that horizontally distributed database could use association rule
to subdivide the data into four modules. Huge data will be subdividing into
user module, administrator module, association rule and Apriori Algorithm. In
User module, there have two setting to be considered, that are data owner and
data miner could not share same data and several parties share same data. First
setting applies data perturbation to hide and protect data from being snatch by
data miner. Second setting requires data mining to have protection to data from
other parties. Administrator module is a module for admin to view user details
based on user processing details. The third module is association rules that
applied to horizontally distributed database to identify relations between data
based on if/then statement. Last module, Apriori algorithm make use for finding
association between data fragments especially databases that containing
transaction.

 

III.           
LITERATURE REVIEW

 

Based
on the extraction of journal in 1, fragmentation, replication and data
allocation are presented as the design of distributed database. According to
the research of Shin and Irani 11, fragmentation is defined as a design
method which is used to divide the relation into two or more partitions.
Parallelism is one of the advantages of fragmentation. The degree of
concurrency and parallelism is increased due to the transaction can be divided
into several subqueries by using the fragmentation. However, the overall
performance and integrity control will become slow and difficult to control due
to the data are stored at different sites. Fragmentation is divided into three
types which are horizontal, vertical and hybrid fragmentation. Corolel and
Morris described that data replication is refered to the storage of data copies
at different locations and different sites that served by a computer network 12.
Maintain the stability of data is becoming the main problem in managing the
replicated data. There are several advantages of replication which included
improved response time, reduced the network traffic and also increased the
reliability and availability. The process of deciding where to locate the data
is known as the data allocation and the algorithm is considered into several
factors such as performance and data availability goals 13. Data allocation
strategies are classified into centralized data allocation, partitioned data
allocation and replicated data allocation.

Based on the paper 2, database
security is used to prevent a database is used under unwitting condition. There
are a few types of information security 14, 15. Firstly, there is an
authority process checking of a user who uses or access to the database system
is called access control. The second type is auditing which is used to analyze
the security violation and collect the analysis of database from modification
of attackers. The third types of security is authentication which is a
confirmation process for identifying and authenticating the user to use the
database system. The fourth types of security is encryption. Encryption is a
transformation of plaintext to cipher text that is needed to be encrypted or
decrypted using specific algorithms.

Following the mentioned phrase from
paper 3 There are a few references that have been reviewed and used in this
article. Basically, the security features are discussed based on Distributed
Database Management System Model. There are two models available which are
object-oriented and relational data model. Paper 3 studied on several factors
which include single and multilevel access controls (MAC), protection and
integrity maintenance. The security of a database is not only determined based
on the security features, but it also query the effectiveness and efficiency of
the delivery of these characteristics.

Based on journal 4, authorization is a
method to supply one secured access point enabling the users to link up to the
network once and allow them access to authorized resources. Encryption, the
technique of encoding data that only authorized users can understand it. A
number of industry standard encryption algorithms are useful for the encryption
and decryption of data on the server, some most popular algorithms are RSA,
DES, PGP. Authentication, usually authentication is realized by password. A
user must provide the correct password when establishing a connection to
prevent unauthorized use of the database. Multi-level access control is when
user is limited from having complete data access. Policies restricting user
access to certain data parts may result from secrecy requirement or, they may
result from loyalty to the principal of least privileged (a user only has
access to relevant information). Based on distributed security in 16,
database system emphasizes more on the security of multi-level, proposing
exclusively approaches based on distributed data and centralized control
architectures.

               Extracted from journal 5, Sayad
Shujaubuddin Sameer stated that the paper discussing about the security applied
on sensitive data using association rule is very important in aspect data
mining and other learning techniques. In the future, privacy will emphasize on
data mining due to productive of development of data mining.  Rakesh Agrawal and Ramkrishnan Shrikant
specified that fast algorithm could be applied on distribution database for
mining association rules purpose on various computer within a network 17.
Bar-code technology or known as basket data able to collect and store huge data
by retail organizations through the rules. M.saraaswati and N. Kowsalya
affirmed on privacy preserving and data secure mining of association rule in
distributed rule 5. Apriori algorithm already applied on most of parallel and
distributed ARM algorithm but directly apply Apriori algorithm won’t obviously
improve the performance of distributed ARM.

 

IV.            
RESULT

 

According to paper 1, it
explained a lot of details about the concurrency control and security in
distributed database. Concurrency control in distributed database is defined as
the action of processing the concurrent access to the database. Distributed
two-phase locking (2PL) is the most familiar distributed concurrency control
system. “read any, write all” is the main approach of 2PL protocol and it is
used as the basic concurrency control protocol 18. Each transaction in 2PL
has executed in two phase which is growing phase and shrinking phase. Growing
phase is for obtains locks in transaction, while shrinking phase is for
releases locks. Lock managers in 2PL are spread to all sites and each of them
is responsible to lock the data at that site. Distributed Optimistic protocol
is another protocol for concurrency control. It is operated by exchange the
certification information. Security is important in distributed database. It is
used to prevent the information and data modified or misused by other people.
In this paper, there are four security components is presented which is
security authentication, authorization, encryption and also access control.
Moreover, deadlock is clarified as the major problem that occurs in distributed
system. In this research, 2PL algorithm with Timestamps mechanism is found that
it is effective enough for concurrency control in distributed database.

Based on the survey 2, there are five
methods to maintain security and privacy of the DDBMS which has been connected
through computer network. Firstly, access control based security is categorized
as Discretionary control (DAC) models, Mandatory access control (MAC) models
and Role-based access control (RBAC) models. DAC is used to store the access
control matrix. MAC is used to disclose the unauthorized of database and
provide protection against illegal modification. RBAC gives support on
arbitrary and specific organization security policies. Next, trust based
security approaches which focus on the security of utility 19. The purpose of
developing trust environment is for identity checking of the users. Based on
symmetric key cryptography, kerbores is used as a lightweight protocol 19, 20.  There is an agent based approach that monitors
the user action via neural network. Node registry and Service Level Agreements
should be implemented to enhance the trust in DDBMS 21. The third method is
authentication based security. Two factors authentication should be split into
three factors to increase the security 22. The fourth method is cryptography
based approaches which is encrypted the plaintext to cipher text with the help
of cryptographic algorithm. DNA based cryptography and DNA Cryptography
maintain the data confidentiality and integrity but they need technology in
advance in order to reach the mature stage. There are some other security
approaches. The Role Ordering (RO) and CORBA based authentication security
model have been developed to protect and enhance the security of DDBMS.

According
to the paper 3, the comparison between relational database and
object-oriented database are summarized and shown as table below:

 

Table
1: Comparison between Relational Database and Object-Oriented Database

 

Comparison

Relational
database
(RDBMS)

Object-oriented
database
(OODBMS)

Concepts

Extended
relational model with object-oriented concepts

Pure
object-oriented concepts

Data
stored

Data is stored
in the form of tables which contains rows and columns. Every column in the
table has its specific name and every row of the table has its own primary
key.

Data is stored
along with its actions that processes or reads the existing data. The data is
stored in the form of objects in object-oriented database.

Access
controls

Access form is
based on view. SQL VIEW command is created based on view which is a logical
table.

Access is
controlled by classifying elements of the database. The basic element of this
classification is the object.

Integrity

A relational
DBMS takes a more global approach 23.

An
object-oriented database executes constraint checking methods on the affected
objects to maintain integrity before and after an update.

 

Table
1: Comparison between Relational Database and Object-Oriented Database
(continuous)

 

Encapsulation

Absence

Present. Data
is encapsulated in the object. Object level starts the control for access,
modification, and integrity.

Complexity
association

Simpler than
object-oriented database

More complex
than relational database

Control
of system access and multilevel access

Less difficult
due to the role of client and server is maintained

More difficult.
Because the roles of client and server are not well defined.

 

Based on paper 4, data mining causes
critical security problems. A user can create various queries and infer a
sensitive hypothesis that is the retrieval problem occurs through data mining
if the user has the ability to apply data mining tools. There are various ways
to handle this problem. Given a database and a particular data mining tool, one
can apply the tool to see if sensitive information can be deduced from
legitimately obtained unclassified information. If so then there is a retrieve
problem. There are some issues with this approach; one is that we are applying
only one tool. In reality, the user may have several tools available to him or,
to her. Furthermore, it is impossible to cover all of the ways that the
retrieval problem could occur. Another, solution to the retrieval problem is to
build a retrieval controller that can detect the motives of the user and
prevent the retrieval problem from occurring. Data-mining tool, data source and
database in feasibly managed by a DBMS where it is positioned between a
retrieval controller. Data mining system is being extended to function in a
distributed environment. This system is called distributed data mining system
& has received very little attention. Distributed object management system,
collaborative computing system, and the network are examples of technologies
that have progressed greatly from distributed database. System of collaborative
computing is secured normally due to much work on securing distributed
database. The work portrayed in 24 is the purpose of the security of a
management groups specifically interests in a security group, which in relation
to the security of distributed object systems.

According to paper 5, association rule
should be able to improve the performance of data distribution in the aspect of
computation time. Testing should be conducted with constant of size of data
with dynamic elements within each data. The output should have constant
changing of estimated time inversely proportional to element size.

V.              
CONCLUSION

 

According
to paper 1, it is presented about the design, concurrency control
and security of distributed database. Security is one of the most important
things in distributed database as it is required to ensure that the information
and data are operating in a secure environment and integrity. Nowadays,
distributed database is becoming famous in computer science. Hence, we need to
understand it and try to find out the solutions to improve the weakness of the
distributed database.

Based on the survey 2, database
security is a critical problem of total system. In this study, there are a few possible
threats to database security and different techniques that are conducive for the
future generation in order to increase the security of DDBMS have been
discussed. The purpose of this study is to analyze different security. Due to
the rapid growth of the technology, database security is still an important
field to be examined.

According to the paper 3, database
security is a concerning issue in today’s technological world. This article
studies about database security issues and how the database model affects
database system security. Object-oriented database and relational database
exhibit different security protection. It is shown that RDBMS is the better
choice for a distributed application as relational model is more mature and the
existing standards are globally accepted.

Based on paper 4, day by day distributed database systems are getting widely held. Many
organizations are now preferring distributed database system. This paper
reviews the Concurrency Control on DDB, main security components of DDB, the
result outcome which are some security issues including multi-level security in
distributed database system. Yet, there is much room for further research and
experimentation on these issues.

According to paper 5, association rules applied in horizontally distributed databases can
improve the privacy and efficiency compared to current leading protocol.
Association of algorithm with association rule is a high secure multi-party
protocol that could use to computing the subset of data that involve various
parties. The module also tests for performance of data held by different
person. Problems with data performance at different side just can be found when
the players involved is more than two.