Q. What do you mean by data an

Q. What do you mean by data and its management? How data are organized?

What is data?
* Data generally refers to fact and figures. It may be the name, object, situation or anything that one is interested to know.
* Data is a raw material to computer for producing information.
* Data when processed to present in a meaningful way, we have information.
* Data are simply values or set of values.
* Data and information are used interchangeably, as information at one level can be used as data at another level.

Data is vital to human being. Every moment one is eager to know something. An organization survives on data. It has to record data on stock at hand, demand for the product, marketing strategy, personnel information and others for effective decision making. Database and database systems have become an essential component of everyday life in modern society.

Data item refers to a single unit of values called elementary items.

For eg. rollno, customernumber,pincode etc.

Data items that are divided into sub-items are called group items.
For eg. 98-C, Main Road,GandhiNagar,Berhampur

It is difficult to find the persons who belong to GandhiNagar from group items. It needs to be divided into elementary items as follows

Room No. Location Street town 98-C Main Road GandhiNagar Berhampur
Data must be managed. Data management consists of the followings

Data capture: Collecting data as and when they originate.
Data classification: Classification of captured data based on the nature and intended usage.
Data storage: the classified data has to be store permanently so as to access it whenever required.
Data arranging : Presenting the stored data in a particular order based on users requirement.
Data retrieval : retrieval of data for further processing / decision making.
Data maintenance :keeping data up-to-date, this involves modify, insertion or deletion of data from storage.
Data verification: process of checking for valid and accurate data before storing.
Data coding: representing data in another way for easy reference. Coding helps in reducing amount of storage.
Data editing: modifying the existing data.
Data transcription: activity where the data is converted from one form into another.
Data transmission: forwarding data and information from one place to another place for further processing via network.

Organization of data

* Collection of data may be organized in hierarchy of fields, records and files.
* Data are organized into the hierarchy of fields, records and files reflects the relationship between attributes, entities and entity set.
* Entity is something that has certain attributes or properties.
* Each attribute may be assigned a value, and the value may be numeric or non-numeric.
* Entity set is a group of entities with similar attributes.
* Field is a single elementary unit of information representing an attribute of an entity.
* A record is the collection of filed values of a given entity.
* A file is the collection of records of the entities in a given entity set.

Q. What is Database? Write its characteristics.

Generally, database is a collection of related data. For effective decision making one must record and maintain data of various facts of its interest. For example an organization records data of its products manufactured, marketed and about customers, college on course, student, library and hospital on patient and doctors.

Properties of Database:

A database has the following properties
* A database represents some aspects of real world.
* A database is collection of logically related data.
* A database is build and maintained for a specific purpose.

Q. What is DBMS? Write its advantage and disadvantages.
What is DBMS?

A database management system is a software system that allows user to create and maintain a database. The objective of DBMS is to facilitate the process of defining, storing and retrieving the information contained in database among various users and applications.

Other important function of DBMS is to exert centralized control of the database including protection against both hardware/software mal-function and secure these from unauthorized users.

Advantages and disadvantages of database Approach

The advantages of DBMS over traditional (maintaining manually) database is as follows:

Redundancy can be reduced: In a non database system, every user group maintains its own files for handling its data processing applications.

Centralized control of data avoids unnecessary duplication of data and effectively reduces the total amount of data storage required.

Data can be shared: The centralized database allows sharing of data among number of application program and users.

Besides, it also allows new applications to operate against the same data.
Inconsistency can be avoided: Duplication of data may leads to inconsistency, as the two entries will not agree that is, when one of the two entries has been updated the other may not. At such times the database is said to be inconsistent.

Inconsistency can be reduced, if the redundancy is removed. Alternatively, if the redundancy is not removed but is controlled, then the DBMS can guarantee that the database is never inconsistent; by ensuring that any changes made to either of the two entries is automatically applied to the other one as well. This process is known as propagating updates.
Transaction support can be provided:
A transaction is a logical unit of work, typically involving several database operations. The transaction atomicity feature guarantees that if two or more operations are part of the same transaction, then the system can effectively operate either both of them or neither, even if in any abnormal situations e.g. the system fails.

Integrity can be maintained: It implies data contained in the database is both accurate and consistent. DBA implements integrity constraint to check to ensure that they fall within a specified range and are of correct format.

Security can be enforced: Data is vital to an organization and it is confidential. Such confidential data must not be accessed by unauthorized user. The DBA has the responsibility to ensure that proper access procedures are followed including proper authentication schemes for access to the DBMS and additional checks before permitting access to sensitive data.

Standards can be enforced: With Centralized database, the DBA can ensure that all applicable standards are observed in the representation of the data. Applicable standards include any or all of the followings: departmental, installation, corporate, industry, national and international. It is desirable as an aid to data interchange, or movement of data between systems.

Conflict resolution: Since database is under the control of DBA, she/he has to choose the best file structure and access method to get optimal performance for the response-critical applications, while permitting less critical applications to continue to use database.


Problems associated with centralization: Centralization means the data is accessible from a single source. This increase the potential severity of security breaches and disruption of the operation of the organization because of downtime and failure.

Cost of software/hardware and training: This includes the cost of purchasing or developing software, upgradation cost of hardware to allow for extensive programs and others.

Complexity of backup, recovery and security: The processing overhead introduced by DBMS to implement security, integrity, and sharing of the data causes a degradation of the response and throughput time. The backup and recovery operations are fairly complex in a DBMS environment.

Q. What is Database System?
A database system is a collection of four major components. These are
1. Data : facts and figures on which database is maintained.
2. Hardware :Consists of processor, memory, secondary storage device,
I/O device etc.
3. Software : a software system that work between user and physical database,
commonly known as DBMS.
4. User.: end users, application programmers and database administrator.

Q. Write a short note on various users of DBMS.

There are number of users of database. They can be classified into following groups, depending on their degree of expertise or mode of interaction with DBMS.

1. Database Designers: responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These tasks are mostly undertaken before the database is actually implemented and populated with data. It is the responsibility of database designers to communicate with all prospective database users in order to understand their requirements, and to come up with design that meets these requirements.

2. Database Administrator: When database is used by many users; there must be a centralized control of the database to administrate the resources. It is exerted by a person or group of persons under the supervision of a high-level administrator. This person or group is referred to as the DBA. They are the users who are most familiar with the database and are responsible for creating, modifying, and maintaining its three levels.

3. Application programmer and system analysts:
The system analysts determine the requirement of the end users, and develop specifications for canned transactions that meet these requirements. Application programmer implements these specifications as programs.
4. End users: peoples whose jobs require access to the database for querying, updating, and generating reports; the database primarily exists for their use. There are several categories of end users.
a. Casual end user: occasionally access to the database, but they may need different information each time.
b. Naïve end user: they form a sizable portion of database end users. Their man job function revolves around constantly querying and updating the database, using standard types of queries and updates -called canned transactions- that have been carefully programmed and tested.
c. Sophisticated end user: include engineers, scientists, business analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS so as to implement their applications to meet their requirement.
d. Stand-alone end user: maintain personal databases by using ready-made program packages that provide easy-to-use menu-based interfaces.
Q. Briefly define the role / task of Database Administrator

A. DBA is a person or group of persons who has centralized control over database. DBA is the custodian of the data and controls the database structure. The DBA is responsible for authorizing access to the database, for coordinating and monitoring its use, and for acquiring software and hardware resources as needed.

Working on three levels architecture: DBA administers the three levels of the database and, in consultation with the overall user community, sets up the definition of the conceptual level of the database. The DBA further specifies the external view of the various users and applications and is responsible for the definition and implementation of the internal level. Mapping between the internal and conceptual level, as well as between conceptual and external levels, are also defined by the DBA.

Defining security and integrity constraints: The DBA is accountable for problems such as breach of security or poor system response time. DBA is also responsible for granting permission to the users of the database and stores the profile of each user in the database. Besides, he/she must implement procedures for integrity constraints.
Defining dump and reload policies: DBA is also responsible for defining procedures to recover the database from failures due to human, natural or hardware causes with minimal loss of data.

Monitoring performance and responding to changing requirements: DBA is responsible for organizing the system in such a way as to get the best performance.

Q. What is the three-level architecture of DBMS? How it has to be mapped?
According to different categories of users, and the abstraction at each level ANSI/SPART developed three-level architecture, commonly known as ANSI/SPARC1 architecture. The goal of this architecture is to separate user applications and the physical database. In this architecture, schemas can be defined at the following three levels.

1. Internal level has an internal schema, which describes the physical storage structure of the database. It is a low-level representation of entire database.
the internal view is described by means of the internal schemas, which not only defines the various stored record types but also specifies what indexes exist, how stored fields are represented, what physical sequence the stored records are in , and so on.
2. Conceptual/Global level has a conceptual schema, which describes the structure of the whole database for a community of users. It hides the details of physical storage structures and concentrates on describing entries, data type, relationship, user operations and constraints. It is representation of the entire information content of the database, in a form that is somewhat abstract in comparison with the way in which the data is physically stored.

3. External /user view/view level includes number of external schema. The external level is individual user level. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of database from that user group. An external view is thus the contents of database as seen by some particular user.

Mapping between views
In addition to the three levels, the architecture involves certain mappings. In DBMS, based on the three schema architecture, each user group refers only to its own external schema. Hence, the DBMS must transform a request specified on the external schema into a request against the conceptual schema, and then into a request on the internal schema for processing over the stored database. If the request is database retrieval, the data extracted from the stored database must be reformatted to match the user’s external view. The process of transforming requests and results between levels are called mappings.

Two mappings are required in a database system with three different views.
1. A mapping between the external and conceptual: defines the correspondence between a particular external view and the conceptual view.
2. A mapping between conceptual and internal: defines the correspondence between the conceptual view and the store database; it specifies how conceptual records and fields are represented at the internal level.

Q. What is Data independence?

In pre-database systems data is physically represented in secondary storage, and the technique used to access it, are both dictated by the requirements of the application under consideration. Applications of such type are data-dependent because it is impossible to change the physical representation or access technique without affecting the application.

Different applications will require different views of the same data. The DBA must have the freedom to change the physical representation or access technique in response to changing requirements, without having to modify existing applications.

Three levels architecture, along with the mappings provide two distinct level of data independence. Data independence can be defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level.
1. Logical data independence indicates that the conceptual schema can be changed without affecting the existing external schemas. We may change the conceptual schema to expand the database. It also insulates application programs from operations such as combining two records into one or splitting an existing record into two or more records. Logical data independence is achieved by providing the external level or user view of the database. The application program or users see the database as described by their respective external views.

2. Physical data independence indicates that the physical storage structures or devices used for storing the data could be changed without necessitating a change in the conceptual view or any of the external views. If there is a need to change the file organization or the type of physical device used as a result of growth in the database or new technology, a change is required in the transformation functions between the physical and conceptual levels. This change is necessary to maintain the conceptual level invariant. Altering the physical database organization, however can affect the response and efficiency of existing application programs.
Q. Differentiate between DDL and DML.

A DBMS must provide languages and interfaces for its various users. These are as follows.

Data Definition Language (DDL):
Once the design of a database is completed and a DBMS is chosen to implement the database, the first thing is to specify conceptual and internal schemas for the database and mappings between the two.

DBMS provides a facility known as Data Definition Language, which can be used to define the conceptual scheme and also give some details about how to implement this scheme in the physical devices used to store the data. This definition includes all the entity set and their associate attributes as well as the relationships among the entity sets. The definition also includes any constraints that have to be maintained, including the constraint on the value that can be assigned to a given attribute and the constraint on the values assigned to different attributes in same or different records. These definitions, which can be described as meta-data about the data in the database, are expressed in the DDL of the DBMS and maintained in a compiled form. The compiled form of the definitions is known as data dictionary, directory, or system catalog.

The DBMS maintains the information on the file structure, the method used to efficiently access the relevant data. It also provides a method whereby the application programs indicate their data requirements. The application program could use a subset of the conceptual data definition language or separate language. The database system also contains mapping functions that allows it to interpret the stored data for the application system.

The internal schema is specified in a somewhat similar data definition language called data storage definition language. The definition of the internal view is compiled and maintained by the DBMS. The compiled internal schema specifies the implementation details of the internal database, including the access method employed.

Data Manipulation Language (DML):

Once the database schemas are compiled and the database is populated with data, users must have some means to manipulate the database. Typically manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS provides a set of operations or a language called the data manipulation language.

The language used to manipulate data in the database is called Data Manipulation Language. Data manipulation involves retrieval of data from the database, insertion of new data into the database, and deletion or modification of existing data. The first of these data manipulation operation is called query. A query is a statement in the DML that requests the retrieval of data from the database. The subset of the DML used to pose a query is known as a query language.

The DML provides commands to select and retrieve data from the database. Commands are also provided to insert, update and delete records. They could be used in an interactive mode or embedded in conventional programming languages. The Data Manipulation functions provided by the DBMS can be invoked in application program directly by procedure calls or by preprocessor statements. The latter would be replaced by appropriate procedure calls by either a preprocessor or the compiler.
The DML can be procedural; the user indicates not only what to retrieve but how to go about retrieving it. If the DML is nonprocedural, the user has to indicate only what is to be retrieved.
There are two main types of DMLs.

1. high-level or nonprocedural DML
2. low-level or procedural DML
1. Nonprocedural DML: It can be used on its own to specify complex database operations in a concise manner. Many DBMSs allow high-level DML statements either to be entered interactively either from terminal or to be embedded in a general-purpose programming language. In the latter case, DML statement must be identified within the program so that they can be extracted by a pre-compiler and processed by the DBMS. High-level DMLs, such as SQL, can specify and retrieve many records in a single DML statement and are hence called set-at-a-time or set-oriented DMLs. A query in a high-level DML often specifies which data to retrieve rather than how to retrieve it; hence, such languages are also called declarative.

2. procedural DML: must be embedded in a general purpose-programming language. This type of DML typically retrieves individual records from the database and process each separately. Hence it needs to use programming language construct, such as looping, to retrieve and process each record from a set of records. Low-level DMLs are also called record-at-a-time DMLs.

Whenever DML commands,( high or low level) are embedded in a general purpose programming language, that language is called the host language and the DML is called the data sublanguage.

On the other hand, a high-level DML used in a stand-alone interactive manner is called a query language.

Q. Briefly describe about Data Modeling and its classification?
A. A model is an abstraction process that hides superfluous details while highlighting details pertinent to the applications. A data model is a mechanism that provides this abstraction for database applications. Data modeling is used for representing entities of interest and their relationships in the database. A number of models for data representation have been developed. Most data representation models provide mechanisms to structure data for the entities being modeled and allow a set of operations to be defined as well as constraints to be enforced on them. These models differ in their method of representing the associations amongst entities and attributes.

Data models can be classified as follows:

File-based system or primitive models: Entities or objects of interest are represented by records that are stored together in files. Relationships between objects are represented by using directories of various kinds.

Traditional data models: Traditional data models include hierarchical, network and relational models. The hierarchical model evolved from the file-based system and the network mode is a superset of the hierarchical model. The relational model is based on the mathematical concept of relation.

Semantic data models: this model is influenced by semantic networks developed by artificial intelligence researchers. Semantic network were developed to organize and represent general knowledge. Semantic data models are able to express greater interdependencies among entities of interest. Semantic modeling is known by many names, such as entity/relationship modeling, entity modeling and object modeling.

Entity-relationship model: Earlier commercial systems were based on the hierarchical and network approach. The E-R model is a generalization of these models. E/R diagram constitute a technique for representing the logical structure of a database in a pictorial manner. They provide a simple and readily understood means of communicating the salient features of the design of any given database. In this model object of similar entity are collected into an entity set. The relationship between entity sets is represented by a named E-R relationship and is 1:1,1:M, or M:N, mapping from one entity set to another.

Entity:. An entity is a thing which can be distinctly identified. An entity may be an object with a physical existence such as employee, dog, person etc. or may be an object with a conceptual existence such as college, organization, job, etc. Entities are the basic units used in modeling classes. An entity set is a group of similar object with same set of attributes or properties. Entities can be classified into regular entities (strong entities) and weak entities. A weak entity is an entity that is existence-dependent on some other entity, in the sense that is can not exist if that other entity does not also exist. A regular entity on the other hand is an entity that is not weak.

Each entity type is shown as a rectangle containing the name of the entity type in question, for a weak entity type, the border of the rectangle is doubled.

Attributes: Each entity has attributes that describe it. The properties that characterize an entity set are called its attributes. Attributes are also known as data element, data item, data fields. All entities or relationships of a given type have certain kinds of properties in common. To store data on an entity set, we have to create a model for it by gathering important properties concern to organization. Attributes can be

Simple or composite :A composite attribute contains numbers of simple attributes. For example the employee name contains first name, middle name and last name. The employee name is considered as composite attribute where as first name, middle name and last name are called simple attribute.
Single or multi-valued :Most attribute have a single value for a particular entity; such attributes are called single-valued. For ex. father’s name, age is single valued attribute of an entity. Often some attributes require different values. For ex. Room type in a lodge, qualification of a person etc.
Missing / null value :In some cases a particular entity may not have an applicable value for an attribute. unknown/not applicable
Base or derived :in some cases, one attribute is depend on some other attributes for example DA attribute is dependant on salary. The DA attribute is hence called a derived attribute where as salary is base or stored attribute.

Key: although an entity has number of attributes, still a particular attribute uniquely identifies the entity type. That is an entity type usually has an attribute whose values are distinct for each individual entity in the entity set. Such an attribute is called a key attribute. For example account number in banking is said to be key.

Attributes are shown as ellipses containing the name of the property in question and attached to the relevant entity or relationship by means of a solid line. The ellipse border is dotted or dashed if the property is derived and doubled if it is multi-valued. If the property is composite, its component properties are shown as further ellipses, connected to the ellipse for the composite property in question by means of further solid lines. Key properties are underlined. The value sets corresponding to properties are not shown.
Relationships: Entity can be related to other by means of relation. An association among entities is called relationship. The relationship set is used in data modeling to represent an association between entity sets. The association could have certain properties represented by the attributes of the relationship set. The entities involved in a given relationship are said to be participants in that relationship. The number of participants in a given relationship is called the degree of that relationship.

Each relationship type is shown as a diamond containing the name of the relationship type in question. The diamond border is doubled if the relationship in question is that between a weak entity type and the entity type on which its existence depends. The participants in each relationship are connected to the relevant relationship by means of solid lines; each such line is labeled “1” or “M” to indicate whether the relationship is one-to-one , many-to-one etc. The line is doubled if the participation is total.

Entity subtypes and supertypes: Any given entity is of at least one entity type, but an entity can be of several types simultaneously. For example, if some employees are programmers, then we might say that entity type PROGRAMMER is a subtype of entity type EMPLOYEE, that entity type EMPLOYEE IS super type of entity type the converse is not true. Which does not apply to
Let entity type B is a subtype of entity type A. Then e draw a solid line from the A to B, marked with an arrowhead at the B end. The line denotes what is sometimes called “the isa relationship”.
Association of data
Information is obtained from data by using the context in which the data is obtained and made available. For example in a bank it is account number, which provides details about a customer. An Association between two attributes indicates that the values of the associated attributes are interdependent. This correspondence between attributes of an entity is a property of the information that is used in modeling the object. The association among various sets of data represented in the database is called a relationship.

Relationship types
A relationship type R among n entity types E1,E2,E3,… ,En defines a set of relationship set among entities from these entity types. Mathematically, the relationship set R is a set of relationship instance ri, where each ri associates n individual entities (e1,e2,e3,….,en), and each entity ej in ri is a member of entity type Ej, 1=j =n. Hence, a relationship type is a mathematical relation on E1,E2,E3,….En; alternatively, it can be defined as a subset of the Cartesian product E1 × E2 × E3 × … ,En. Each of the entity types E1,E2,E3,….En is said to participate in the relationship type R; similarly, each of the individual entities e1,e2,e3,….,en is said to participate in the relationship instance ri= (e1,e2,e3,….,en)

Relationship Degree: The degree of a relationship type is the number of participating entity types. Hence, the WORKS_FOR relationship is of degree two. Relationship can generally be of any degree, but the ones most common are binary relationships.

Cardinality ratios for binary relationships: The cardinality ratio for a binary relationship specifies the maximum number of relationship instances that an entity can participate in. For ex. In the WORKS_FOR binary relationship type, department, employee is of cardinality ratio 1:N, meaning that each department can be related to any number of employees, but an employee can be related to only one department.
The possible cardinality ratios for binary relationship types are 1:1, 1:N,N:1, and M:N.

Case study
A company database keeps track of a company’s employees, departments and projects. Suppose that after the requirements collection and analysis phase, the database designers provided the following description to be represented in the database.

1. The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations.
2. A department controls a number of projects, each of which has a unique name, a unique number, and a single location.
3. We store each employee’s name, social security number, address, salary, sex and birth date. An employee is assigned to one department but may work on several projects, which are not necessarily controlled by the same department. We keep track of the number of hours per week that an employee works on each project. We also keep track of the direct supervisor of each employee.

4. We want to keep track of the dependents of each employee for insurance purposes. We keep each dependent’s first name, sex, birth date, and relationship to the employee.

Q. What is relational model? Define the structure of a relational database.

A. A model is an abstraction process that hides superfluous details while highlighting details concerning to the application at hand. There are number of data models like file-based model, traditional data model and semantic data model. The traditional data model includes hierarchical, network and relational data model.

Among these relational data model is the fundamental on which the modern database system technology based. After more than a decade, it has emerged from the research as a commercial product and was introduced in 1970 by Ted Codd of IBM. Relational data model is conceptually simple and based on sound theoretical principles.

Relational data model is based on the mathematically concept of relation. In this model, the relation is the only construct required to represent the associations among the attributes of an entity as well as the relationships among different entities. A relation may be visualized as a named table.

Formal relational term Informal equivalents Relation Table Tuple Row or records Cardinality Number of rows Attribute Column or field Degree Number of columns Primary key Unique identifier Domain Pool of legal values The following structure shows a relation EMPLOYEE. Relational model concepts

The relational model represents the database as a collection of relations. It consists of three basic components:
* A set of domains and a set of relations
* Operations on relations
* Integrity rules

A relation is thought of as a table of values, each row in the table represents a collection of related data values. The table name and column names are used to help in interpreting the meaning of the values in each row. All values in a column are of the same data type.

From relational model point of view, a row is called a tuple, a column header is called an attribute, and the table is called a relation. The data type describing the types of values that can appear in each column is represented by a domain of possible values.

Domains, attributes, tuples and relations.

Domain: An object or entity is characterized by its attributes. In relational database systems, attributes correspond to fields. For a given application, an attribute may only be allowed to take a value from a set of the permissible values. This set of allowable values for the attribute is the domain of the attribute. Domain Di, is a set of values of the same data type. A common method of specifying a domain is to specify a data type from which the data values forming the domain are drawn. For example

Employee_Age : each must be a value between 18 to 60.

Employee_name : set of character stings that represent name of persons
Phone_number : set of seven digit numbers valid within a particular area.

A domain, like a data type, may be unstructured (atomic) or structured. Domain Di is said to be simple if all its elements are non-decomposable. Atomic domains are general sets, such as the set of integers, real numbers, character strings, and so on. The domain for the attribute address, for instance, which specifies street number, street name, city, state and zip or postal code is considered a composite domain.

Attribute: Attributes are defined on some underlying domain. That is they can assume values from the set of values in the domain. Attributes defined on the same domain are comparable, as these attributes draw their values form the same set.
Relations :A relation schema R, denoted by R(A1,A2…..An) is made up of a relation name R and a list of attributes A1,A2….An. Each attribute Ai is the name of a role played by some domain D in the relation scheme R. D is called the domain of Ai and is denoted by dom(Ai). A relation schema is used to describe a relation ; R is called the name of this relation.

Properties of relations
Relations possess certain properties, all of them immediate consequence of the definition of relation.
1. There are no duplicate tuples: This property follows from the fact that the body of the relation is a mathematical set; sets in mathematics do not include duplicate elements.
2. Tuples are unordered, top to bottom: this property also follows from the fact that the body of the relation is a mathematical set; sets in mathematics are not ordered.
3. Attributes are unordered, left to right: This property follows from the fact that the heading of a relation is also a set. Thus there is no such thing as “the first attribute” or “the second attribute” because attribute are referenced by name not by position.
4. Each tuple contains exactly one value for each attribute. : this property follows immediately from the definition of a tuple- a tuple is a set of n components or ordered pairs of the form ai:vi(i=1,2,3…n)
Product_name Manufacturer Sprinz Cavinkare Pvt. Ltd. Ariel Procter & Gamble Clinic Hindustan Lever Ltd. Ponds Hindustan Lever Ltd. Product_name Item Price Sprinz Deodorant 15.00 Sprinz Soap 25.00 Ariel Detergent power 70.00 Ariel Soap 10.00 Sprinz Talcum power 27.00 Clinic Clinic Plus sampoo 2.00 Clinic Clinic Coconut Oil 3.00 Ponds Talcum power 25.00

Product_name Manufacturer Product_name Item Price Sprinz Cavinkare Pvt. Ltd. Sprinz Deodorant 15.00 Sprinz Cavinkare Pvt. Ltd. Sprinz Soap 25.00 Ariel Procter & Gamble Ariel Detergent power 70.00 Ariel Procter & Gamble Ariel Soap 10.00 Sprinz Cavinkare Pvt. Ltd. Sprinz Talcum power 27.00 Clinic Hindustan Lever Ltd. Clinic Clinic Plus sampoo 2.00 Clinic Hindustan Lever Ltd. Clinic Clinic Coconut Oil 3.00 Ponds Hindustan Lever Ltd. Ponds Talcum power 25.00

The number of tuples in the join of MANUFACTURER and PRODUCT is the same as those in PRODUCT because a tuple in MANUFACTURER has the same value of the product_name attribute as a tuple in PRODUCT.

Set theory
A set is well-defined collection of objects. It is commonly represented by a list of its elements or by the specification of some membership condition. The intension of a set defines the permissible occurrences by specifying a membership condition. The extension of the set specifies on of the numerous possible occurrence by explicitly listing the set members. For example

Intension of set G: {g|g is an odd positive integer less than 10}
Extension of set G: {1,3,5,7,9}
A set is determined by its members. The number 3 is a member of the set G and this is denoted by . Given an object g and the set G exactly one of the statement “g is a member of G”or “g is not a member of G” is true.

Operations on sets include the union, intersection, Cartesian product, and difference operations. The union of two sets G and H () is the set that contains all elements belonging either to set G or H. If sets G and H have any elements in common, the union will not duplicate those members. The intersection of the set G and H ( ) is the set composed of all elements belonging to both G and H. If G and H are two sets, then G is included in H, ( ), if and only if each member of G is also a member of H. Should there be an element h such that but , then G is a proper subset of H ()


G= {Ariel}
H= {Ponds, Sprinz, Ariel}
={Ponds, Sprinz, Ariel}
= {Ariel}

Note that and and in the above example

The Cartesian product of two sets G and H ( ) is defined in terms of ordered pairs or 2- tuples. An ordered pair is conventionally denoted by enclosing it in parentheses (g,h). The product is the set consisting of all ordered pairs(g,h) for which and .
G= {Ariel, Sprinz}
H= {Soap, Deodorant }
={( Soap, Ariel),(Deodorant, Ariel),( soap, Sprinz),( deodorant, sprinz)}

Note that the individual n-tuples in the Cartesian product are ordered. Therefore, and are entirely different sets.

The difference of two sets G and H (G-H) is the set that contains all elements that are members of G but not of H.
G= {Ariel}
H= {Ponds, Sprinz, Ariel}
Then the sets G-H and H-g are
G-H = ( the null set)
H-G = {Ponds, Sprinz}

In set theory, relations between sets can be of many kinds, such as a subset of ( ) , complement of (), and so on. Pairing relations can also be defined in terms of some specific criterion. If G and H are sets of objects, and , then the possible pairing relations of degree 2 are:

(g,g) (g,h) (h,g) (h,h)

Each is a relation. We can see that a pairing relation must be a subset of the Cartesian product of the sets involved in the relationship. In the four relationships above, these Cartesian products are and and respectively.

Base Tables

SQL tables are allowed to include duplicate rows. They therefore do not necessarily have a primary key.
SQL tables are considered to have a left-to-right column ordering.

Base tables are defined by means of the CREATE TABLE statement.

CREATE TBALE <base table name> (<base table element commalist>);

Where each <base table element> is either a <column definition> or a <constraint>. The <constraint>s specify certain integrity constraints that apply to the base table in question.

A key is a single attribute or combination of two or more attribute of an entity set that is used to identify one or more instances of the set.

An attribute which is unique and will identify an instance of the entity set is known as
primary key.

If we add additional attribute to a primary key, the resulting combination would still uniquely identify an instance of the entity set. Such augmented keys are called super keys; a primary key is , therefore a minimum super key.

There may be two or more attributes or combinations of attributes that uniquely identify an instance of an entity set. These attributes or combinations of attributes are called candidate keys. In such case we must decide which of the candidate keys will be used as the primary key. The remaining candidate keys would be considered alternate keys.

A secondary key is an attribute or combination of attributes that may not be a candidate key but that classifies the entity set on a particular characteristic. For example department in employee entity is a secondary key.

Relational Algebra
Relational algebra is a collection of operations to manipulate relations. Relational algebra is a procedural language. Furthermore, it defines the complete scheme for each of the result relations. The relational algebraic operations can be divided into basic set-oriented operations and relational-oriented operations. The former are the traditional set operations, the latter, those for performing joins, selection, projection and division.

Basic operations
Basic operations are the traditional set operations: union, difference, intersection and Cartesian product. Three of these four basic operations -union ,intersection, and difference -require that operand relations be union compatible. Two relations are union compatible if they have the same arity and one-to-one correspondence of the attributes with the corresponding attribute defined over the same domain. Two relations P(P) and Q(Q) are said to be union compatible if both P and Q are of the same degree n and the domains of the corresponding n attributes re identical.
If P= {P1,P2……Pn} and Q= {Q1,Q2….Qn} then

Dom(Pi)= Dom (Qi) for I = {1,2……n}
Id Name 1 Satya 2 Syam 4 Nirmal 5 Ram 7 hari Id Name 2 Syam 5 Ram 8 madhu

If we assume that P(P) and Q(Q) are two union-compatible relations, then the union of P(P) and Q(Q) is the set-theoretic union of P(P) and Q(Q). the resultant relation, R= , has tuples drawn from P and Q such that
max( |P| ,|Q|) = |R| = |P|+|Q|

Id Name 1 Satya 2 Syam 4 Nirmal 5 Ram 7 Hari 8 Madhu The result relation R contains tuples that are in either P or Q or in both of them. The duplicate tuples are eliminated.
The degree of the relation P(P) and Q(Q), and R (R ) is the same. The cardinality of the resultant relation depends on the duplication of tuples in P and Q. From the above expression, we can see that if all the tuples in Q were contained in P, then R = P and |R| = |P| , while if the tuples in P and Q were disjoint, the |R|= |P| + |Q|.

The difference operation removes common tuples from the first relation.
Id Name 1 Satya 4 Nirmal 7 Hari R = P- Q such that
R= and
The intersection operation selects the common tuples from the two relations.

Id Name 2 Syam 5 ram R= where
R= and
0=|R| = min( |P|,|Q|)
The intersection operation is really unnecessary. It can be expressed as
= P-(P-Q)

Cartesian product
The Cartesian product of two relations is the concatenation of tuples belonging to the two relations. A new resultant relation scheme is created consisting of all combination of the tuples.
Where a tuple is given by , i.e. the result relation is obtained by concatenating each tuple in relation P with each tuple in relation Q. Here, || represents the concatenation operation.
The scheme of the result relation is given by:
R = P || Q
The degree of the result relation is given by:
|R| = |P|+|Q|
The cardinality of the result relation is given by:
|R| = |P|*|Q|

Id Name Designation 1 Satya Manager 1 Satya Executive 2 Syam Manager 2 Syam Executive 4 Nirmal Manager 4 Nirmal Executive
Id Name 1 Satya 2 Syam 4 Nirmal Designation Manager executive
The union and intersection operations are associative and commutative; therefore, given relations R( R ) and S(S), T(T) :

The difference operation, in general, is non-commutative and non-associative.
R-S ? S-R
R-(S-T) ? (R-S)-T
Additional relational algebraic operations

The basic set operations, which provide a very limited data manipulation facility, have been supplemented by the definition of the following operations: Projection, selection, join and division.
These operations are represented by the symbol respectively. Projection and selection are unary operations; join and division are binary.

The projection of a relation is defined as a projection of all its tuples over some set of attributes. i.e. it yields a “vertical subset” of the relation. The projection operation is used to either reduce the number of attributes in the resultant relation or to reorder attributes. In the first case the arity (degree) of the relation is reduced. When the number of attributes in the relation is reduced, the cardinality may also be reduced; this is due to the deletion of duplicate tuples in the projected relation.

1 American National Standards Institute/Standards Planning and Requirements Committee