How to model an entity type that can have different sets of attributes? - Databases

TopAnswers Databases
Top Answer
MDCCL (imported from SE)
According to your description of the business environment under consideration, there exists a *supertype-subtype* structure that encompasses *Item* —the supertype— and each of its *Categories*, i.e., *Car*, *Boat* and *Plane* (along with two more that were not made known) —the subtypes—.

I will detail below the method I would employ to manage said scenario.

# Business rules #

In order to start delineating the relevant *conceptual* schema, some of the most important *business rules* determined so far (restricting the analysis to the three disclosed *Categories* only, to keep things as brief as possible) can be formulated as follows:

- A *User* owns zero-one-or-many *Items*.
- An *Item* is owned by exactly-one *User* at a specific instant.
- An *Item* may be owned by one-to-many *Users* at distinct points in time.
- An *Item* is classified by exactly-one *Category*.
- An *Item* is, at all times,
  - either a *Car*
  - or a *Boat*
  - or a *Plane*.

# Illustrative IDEF1X diagram #

[Figure 1] displays an IDEF1X^1^ diagram that I created to group the previous formulations along with other business rules that appear pertinent:

![Figure 1 - Item and Categories Supertype-Subtype Structure][Figure 1]

**Supertype**

On the one hand, *Item*, the supertype, presents the properties^†^ or attributes that are common to all the *Categories*, i.e.,

- *CategoryCode* —specified as a FOREIGN KEY (FK) that references *Category.CategoryCode* and functions as a subtype *discriminator*, i.e., it indicates the exact *Category* of subtype with which a given *Item* must be connected—, 
- *OwnerId* —distinguished as a FK that points to *User.UserId*, but I assigned it a role name^2^ in order to reflect its special implications more accurately—, 
- *Foo*, 
- *Bar*, 
- *Baz* and 
- *CreatedDateTime*.

**Subtypes**

On the other hand, the properties^‡^ that pertain to every particular *Category*, i.e.,

- *Qux* and *Corge*;
- *Grault*, *Garply* and *Plugh*; 
- *Xyzzy*, *Thud*, *Wibble* and *Flob*;

are shown in the corresponding subtype box.

**Identifiers**

Then, the *Item.ItemId* PRIMARY KEY (PK) has migrated^3^ to the subtypes with different role names, i.e., 

- *CarId*, 
- *BoatId* and 
- *PlaneId*.

**Mutually exclusive associations**

As depicted, there is an association or relationship with a *one-to-one* (1:1) cardinality ratio between (a) each supertype occurrence and (b) its complementary subtype instance.

The *exclusive subtype* symbol portrays the fact that the subtypes are mutually exclusive, i.e., a concrete *Item* occurrence can be supplemented by a single subtype instance only: either one *Car*, or one *Plane*, or one *Boat* (never by zero or less, nor by two or more).

^†,\ ‡^ *I employed classic placeholder names to entitle some of the entity type properties, as their actual denominations were not supplied in the question.*

# Expository logical-level layout #

Consequently, in order to discuss an expository logical design, I derived the following SQL-DDL statements based on the IDEF1X diagram displayed and described above:

    -- You should determine which are the most fitting 
    -- data types and sizes for all your table columns 
    -- depending on your business context characteristics.

    -- Also, you should make accurate tests to define the 
    -- most convenient INDEX strategies based on the exact 
    -- data manipulation tendencies of your business context.

    -- As one would expect, you are free to utilize 
    -- your preferred (or required) naming conventions. 

    CREATE TABLE UserProfile (
        UserId          INT      NOT NULL,
        FirstName       CHAR(30) NOT NULL,
        LastName        CHAR(30) NOT NULL,
        BirthDate       DATE     NOT NULL,
        GenderCode      CHAR(3)  NOT NULL,
        Username        CHAR(20) NOT NULL,
        CreatedDateTime DATETIME NOT NULL,
        --
        CONSTRAINT UserProfile_PK  PRIMARY KEY (UserId),
        CONSTRAINT UserProfile_AK1 UNIQUE ( -- Composite ALTERNATE KEY.
            FirstName,
            LastName,
            GenderCode,
            BirthDate
        ),
        CONSTRAINT UserProfile_AK2 UNIQUE (Username) -- ALTERNATE KEY.
    );

    CREATE TABLE Category (
        CategoryCode     CHAR(1)  NOT NULL, -- Meant to contain meaningful, short and stable values, e.g.; 'C' for 'Car'; 'B' for 'Boat'; 'P' for 'Plane'.
        Name             CHAR(30) NOT NULL,
        --
        CONSTRAINT Category_PK PRIMARY KEY (CategoryCode),
        CONSTRAINT Category_AK UNIQUE      (Name) -- ALTERNATE KEY.
    );

    CREATE TABLE Item ( -- Stands for the supertype.
        ItemId           INT      NOT NULL,
        OwnerId          INT      NOT NULL,
        CategoryCode     CHAR(1)  NOT NULL, -- Denotes the subtype discriminator.
        Foo              CHAR(30) NOT NULL,
        Bar              CHAR(40) NOT NULL,
        Baz              CHAR(55) NOT NULL,  
        CreatedDateTime  DATETIME NOT NULL,
        --
        CONSTRAINT Item_PK             PRIMARY KEY (ItemId),
        CONSTRAINT Item_to_Category_FK FOREIGN KEY (CategoryCode)
            REFERENCES Category    (CategoryCode),
        CONSTRAINT Item_to_User_FK     FOREIGN KEY (OwnerId)
            REFERENCES UserProfile (UserId)  
    );

    CREATE TABLE Car ( -- Represents one of the subtypes.
        CarId INT          NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
        Qux   DATE         NOT NULL,
        Corge DECIMAL(5,2) NOT NULL,   
        --
        CONSTRAINT Car_PK         PRIMARY KEY (CarId),
        CONSTRAINT Car_to_Item_FK FOREIGN KEY (CarId)
            REFERENCES Item (ItemId),
        CONSTRAINT ValidQux_CK    CHECK       (Qux >= '1990-01-01')   
    );

    CREATE TABLE Boat ( -- Stands for one of the subtypes.
        BoatId INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
        Grault SMALLINT NOT NULL,
        Garply DATETIME NOT NULL,   
        Plugh  CHAR(63) NOT NULL, 
        --
        CONSTRAINT Boat_PK         PRIMARY KEY (BoatId),
        CONSTRAINT Boat_to_Item_FK FOREIGN KEY (BoatId)
            REFERENCES Item (ItemId),
        CONSTRAINT ValidGrault_CK  CHECK       (Grault <= 10000)  
    );

    CREATE TABLE Plane ( -- Denotes one of the subtypes.
        PlaneId INT      NOT NULL, -- Must be constrained as (a) the PRIMARY KEY and (b) a FOREIGN KEY.
        Xyzzy   BIGINT   NOT NULL,
        Thud    TEXT     NOT NULL,  
        Wibble  CHAR(20) NOT NULL, 
        Flob    BIT(1)   NOT NULL,   
        --
        CONSTRAINT Plane_PK         PRIMARY KEY (PlaneId),
        CONSTRAINT Plane_to_Item_PK FOREIGN KEY (PlaneId)
            REFERENCES Item (ItemId),
        CONSTRAINT ValidXyzzy_CK    CHECK       (Xyzzy <= 3258594758)
    );

*This has been tested in [this db<>fiddle](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=06ab489b55c5f40c1eb5d45f124bcf9f) running on MySQL 8.0.*

As demonstrated, the superentity type and each of the subentity types are represented by the corresponding *base* table.

The columns `CarId`, `BoatId` and `PlaneId`, constrained as the PKs of the appropriate tables, help in representing the conceptual-level one-to-one association by way of FK constraints^§^ that point to the `ItemId` column, which is constrained as the PK of the `Item` table. This signifies that, in an actual “pair”, both the supertype and the subtype rows are identified by the same PK value; thus, it is more than opportune to mention that

- (a) attaching an *extra* column to hold system-controlled surrogate values^‖^ to (b) the tables standing for the subtypes is (c) *entirely superfluous*.

^§^ In order to prevent problems and errors concerning (particularly FOREIGN) KEY constraint definitions —situation you referred to in comments—, it is very important to take into account the *existence-dependency* that takes place among the different tables at hand, as exemplified in the declaration order of the tables in the expository DDL structure, which I supplied in [this db<>fiddle](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=06ab489b55c5f40c1eb5d45f124bcf9f) too.

^‖^ E.g., appending an additional column with the [AUTO_INCREMENT](https://dev.mysql.com/doc/refman/5.7/en/example-auto-increment.html) property to a table of a database built on MySQL.

**Integrity and consistency considerations**

It is critical to point out that, in your business environment, you have to (1) ensure that each “supertype” row is at all times complemented by its corresponding “subtype” counterpart, and, in turn, (2) guarantee that said “subtype” row is compatible with the value contained in the “discriminator” column of the “supertype” row. 

It would be very elegant to enforce such circumstances in a *declarative* manner but, unfortunately, none of the major SQL platforms has provided the proper mechanisms to do so, as far as I know. Therefore, resorting to procedural code within [ACID TRANSACTIONS](http://amturing.acm.org/info/gray_3649936.cfm#add_1) it is quite convenient so that these conditions are always met in your database. Other option would be employing TRIGGERS, but they tend to make things untidy, so to speak.

# Declaring useful *views* #

Having a logical design like the one explained above, it would be very practical to create one or more views, i.e., *derived* tables that comprise columns that belong to two or more of the relevant *base* tables. In this way, you can, e.g., SELECT directly FROM those views without having to write all the JOINs every time you have to retrieve “combined” information.

**Sample data**

In this respect, let us say that the base tables are “populated” with the sample data shown below:

    --
    INSERT INTO UserProfile 
        (UserId, FirstName, LastName, BirthDate, GenderCode, Username, CreatedDateTime)
    VALUES
        (1, 'Edgar', 'Codd', '1923-08-19', 'M', 'ted.codd', CURDATE()),
        (2, 'Michelangelo', 'Buonarroti', '1475-03-06', 'M', 'michelangelo', CURDATE()),
        (3, 'Diego', 'Velázquez', '1599-06-06', 'M', 'd.velazquez', CURDATE());

    INSERT INTO Category 
        (CategoryCode, Name)
    VALUES
        ('C', 'Car'), ('B', 'Boat'), ('P', 'Plane');

    -- 1. ‘Full’ Car INSERTion

    -- 1.1 
    INSERT INTO Item
        (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
    VALUES
        (1, 1, 'C', 'Motorway', 'Tire', 'Chauffeur', CURDATE());
     
     -- 1.2
    INSERT INTO Car
        (CarId, Qux, Corge)
    VALUES
        (1, '1999-06-11',  999.99);

    -- 2. ‘Full’ Boat INSERTion

    -- 2.1
    INSERT INTO Item
        (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
    VALUES
        (2, 2, 'B', 'Ocean', 'Anchor', 'Sailor', CURDATE());

    -- 2.2
    INSERT INTO Boat
        (BoatId, Grault, Garply, Plugh)
    VALUES
        (2, 10000, '2016-03-09 07:32:04.000', 'So far so good.');

    -- 3 ‘Full’ Plane INSERTion

    -- 3.1   
    INSERT INTO Item
        (ItemId, OwnerId, CategoryCode, Foo, Bar, Baz, CreatedDateTime)
    VALUES
        (3, 3, 'P', 'Sky', 'Wing', 'Aviator', CURDATE());

    -- 3.2

    INSERT INTO Plane
        (PlaneId, Xyzzy, Thud, Wibble, Flob)
    VALUES
        (3, 3258594758, 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut sollicitudin pharetra sem id elementum. Sed tempor hendrerit orci. Ut scelerisque pretium diam, eu sodales ante sagittis ut. Phasellus id nunc commodo, sagittis urna vitae, auctor ex. Duis elit tellus, pharetra sed ipsum sit amet, bibendum dapibus mauris. Morbi condimentum laoreet justo, quis auctor leo rutrum eu. Sed id nibh non leo sodales pulvinar. Nam ornare ipsum nunc, eget molestie nulla ultrices vel. Curabitur fermentum nisl quis lorem aliquam pretium aliquam at mauris. In vestibulum, tellus et pharetra sollicitudin, mi lacus consectetur dolor, id volutpat nulla eros a mauris. ', 'Here we go!', TRUE);

    --

Then, an advantageous view is one that gathers columns from `Item`, `Car` and `UserProfile`:

    --

    CREATE VIEW CarAndOwner AS
        SELECT C.CarId,
               I.Foo,
               I.Bar,
               I.Baz,
               C.Qux,
               C.Corge,           
               U.FirstName AS OwnerFirstName,
               U.LastName  AS OwnerLastName
            FROM Item I
            JOIN Car C
              ON C.CarId = I.ItemId
            JOIN UserProfile U
              ON U.UserId = I.OwnerId;

    --

Naturally, a similar approach can be followed so that you can as well SELECT the “full” `Boat` and `Plane` information straight FROM *one single table* (a derived one, in these cases).

After that —if you do not mind about the presence of NULL marks in result sets— with the following VIEW definition, you can, e.g., “collect” columns from the tables `Item`, `Car`, `Boat`, `Plane` and `UserProfile`:

    --

    CREATE VIEW FullItemAndOwner AS
        SELECT I.ItemId,
               I.Foo, -- Common to all Categories.
               I.Bar, -- Common to all Categories.
               I.Baz, -- Common to all Categories.
              IC.Name      AS Category,
               C.Qux,    -- Applies to Cars only.
               C.Corge,  -- Applies to Cars only.
               --
               B.Grault, -- Applies to Boats only.
               B.Garply, -- Applies to Boats only.
               B.Plugh,  -- Applies to Boats only.
               --
               P.Xyzzy,  -- Applies to Planes only.
               P.Thud,   -- Applies to Planes only.
               P.Wibble, -- Applies to Planes only.
               P.Flob,   -- Applies to Planes only.
               U.FirstName AS OwnerFirstName,
               U.LastName  AS OwnerLastName
            FROM Item I
            JOIN Category IC
              ON I.CategoryCode = IC.CategoryCode
       LEFT JOIN Car C
              ON C.CarId = I.ItemId
       LEFT JOIN Boat B
              ON B.BoatId = I.ItemId
       LEFT JOIN Plane P
              ON P.PlaneId = I.ItemId               
            JOIN UserProfile U
              ON U.UserId = I.OwnerId;

    --

The code of the views here shown is only illustrative. Of course, doing some testing exercises and modifications might help to accelerate the (physical) execution of the queries at hand. In addition, you might need to remove or add columns to said views as the business needs dictate.

The sample data and all the view definitions are incorporated into [this db<>fiddle](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=06ab489b55c5f40c1eb5d45f124bcf9f) so that they can be observed “in action”.

**Data manipulation: Application program(s) code and column aliases**

The usage of application program(s) code (if that is what you mean by “server-side specific code”) and column aliases are other significant points that you brought up in the next comments:

> - I did manage to workaround [a JOIN] issue with server-side specific code, but I really don't want to do that -AND- adding aliases to all columns might be "stressing".

> - Very well explained, thank you very much. However, as I suspected, I'll have to manipulate the result set when listing all data because of the similarities with some columns, since I don't want to use several aliases to keep the statement cleaner.

It is opportune to indicate that while using application program code is a very fitting resource to handle the presentation or graphical features —i.e., the external level of representation of a computerized information system— of data sets, it is paramount that you avoid carrying out data retrieval on a row-by-row basis to prevent execution speed issues. The objective should be to “fetch” the pertinent data sets in toto by means of the robust data manipulation instruments provided by the (precisely) set engine of the SQL platform so that you can optimize the behaviour of your system.

Furthermore, utilizing aliases to rename one or more columns within a certain scope may appear stressing but, personally, I see such resource as a very powerful tool that helps to (i) contextualize and (ii) disambiguate the *meaning* and *intention* ascribed to the concerning columns; hence, this is an aspect that should be thoroughly pondered with respect to the manipulation of the data of interest.

# Similar scenarios #

You might as well find of help [this series of posts](https://dba.stackexchange.com/a/102069/63644) and [this group of posts](https://dba.stackexchange.com/a/107969/63644) which contain my take on two other cases that include supertype-subtype associations with mutually exclusive subtypes.

I have also proposed a solution for a business environment involving a supertype-subtype cluster where the subtypes are *not* mutually exclusive in [this (newer) answer](https://dba.stackexchange.com/a/183129/63644).

---

# Endnotes #

^1^ *Integration Definition for Information Modeling* ([IDEF1X](http://www.idef.com/idef1x-data-modeling-method/)) is a highly recommendable data modeling technique that was established as a *standard* in December 1993 by the U.S. *National Institute of Standards and Technology* (NIST). It is solidly based on (a) some of the theoretical works authored by the *sole originator* of the [relational model](http://dl.acm.org/citation.cfm?id=362685), i.e., [Dr. E. F. Codd](http://amturing.acm.org/award_winners/codd_1000892.cfm); on (b) the [entity-relationship view](http://dl.acm.org/citation.cfm?id=320440), developed by [Dr. P. P. Chen](http://awards.acm.org/award_winners/chen_6196475.cfm); and also on (c) the Logical Database Design Technique, created by Robert G. Brown.

^2^ In IDEF1X, a *role name* is a distinctive label assigned to a FK property (or attribute) in order to express the meaning that it holds within the scope of its respective entity type.

^3^ The IDEF1X standard defines *key migration* as “The modeling process of placing the primary key of a parent or generic entity in its child or category entity as a foreign key”.

  [Figure 1]: https://i.stack.imgur.com/YihR1.png
1 Answer