add tag
Arthur Tarasov (imported from SE)
Say, a table `car` has one-to-one relationship to tables `electric_car`, `gas_car`, and `hybrid_car`. If a `car` is `electric_car`, it can no longer appear in `gas_car` or a `hybrid_car`, etc. 

Is there anything wrong with such design? Some problems that may occur down the road?
Top Answer
Walter Mitty (imported from SE)
The different types of cars are an instance of a general problem that surfaces over and over again in data modeling.  It is called "generalization/specialization" in ER modeling, and "superclass/subclass" in object modeling.  

An object modeler uses the inheritance features built into the object model to solve the problem quite easily.  The subclasses simply extend the superclass.

The relational modeler is faced with a problem.  how to design the tables so as to emulate the benefits that one would get from inheritance?  

The simplest technique is called [single table inheritance][1].  Data about all types of cars are grouped into a single table for cars.  There is a column, car_type, that groups together all the cars of a single type.  No car can belong to more than one type.  If a column is irrelevant to, say, electric cars,  it will be left [NULL][2] in the rows that pertain to electric cars.

This simple solution works well for the smaller and simpler cases.  The presence of a lot of NULLs adds a tiny bit to storage overhead, and a little bit to retrieval overhead.  The developer may have to learn [SQL three-valued logic][3] if boolean tests are done on nullable columns.  This can be baffling at first, but one gets used to it.

There is another technique, called [class table inheritance][4].  In this design, there are separate tables for gas_car, electric_car, and hybrid_car, in addition to a combined table, car, for all of them.  When you want all of the data about a specific kind of car, you join the car table with the appropriate specialized table.  There are fewer NULLs in this design, but you do more joining.  This technique works better in the larger and more complex cases.

There is a third technique called shared primary key.  This technique is often used in conjunction with class table inheritance.  The specialized tables for the subclasses have, as their primary key, a copy of the primary key of the corresponding entry in the car table.  This id column can  be declared to be both the primary key and a foreign key.

This involves a little extra programming when new cars are to be added, but it makes the joins simple, easy, and fast.

Superclasses and subclasses happen all the time in the real world.  Don't be afraid.  But do test your initial design for performance. If your first attempt is simple and sound, you'll be able to tweak it to speed it up.

Answer #2
Joel Brown (imported from SE)
There is nothing wrong with having as many entity sub-types in your model as are needed to reflect the reality of the data that you're trying to model.  The question isn't whether sub-types are a bad practice.  The issue _may_ be is it a **good model**?

For example, under your example, what do you do with something like an Audi A4 eTron - which is a plug-in hybrid?  Is that an "electric car" or is it a "hybrid car"?

The other question you have to ask yourself is why you're sub-typing at all?  How many distinct predicates do you have in your sub-types?  Are any of these predicates shared between sub-types?  The situation could get complicated.

Sub-typing isn't used in database design for classification.  You can do classification with codes, foreign keys to code tables, or with flags.  Sub-typing is used to model distinct predicate sets for different types of a thing of interest.  If you're using sub-types merely for classification then that's a bad practice.

If your sub-types clearly and unambiguously model different predicate sets for the things your database cares about, then it's a perfectly good practice, regardless of how many sub-types you need.

This room is for discussion about this question.

Once logged in you can direct comments to any contributor here.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.