SQL Naming Convention

jake0719

浏览: 89007 次
性别:
来自: 杭州

最近访客更多访客>>

siophy

生活小丑

云过无痕

天之骄子5237

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Database

SQL SQL Server CMS MySQL PostgreSQL

突然发现创建数据库没有统一的命名规范. 下面一篇文章不错可以参考, 另外还可以参考MSDN-Oslo的表设计模式.:

The following list provides the SQL guidelines for designing tables for the “Oslo” repository.

SQL Naming Convention

转自:http://kurafire.net/articles/sql-convention

There are millions of people making applications that use some sort of database or another, and there are several dozens of conventions which you can use to make a more well-formed database schematic. The CMS that powers this website is written by a semi-selfmade SQL convention. This article explains how it works.

The Media Design Content Management System, which is what I use to create and manage this website, uses PostgreSQL as its database system, rather than the more well-known MySQL.

For those of you that don't know, SQL stands for Structured Query Language, and is pronounced as [es queue el] (and not as sequal, as that name had caused trademark conflicts 18 years ago). There are various database systems that are based on SQL standards, although all of them have their own extentions to the official SQL Standard. If you want to read more about SQL, see this Wikipedia page on SQL. This article focuses only on PostgreSQL and this CMS's implementation using it.

Before we get started, we should look at what a standard or convention generally requires: consistency. You can make a standard without a lot of consistency (see RSS for example), but then it probably won't be a very good or useful standard. And as it is consistency and logic that determine a lot of the quality of a product or standard, it is consistency and logic that should be a major focus point of a naming convention.

Our CMS's Convention

Short and descriptive

At all times, stick to using as short as possible database-, table- and column names, but not at the cost of descriptiveness. Make sure that one look at the name can explain the purpose and content of the table or column.

Examples: filename, title, moderated, dateline.

Underscores are evil

Using underscores between words generally means your table or column names are not short and descriptive enough. There really is little need to have columns like user_name when username works just fine. Not only is the latter shorter, but it is also better: sometimes you'll have a column name that is valid as two words but also as one word, and then people may come to expect it as one word only to find out you treated it as two and put an unexpected underscore in-between. If you're the only person that will ever, ever touch your database system and your code… you should still stay away from underscores. You don't know what the future brings. Stick to consistency, and consistency can only be achieved here by not using underscores in table and column names.

Say no to CamelCase

CamelCase is an often adopted naming convention, used mainly in programming languages like Java and JavaScript. Its principle is simple: start every new word with a capital letter: camelCase, myObject, someFunction, etc. But CamelCase, or any uppercase in general, is not recommended for table and column names in SQL. Why, you wonder? Well, SQL databases tend to convert all identifiers (table/column names) to lowercase unless you wrap them in quotes. So even though you create a table called Authors, it'll be made as authors. Then, if you somehow end up moving your database to a server where the SQL system is compiled to be case-sensitive, and your code has Authors everywhere, it won't work anymore. Additionally, exporting your database scheme and moving it from Windows to Linux using certain tools comes at the risk that your names will all be lowercased along the way, which will - again - make your code (with CamelCase in it) stop working. Or imagine you have a column called authorId. The capital I for id can easily look like a lowercase L in many fonts, adding to the risk of confusing other people who have to work with your code. There are just too many small risks involved in using CamelCase, so try to avoid it and stick to lowercase.

Table names in column names

Certain conventions dictate that you use {tablename}_{columnname}, ie. table author would have author_name, author_birthday and so forth. Our suggestion: don't do that. A table is a collection of columns that belong in it, and a column name is exactly what it says: a column name; not a "column identifier with a table identifier in it as well". Stick to columnname; not only is it much shorter, it also does not require underscores nor does it clutter up queries unnecessarily. For instance, compare these two queries:

SELECT post_title AS title, post_content AS content, post_dateline AS dateline, post_author_id AS author_id FROM post;
SELECT title, content, dateline, authorid FROM post;

Which is easier to read? The second, as it keeps all data clean and easy to scan.

Furthermore, there are other reasons to stay away from using table names in column names (besides being illogical in their presence): what would happen if you had a table called post that would, later on in development of the application, start to conflict with something else? What if you had to rename it to entry? You'd end up with table entry and a bunch of columns that are prefixed with post_. Now you'll be stuck with highly inconsistent naming, or you have to go through all of your code and not only update the table name, but all of your column names as well! That can easily become quite tedious, as you can probably imagine.

And that's still not the end of it! Aside of being illogical, cluttersome, space-consuming and hardly future-proof, they also make it harder to join tables in queries. You can say that in a way it makes it easier, because you'll never have to worry about ambiguity in column names, but you're removing the use of USING() and you're making JOIN queries look very complicated, even when they're not. What's worse, some people make queries such as this:

SELECT post_title AS title, post_dateline AS dateline, 
author_name AS name FROM post LEFT JOIN author 
ON (post_author_id = author_id)

Why is that so bad?

SELECT title, dateline, name FROM post 
LEFT JOIN author USING (authorid)

…is simply much cleaner and easier to scan.

And to top it all off, using table names in your column names will only make matters highly confusing once you start having columns that reference columns on other tables. You'll end up with a column like this: post_link_id. That's 2 table names in a single column name. Not exaclty desirable, I would say.

There is one example where this is justifiable, see ID Columns below.

Avoid reserved keywords!

Tools such as phpMyAdmin allow you to name your tables and columns whatever you want, even when the names you choose are actually reserved keywords. That's because MySQL allows you to create such tables, but requires them enclosed in backticks (`). Using reserved keywords is very risky; you'll be forced to always use backticks (this is MySQL-specific, by the way) and if you forget them anywhere at all, you'll run into problems. Additionally, if your dump of the database(s) or table(s) is made without backticks, you'll have to manually add them because otherwise restoring the dump won't work. Be smart; just avoid using them at all times. They're never a necessity.

ID columns

Also known as auto_increment or serial columns, these are usually the very first column on any table and are almost always the Primary Key column of the table. For ID columns, you can choose to name them all id or, what we adopted for the Media Design CMS, {tablename}id (avoiding underscores). Having been completely consistent in the design of the database and all of its tables, we've been able to simplify a lot of functions and methods around this, making our life much easier. Whenever we need a serial (PostgreSQL's version of the ID column), we know it's found at tablename + 'id'. This makes referencing, joining and simply querying a lot easier. Additionally, by going for a tablename + 'id' instead of just 'id' we can always see from every serial and in every JOIN query what table it is for. Doing this for all columns, though, is overkill.

That sums up the rough convention used for the database scheme of our CMS. It may not be the "end-all, be-all" convention, but it's logical, consistent (providing you accept the good reasons to break one consistency rule) and works very nicely.

分享到：

[Exception... "Not enough arguments [nsI ... | 用oracle发送邮件（功能很全）

2009-04-22 12:57
浏览 2059
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论