schemaball - visualize an SQL database schema in a circular composition
# all parameters read from default configuration file schemaball [-debug [-debug] ]
# read parameters from specific configuration file schemaball -conf etc/schemaball.conf
# override database connection settings schemaball [-database DATABASE] [-host HOST] [-user USER] [-password PASSWORD]
# dump configuration only schemaball -dumpconfig
# read documentation schemaball -man
Schemaball generates high-level database schema diagrams. The schema diagram is formatted along a circular track. Tables are represented by glyphs along the circumference and relationships between tables (constraints) are depicted by lines that join the table glyphs. Schemaball can follow these constraints from table to table and automatically annotate the image to help you identify tables participating in a constraint chain. Diagram elements can be flexibly hidden or highlighted to draw attention to specific parts of the schema.
Schemaball is primarily designed to help visualize the relationships between tables, rather than table structure. Currently, the schema diagrams do not indicate which field in a table participates in a constraint relationship and details of internal table structure are not shown.
The database structure can be obtained by connecting to a live server or by reading SQL schema dumps or flat files. The properties of the schema diagram are manipulated by setting parameter values stored in a configuration file.
Up-to-date documentation, code and FAQ can be viewed at http://mkweb.bcgsc.ca/schemaball.
When database schemas are small, generating effective illustrations is very easy. Diagrams can be drawn manually without much effort. As the number of tables and constraints grows, it becomes more difficult and time consuming to depict the schema in a compact and informative manner. The schema diagram becomes very complex and automated drawing tools have difficulty in optimizing the placement and routing of the glyphs and lines in the schema diagram. In a highly connected schema, it may not be possible to draw the table glyphs in clusters to limit the total length of constraint lines. As a result, constraint paths are long and tracing them is awkward.
One way to limit the complexity inherent in the routing problem of the constraint lines is to use a pre-determined layout for the table glyphs. By placing these along a circle, constraint lines are easy to generate and follow. The circular schema diagram scales well with schema size - the diagram remains compact and light on the eyes.
Database schemas, especially for large databases, are usually examined in a piece-meal fashion. You don't look at the entire schema all at once - you examine groups of tables and constraints which form a functional group. By decomposing the schema in this manner, the data, objects and interactions modeled by the database can be more easily understood. Schemaball is ideally suited for such decomposition - the colour scheme of arbitrary elements (tables or constraints) can be adjusted. The colour scheme can be propagated along constraints, thereby generating highlighted paths through the schema.
Schemaball generates static images. Currently only bitmap images (PNG) can be generated. However, coordinates of each element can be obtained and may be used to generated vector images (e.g. SVG). The properties of the schema diagrams can be tuned flexibly making publication-quality output possible.
To draw the schema diagram, a list of tables and constraints is required. In diagrams generated by Schemaball, internal table structure is not displayed. Tables and constraints may be loaded from (a) live database (b) SQL schema dump, or (c) list of table or constraint names.
Schemaball needs to know how to communicate with your database to be able to
To connect to a live database, the <db> block in the configuration file stores the server name, database name and user authentication tokens. Interface to the database is provided by DBI and the DBI driver is specified in 'dbtype' in the <sqldb> block. This block additionally defines all the SQL commands which are used to poll database structure. The <sql> block is described further on.
In addition to polling a live database, Schemaball can collect schema information from an SQL schema dump or a flat file which lists the table names. The <db> block is used to specify the data source.
To connect to a database server, the following need to be specified in the <db> block.
<db> database = sequence host = seqdb01 user = viewer password = viewer </db>
The 'sqlfile' variable stores the filename of the schema dump.
<db> sqlfile = schemas/hs.sql </db>
If your table names are in a list, use 'tablefile'.
<db> tablefile = tables.txt </db>
Flat files are useful in cases when you'd like to draw a schema diagram but you don't want to create a database or generate the appropriate SQL commands to create tables.
Named <sql> blocks store SQL commands. You may need to adjust the value of the 'sql' variable in these commands to make them compatible with your server, if you are not using MySQL. The name of the block should not be changed. Each SQL command is defined in the following way
<sql NAME> sql = SQLCOMMAND colnum = COLNUM </sql>
where NAME is a unique identifier, SQLCOMMAND is the SQL query and COLNUM is the column in the query result that contains the desired information. The value of COLNUM depends on what the query is designed to achieve.
The SQL command may be applied to specific databases and tables by using the strings _DB_ and _TABLE_ in the query string. These strings will be substituted with the current database and table name. The following <sql> blocks are defined by default and are required to extract the required schema information.
The listtables SQL command is expected to provide a list of tables in the database. In MySQL, the appropriate SQL query is ``SHOW TABLES''.
<sql listtables> sql = "show tables" colnum = 0 </sql>
This SQL command generates a list of all fields in a table, along with the field type and other information.
One way to gain access to the list of constraints for a table is to obtain the SQL statement that creates the table. In MySQL, this is done using ``SHOW CREATE TABLE tablename''. Schemaball will parse the SQL create command for constraint relationships and identify corresponding tables.
Another way to list constraints is through the table status view. The MySQL query is ``SHOW TABLE STATUS FROM database LIKE tablename''.
Schemaball generates its table list from the data source defined in the <db> block. When the <db> block defines a database, the SQL query defined in the <sql listtables> block is used to generate the table list. When the <db> block defines an SQL schema dump, the CREATE TABLE statements in the dump are parsed to extract table names.
MySQL does not support foreign keys currently for MyISAM tables (as of v4.0, see http://www.mysql.com/doc/en/ANSI_diff_Foreign_Keys.html). Support for constraints may be implemented by using (a) InnoDB tables which support foreign keys, or (b) application logic. To help automate application logic, foreign key fields may be named using a specific convention, which can be used to work out the target table and field for a given foreign key name. One such convetion is
FKtype_tablename__field
where type is some arbitrary string to differentiate of the foreign key. This field is specific to an application, which associates different TYPEs with specific business logic. The TYPE makes it possible to have multiple foreign keys pointing to the same field in the same table. The TABLE is the name of the table that the foreign key references and the FIELD is the name of the FIELD in that table. If the canonical name of the referenced field is in the format TABLE_FIELD, then the FIELD string in the FK name does not include the TABLE_ string. For example, if a Person table has a foreign key which references the field Personality_ID in the table Personality, then the name of the key would be FK_Personality__ID (not FK_Personality__Personality_ID).
The <linkrule> block in Schemaball's configuration file specifies how to detect constraint relationships. There are three ways in which constraints may be determined.
Constraint relationships can be extracted by polling the database with specific SQL statements. The SQL statement to use is stored in the 'sql' variable. The query result is parsed using a regular expression stored in 'rx'.
<linkrule fk> sql = tablestatus rx = REFER \w+\/(\w+) use = no </linkrule>
The regular expression uses a capturing bracket to extract the table name to which the constraint refers.
<linkrule tablecreate> sql = tablecreate rx = REFERENCES `?(\w+) use = no </linkrule>
If you follow a naming convention for your foreign keys, a link rule as shown below should be used.
<linkrule> type = a foreign key rx = .*FK_(\w+)__(\w+) <fields> table = 0 field = 1 </fields> </linkrule>
This link rule defines the regular expression to be applied to the field name. The regular expression contains capturing brackets, which store the table and field name pointed to by the foreign key.
You can specify all or additional constraints using a flat file. Each line specifies a single constrain, which can be in the format table1__table2, where the tables are delimited by __, or ``table1 table2'', where the tables are delimited by a space. In both cases, table1 contains a foreign key pointing to table2.
<linkrule linkfile> file = fk.txt use = yes </linkrule>
## fk1.txt table1__table2 table2__table3
## fk2.txt table1 table2 table2 table3
Once Schemaball has analyzed the data source (database, SQL dump, or table list) and compiled a list of the constraints (database, SQL dump, or constraint list), it can generate a dump of what it has parsed. If you toggle debugging (-debug), the dump is reported in lines starting with ``schemadump''. These data can be used to verify that Schemaball has correctly parsed your database structure and to create flat files of table and constraint names.
Schemaball gets most of its runtime parameters from an Apache-like configuration file. The Config::General module is required to parse the file. For the syntax of the configuration file, refer to the man page of Config::General.
If you do not specify the configuration file, Schemaball checks a few default locations before giving up. The default name of the configuration file is schemaball.conf, or SCRIPTNAME.conf where SCRIPTNAME is the name of the schemaball script. The following paths are checked for the SCRIPTNAME.conf file, in order. Relative paths are relative to the location of the script.
Blocks in the configuration file group parameters that control disparate aspects of the schema diagram. These blocks are
Data source: (a) server, database name, user and password, (b) SQL schema dump file, or (c) flat file containing table names.
Named blocks defining rules for parsing constraint relationships from (a) database using SQL queries, (b) field names, or (c) flat file containing constraint names.
Parent block for schema diagram properties. Contains an <elements> block which stores properties of the following image elements: <link> (constraint lines), <bezier> (bezier curve properties for constraint lines), <label> (text labels for tables), <table> (table glyphs) and <highlight> (constraint tracing).
Database driver and custom SQL statements. Contains named <sql> blocks which store custom SQL statements.
The image block stores the following entries
<image> file image_element_dump size glyphradius annotation background foreground <elements> ... </elements> </image>
Output file name for schema diagram.
Toggles output of coordinates for each image element. Produces lines prefixed with ``ballelem'' containing position (both cartesian and polar coordintes), rotation, size and label of each element.
Width and height of the image, which is square.
Radius at which the table glyphs will be drawn, in units of inscribed circle radius. The tables will appear in a circle whose radius is size/2 * glyphradius.
Toggles text presenting information about the data source in the lower left of the schema diagram.
Colour of image background.
background = cad9ef background = 202,217,239
Colour of foreground text.
The link block controls how constraint lines should be drawn.
<link> show_links color color_hi stroke stroke_hi colour_from stroke_from foot_fraction anchor type link_to_invisible hide_file <hide_rx> ... </hide_/rx> highlight_file <highlight_rx> ... </highlight_rx> </link>
Toggles the display of constraint lines.
Colour of the constraint lines.
Colour of highlighted constraint lines. Highlights are controlled by 'highlight_file' and <highlight_rx>.
Thickness of the constraint line.
Thickness of highlighted constraint line.
A portion of the link line near the table that contains the foreign key may be coloured differently.
Thickness of portion of link line near the table that contains the foreign key.
Fraction of the link line near the table that contains the foreign key to which colour_from and stroke_from are applied.
The location of the table glyph to which the constraint line connects. If set to 'edge', the constraint line will be drawn to the point of the table glyph closest to the center of the image. If set to 'center', the line will be drawn to the center of the table glyph.
When using bezier constraint curves, 'edge' is preferred. When using straight lines, 'center' should be used.
The geometry of the constraint line.
Toggles whether constraint lines should be drawn to invisible tables. This option only applies to hidden tables that are invisible, but still remain in the schema diagram (see table :: invisible_hide option).
^table1__ - hides all constraints originating in table1 __table2$ - hides all constraints pointing to table2
Block containing regular expressions which control visibility of constraint lines. Serves the same purpose as hide_file, but allows for the regular expressions to be stored in the configuration file.
<hide_rx> ^table1__ __table2$ </hide_rx>
File name containing regular expressions that control highlighting of constraint lines. Highlighted lines will be coloured using colour_hi, and drawn with thickness stroke_hi.
Block containing regular expressions that control highlighting of constraint lines. Serves the same purpose as highlight_file.
To highlight all constraints originating from table1,
<highlight_rx> ^table1__ </highlight_rx>
<bezier> resolution radius </bezier>
The <bezier> block controls properties of the bezier curves, if this constraint line type is chosen.
The curve is drawn using piece-wise straight lines. The resolution controls the number of these lines. The higher the number, the more smooth the curve will appear.
The location of the third point that defines the bezier curve. The third point will be half way between the table glyphs, in angular distance, and at a relative radius of 'radius'. The smaller the value of radius, which may be negative, the higher the curvature of the constraint line.
Properties of text and text labels.
<label> font size margin </label>
True type font to use for the labels. The font file is relative to the script path.
Size of the labels, in points.
Padding between the label text and the table glyph. The label is always aligned to this margin and drawn radially outwards from the center of image.
Table glyph properties.
<table> size fill fill_hi stroke outline outline_hi label anonymize hide_file <hide_rx> ... </hide_rx> invisible_hide highlight_file <highlight_rx> ... </highlight_rx> </table>
Size of the table glyph, which is a circle.
Fill colour of the table glyph.
Fill colour of a highlighted table glyph.
Thickness of table glyph edge.
Colour of table glyph edge.
Colour of a highlighted table glyph edge.
Toggles table name labels.
Toggles whether table names should be the real table name or an anonymous label in the form ``Table NNN'', where NNN is the table index. Useful if you do not want to give away too much information about your schema.
File containing regular expressions that control whether tables are included in the schema diagram. If a table name matches any regular expression, the table's status is changed to 'hidden'.
Hidden tables may be invisible in the schema, but still take up space along the circle, or removed from the schema altogether. The invisible_hide toggle controls this behaviour.
Regular expressions for defining hidden tables. For example, to hide all tables whose names contain the string ``seq'',
<hide_rx> seq </hide_rx>
Toggle that controls whether hidden tables become invisible, but take up space along the schema circle, or whether the should be removed from the schema altogether. When set to 'yes', you will see gaps in your schema circle when tables are hidden. If set to 'no' the glyphs of tables that are not hidden will be re-organized and distributed uniformly along the circle.
File with regular expressions that control table highlighting.
Block with regular expressions that control table highlighting. For example, to highlight all tables whose names contain the string ``seq'',
<highlight_rx> seq <highlight_rx>
Controls how highlighting is inherited across constraint lines.
<highlight> highlight_by_link highlight_by_table highlight_by_table_forward highlight_by_table_reverse highlight_by_iterations fade_factor_table fade_factor_link </highlight>
The inheritance of highlighting helps to follow contraint relationships between tables. For example, by first highlighting a specific table (see table :: highlight_rx)
<table> <highlight_rx> mytable </highlight_rx> </table>
you can ask that all constraint lines originating from and ending at this table are highlighted too.
highlight_by_table = yes highlight_by_table_forward = yes highlight_by_table_reverse = yes highlight_by_iterations = 1
Similarly, if you highlight a constraint line, you can highlight all the tables that participate in the lines
<link> <highlight_rx> ^table1__ <highlight_rx> </link> highlight_by_link = yes highlight_by_terations = 1
The effect will be to highlight all tables which participate in a constraint relationship with table1.
The parameter 'highlight_by_link' toggles whether tables should inherit link highlights. Similarly, links inherit table highlights if 'highlight_by_table' is set.
Toggles whether links are used to highlight tables. In order for tables to become highlighted, some links must be highlighted using the <highlight_rx> block or highlight_file parameter in the <link> block.
Toggles whether tables are used to highlight links. In order for links to become highlighted, some tables must be highlighted using the <highlight_rx> block or highlight_file parameter in the <table> block.
When a table is highlighted, the inheritance of the highlight to its links can be controlled using 'highlight_by_forward' and 'highlight_by_reverse' (see below). To highlight outgoing constraints, use highlight_by_forward. To highlight incoming constraints, use 'highlight_by_reverse'. Both can be set to highlight all constraints that involve a highlighted table.
The number of iterations of highlight inheritance. Consider the linear schema
T1 - LINK1 - T2 - LINK2 - T3 - LINK3
If we initially highlight T1,
<table><highlight_rx>T1</highlight_rx></table> and set the number of iterations to 0, the resulting schema will have only T1 highlighted.
T1(h) - LINK1 - T2 - LINK2 - T3 - LINK3 ...
Setting iterations to 1,
T1(h) - LINK1(h) - T2(h) - LINK2 - T3 - LINK3
When two iterations are used,
T1(h) - LINK1(h) - T2(h) - LINK2(h) - T3(h) - LINK3
and so on.
With each iteration, the colour of the highlights may be diluted to distinguish the inheritance depth. The first highlight is always drawn as specified by color_hi and stroke_hi in the <link> block and outline_hi and fill_hi in the <table> block. Subsequently, the highlight colours are diluted in HSB space using the following recipe
color_hi(n) = RGB(HSB(color_hi(n-1)) * fade_factor ^ (iteration-1) HSB(color) * (1-fade_factor)^ (iteration-1))
The size of the fade factor determines how quickly the highlighted colour fades towards the regular colour for the highlighted element. If fade_factor = 1, the colour never fades. If fade_factor = 0.5, the colour fades half-way towards the regular colour (in HSB space). If fade_factor = 0, the highlighted colour immediately turns to the regular colour on the first iteration.
Schemaball has two debug levels, specified by the number of -debug
command-line parameters. Level 1, set by -debug, will make
printdebug()
produce output. Level 2, set by -debug -debug,
will in addition dump the structure of the $CONF
variable.
The global $CONF
variable holds all runtime information,
parsed from the configuration file, as well as other settings. To see the
structure of this variable run
schemaball -dumpconfig
Database schemas can be parsed from flat files, SQL dumps (using SQL::Translator) or (as before) live servers.
Table names and constraints are dumped to STDOUT in schemadump lines when -debug is used.
Added tutorials using the Ensembl (www.ensembl.org) human genome database.
Fixed HSB dilution bug for greys.
Now using stringFT instead of old stringttf GD API call to make labels (Steve Postma). Fixed label positioning to accomodate proportionately spaced fonts. Bundled minihaha font, since minimono does not have lowercase letter forms.
Initial release.
Please report bugs and suggestions to martink@bcgsc.ca.
Martin Krzywinski, martink@bcgsc.ca
$Id: schemaball,v 1.5 2003/10/27 20:25:11 martink Exp $
Martin Krzywinski Genome Sciences Centre Vancouver BC Canada www.bcgsc.ca martink@bcgsc.ca