nblock's ~

SQLAlchemy automap and custom types

An API is an important part of an application when it comes to automation. For those web applications that do not offer an API, I typically write some small application on top of Scrapy or Requests to solve the problem.

The application in question is an old PHP application. There is no API available and adding an API to it is out of scope. MySQL is used as the database and most of the 79 tables are stored with MyISAM. Recently added tables are stored with InnoDB. There are just a few unique constraints on the database side and no foreign key constraints (due to MyISAM). The entire logic is coded into the PHP application.

The web application's database can be accessed directly, so there is no need for scraping. The following SQL listing provides a minimal working example for the purpose of this blog post.

-- create tables
CREATE TABLE IF NOT EXISTS `hosts` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `state` int(1) NOT NULL DEFAULT '1',
  `os_id` int(11) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ;

CREATE TABLE IF NOT EXISTS `os_types` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;

-- stuff some sample data into the tables
INSERT INTO `hosts` (`id`, `name`, `state`, `os_id`) VALUES
(1, 'astoria', 1, 3),
(2, 'fiddle', 2, 2),
(3, 'freeman', 4, 3),
(4, 'liard', 4, 3),
(5, 'leand', 1, 1),
(6, 'algar', 1, 2),
(7, 'ells', 3, 2);

INSERT INTO `os_types` (`id`, `name`) VALUES
(1, 'Debian Wheezy'),
(2, 'Debian Jessie'),
(3, 'Debian Stretch');

There are two tables, hosts and os_types. The rows of the os_types table are referenced via hosts.os_id inside the PHP application. There is no immediate connection on the database level. The hosts table contains a state column with the following magic numbers:

  • 1: active
  • 2: disabled
  • 3: unknown
  • 4: deleted

The task is simple: Connect to the database and print the name, state and the name of the OS.

Try #1

I don't want to write SQL by hand and I certainly don't want to remember all the magic numbers. So, I decided to give SQLAlchemy, a popular Object Relational Mapper for Python, another try. The typical usage is to define your model in plain Python and let SQLAlchemy manage the database side for you. This is convenient for new projects or if the database has 5 tables in total. The database of this application manages 79 tables and some of them contain a lot of columns (e.g. 26 columns for a single table). That's too much typing. Fortunately, SQLAlchemy offers a feature called Automap where it connects to a database, inspects the tables and tries to figure out the models for you. OK, now some code:

from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session

# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()


def main():
    engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')

    # Perform automap and create a session
    Base.prepare(engine, reflect=True)
    session = Session(engine)

    # Use the session
    Hosts = Base.classes.hosts
    OSTypes = Base.classes.os_types
    for host in session.query(Hosts).filter_by(state=1).all():
        os_type = session.query(OSTypes).get(host.os_id)
        print(host.name, host.state, os_type.name)


if __name__ == "__main__":
    main()

Run it:

$ python sqlalchemy1.py
astoria 1 Debian Stretch
leand 1 Debian Wheezy
algar 1 Debian Jessie

It works but there are some obvious limitations. SQLAlchemy was not able to figure out the relationship between hosts.os_id and os_types.id. So the programmer has to manage the relationship by hand (this happens in the PHP application). This is not only cumbersome but also error prone as one could just set os_id to an unreferenced value. Let's try to fix that.

Try #2

One can provide some hints for SQLAlchemy in order to build up a relationship. Take a look at the following version:

from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy.orm import relationship

# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()


class Hosts(Base):
    __tablename__ = 'hosts'

    # custom types
    os_id = Column(Integer, ForeignKey('os_types.id'))

    # relationships
    os = relationship('os_types', backref='hosts')


def main():
    engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')

    # Perform automap and create a session
    Base.prepare(engine, reflect=True)
    session = Session(engine)

    # Use the session
    for host in session.query(Hosts).filter_by(state=1).all():
        print(host.name, host.state, host.os.name)


if __name__ == "__main__":
    main()

Run it:

$ python sqlalchemy2.py
astoria 1 Debian Stretch
leand 1 Debian Wheezy
algar 1 Debian Jessie

I just added a Hosts class that maps to the hosts table. It adds a ForeignKey to the os_id column and a relationship named os. Usage is much simpler now: it boils down to a single query and there is no need to look up the name of the operating system by hand. But still, there is a magic number in use (state=1).

Try #3

As noted above, the state column has 4 known values. This pretty much looks like an enum. SQLAlchemy has another nice feature for this particular use case: custom types. The following version replaces the magic numbers with a custom type implemented as Python enum:

import enum

from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import TypeDecorator
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy.orm import relationship

# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()


@enum.unique
class HostState(enum.IntEnum):
    # This host is active and currently in use.
    ACTIVE = 1

    # It is no longer in use, it is turned off.
    DISABLED = 2

    # Unknown, literally
    UNKNOWN = 3

    # This host is gone, away forever.
    DELETED = 4


class HostStateTypeDecorator(TypeDecorator):

    impl = Integer

    def process_bind_param(self, value, dialect):
        if isinstance(value, HostState):
            value = value.value
        return value

    def process_result_value(self, value, dialect):
        if value is not None:
            value = HostState(value)
        return value


class Hosts(Base):
    __tablename__ = 'hosts'

    # custom types
    state = Column(HostStateTypeDecorator)
    os_id = Column(Integer, ForeignKey('os_types.id'))

    # relationships
    os = relationship('os_types', backref='hosts')


def main():
    engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')

    # Perform automap and create a session
    Base.prepare(engine, reflect=True)
    session = Session(engine)

    # Use the session
    for host in session.query(Hosts).filter_by(state=HostState.ACTIVE).all():
        print(host.name, host.state, host.os.name)


if __name__ == "__main__":
    main()

Run it:

$ python sqlalchemy3.py
astoria HostState.ACTIVE Debian Stretch
leand HostState.ACTIVE Debian Wheezy
algar HostState.ACTIVE Debian Jessie

Sweet, the magic numbers are now gone and one can build a query using the custom type: state=HostState.ACTIVE. Magic numbers are converted in both directions via process_result_value() and process_bind_param().

Tested with Python 3.6.4 and SQLAlchemy 1.2.6.


permalink

tagged automation, legacy, python and sqlalchemy