An API is an important part of an application when it comes to automation. For
those web applications that do not offer an API, I typically write some small
application on top of Scrapy or Requests to solve the problem.
The application in question is an old PHP application. There is no API
available and adding an API to it is out of scope. MySQL is used as the
database and most of the 79 tables are stored with MyISAM. Recently added
tables are stored with InnoDB. There are just a few unique constraints on the
database side and no foreign key constraints (due to MyISAM). The entire logic
is coded into the PHP application.
The web application's database can be accessed directly, so there is no need
for scraping. The following SQL listing provides a minimal working example for
the purpose of this blog post.
-- create tables
CREATE TABLE IF NOT EXISTS `hosts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`state` int(1) NOT NULL DEFAULT '1',
`os_id` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ;
CREATE TABLE IF NOT EXISTS `os_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
-- stuff some sample data into the tables
INSERT INTO `hosts` (`id`, `name`, `state`, `os_id`) VALUES
(1, 'astoria', 1, 3),
(2, 'fiddle', 2, 2),
(3, 'freeman', 4, 3),
(4, 'liard', 4, 3),
(5, 'leand', 1, 1),
(6, 'algar', 1, 2),
(7, 'ells', 3, 2);
INSERT INTO `os_types` (`id`, `name`) VALUES
(1, 'Debian Wheezy'),
(2, 'Debian Jessie'),
(3, 'Debian Stretch');
There are two tables, hosts and os_types. The rows of the os_types
table are referenced via hosts.os_id inside the PHP application. There is
no immediate connection on the database level. The hosts table contains a
state column with the following magic numbers:
- 1: active
- 2: disabled
- 3: unknown
- 4: deleted
The task is simple: Connect to the database and print the name, state and the
name of the OS.
Try #1
I don't want to write SQL by hand and I certainly don't want to remember all
the magic numbers. So, I decided to give SQLAlchemy, a popular Object
Relational Mapper for Python, another try. The typical usage is to define your
model in plain Python and let SQLAlchemy manage the database side for you.
This is convenient for new projects or if the database has 5 tables in total.
The database of this application manages 79 tables and some of them contain a
lot of columns (e.g. 26 columns for a single table). That's too much typing.
Fortunately, SQLAlchemy offers a feature called Automap where it connects
to a database, inspects the tables and tries to figure out the models for you.
OK, now some code:
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()
def main():
engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')
# Perform automap and create a session
Base.prepare(engine, reflect=True)
session = Session(engine)
# Use the session
Hosts = Base.classes.hosts
OSTypes = Base.classes.os_types
for host in session.query(Hosts).filter_by(state=1).all():
os_type = session.query(OSTypes).get(host.os_id)
print(host.name, host.state, os_type.name)
if __name__ == "__main__":
main()
Run it:
$ python sqlalchemy1.py
astoria 1 Debian Stretch
leand 1 Debian Wheezy
algar 1 Debian Jessie
It works but there are some obvious limitations. SQLAlchemy was not able to
figure out the relationship between hosts.os_id and os_types.id. So
the programmer has to manage the relationship by hand (this happens in the PHP
application). This is not only cumbersome but also error prone as one could
just set os_id to an unreferenced value. Let's try to fix that.
Try #2
One can provide some hints for SQLAlchemy in order to build up a relationship.
Take a look at the following version:
from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy.orm import relationship
# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()
class Hosts(Base):
__tablename__ = 'hosts'
# custom types
os_id = Column(Integer, ForeignKey('os_types.id'))
# relationships
os = relationship('os_types', backref='hosts')
def main():
engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')
# Perform automap and create a session
Base.prepare(engine, reflect=True)
session = Session(engine)
# Use the session
for host in session.query(Hosts).filter_by(state=1).all():
print(host.name, host.state, host.os.name)
if __name__ == "__main__":
main()
Run it:
$ python sqlalchemy2.py
astoria 1 Debian Stretch
leand 1 Debian Wheezy
algar 1 Debian Jessie
I just added a Hosts class that maps to the hosts table. It adds a
ForeignKey to the os_id column and a relationship named os. Usage
is much simpler now: it boils down to a single query and there is no need to
look up the name of the operating system by hand. But still, there is a magic
number in use (state=1).
Try #3
As noted above, the state column has 4 known values. This pretty much looks
like an enum. SQLAlchemy has another nice feature for this particular use
case: custom types. The following version replaces the magic numbers with a
custom type implemented as Python enum:
import enum
from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import TypeDecorator
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy.orm import relationship
# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()
@enum.unique
class HostState(enum.IntEnum):
# This host is active and currently in use.
ACTIVE = 1
# It is no longer in use, it is turned off.
DISABLED = 2
# Unknown, literally
UNKNOWN = 3
# This host is gone, away forever.
DELETED = 4
class HostStateTypeDecorator(TypeDecorator):
impl = Integer
def process_bind_param(self, value, dialect):
if isinstance(value, HostState):
value = value.value
return value
def process_result_value(self, value, dialect):
if value is not None:
value = HostState(value)
return value
class Hosts(Base):
__tablename__ = 'hosts'
# custom types
state = Column(HostStateTypeDecorator)
os_id = Column(Integer, ForeignKey('os_types.id'))
# relationships
os = relationship('os_types', backref='hosts')
def main():
engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')
# Perform automap and create a session
Base.prepare(engine, reflect=True)
session = Session(engine)
# Use the session
for host in session.query(Hosts).filter_by(state=HostState.ACTIVE).all():
print(host.name, host.state, host.os.name)
if __name__ == "__main__":
main()
Run it:
$ python sqlalchemy3.py
astoria HostState.ACTIVE Debian Stretch
leand HostState.ACTIVE Debian Wheezy
algar HostState.ACTIVE Debian Jessie
Sweet, the magic numbers are now gone and one can build a query using the
custom type: state=HostState.ACTIVE. Magic numbers are converted in both
directions via process_result_value() and process_bind_param().
Tested with Python 3.6.4 and SQLAlchemy 1.2.6.