An API is an important part of an application when it comes to automation. For those web applications that do not offer an API, I typically write some small application on top of Scrapy or Requests to solve the problem.
The application in question is an old PHP application. There is no API available and adding an API to it is out of scope. MySQL is used as the database and most of the 79 tables are stored with MyISAM. Recently added tables are stored with InnoDB. There are just a few unique constraints on the database side and no foreign key constraints (due to MyISAM). The entire logic is coded into the PHP application.
The web application’s database can be accessed directly, so there is no need for scraping. The following SQL listing provides a minimal working example for the purpose of this blog post.
-- create tables
CREATE TABLE IF NOT EXISTS `hosts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`state` int(1) NOT NULL DEFAULT '1',
`os_id` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ;
CREATE TABLE IF NOT EXISTS `os_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
-- stuff some sample data into the tables
INSERT INTO `hosts` (`id`, `name`, `state`, `os_id`) VALUES
(1, 'astoria', 1, 3),
(2, 'fiddle', 2, 2),
(3, 'freeman', 4, 3),
(4, 'liard', 4, 3),
(5, 'leand', 1, 1),
(6, 'algar', 1, 2),
(7, 'ells', 3, 2);
INSERT INTO `os_types` (`id`, `name`) VALUES
(1, 'Debian Wheezy'),
(2, 'Debian Jessie'),
(3, 'Debian Stretch');
There are two tables, hosts
and os_types
. The rows of the os_types
table are referenced via hosts.os_id
inside the PHP application. There
is no immediate connection on the database level. The hosts
table
contains a state
column with the following magic numbers:
- 1: active
- 2: disabled
- 3: unknown
- 4: deleted
The task is simple: Connect to the database and print the name, state and the name of the OS.
Try #1
I don’t want to write SQL by hand and I certainly don’t want to remember all the magic numbers. So, I decided to give SQLAlchemy, a popular Object Relational Mapper for Python, another try. The typical usage is to define your model in plain Python and let SQLAlchemy manage the database side for you. This is convenient for new projects or if the database has 5 tables in total. The database of this application manages 79 tables and some of them contain a lot of columns (e.g. 26 columns for a single table). That’s too much typing. Fortunately, SQLAlchemy offers a feature called Automap where it connects to a database, inspects the tables and tries to figure out the models for you. OK, now some code:
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()
def main():
engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')
# Perform automap and create a session
Base.prepare(engine, reflect=True)
session = Session(engine)
# Use the session
Hosts = Base.classes.hosts
OSTypes = Base.classes.os_types
for host in session.query(Hosts).filter_by(state=1).all():
os_type = session.query(OSTypes).get(host.os_id)
print(host.name, host.state, os_type.name)
if __name__ == "__main__":
main()
Run it:
$ python sqlalchemy1.py
astoria 1 Debian Stretch
leand 1 Debian Wheezy
algar 1 Debian Jessie
It works but there are some obvious limitations.
SQLAlchemy was not able to figure out the
relationship between hosts.os_id
and os_types.id
. So the programmer
has to manage the relationship by hand (this happens in the PHP
application). This is not only cumbersome but also error prone as one
could just set os_id
to an unreferenced value. Let’s try to fix that.
Try #2
One can provide some hints for SQLAlchemy in order to build up a relationship. Take a look at the following version:
from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy.orm import relationship
# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()
class Hosts(Base):
__tablename__ = 'hosts'
# custom types
os_id = Column(Integer, ForeignKey('os_types.id'))
# relationships
os = relationship('os_types', backref='hosts')
def main():
engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')
# Perform automap and create a session
Base.prepare(engine, reflect=True)
session = Session(engine)
# Use the session
for host in session.query(Hosts).filter_by(state=1).all():
print(host.name, host.state, host.os.name)
if __name__ == "__main__":
main()
Run it:
$ python sqlalchemy2.py
astoria 1 Debian Stretch
leand 1 Debian Wheezy
algar 1 Debian Jessie
I just added a Hosts
class that maps to the hosts
table. It adds a
ForeignKey
to the os_id
column and a relationship named os
. Usage
is much simpler now: it boils down to a single query and there is no
need to look up the name of the operating system by hand. But still,
there is a magic number in use (state=1
).
Try #3
As noted above, the state
column has 4 known values. This pretty much
looks like an enum. SQLAlchemy has another
nice feature for this particular use case: custom
types. The
following version replaces the magic numbers with a custom type
implemented as Python enum:
import enum
from sqlalchemy import Column
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import TypeDecorator
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy.orm import relationship
# The declarative base used for the SQLAlchemy reflection.
Base = automap_base()
@enum.unique
class HostState(enum.IntEnum):
# This host is active and currently in use.
ACTIVE = 1
# It is no longer in use, it is turned off.
DISABLED = 2
# Unknown, literally
UNKNOWN = 3
# This host is gone, away forever.
DELETED = 4
class HostStateTypeDecorator(TypeDecorator):
impl = Integer
def process_bind_param(self, value, dialect):
if isinstance(value, HostState):
value = value.value
return value
def process_result_value(self, value, dialect):
if value is not None:
value = HostState(value)
return value
class Hosts(Base):
__tablename__ = 'hosts'
# custom types
state = Column(HostStateTypeDecorator)
os_id = Column(Integer, ForeignKey('os_types.id'))
# relationships
os = relationship('os_types', backref='hosts')
def main():
engine = create_engine('mysql://USER:PASS@HOST:PORT/DATABASE')
# Perform automap and create a session
Base.prepare(engine, reflect=True)
session = Session(engine)
# Use the session
for host in session.query(Hosts).filter_by(state=HostState.ACTIVE).all():
print(host.name, host.state, host.os.name)
if __name__ == "__main__":
main()
Run it:
$ python sqlalchemy3.py
astoria HostState.ACTIVE Debian Stretch
leand HostState.ACTIVE Debian Wheezy
algar HostState.ACTIVE Debian Jessie
Sweet, the magic numbers are now gone and one can build a query using
the custom type: state=HostState.ACTIVE
. Magic numbers are converted
in both directions via process_result_value()
and
process_bind_param()
.
Tested with Python 3.6.4 and SQLAlchemy 1.2.6.