A Case Study: Molecules, Atoms, and PDB Files

Molecular graphic visualization tools allow for interactive exploration of molecular structures. Most read PDB-formatted files, which we describe in Multiline Records. For example, Jmol (in the following graphic) is a Java-based open source 3D viewer for these structures.

images/oop/ammonia.png

In a molecular visualizer, every atom, molecule, bond, and so on has a location in 3D space, usually defined as a vector, which is an arrow from the origin to where the structure is. All of these structures can be rotated and translated.

A vector is usually represented by x, y, and z coordinates that specify how far along the x-axis, y-axis, and z-axis the vector extends.

Here is how ammonia can be specified in PDB format:

 COMPND AMMONIA
 ATOM 1 N 0.257 -0.363 0.000
 ATOM 2 H 0.257 0.727 0.000
 ATOM 3 H 0.771 -0.727 0.890
 ATOM 4 H 0.771 -0.727 -0.890
 END

In our simplified PDB format, a molecule is made up of numbered atoms. In addition to the number, an atom has a symbol and (x, y, z) coordinates. For example, one of the atoms in ammonia is nitrogen, with symbol N at coordinates (0.257, -0.363, 0.0). In the following sections, we will look at how we could translate these ideas into object-oriented Python.

Class Atom

We might want to create an atom like this using information we read from the PDB file:

 nitrogen = Atom(1, ​"N"​, 0.257, -0.363, 0.0)

To do this, we’ll need a class called Atom with a constructor that creates all the appropriate instance variables:

 class​ Atom:
 """ An atom with a number, symbol, and coordinates. """
 
 def​ __init__(self, num: int, sym: str, x: float, y: float,
  z: float) -> None:
 """Create an Atom with number num, string symbol sym, and float
  coordinates (x, y, z).
  """
 
  self.number = num
  self.center = (x, y, z)
  self.symbol = sym

To inspect an Atom, we’ll want to provide __repr__ and __str__ methods:

 def​ __str__(self) -> str:
 """Return a string representation of this Atom in this format:
 
  (SYMBOL, X, Y, Z)
  """
 
 return​ ​'({0}, {1}, {2}, {3})'​.format(
  self.symbol, self.center[0], self.center[1], self.center[2])
 
 def​ __repr__(self) -> str:
 """Return a string representation of this Atom in this format:
 
  Atom(NUMBER, "SYMBOL", X, Y, Z)
  """
 
 return​ ​'Atom({0}, "{1}", {2}, {3}, {4})'​.format(
  self.number, self.symbol,
  self.center[0], self.center[1], self.center[2])

We’ll use those later when we define a class for molecules.

In visualizers, one common operation is translation, or moving an atom to a different location. We’d like to be able to write this in order to tell the nitrogen atom to move up by 0.2 units:

 nitrogen.translate(0, 0, 0.2)

This code works as expected if we add the following method to class Atom:

 def​ translate(self, x: float, y: float, z: float) -> None:
 """Move this Atom by adding (x, y, z) to its coordinates.
  """
 
  self.center = (self.center[0] + x,
  self.center[1] + y,
  self.center[2] + z)

Class Molecule

Remember that we read PDB files one line at a time. When we reach the line containing COMPND AMMONIA, we know that we’re building a complex structure: a molecule with a name and a list of atoms. Here’s the start of a class for this, including an add method that adds an Atom to the molecule:

 class​ Molecule:
 """A molecule with a name and a list of Atoms. """
 
 def​ __init__(self, name: str) -> None:
 """Create a Molecule named name with no Atoms.
  """
 
  self.name = name
  self.atoms = []
 
 def​ add(self, a: Atom) -> None:
 """Add a to my list of Atoms.
  """
 
  self.atoms.append(a)

As we read through the ammonia PDB information, we add atoms as we find them; here is the code from Multiline Records, rewritten to return a Molecule object instead of a list of lists:

 from​ molecule ​import​ Molecule
 from​ atom ​import​ Atom
 from​ typing ​import​ TextIO
 
 def​ read_molecule(r: TextIO) -> Molecule:
 """Read a single molecule from r and return it,
  or return None to signal end of file.
  """
 # If there isn't another line, we're at the end of the file.
  line = r.readline()
 if​ ​not​ line:
 return​ None
 
 # Name of the molecule: "COMPND name"
  key, name = line.split()
 
 # Other lines are either "END" or "ATOM num kind x y z"
  molecule = Molecule(name)
  reading = True
 
 while​ reading:
  line = r.readline()
 if​ line.startswith(​'END'​):
  reading = False
 else​:
  key, num, kind, x, y, z = line.split()
  molecule.add(Atom(int(num), kind, float(x), float(y), float(z)))
 
 return​ molecule

If we compare the two versions, we can see the code is nearly identical. It’s just as easy to read the new version as the old—more so even, because it includes type information. Here are the __str__ and __repr__ methods:

 def​ __str__(self) -> str:
 """Return a string representation of this Molecule in this format:
  (NAME, (ATOM1, ATOM2, ...))
  """
 
  res = ​''
 for​ atom ​in​ self.atoms:
  res = res + str(atom) + ​', '
 
 # Strip off the last comma.
  res = res[:-2]
 return​ ​'({0}, ({1}))'​.format(self.name, res)
 
 def​ __repr__(self) -> str:
 """Return a string representation of this Molecule in this format:
  Molecule("NAME", (ATOM1, ATOM2, ...))
  """
 
  res = ​''
 for​ atom ​in​ self.atoms:
  res = res + repr(atom) + ​', '
 
 # Strip off the last comma.
  res = res[:-2]
 return​ ​'Molecule("{0}", ({1}))'​.format(self.name, res)

We’ll add a translate method to Molecule to make it easier to move:

 def​ translate(self, x: float, y: float, z: float) -> None:
 """Move this Molecule, including all Atoms, by (x, y, z).
  """
 
 for​ atom ​in​ self.atoms:
  atom.translate(x, y, z)

And here we’ll call it:

 ammonia = Molecule(​"AMMONIA"​)
 ammonia.add(Atom(1, ​"N"​, 0.257, -0.363, 0.0))
 ammonia.add(Atom(2, ​"H"​, 0.257, 0.727, 0.0))
 ammonia.add(Atom(3, ​"H"​, 0.771, -0.727, 0.890))
 ammonia.add(Atom(4, ​"H"​, 0.771, -0.727, -0.890))
 ammonia.translate(0, 0, 0.2)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.40.189