Looking Ahead

Let’s add one final complication. Suppose that molecules didn’t have END markers but instead just a COMPND line followed by one or more ATOM lines. How would we read multiple molecules from a single file in that case?

 COMPND AMMONIA
 ATOM 1 N 0.257 -0.363 0.000
 ATOM 2 H 0.257 0.727 0.000
 ATOM 3 H 0.771 -0.727 0.890
 ATOM 4 H 0.771 -0.727 -0.890
 COMPND METHANOL
 ATOM 1 C -0.748 -0.015 0.024
 ATOM 2 O 0.558 0.420 -0.278
 ATOM 3 H -1.293 -0.202 -0.901
 ATOM 4 H -1.263 0.754 0.600
 ATOM 5 H -0.699 -0.934 0.609
 ATOM 6 H 0.716 1.404 0.137

At first glance, it doesn’t seem much different from the problem we just solved: read_molecule could extract the molecule’s name from the COMPND line and then read ATOM lines until it got either an empty string signaling the end of the file or another COMPND line signaling the start of the next molecule. But once it has read that COMPND line, the line isn’t available for the next call to read_molecule, so how can we get the name of the second molecule (and all the ones following it)?

To solve this problem, our functions must always “look ahead” one line. Let’s start with the function that reads multiple molecules:

 from​ typing ​import​ TextIO
 
 def​ read_all_molecules(reader: TextIO) -> list:
 """Read zero or more molecules from reader,
  returning a list of the molecules read.
  """
 
  result = []
  line = reader.readline()
 while​ line:
  molecule, line = read_molecule(reader, line)
  result.append(molecule)
 
 return​ result

This function begins by reading the first line of the file. Provided that line is not the empty string (that is, the file being read is not empty), it passes both the opened file to read from and the line into read_molecule, which is supposed to return two things: the next molecule in the file and the first line immediately after the end of that molecule (or an empty string if the end of the file has been reached).

This simple description is enough to get us started writing the read_molecule function. The first thing it has to do is check that line is actually the start of a molecule. It then reads lines from reader one at a time, looking for one of three situations:

  • The end of the file, which signals the end of both the current molecule and the file

  • Another COMPND line, which signals the end of this molecule and the start of the next one

  • An ATOM, which is to be added to the current molecule

The most important thing is that when this function returns, it returns both the molecule and the next line so that its caller can keep processing. The result is probably the most complicated function we have seen so far, but understanding the idea behind it will help you know how it works:

 from​ typing ​import​ TextIO
 
 def​ read_molecule(reader: TextIO, line: str) -> list:
 """Read a molecule from reader, where line refers to the first line of
  the molecule to be read. Return the molecule and the first line after
  it (or the empty string if the end of file has been reached).
  """
 
  fields = line.split()
  molecule = [fields[1]]
 
 
  line = reader.readline()
 while​ line ​and​ ​not​ line.startswith(​'COMPND'​):
  fields = line.split()
 if​ fields[0] == ​'ATOM'​:
  key, num, atom_type, x, y, z = fields
  molecule.append([atom_type, x, y, z])
  line = reader.readline()
 
 return​ molecule, line
images/fileproc/lookahead_2.png
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.237.131