Molecular Design: October 2009

Saturday 3 October 2009

Screening Libraries: Diversity & Coverage

I’m guessing that this may be the first blog post on screening library design to be written in Tierra del Fuego. The weather is currently rather unpleasant although less so than an hour ago when the snow was horizontal.

I introduced screening library design in the previous post with a generalised view of the work flow for fragment based lead generation. When selecting compounds for screening it can be helpful to think in terms of a chemical space in which all possible compounds (real or virtual) can be found. Now you’ve just got to sample the regions of chemical space that you like and you’ve got your library.

Life of course is not so easy. The main problem is that, despite the occasional claim to the contrary, nobody has found a convincing set of coordinates with which to describe chemical space usefully. You can sort of describe organic molecules by size and polarity without having worry about minor irritations like conformational flexibility, ionisation and tautomers. However, molecular recognition also depends on the shapes of molecules which, even for rigid species, are not so easy to turn into coordinates. Especially when you want these coordinates to be predictive of biological activity.

All is not lost since structurally similar molecules often have similar biological properties. One way that the similarity of a pair of compounds can be quantified is by comparing their molecular connection tables (the structures that you would write down on a piece of paper) and for this reason we sometimes talk about 2D similarity. There is no need for 3D molecular structures when you calculate molecular similarity in this way which means that there is no need to deal with conformations. Molecular fingerprints are used frequently to calculate similarity and the idea behind this is that the fingerprints encode the presence or absence of structural features in molecules. Many shared features suggest that two molecules are likely to be very similar. I’ll not go into the details of fingerprints in this post although you’ll be able to find some detailed discussion in our screening library design article.

The downside of 2D molecular similarity measures is that they are unlikely to reveal any but the most trivial shape match or pharmacophoric (e.g. oxadiazole replaces ester) similarity between molecules. This is not too much of a problem in library design because you’ll often to want select both molecules if they are based on different scaffolds, even if they can both orient their hydrogen bonding groups in a similar way. Once you’ve found some active compounds it becomes a very different game because now you’ll be looking for less obvious similarities between these actives, either to extract structure activity relationships or to define search queries.

So even though we don’t have a set of coordinates that defines chemical space in a way that is predictive of biological properties of molecules, we can still use molecular similarity to sample from a collection of compounds. Figure 1 illustrates how this sampling works and will give you an idea of what we mean by the terms diversity and coverage. The key thing to remember when looking at Figure 1 is that similar compounds are close to each other so there is an inverse relationship between distance and similarity. The stars are selected to cover the chemical space occupied by all the molecules and a star can’t cover its neighbourhood effectively the compounds in it are too far away

Although I needed put the molecules in particular positions (i.e. give them coordinates) to generate the graphic, you only need the distances between molecules to select representative subsets. In our paper we described in house software which can be used to do this and the two programs (Flush and BigPicker) are actually quite complementary to each other. Left to its own devices, BigPicker tends to select compounds with no near neighbours and we typically use Flush to ensure that the compounds that BigPicker is selecting from all have sufficient number of neighbours.

This is probably a good point at which to leave things. In the next post, I’ll describe the Core and Layer approach to selecting compounds for screening. This method is not specific to fragment libraries and in fact I’ve used it in work up of high throughput screening output and selection of compounds for cell-based assays.

Literature cited

Blomberg et al, Design of compound libraries for fragment screening. JCAMD, 2009, 23, 513-525 DOI

Grant & Pickup, A Gaussian Description of Molecular Shape. J. Phys. Chem. 1995, 99, 3503–3510 DOI