About Target Searching

Using four different styles of search methods that are individually accessed through the "Search" drop-down menu, one can look for potential genomic sites that can be targeted by engineered homing endonucleases:

  1. "Central Four Search": This will produce all DNA sequences in the genomic sequence query that contain exact matches to the 'central four' basepairs (positions -2 to +2 in the corresponding LHE target site). If the ability of an LHE to accommodate variants of the central four base pairs has been determined, then any potential sites that contain those variations will also be displayed. This search does not consider any flanking positions in the potential target.

    Why: Our experience indicates that mismatches between an LHE's natural DNA target site and a potential genomic target site at any the central four basepairs of a target site are often not easily achieved through protein engineering or selection; therefore it is usually strongly preferred to target sites that are well-matched to the inherent cleavage preferences of the wild-type LHE at those base pair positions.

  2. "Identity Search": This will examine the genomic query sequence for those 20 base pair sequences that harbor the fewest number of total mismatches relative to the wild-type LHE target. The penalty for mismatches in the central four base pair positions is still kept very stringent for the reasons described above.

    Why: Our experience indicates that an efficient strategy for retasking homing endonuclease target specificity to genomic sites is to choose a wild-type LHE that displays a corresponding natural target site that is as closely related to the desired site as possible, for ease of engineering and selection.

  3. "PWM Search": For those proteins where their complete specificity profile has been determined (i.e. the fidelity of recognition at each position, and the relative effect of base pair substitutions at each position on overall cleavage activity is known), the user is provided with the opportunity to search using a position weighted matrix (PWM) that applies variable penalties on various base pair mismatches depending on the enzyme's specificity and fidelity at that position.

    Why: The difference between the two search strategies described above in (2) versus (3) above is simple: the use of a simple identity matrix only returns those matches with the fewest mismatches. In contrast, the use of a scoring matrix that accounts for recognition degeneracy at individual DNA base pair positions can indicate sites that might be more distantly related to the protein's wild-type recognition site, while nevertheless indicating more tractable gene targeting sites.

  4. "Module Search": for those LHEs that have been systematically assayed for overall 'engineerability' against individual pockets of all possible DNA codons, an advanced scoring matrix is available that identifies the best hits based upon a modular approach to DNA target site identification.

    Why: Screening or designing specificity changes as individual basepair positions and then trying to combine those separate solutions to create an active, specific enzyme against a novel genomic target is often problematic. The reason for this is that there is often significant cross-talk between adjacent base pairs in the DNA target and between the corresponding amino acid residues. These can only be accounted for either by the computational design of enzymes that can tolerate multiple base pair substitutions, or by high-throughput selection of enzymes that can tolerate multiple adjacent base pair substitutions.