Dr. Jaromczyk's Research Group Summer 2005
Computer Science Department
Dr. Jaromczyk
May 18--Josh Gilkerson spoke about sequence alignment. This technique compares two (or more) DNA sequences to determine their similarity. Local alignment matches a small sequence against a larger one, while global alignment scores the similarity of two large sequences. Sequences are close if only a few changes, such as changing base pairs or adding gaps, can transform one sequence into another. Dynamic programming algorithms such as Needleman-Wunsch can perform sequence alignment in O(n2) time and space. Some variations can reduce the space requirements. Multiple alignment, comparing more than two sequences, is much more difficult, and is often only approximated.
We also used BLAST, a popular sequence alignment program, to run sample searches on publicly available genome databases. An online search is available from the National Center for Biotechnology Information.
May 19--Dr. Jaromczyk spoke about dynamic programming. In this particular application, we want to find the distance between two sequences which can be transformed by changing a letter, inserting a letter, or deleting a letter. A common definition of distance is the edit distance--the minimum number of changes to change one string to the other. Handouts provided details on specific dynamic programming algorithms.
May 20--Two groups of students presented their implementations of
dynamic programming algorithms used to score the similarity of three languages--Spanish, Catalan, and Polish. These programs will eventually be available online.
May 24--Susan K. Smith from the Shaver Engineering Library demonstrated some of the databases available through UK for searching for technical articles. Many databases can be searched by such criterion as author, abstract, and full text. In some cases, the entire article may be available online. In other cases, it may be available as a physical copy (check InfoKat) or through the Interlibrary Loan.
May 25--David Egts from SGI
spoke about their Prism system for visualization. Many of their
customers have experienced an explosive growth in data size, outpacing
the growth of processor speed. For this reason, parallelization is
needed, along with a way to filter out relevant data. According to the
presentation, their system uses mainly off-the-shelf components and can
be easily expanded to 512 processors and many gigabytes of RAM. It uses
OpenGL extensions to efficiently render scenes in real time, and runs on
open source software.
June 2--Viktor Ivanov gave an introduction to PHP. PHP scripts
can be embedded in HTML, and the server transforms them into pure HTML to
send to the client. While it's not a true programming language, PHP
contains many similar features like variables and if blocks. It is
particularly useful for such tasks as making data submission forms and
interfacing with databases.
June 3--Paul Thacker gave an introduction to relational
databases. Topics included the entity-relationship model,
normalization, and SQL. Notes for the presentation can be found here.
June 6--Eren Turgay gave a presentation on using PHP to access a
MySQL database. With simple commands, PHP can connect to a database and
provide MySQL commands in the form of a string. This can be useful for
such things as entering data from a PHP form into a database or
formatting data from a database for display online. A few things to
remember are to always close a connection at the end, to free memory
when finished with large results, and to lock tables to ensure
integrity. For a complete list of commands, go to this site. Go here for some
simple examples. The PowerPoint
presentation is also available.
June 8--Joe Ibershoff gave a presentation on "Optimal, Efficient
Reconstruction of Phylogenetic Networks with Constrained Recombination."
A phylogenetic network is a DAG with a single root node and every other
node having either one or two incoming edges--one for mutation nodes or
two for recombination nodes. They are useful because they provide a
possible explanation of the history of mutations in a gene sequence.
Changes can occur either through a mutation of a base or a
recombination, which combines the prefix from one part of the tree with
a suffix from another. To make the calculations manageable, we might
restrict the form to a gall tree, in which all underlying recombination
cycles share no nodes with other recombination cycles.
June 9--Jesse Stewart gave a presentation on "Volume
Visualization and the Marching Cubes Algorithm." In general, volume
visualization takes 3D data (perhaps biomedical) and displays it in a
useful way. Techniques include direct volume rendering, which
casts rays on the data to give a realistic (but slow) rendering, and
surface fitting, which fits iso-surfaces to the data to approximate it
with a polygonal mesh. The mesh might be formed through the marching
cubes algorithm. The algorithm iterates through neighboring pairs of
slices. For each pair, it forms a large cube using four bordering
voxels from each slice. Based upon whether each corner is inside or
outside of the surface, there are 256 possible configurations, falling
into 15 different families. This classification determines how
triangles are added to the surface. The presentation is
available in both PowerPoint and HTML formats.
June 10--Joe Ibershoff gave a presentation on StarLogo. StarLogo
is an expansion of the Logo language designed to explore decentralized
systems, perhaps systems of biological agents or physical particles.
Each component is called a turtle, and the turtles can be made to
perform various functions. An observer then oversees the entire
process. Possible simulations include self-segregation and the
predator/prey
model. For more information, visit the Official StarLogo Website.
Also, a similar variation called NetLogo is available.
June 13--Paul Thacker provided an overview of cluster analysis.
Topic included a definition of cluster analysis, terminology, a summary
of the entire process, and details of a few specific clustering
algorithms. Notes for the presentation can be found here.
June 15--Eren Turgay gave a presentation on medical image
registration. Registration matches or aligns common data across two or
more images. It is used in a variety of fields including cartography,
astronomy, and medicine. In medicine, it can detect tumors and diseases
and track recovery. Temporal registration compares images of the same
object taken with the same instrument, but at different times.
Multimodal registration compares the same scene measured with different
instruments. Three major registration techniques were discussed. Linear
registration uses affine transformations. Non-linear techniques are
more useful for medicine and use various elastic models. Hierarchical
models make several passes going from course to fine detail in terms of
either data, warp complexity, or the model used. The more complex
methods tend to give better results, but are more costly. The PowerPoint presentation is available. A
bibliography of relevant papers is also available here.
Additionally, Levon Ter-Isahakyan demonstrated a NetLogo program
simulating an ant trail. If the ants are set to secrete a chemical,
they tend to move together in a line. Otherwise, the movement is
more random. The idea for the simulation comes from the book
Self-Organization in Biological Systems by Scott Camazine and
others.
June 16--Albert Kalim gave a presentation on "Visualization in
Bioinformatics." He discussed various programs, including
general-purpose visualization tools and programs designed specifically
for molecular viewing. Some general-purpose tools include GNUPlot,
SciLab, and Persistence of Vision Raytracer. Molecular viewers can fall
into several categories, including microarray design and analysis,
sequence analysis, molecular evolution, and proteomics.
Also, Viktor Ivanov spoke about web resources for bioinformatics. Go here for the complete list of links. The links
were reviewed for relevance, but none are actually endorsed by our
program.
June 17--Casey Lengacher gave a presentation on "Proteins,
Protein
Folding, and the Lattice Model." Proteins are formed by chains of amino
acids. The protein folds into a particular shape--called its tertiary
structure--which affects its function. It is very difficult to
determine the shape based upon the amino acid sequence. To make the
problem manageable, we reduce the number of possibilities by using a
lattice model. We then find the lowest energy state based upon
hydrophobic and hydrophilic attractions. The presentation is
available in both PowerPoint and HTML formats. Also, a survey of relevant
papers is available.
Additionally, Isaiah spoke about the Big Apple server. This is a
four-node Apple workgroup with over 170 open source bioinformatics
programs installed. It is designed to efficiently handle many
concurrent processes, and is accessible from the Internet.
June 20--Dr. Danny van Noort have a presentation on "DNA
Computers in Microfluidics." Dr. van Noort is an experimentalist
working at Seaul National University in South Korea. DNA computing uses
DNA molecules to perform computations. Using the selection principle,
we make strands representing all possible solutions to a problem and
filter out the correct ones. This can be rather slow, but allows for
tremendous parallelization. Microfluidics uses systems of tiny valves
which can be opened and closed with pneumatic chambers to program the
equivalent of Boolean operations.
June 21--The group toured some of the laboratories in the Plant
Science Building. One of the labs performs gene sequencing for other
enterprises on a contract basis. Another lab is investigating the
symbiotic relationship between some grass and fungus, which can result
in poisoning of livestock.
June 22--Levon Ter-Isahakyan spoke about hidden Markov models. A
Markov model explores the space of all possible solutions to a problem
by walking through various states, driven by probabilities and
considering only the immediate surroundings. The probabilities can be
represented as transition matrices. If the states are hidden, the
representation is a hidden Markov model. In bioinformatics, this can be
used to search for alignments. The process can be trained from
initially unaligned sequences using the Baum-Welch expectation
maximization or gradient descent methods.
June 23--Josh Gilkerson gave a presentation on "Finding Consensus
in a Family of DNA Sequences." In biology, a consensus sequence can
find slightly different variations of a sequence of bases in various DNA
sources. Starting with a family of sequences, the consensus sequence
minimizes the sum of the edit distances (mutations, additions, or
deletions) going from each member of the family to that sequence. Such
problems (in the general class of combinatorial optimization) are
NP-hard, so we generally use approximation techniques. Two such
techniques are simulated annealing and genetic algorithms. For further
details, see the presentation in either PowerPoint or HTML formats.
June 26-June 29--Several students attended the ISMB 2005 conference in
Detroit, Michigan.
June 30--Britt Selvitelle gave a presentation on Ruby on Rails, a
web framework similar to Python. Potential uses include automatically
generating a framework for blogs and working with databases in an
object-oriented way. For further details, visit the official website.
University of
Kentucky, Lexington, KY - 40506
Website maintained by Paul
Thacker
File last modified: Saturday, 13-Aug-2005 00:53:21 EDT$