Dr. Jaromczyk's Research Group Summer 2005

University Of Kentucky

Computer Science Department
Dr. Jaromczyk

Looking for an earlier version of this page? Go to Summer 2003 or Spring 2005.
You may also want to visit the Summer 2004 website.

During the summer, several undergraduate and graduate students will be working on computer science projects and attending educational talks. This page will summarize these activities.

May 18--Josh Gilkerson spoke about sequence alignment. This technique compares two (or more) DNA sequences to determine their similarity. Local alignment matches a small sequence against a larger one, while global alignment scores the similarity of two large sequences. Sequences are close if only a few changes, such as changing base pairs or adding gaps, can transform one sequence into another. Dynamic programming algorithms such as Needleman-Wunsch can perform sequence alignment in O(n2) time and space. Some variations can reduce the space requirements. Multiple alignment, comparing more than two sequences, is much more difficult, and is often only approximated.

We also used BLAST, a popular sequence alignment program, to run sample searches on publicly available genome databases. An online search is available from the National Center for Biotechnology Information.

May 19--Dr. Jaromczyk spoke about dynamic programming. In this particular application, we want to find the distance between two sequences which can be transformed by changing a letter, inserting a letter, or deleting a letter. A common definition of distance is the edit distance--the minimum number of changes to change one string to the other. Handouts provided details on specific dynamic programming algorithms.

May 20--Two groups of students presented their implementations of dynamic programming algorithms used to score the similarity of three languages--Spanish, Catalan, and Polish. These programs will eventually be available online.

May 24--Susan K. Smith from the Shaver Engineering Library demonstrated some of the databases available through UK for searching for technical articles. Many databases can be searched by such criterion as author, abstract, and full text. In some cases, the entire article may be available online. In other cases, it may be available as a physical copy (check InfoKat) or through the Interlibrary Loan.

May 25--David Egts from SGI spoke about their Prism system for visualization. Many of their customers have experienced an explosive growth in data size, outpacing the growth of processor speed. For this reason, parallelization is needed, along with a way to filter out relevant data. According to the presentation, their system uses mainly off-the-shelf components and can be easily expanded to 512 processors and many gigabytes of RAM. It uses OpenGL extensions to efficiently render scenes in real time, and runs on open source software.

June 2--Viktor Ivanov gave an introduction to PHP. PHP scripts can be embedded in HTML, and the server transforms them into pure HTML to send to the client. While it's not a true programming language, PHP contains many similar features like variables and if blocks. It is particularly useful for such tasks as making data submission forms and interfacing with databases.

June 3--Paul Thacker gave an introduction to relational databases. Topics included the entity-relationship model, normalization, and SQL. Notes for the presentation can be found here.

June 6--Eren Turgay gave a presentation on using PHP to access a MySQL database. With simple commands, PHP can connect to a database and provide MySQL commands in the form of a string. This can be useful for such things as entering data from a PHP form into a database or formatting data from a database for display online. A few things to remember are to always close a connection at the end, to free memory when finished with large results, and to lock tables to ensure integrity. For a complete list of commands, go to this site. Go here for some simple examples. The PowerPoint presentation is also available.

June 8--Joe Ibershoff gave a presentation on "Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination." A phylogenetic network is a DAG with a single root node and every other node having either one or two incoming edges--one for mutation nodes or two for recombination nodes. They are useful because they provide a possible explanation of the history of mutations in a gene sequence. Changes can occur either through a mutation of a base or a recombination, which combines the prefix from one part of the tree with a suffix from another. To make the calculations manageable, we might restrict the form to a gall tree, in which all underlying recombination cycles share no nodes with other recombination cycles.

June 9--Jesse Stewart gave a presentation on "Volume Visualization and the Marching Cubes Algorithm." In general, volume visualization takes 3D data (perhaps biomedical) and displays it in a useful way. Techniques include direct volume rendering, which casts rays on the data to give a realistic (but slow) rendering, and surface fitting, which fits iso-surfaces to the data to approximate it with a polygonal mesh. The mesh might be formed through the marching cubes algorithm. The algorithm iterates through neighboring pairs of slices. For each pair, it forms a large cube using four bordering voxels from each slice. Based upon whether each corner is inside or outside of the surface, there are 256 possible configurations, falling into 15 different families. This classification determines how triangles are added to the surface. The presentation is available in both PowerPoint and HTML formats.

June 10--Joe Ibershoff gave a presentation on StarLogo. StarLogo is an expansion of the Logo language designed to explore decentralized systems, perhaps systems of biological agents or physical particles. Each component is called a turtle, and the turtles can be made to perform various functions. An observer then oversees the entire process. Possible simulations include self-segregation and the predator/prey model. For more information, visit the Official StarLogo Website. Also, a similar variation called NetLogo is available.

June 13--Paul Thacker provided an overview of cluster analysis. Topic included a definition of cluster analysis, terminology, a summary of the entire process, and details of a few specific clustering algorithms. Notes for the presentation can be found here.

June 15--Eren Turgay gave a presentation on medical image registration. Registration matches or aligns common data across two or more images. It is used in a variety of fields including cartography, astronomy, and medicine. In medicine, it can detect tumors and diseases and track recovery. Temporal registration compares images of the same object taken with the same instrument, but at different times. Multimodal registration compares the same scene measured with different instruments. Three major registration techniques were discussed. Linear registration uses affine transformations. Non-linear techniques are more useful for medicine and use various elastic models. Hierarchical models make several passes going from course to fine detail in terms of either data, warp complexity, or the model used. The more complex methods tend to give better results, but are more costly. The PowerPoint presentation is available. A bibliography of relevant papers is also available here.

Additionally, Levon Ter-Isahakyan demonstrated a NetLogo program simulating an ant trail. If the ants are set to secrete a chemical, they tend to move together in a line. Otherwise, the movement is more random. The idea for the simulation comes from the book Self-Organization in Biological Systems by Scott Camazine and others.

June 16--Albert Kalim gave a presentation on "Visualization in Bioinformatics." He discussed various programs, including general-purpose visualization tools and programs designed specifically for molecular viewing. Some general-purpose tools include GNUPlot, SciLab, and Persistence of Vision Raytracer. Molecular viewers can fall into several categories, including microarray design and analysis, sequence analysis, molecular evolution, and proteomics.

Also, Viktor Ivanov spoke about web resources for bioinformatics. Go here for the complete list of links. The links were reviewed for relevance, but none are actually endorsed by our program.

June 17--Casey Lengacher gave a presentation on "Proteins, Protein Folding, and the Lattice Model." Proteins are formed by chains of amino acids. The protein folds into a particular shape--called its tertiary structure--which affects its function. It is very difficult to determine the shape based upon the amino acid sequence. To make the problem manageable, we reduce the number of possibilities by using a lattice model. We then find the lowest energy state based upon hydrophobic and hydrophilic attractions. The presentation is available in both PowerPoint and HTML formats. Also, a survey of relevant papers is available.

Additionally, Isaiah spoke about the Big Apple server. This is a four-node Apple workgroup with over 170 open source bioinformatics programs installed. It is designed to efficiently handle many concurrent processes, and is accessible from the Internet.

June 20--Dr. Danny van Noort have a presentation on "DNA Computers in Microfluidics." Dr. van Noort is an experimentalist working at Seaul National University in South Korea. DNA computing uses DNA molecules to perform computations. Using the selection principle, we make strands representing all possible solutions to a problem and filter out the correct ones. This can be rather slow, but allows for tremendous parallelization. Microfluidics uses systems of tiny valves which can be opened and closed with pneumatic chambers to program the equivalent of Boolean operations.

June 21--The group toured some of the laboratories in the Plant Science Building. One of the labs performs gene sequencing for other enterprises on a contract basis. Another lab is investigating the symbiotic relationship between some grass and fungus, which can result in poisoning of livestock.

June 22--Levon Ter-Isahakyan spoke about hidden Markov models. A Markov model explores the space of all possible solutions to a problem by walking through various states, driven by probabilities and considering only the immediate surroundings. The probabilities can be represented as transition matrices. If the states are hidden, the representation is a hidden Markov model. In bioinformatics, this can be used to search for alignments. The process can be trained from initially unaligned sequences using the Baum-Welch expectation maximization or gradient descent methods.

June 23--Josh Gilkerson gave a presentation on "Finding Consensus in a Family of DNA Sequences." In biology, a consensus sequence can find slightly different variations of a sequence of bases in various DNA sources. Starting with a family of sequences, the consensus sequence minimizes the sum of the edit distances (mutations, additions, or deletions) going from each member of the family to that sequence. Such problems (in the general class of combinatorial optimization) are NP-hard, so we generally use approximation techniques. Two such techniques are simulated annealing and genetic algorithms. For further details, see the presentation in either PowerPoint or HTML formats.

June 26-June 29--Several students attended the ISMB 2005 conference in Detroit, Michigan.

June 30--Britt Selvitelle gave a presentation on Ruby on Rails, a web framework similar to Python. Potential uses include automatically generating a framework for blogs and working with databases in an object-oriented way. For further details, visit the official website.


University of Kentucky, Lexington, KY - 40506
Website maintained by Paul Thacker


File last modified: Saturday, 13-Aug-2005 00:53:21 EDT$