Currently, I am a part of the AI Scholars program at the University of Florida, where I am working under Dr. Catalin Voiniciuc
at the Designer Glycans Lab, as well as Dr. Wenjun Xie.
My project is "Synthetic Biology Meets Generative AI: Extending Diffusion Models to DNA Sequences for Programmable CSLA Enzyme
Generation". I am researching how to computationally generate CSLA enzymes that produce plant β-mannan polysaccharides, with
programmable activities, using our lab's collected data of the structures and functions of such enzymes.
Mannans are a type of polysaccharide found in hemicellulose, which are critical to food, material, energy, and biomedical
industries. They are present in plant cell walls and are a major source of biomass. Structural variations, from β-mannans
in palm seeds to galactomannans in legumes, allow them to function as thickeners, stabilizers, and dietary fibers in food,
as well as renewable resources for biofuels and bioplastics.
Thus far while beginning the project, I have begun working with Dr. Wenjun Xie's maximum entropy (MaxEnt) model, which uses .a3m multiple sequence aligned (MSA) files as input in order to train. This model is then used to calculate the mutational energies for mutations in CSLA enzyme sequences. These mutational energies, once ranked, show how favorable a mutation is for an enzyme. Dr. Wenjun Xie has suggested during this time that in truth the diffusion model is not optimal for enzyme mutation sequence generation, and has instead directed me toward Direct Preference Optimization (DPO).
From "Direct Preference Optimization: Your Language Model is Secretly a Reward Model" by Rafailov, Rafael, et al.
From "Modern mannan: a hemicellulose’s journey" by Voiniciuc, Cătălin