Team:METU Turkey SoftLab

From 2011.igem.org

Revision as of 07:55, 2 September 2011 by Metu 2011 (Talk | contribs)

Abstract

'''Under Construction!!!'''

...

Team

METU Turkey Software is an interdisciplinary team of 7 students and 4 advisors from various backgrounds such as Molecular Biology, Bioinformatics, Computer Engineering and Computer Education and Instructional Technology. We have put our knowledge and experience in our fields together to bring a much needed solution to a daily problem in field of synthetic biology for iGEM 2010

......

.....

......

.....

......

.....

......

.....

......

.....

......

.....

......

.....

......

.....

......

.....

......

.....

Motivation

...

Scope and Future Aspects

.....

.....

....

Project Introduction

...

Notebook

January

Brain Storming about the iGEM.

  • What is iGEM.
  • Previous Wet-Lab Projects developed at METU.
  • What kind of projects can be developed as a software team.

NOTE: Still the first software team in Turkey...

February
  • Reading articles about iGEM WetLab and Software team projects.
  • Looking for the members of the team.
  • Looking for the instructors who can consult the team.
March
  • Employing a member to team interested in Synthetic Biology.
  • Reading articles about Synthetic Biology, Bioinformatics and Bio Engineering.
  • Founding the team [ An instructor, and student members ]
April
At this month we have started regular workshop about Synthetic Biology, Bioengineering, and Bioinformatics.
  • This month biologists in the team telling the needed basics to software group.
Week 1

Workshop -1

[Biology basics, What is Synthetic Biology?, and the works in this field ]

Week 2

Workshop -2

[What is Synthetic Biology?, and the works in this field ]

Week 3

Workshop -3

[iGEM, Parts, Biobricks, and Devices ]

Week 4

Workshop – 4

[iGEM, Parts, Biobricks, and Devices ]

May
At this month we have completed our workshops, and as the masters in this field started meeting with instructors. /
  • Meanwhile, looking for sponsors [We have designed a document telling the iGEM, previous project and our project generally and started to send it private companies who can found us.]
  • This month it is turn to software group; they are transferring the basics of software concepts to biologists in the team.
  • Furthermore, we are discussing about how we can apply the basics of computer engineering to synthetic biology and iGEM parts.
Week 1

Meeting -1

[First, discussion on articles that have been selected by consulters. Then, brain storming about the selected iGEM projects from previous years and our project ]

Week 2

Meeting -2[with consulters ]

[Tellingtheprevious projects to consulters and telling our initial idea about project. Then, brainstorming about our project. ]

Week 3

Meeting -3

[Basic databaseconcepts and iGEM parts. ]

Week 4

Meeting -4

[What is ER Diagram and How we can develop a database model for iGEM parts with ER Model. ]

June

At this month software group is going on to tell the basics of software development and programming and computer engineering, discussions about the computer engineering approaches are continuing.

  • Furthermore, we have constructed the design group for web page, poster, presentation, and an attractive animation telling us.
Week 1

Meeting – 5

[Graph theory, Graph theoretic modeling, and graphical modeling of iGEM parts. Using Input Output loops on iGEM parts. ]

Week 2

Meeting – 6 [ With Consulters ]

  • Project Description.
  • Our tasks on holiday.
  • Telling our project and basic concepts to design team.
  • Discussion on storyboard for animation.
Week 3 and Week 4

HAVE a NICE HOLIDAY

SEE YOU ON JULY 1 as a POWERFUL TEAM; “METU TURKEY SOFTWARE”.

July

At this month we have started to develop application. And divided the team to 3 groups [ Software - Gene – Design ].

  • Gene group is providing row data to software group by extracting it from the part registery and other resources.
  • Software group is developing application.
  • Designers are learning new design tools, and applying those to our project [Not all Members of the group working actively for the team].

NOTE: Members are not strictly assigned to a group; this is just for organization of tasks.

Week 1

Meeting-7

  • Take stock for current situation.
  • Discussion on web, poster, animation design.
  • Discussion on storyboard for animation.
  • Task analyses for each group [ Software, Gene, Design ].
Week 2

Meeting – 8

Checking the tasks of each group.

· Software Group

  • Database Design
  • Interface for DB.
  • Designing a basic SRS and SDD to be able to state the requirements of application exactly .

· Gene Group

  • Extracting I/O information for each part in part registry according to specified standards by Gene group.
  • Discussion about expectation from the software.

· Design Group

  • Team Logo
  • Web site
  • Poster
  • Animation
  • Presentation
Week 3 and Week 4

DOING THE TASKS.

August

At this month we have started to apply graph theory on iGEM parts. We have specified nodes, edges, graph types. Furthermore, we have started to develop a new “ Part Registery Form ” to be able to more standardize the part entry to be able to apply some algorithms on the parts more efficiently.

Week 1

Meeting -8

/

  • Node data extraction algorithm.
  • Node description.
  • Visualization of nodes.
  • Pathway finding according to specified I/O properties.
  • Representing the nodes with original images.
Week 2

Meeting -9

  • Whichone is node part or subparts or both are nodes in different graphs?
  • Part Combination rules.
  • Web site, Poster Content
  • Animation storyboard.
  • Survey for new “Part Registery Standarts”.
  • NewPart Registery Form.
Week 3

Meeting -10

/

  • Extraction of Part Combination Rules
  • Web, poster, presentation contents generally.
Week 4

Meeting -11

  • USTC and Berkeley projects.[ https://2009.igem.org/Team:USTC_Software and https://2009.igem.org/Team:Berkeley_Software
  • Graphical representation of node relations.
  • Part Combination Rules
  • Subpart Combinations Rules
  • Expectations from the software (SRS: Functional; Requirements)
September
At this time all bases for application were nearly to finish, the software group was waiting row data from the gene group. Meanwhile, they were working on code bases.
Week 1

DOING THE TASKS.

Week 2

Meeting -12

  • Final Database
  • Final Graphs
  • GUI
  • Expectations from the software (SRS: Functional Requirements) (Suggestions)
  • Survey details
Week 3

Meeting -13

  • Animation (Storyboard, timeline)
  • Web site (Suggestions to web site)
  • Poster (How we can use 3D stereoscopic image, How we can tell the development progress and our concepts by 3D effect etc...)
  • Presentation ( Suggestions about presentation )
Week 4

Meeting -14

  • Final Graphs
  • GUI ( about 70 % is over )
  • How to send the software to other teams for collaboration ( just general ideas, details will be talked later ).
  • With survey or not , can it shade the software?
  • What to ask to teams while sending?
October

Now everything is nearly over, it is time to put everything together.

  • Gene group explaining the methods that we have used during project.
  • Software group finalizing, software, testing it importing new function according to collaboration results with METU TURKEY wet lab team., trying to solve infinite bugs…
  • Design group putting all together…
  • Meanwhile, all team is writing the content for web, poster, and presentation.
Week 1

Meeting -15

  • BioGuide Application, missing points.
  • Content writing
  • Web, poster, animation
Week 2, Week 3, & Week 4

GOOD NEWS Infinite meetings started :)

  • Writing content [shared it, tasks assigned to members according to their fields.]
  • Software; debugging, testing, adding functions…
  • Designers web, poster, presentation, animation, importing content…
November

November 1 – 12 is our ticket dates,

See you in BOSTON …

METU TURKEY SOFTWARE

Follow our discussion topic at our googlegroups

Download Executable and Code

You can download BioGUIDE at http://sourceforge.net/projects/bioguide

Download & install it and feel it's power!!

Collaboration

Part Registry Survey

Click here to go to survey page..


Results

(including responses between 10th -22nd of Oct 2010)

General Profile of Participants

  • The following teams are dedicated as collaborators with more than 60% team participation are
    • INSA-Lyon
    • Lethbridge
    • WashU
  • Out of 244 participants between 10 to 22.10.2010, 57% of the participant had scientific degrees from B.Sc to Professor and 18% had graduate degrees. 18% of participants are enrolled in their teams as either Instructors or Advisors.
  • 95 teams have responded to the survey while we are still waiting to hear from 33 teams. 74% teams participated with one or more members to the survey.
  • 75% of participants were interested with synthetic biology field for academic purposes.

Browsing the Registry of Standard Parts

  • 56% of participants think that it is not easy to search for the parts in Registry of Standard Parts. Many comments indicate a need for a better search engine and more flexible keyword search options, especially excepting aliases. Also many are longing for recognizable parts names, which will ease searching with keyword.

Partnership with Google and enforcing standardized parts names are suggested

As a global organization iGEM can offer the Parts Registry in different languages and more illustrations describing how the system works.

Content of Registry of Standard Parts

  • 57% of participants agree that the number of parts registered in the Registry of Standard Parts is not enough for their projects.
  • 55% of participants think that there are enough and useful parts distributed in iGEM Plates that we can use in our projects.

Even though most agree the number of parts in the registry is impressive, still they find it limited when it comes to design different devices for diverse applications especially in different species other than E. Coli. Participants believe that if there are more functional standardized parts, especially protein coding sequences and promoter-RBS , they can design devices according to the needs of the community instead of designing what can simply be assembled into a device.

Encouraging development of vectors and standards for new species and new standardized parts in different research areas is suggested.

Enforcing submission of right DNA sequences and working conditions for each part is suggested.

Few recommend expanding iGEM into a collaborative effort rather than an undergraduate tournament, which will increase the number and the diversity of the parts designed and submitted all throughout the year.

Submission to the Parts Registry

  • 52% of participants said that they have not encountered difficulties during submitting parts. Even though participants are satisfied with the web interface of the registry, most complains about the pSB1C3 as the new standard plasmid to submit DNA.
  • 71% of participants are like minded with our team's opinion, which is that The nomenclature of part IDs such as construct, device, composite parts, protein generator, is confusing as there is no consensus on how to use them correctly.

Terminology and categorization used on iGEM’s Parts Registry should be re-described and correct use of terminology should be enforced during the submission process.

  • 75% of participants agree that different, specified submission interfaces should be designed for contructs, promoter, RBS, CDS and terminals is needed during Registry of Standard Parts. But, there are very strong and valid arguments against it such as, losing the flexibility of the registry will not allow future submission of unclassified parts.

We suggest keeping the parts submission interface as is, until these concerns are addressed.

  • 75% of participants agree that Out-dated, un-available and not-characterized parts in the Registry of Standard Parts should be removed to an archive after the consent of the designer.

“It would be great to see some sort of organization like this! I agree that unavailable parts should be followed up on and removed if necessary. I also think that parts which are not sufficiently documented should be highlighted in some way. Once these parts are identified, teams can actively characterize them as part of their projects or as side projects.”

“Think about these things: (i) who decides when a part is out-dated, and how can that person know that an old part cannot have a novel use in the future? (ii) likewise, an uncharacterized part may be both characterized and used in the future”

We suggest building a backup system, such as an archive, to sort out the rarely used, un-available and un-categorized parts until they are in line with the enforced standards.

  • 91% of participants have same opinion with us, which is that standardization of the nomenclatures used for each different composition of parts is necessary.

Standards that should be enforced and Additional New Standards

According to our survey, from high rated to low, these standards have been rated which has been used while assigning a name to parts

  • 33% Type of part
  • 17% Input
  • 17% Output
  • 14% Version
  • 10% Year
  • 9% Group

Along with above, having short recognizable part names along with function and performance , Genbank/EMBL link and organism information is important.

  • 93% of participants have said that for the parts that are marked as “WORKS” distinguishing the parts with quantitative experimental validation vs parts without this information is important. Most participants have encountered with similar problems about parts that don’t work under their lab conditions or works but not they were claimed for.
  • 89% of participants have same opinion with us, which is that iGEM should sub-categorize the “WORKS” comment into 1) “Quantitative” for parts which are characterized with experiments and 2) “Qualitative” for parts which are not characterized will be an appropriate measure for standardization of Biobrick database.

In order to overcome these problems we suggest enforcing the working conditions title for the registry entrance, in order to collect quantitative experimental details on submitted parts, which might slow down the registration process but will definitely increase the quality of the database.

  • 61% of participants agree that POPS (Polymerase Per Second) should be assigned to every part or biobricks with a promoter, where appropriate. - 57% of participants have been agree that RIPS (Ribosome per Second) should be assigned to every part or biobricks with a RBS brick.

Though most participants agree the need for POPS and RBS information , they are concerned about the workload it would bring to individual labs.

“To do this, the Registry need to define a reliable and easy method of determining the PoPS for teams to use. However, I would say that there are better systems for quantifying promoter output than PoPS, and they should be used instead, if possible”.

  • 67% of participants have thought that entering POPS information should not be mandatory while submitting new parts. Similarly, 65% of participants disagree that entering RIBS information should be mandatory while submitting new parts

Even though the researchers feeling the need for this information they are shying away from requesting it as a mandatory title for parts registry as it would be difficult for underfunded and inexperienced groups to perform these measurements.

We strongly suggest starting a forum on how to quantify the performance of promoters and genes to bring an easy to measure standard for the efficiency of the parts. Additionally iGEM should the responsibility and provide the measurements for the each promoter and gene included in the distributions. The second choice would be even better in terms of standardization as all the measurement will be performed by one center under similar conditions and with experienced researchers, which will allow user to compare and contrast the efficiencies of the parts more accurately.

  • 82% of participants have thought that information on working conditions of the parts should be mandatory while submitting new parts. Most find submiting the detailed experimental information and working conditions is crucial and even easier than submitting measurements of POPS or RBS.

Definitions you would like to see at the Registry of Standard Parts

  • Transcriptional efficiency 13%
  • Protein lifetime 10%
  • Ribosome binding efficiency 10%
  • mRNA lifetime 9%
  • Translation initiation and efficiency 9%
  • Protein concentration 9%
  • Cooperative effects with other molecules 9%
  • Protein-DNA binding rates and efficiencies 8%
  • RNA polymerase affects 8%
  • System copy count 8%
  • Protein multimerization 6%

Additional titles includes: Catalytic rates and affinities for substrates, leakiness of promoter in lack of stimulus, POPS at various inducer/repressor concentrations.

Efficiency of the Database Entries

  • 86% of participants would like to see a ranking/rating system for the parts by the other iGEM users which will be one indication of if a part is working and how well in different laboratories. Few had concerns about how well the rating system will work for rarely used parts while the widely used parts would even more popular due the the rating system. Still many believes this would be one futher towards a peer-reviewed quality control system for the parts.
  • 61% of participants agreed that parts should be updated regularly by the designers, where most agreed at least when there is new information on the parts. It has also been suggested to give permission to all the users of that part for updating information.
  • 73% of participants have been agree with us that excluding the low ranking parts or the parts with negative feedback from the future plates will increase efficiency of the system. The major concern about excluding any part is losing the variety of parts in the database. Few recommends excluding only the parts that are not working.

“Efficiency shouldn't be top priority in a database. First and foremost, data is the top priority. Excluding those parts would make the system more efficient”

“Some parts may be rare or new and have low efficiency, but can be very important! Getting rid of them would eliminate any chance of improvement to these parts, which not only a qualifier for an iGEM gold medal, but also one of the focuses of biobricks.”

We suggest excluding the parts not-working, low rated or with negative feedbacks from the annual distribution plates but still archive them and make their data available through the parts registry. So the while the individuals labs are receiving plates with higher rated, fully working parts for their projects, anyone who wants to work on a more exotic part can search through the achieves and re-vitalize the parts stored there. The challenge of re-vitalization of parts can be encouraged as an collaborative effort.

New Options for the Parts Registry Database

  • 96% of participants are like minded with us that it will be useful to have a link out to the gene/protein information of the parts and - %97 of participants have been agree that they would like to know if a part is also involved in known biological pathways.

For receiving pathway information more participants have voted for NCBI Cog (59%) than KEGG pathways (38%) when the responses for both has been distributed among the choices according to response rates. Adding the blast option to the parts registry has also been suggested to locate parts of interest. We are sure all of us would like to see gene-protein and pathway information if these information was integrated into the database and offered automatically for each entry in the database.

We are planning to provide this information about the parts to all parts registry users as a build-in option in the next version of BioGuide in iGEM 2011.


New Parts Registry Form Suggested for The New Standards

Description

Warning Boxes:

  • If Out-dated, un-available and not-characterized parts exist in the Registry of Standard Parts, bring to an archive after the consent of the designer. Divide archive into three title: Out-dated, un-available and not-characterized parts
  • Besides shown as “works”, in the works box there should be explanation whether the part is characterized or non-characterized.
  • Parts should be updated regularly by the designers
  • Excluding the low ranking parts or the parts with negative feedback from the future plates

Characterization Boxes:

  • transcriptional efficiency
  • mRNA lifetime
  • ribosome binding efficiency
  • translation initiation and efficiency
  • protein lifetime
  • protein concentration
  • protein multimerization
  • protein-DNA binding rates and efficiencies
  • cooperative effects with other molecules
  • RNA polymerase effects
  • system copy count

Desription

Search box

  • with click options
  • options: searched parts are:
    • Available
    • Length OK
    • Building
    • Planning
    • Missing
    • Unavailable

according to the clicks of above options, search is modified

Description

Assume on the part image;

  • part DNA sequence is not confirmed, then tag with "non-confirmed DNA sequence"
  • non-characterized parts in the Parts Registry are not characterized further, then it will be tagged as "deprecated"

also:

  • comment box stated that any team can make comment about experiences with the part is opened
  • boxes which had been not filled with the data are highlighted;
    • transcriptional efficiency
    • mRNA lifetime
    • ribosome binding efficiency
    • translation initiation and efficiency
    • protein lifetime
    • protein concentration
    • protein multimerization
    • protein-DNA binding rates and efficiencies
    • cooperative effects with other molecules
    • RNA polymerase effects
    • system copy count
  • if the part is not characterized but "works" then a "Qualitative part" tag is added
  • besides "works", "Characterized" or "non-characterized" box is added
  • ranking/rating stars for the parts voted by the other iGEM users which indicate how well the parts perform in different laboratories is added. For example 4.5 star voted by 27 teams (number of stars and number of votes)

Design

User Guide

For new users whom want to interest with synthetic biology and for experienced scientists, in our program you have two options for Input Output properties ;

To find input-output related biobrick parts or larger constructs! If you want to provide specific chemical or physical inputs from external environment, you may choose these inputs from the list. if you want other specific inputs which will be expressed from a coding sequence, the list can provide these specified protein names.

Wanted features of outputs can be selected by either from list or entering individually into text boxes.

Then click on the “Show Parts ” button.

Then the program will show you the parts have specified I/O properties

To see the properties of a part just click on it.

After clicking, you will see a highlighted pathway on the 2nd graph. Which are the subparts of the clicked part. Showing the sub graph in network.

Human Practices

iGEM’s parts registry is the only current database that holds information and DNA for over 3000 standard parts for the use of synthetic biology community. BioGUIDE is the first designed software that organizes the parts in iGEM spring 2010 distribution as possible atomics parts to build new biological device and systems for specific input and outputs based on graph theory. We are the first group who has applied a novel algorithm to search for input/output relations between the parts to reveal possible construct assemblies. This new approach will chance how the parts registry is used by the scientific community. And the availability of software implementing our algorithm with a very user friendly graphical user interface will allow all the users of the parts registry to explore new and novel constructs according to their parameter with ease. Additionally as the BioGUIDE software is an open source ware , any user can contribute to the development of the application. So, BioGUIDE will be improving with a collaborative effort, which will make it even more widely used among the synthetic biologists.

We also had many collabrations with the teams such as, INSA-Lyon, Lethbridge, WashU through our survey and WARSAW as we have participated in their survey. Also our sister team METU_TURKEY is our main collaborator. They have provided feedback BioGUIDE at different stages of the development. And they had the chance to analyze their constructs input/output relations throughout this collaboration.

ALL TEAMS: The Parts Registry Survey that has been developed by METU_TURKEY_SOFTWARE got response from 253 of participant from 94 teams. Analysis of the survey results we have received so far indicates that we were able outreach to the iGEM community and help them verbalize their concerns and suggestions for the parts registry standards and the maintenance of the database.

WARSAW: Our team members have participated in their survey abour iGEM participants profile.

METU_TURKEY: Beta testing of the algorithm developed and the BioGUIDE software has been performed by METU_TURKEY on few case studies. Additionally they have tested their construct for 2010 competition with our algorithm.

Material

Our main data source for BIO Guide Software Program was the available background information of parts distributed in 2010 iGEM plates (Total of three384-well plates of dried DNA) to the wetlab teams. This data was available through both the parts registry main website (http://partsregistry.org/Main_Page) in XML format and parts registry libraries (http://partsregistry.org/assembly/libraries.cgi?id=31) in Excel format. Data from parts with specific part IDs have been parsed with a custom code developed to modify SAX Parser. Then, the rest of the data which needs to be standardized according to biological importance have been extracted from the Registry of Parts Page manually. The chemical (IPTG, galactose etc.) or physical (UV irradiation, temperature etc) external inputs and proteins synthesized from a biobrick coding sequence can affect promoters on the parts. These effectors are identified under the title “Input”. And the “Output” s of these effectors are classified as inducers (a molecule that starts gene expression), repressors (blocker of attachment of RNA polymerase to promoter), activators (increasing the rate of transcription) and inhibitor s(decreasing the rate of transcription). These standardizations on the database helped us to build the algorithm based on input/output relationships. MySQL Server is used for Database development and organization. All of our illustrations for ER and algorithm is created in SmartDraw (trial version). Java Programming Language, and NetBeans Development environment is used or for software development. The graphical visualization of the software is done with Cyctoscape and yfiles libraries (trial version) are used for the presentation of graphical events. We have utilized css Javascripts for our webdesign. Autodesk Maya 2011 with Academic Licence, Adobe Creative Suite 5 Master Collection (Trial Version) have been used for animations and illustrations. Video tutorial for the BioGUIDE has been created by camstudio and trial versions of Flash and After Effects are also used for the videos.

Supporting Tools

  • SAX Parser ( modified )to parse XML files
  • Java ProramingLanguage, NetBeans Developement environment for software developement
  • MySQL Server for DataBase
  • cystoscape for graph visualization
  • yfiles library for graph events
  • SmartDraw for illutration of ER and Algorithm
  • maya & Cinema 4D for 3D animation, Adobe Master Collection and Microsoft Expression Studio for design
  • CSS, Java Script for web

Safety

Synthetic biology has the potential to impact many areas of society. Synthetic biologists may use artificial molecules to reproduce emergent behavior from natural biology, with the goal of creating artificial life or seeking interchangeable biological parts to assemble them into devices and systems that function in a manner not found in nature (Benner and Sismour 2005, Endy 2005, Heinemann and Panke 2006, Luisi 2007, Serrano 2007).There is possibility of causing intentional or accidental harm to humans, agriculture or the environment. While deliberate damage is dealt with under the heading biosecurity, the potential unintended consequences have to be considered under the term biosafety. As a software developer , we have to consider all possible maliciously use of synthetic biology tools. However, it is diffucult to understand for which purposesour tools will be used, bestway avoiding garage bioterrorism is all parts before adding to partsregistry must be checked , looked for toxic affect and any environmental or human harmness. After scanning possible candidate parts, a committe should decide whether novel part can be added parts registry or not.And we can only warn the user about our intention while building the application:

“BioGUIDE v1.0 software is FOR RESEARCH USE ONLY, no medical or diagnostic use for applications of the novel Biobrick constructs generated through our software has been described “

“No military (defense or combat) applications will be allowed in future”

Methods

Part Extraction Standards

All information about the parts that are essential in experimental setup of iGEM projects has been utilized. The information for the parts available provided with all three 384 well plates in Spring 2010 distribution have been standardized. Our standardization criteria have been discussed in detail under Database Standardization. ER diagram has been generated which simply describes the organization of the data. Around 70% of the parts information has been fetched by the custom parsing code from XML and Excel files provided by iGEM. Rest of the data had to be collected and organized manually as the organization of these data cannot be standardized to generate an algorithm. This step was one of the most time consuming steps in our project. For each construct and Biobrick the information collected was; Activity, Inducer, Activator, Repressor and Inhibitor for promoters and Inducer, Activator, Repressor and Inhibitor information valid for synthesized molecules (mostly proteins and RNA fragments etc.)

Combination

Rules (Image Combinations) In order to build our input/output relations graphs first we run our algorithm on the real combination dataset which contains all few thousand different possible combinations of the biobricks. But after performing all combinations for the first few hundred biobricks application’s rate slowed downed tremendously, which also become very time consuming for displaying biobricks graphs. To overcome this bottleneck we have developed a new strategy, where we have only used the construct combinations of the biobricks distributed within the plates. Moreover, according to information gathered from the subparts of the constructs distrubuted, we also collected the subpart assembly order, such as 1st: promoter, 2nd:rbs, 3rd:coding seq, any internal parts and the Last: terminator. Each specific Biobrick type has been assigned a number as a unique image ID from 1 to 19. Gathering the information on subparts was not a direct forward process. ImageID assembly orders for each construct has been used to extract the type information for each subpart with that construct. This innovative approach helped us to reveal 400 possible brick combinations present within the 3x384 well plates distributed by iGEM in Spring 2010.

Contact

For critics, suggestion, or appraisal, you can contact us on software_metuturkey@googlegroups.com

Future Plan

The application we have developed can be used by all iGEM members. As the iGEMs database expands and the recognition of the field of Synthetic Biology increases, data resources from other biological databases such as NCBI might be needed to be integrated to the application. In such a situation, extendibility of the application is vital. New data resources and new functions should be added easily.

Before planning ahead, feedback from other teams and iGEM headquarters about the BioGuide 1.0 will be collected. That will help us to fill in the missing features of the application and check the theories which are basis of our algorithms.

As our application is not geared towards any commercial use and will stay as an academic application, keeping track of the weekly developmental process on the wiki notebook environment was satisfactory. If the need for a commercial application emerges, we should be utilizing professional software development approaches to determine the exact requirements and to facilitate the use of a common language between interdiciplinary members in the team.

For constructing BioGuide 2.0 we have some plans.

Short Term plan:

Next year we are planning to generate BioGuide 2.0 by using all parts data but inorder to do this we will update our part database but easyway is standardization and reorganizing all parts in partsregistrty.org according to our suggestion because re-organizing and normalization are crucial. We are planning to add new tools to improve graphs.Our ultimate aim is finding best pathway based on automated construction and input-output relation. BioGuide 2.0 will be more faster because we are planning to use OODBMS and all all platform will support BioGuide 2.0.

Long term plan:

We want to improve our algoritm and add more parameter to make graphs more effective. Our dream is embeding our software into partsregistry.org so no mere iGEMers will choose parts in real time by using our software.

Suggestions based on PartsRegistry Survey Results

First suggesiton is offering fartnership with Google for easy search and founding a committe to enforce a standardized nomenclature for terminology and parts registry entries.

We strongly suggest starting a forum on how to quantify the performance of promoters and genes to bring an easy to measure standard for the efficiency of the parts. Additionally iGEM should assume the responsibility and provide the measurements for the each promoter and gene included in the distributions. The second choice would beeven better in terms of standardization as all the measurement will be performed by one center under similar conditions and with experienced researchers, which will allow user to compare and contrast the efficiencies of the parts more accurately. We suggest excluding the parts not-working, low rated or with negative feedbacks from the annual distribution plates but still archive them and make their data available through the parts registry. So the while the individuals labs are receiving plates with higher rated, fully working parts for their projects, anyone who wants to work on a more exotic part can search through the achieves and re-vitalize the parts stored there. The challenge of re-vitalization of parts can be encouraged as an collaborative effort. We are sure, all of us would like to see gene-protein and pathway information if these information was integrated into the database and offered automatically for each entry in the database. We are planning to provide this information about the parts to all parts registry users as a build-in option in the next version of BioGuide in iGEM 2011.

Database Standardization

Two main focuses of our project was the organization of the available information about Biobricks on iGEM’s website and development of a software application to help synthetic biologists at the experimental set-up level by providing all available construct combinations for any given input and output relations ,which they can utilize for their own project.

Normalization and re-organization of the part information at iGEM’s web site was needed in order to develop our application, which will automatically search the possible construct combinations. For the organization and analysis of the Biobricks, we used part info for Spring 2010 distribution. The information on all three 384 well plates distributed by iGEM scrutinized and checked individually to specify the standards available and needed. iGEM is providing so many parts within a hierarchical way, but there is no order in the information flow and no common standards. Furthermore, the information bulk is being used in an ineffective manner. Some of the parts distributed are known to be nonfunctional. Web pages for parts contain lots of information, but majority of them, are again not ordered. Moreover, some additional information had to be removed or replaced in such a way that the information for parts can be used effectively. And removal of the redundant bulk information related with parts at iGEM’s web site had been recommended for future.

Although, the final standardization, which we have suggested is not for general public use and it was urgently needed in order to satisfy the needs of our algorithm. But, still it will be a valuable resource, since it summarizes the basic information about the parts.

As the first step to build the proposed standardization template, the headings selected related to parts are listed on Table 1. Submission of part IDs for individual parts is an accepted and quite valuable way of tracking information. Although, every part has unique partID, for every part there is a need to assign unique part names as official iGEM names. Part names will have an important role as they will be providing the short description about the part, which synthetic biologists can immediately recognize and utilize during the construction of unique Biobricks. Additionally unique part names will be helpful to identify the devices with more than one Biobrick in their constructs. Assignment of unique and distinct names for parts describing their nature and content will be helpful to researchers for the recognition of and search for the parts.


Headings Selected From Previous Entry Forms for Indication of Standardized Information

=========================================

PartID:

PartName:

Bricks:

BrickIDs:

ImageIDs:

RFC10:

RFC21:

RFC23:

RFC25:

=========================================

Table 1: The table above basically describes and designates qualities of parts which identifies their compositions and demonstrates the status of previously assigned standards. PartID refers to the unique ID number for parts including atomic parts and assemblies. PartName refers to the given unique names to parts. Bricks, refers to the shortcut names which specifies atomic parts. ImageIDs, refers to individual or combination of numbers that are assigned by us. RFCs refers to the states of parts based on RFC standards.

iGEM both provides individual, atomic parts and pre-combined constructs such as devices and systems. Availability of combined constructs is important to the researchers as combining individual bio-bricks one at a time will be very time consuming. These previously merged constructs, serve as the repository for puzzle and they can be used for different purposes. Up to date the largest and most trustworthy source, for synthetic biology and its components, is iGEM’s parts registry. In 2010, iGEM provided over 1000 parts that have initiated many projects. Having more atomic parts available in the iGEM’s repository, will lead to the design of more complex and robust constructs, and we would have a better chance to design different constructs for unique purposes. Also, for the parts that are already available, extra steps needs to be taken for the quality control and surveillance of these products. The quality control of the information for the parts is essential for the future of iGEM and synthetic biology. Even though we have found pre-determined RFC standards useful and included those to our standardized template, some individual parts still requires re-organization of the information as RFC standards alone for the functionality of parts, does not satisfy the needs for wet lab biologists.

Without a question there is an urgent need to build a distinct and specific database well organized with its own standards for synthetic biology; however, development of such a database is not an easy task.


Contact Information of Part Owners and Qualitative Group Comments about Parts

=========================================

Designers: Mail:

GroupFavorite:

StarRating:

Parameters:

=========================================

Table 2: The above table simply depicts information about possessors of parts and their contact information and the popularity of the parts for groups. Parameters heading, refers distinctive experimental details unique to the usage of parts which should be decided by groups.

Second step for building the standardized template was to get the phylogenic information about the parts development process which includes the name of the group, designer and contact information, along with the comments from the group on the parts they have submitted. Contact information is especially important for iGEM as other groups who need extra information about the available part can reach to the required information. Even though contacting with the designers of the individual parts which are available is highly encouraged by iGEM, unavailability of contact information points at out the fact that iGEM’s parts registry needs strong re-organization in order to serve to the synthetic biology community properly.

Additionally, the “group favorite” and “starRating” fields are also important for individual evaluation of the parts, which doesn’t get the deserved attention from the iGEM groups. “Group Favorite” defines the confidence on the part by the designer group. “StarRating” defines the related part in terms of popularity and usage efficiency among the groups. According to our observations, most groups are not aware of either of the fields or they are used incorrectly or ineffectively. For example for a part with a full reporter which is known to be functional and gives precise and expected results the StarRating should be at least 2 stars, but for most of the parts in 2010 distribution, it is very difficult to observe a part whose “StarRating” is above one. For quick determination of functionality of the parts these two evaluations are important so they have been included in the proposed standardization template. But, as they were not properly used up to now for the re-organization of the parts information during the development of our software application we had to include all parts to our queries regardless of their evaluations based on “Group Favorites” and “ StarRatings”

Second step for building the standardized template was to get the phylogenic information about the parts development process which includes the name of the group, designer and contact information, along with the comments from the group on the parts they have submitted. Contact information is especially important for iGEM as other groups who need extra information about the available part can reach to the required information. Even though contacting with the designers of the individual parts which are available is highly encouraged by iGEM, unavailability of contact information points at out the fact that iGEM’s parts registry needs strong re-organization in order to serve to the synthetic biology community properly.

Additionally, the “group favorite” and “starRating” fields are also important for individual evaluation of the parts, which doesn’t get the deserved attention from the iGEM groups. “Group Favorite” defines the confidence on the part by the designer group. “StarRating” defines the related part in terms of popularity and usage efficiency among the groups. According to our observations, most groups are not aware of either of the fields or they are used incorrectly or ineffectively. For example for a part with a full reporter which is known to be functional and gives precise and expected results the StarRating should be at least 2 stars, but for most of the parts in 2010 distribution, it is very difficult to observe a part whose “StarRating” is above one. For quick determination of functionality of the parts these two evaluations are important so they have been included in the proposed standardization template. But, as they were not properly used up to now for the re-organization of the parts information during the development of our software application we had to include all parts to our queries regardless of their evaluations based on “Group Favorites” and “ StarRatings”


Input and Output Characteristics of Parts

=========================================

Parameters:

-Input:

• Promoter:

• Activity:

• Inducer:

• Activator:

• Repressor:

• Inhibitor:

• Promoter2:

• Activity:

• Inducer:

• Activator:

• Repressor:

• Inhibitor:

-Output:

• Reporter:

• Reporter2:

• Regulator:

• Inducer:

• Activator:

• Repressor:

• Inhibitor:

• Regulator2:

• Inducer:

• Activator:

• Repressor:

• Inhibitor:

-Working Condition:

=========================================

Table 3: The table above elaborately describes the input relations based on promoters and the output products based on the functional genes and RNAs which are included within the parts. Working condition simply describes any influencing factor or circumstance which is directly related with the functional properties of parts.

Third part of our standardization template includes parameters of contingent input and output elements. These parameters are classified into two groups for simplicity as presented on Table 3. This final part of the standardization template includes the upmost important information about the Biobricks that are required for the BioGuide Software to run its searching algorithm.

Briefly, BioGuide application is designed to catch the input and output relations of individual parts to examine possible Biobricks pathways for specific input and output queries. In other words, at pre-experimental stage, it helps wet lab biologists to design their unique constructs by revealing possible alternative options for pre-determined purposes, along with the primary paths. Our ultimate goal is to improve the algorithm designed for iGEM 2010 and present a new version of the BioGuide in iGEM 2011, which will provide optimum design of constructs for predetermined parameters.

Most of the parts are composed of functional and nonfunctional constructs which are formed by atomic parts. And every part should carry the information for all of its atomic parts within itself. The “input” heading actually stands for promoters. Parts with one or more promoters can be found at iGEM’s Parts Registry. Along with the information on which and how many promoters a part might have, the activity level of promoters are also important to distinguish between a constitutively active promoter or a promoter activated by specific physiological processes or states etc. This information was crucial for us to dissect in order to run our algorithm as it directly affects which inputs can activate the devices or the systems.

Throughout our investigations on the Parts Registry, we found out that much of the terminology was being used ambiguously. Although this might not be vital for synthetic biologists, it is still endeavoring to understand the function of certain regulatory elements which also becomes a time consuming task for the researcher. Thus, we recommend that the explanations of certain regulatory elements should be redefined and fixed especially for synthetic biology for easy communication, sharing and searching of information.

Common misuses of the terminology can guide us to figure out how to construct a standard nomenclature for synthetic biology. We claim that a standard nomenclature is urgently needed for synthetic biology for the following reasons. First of all, synthetic biology is an emerging research discipline and an industrial application area which is highly promising. Secondly, redefinition of the terminology to build a standard nomenclature is needed as some of the terms are prone to be used instead of another causing problems related to misuse for the global communication about synthetic biology. Lastly, the nomenclature has major importance for the construction of a persistent and trustworthy database for synthetic biology which serves for the information exhibition and exchange globally. For instance, there are obvious misunderstandings about the words which are predominantly used for regulation process. We have noticed that, the terms “inhibitor” and “repressor” are being used as equivocally in the part information pages. Like the lactose inhibitor protein, a widely used DNA-binding transcriptional repressor, that have been labeled both as “inhibitor” and “repressor” at iGEM’s Parts Registry. Similar problems resulting from ambiguous use of terminology also observed with regulatory elements. To sum up, we investigated all input elements for promoters and classify these elements in terms of their function, affect and required input element for them. So, we suggest that terminology used for regulation of transcription should be defined clearly on iGEM’s website and correct use of terminology should be enforced.

The second group of parameters was collected under the title “Output”, which refers to products of functional genes. In contradiction, the term “reporter” has also been described within the same list. Reporters are also genes whose products, can be used for screening as an output. According to our group, the usage of the term “reporter” for genes is unnecessary and cause extra complexity for information distribution and gives rise to discrepancies. Instead of using the term “reporter”, predefined “gene” description should be used for genes, which can function as reporters. The special information which is related with the characteristic of that gene should also be presented on part info web page.

Furthermore, the same terminology “reporter” was used for both atomic parts and composite bio-bricks. Also the overall image descriptions for these were defined as “reporters”. We want to point out that using same nomenclature for both atomic genes and for whole functional constructs contributes to the complexity and makes specific explorations difficult through the Parts Registry. So, assigning “reporter” for both atomic parts and for whole constructs is not a good practice. Instead, we are suggesting the usage of other available terminology for the parts listed as reporters, which most of the constructs, now known as reporters, can be grouped into, such as “protein generators”, “composite parts” or “inverters”.

Devices are whole constructs which are functional and have specific and distinct functions. But, as we have observed, unfortunately, the term “device” is also being used for parts which are not functional and do not have specific functional at all. Moreover, within the classification of devices, we argue that some terms are also being used unnecessarily and ambiguously. Devices are classified into five types which are protein generators, reporters, inverters, receivers and senders, measurement devices. For example iGEM defines protein generators as:

Protein generator = promoter + rbs +gene + terminator

Though we accept the definition for protein generators, we observed that there exist numerous parts which are defined as protein generators but actually most of them do not fit to the definition provided above. Although some parts are not functional and do not generate proteins at all, they are classified as protein generators, which makes searching for the parts difficult in the registry. Furthermore, there are also numerous parts which are defined as “composite parts” but actually they fit to the same definition with protein generators. In order to overcome the problem of misuse of device type we have extracted related image ID information for the composite parts. Image ID information helped us to correctly categorize composite parts depending on its individual atomic parts and identify the ones with more than one function, such as being both inhibitor and activator. In other words, we used image and part IDs in order to merge an input for its outputs.

Subtitle working conditions, includes all the detailed information about the experimental properties of parts, and the details about the working process of individual parts and complete devices. Additionally, we marked the subtitle “Working Condition” in our standardization template as potentially the most important title that helps synthetic biologist to better understand the parts functions at iGEM’s part registry database. The main problem we have encounter with the subtitle “working condition” is within most of the parts the details about working process is not enough and not provided regularly.


Examples of Misuse of Terminology:

For Composite Parts:

PartID: BBa_S04055

PartName: Synthetic lacYZ operon

This part is functional and responsible for the production of LacY and LacZ proteins. This part partially fits the definition for “composite part” but actually should be a protein generator as it fits fully to the definition of “protein generators”.

For Protein Generators:

PartID: BBa_J45299

PartName: PchA & PchB enzyme generator

The part which is illustrated above actually fits the definition for “composite part” but in part registry it is classified as protein generator. This part can be functional but it needs a promoter. Even though this part is not functional and is not capable of producing protein, part registry assigns this product as protein generator. We suggest that all parts in the registry, which are composed of more than one atomic part and which are not functional on their own but can be functional, should be classified as “composite parts”.

For Reporters:

PartID: BBa_J04451

PartName: RFP Coding Device with an LVA tag

This functional part is classified as “Reporter” in the parts registry database. It is very clear that this part fits the same description as Protein Generator in Biobrick part registry standards. Although, this part has specific and known functional role, characterizing this part as a reporter is unnecessary and contributes to the level of complexity of information provided. Instead, we suggest that this part should be classified as “protein generator” and related detailed information about the specific function of this part, should be provided in the part information page.

In conclusion, as mentioned above we tried to reorganize and normalize the information about parts which is provided in part registry for 2010 in order to develop our algorithm for the BioGuide application. During this process, we encountered some inconsistencies and misuses of the terminology being used and also inadequacies about the information provided about parts. First of all, we claim that a standard nomenclature should be constituted for future use in the field of synthetic biology. Based on the information gathered according to new nomenclature a professional database should be constructed to address the needs of synthetic biology. This will enable easy information exchange and exhibition globally. Secondly, although there are enough information about parts exists on parts registry database, the information which is provided for parts need to be ordered urgently. Furthermore, there should be new experimental standards which must be introduced to groups in the part submission process for the subtitle “working condition”. These experimental standards will be important because the experimental details about parts are not satisfying the needs of wet-lab biologists for the design and the construction of new Biobricks.

Sponsors

  • ...
  • ...
  • ...
  • ...

Algorithm

In this section, the step by step functioning of our application, along with the encapsulation of the algorithmic concepts of ‘standardization’ of functional iGEM devices are depicted in pictorial forms called flowcharts. Rectangular boxes represent the encapsulation of implementations of the computer programs to perform the particular tasks stated in that box on the flowcharts. These boxes are sometimes called subprograms, objects or packages in Object Oriented software Engineering context. The diamonds represent decision branching and they are found between two rectangular boxes. The arrows show the direction in which subprograms work and communicate. The subprogram at the head of the arrow starts executing after the termination of the subprogram at the tail of the arrow. Following flowcharts are the high level representations of our algorithms developed for the BioGuide software.


1

Diagram 1. Flowchart of collection, formatting and storage of devices data algorithm

Information about the iGEM parts had to be collected in a standardized format for our application to function properly. Following data collection custom subprograms is needed to parse and forward the data the application’s database. In order to achieve this we have designed and implemented the algorithm shown in diagram 1. In this algorithm, the first stage was to find the list of part IDs of devices which were supplied by iGEM in Spring 2010 distribution. This information has been collected from two sources 1) plate files in excel format which was available online 2) device data provided in xml format, both provided by iGEM. The last step in the algorithm was to send the collected partID data to the application’s database.


2.

Diagram 2. Flowchart for BioGuide execution before and during user interaction

Diagram 2 presents the main algorithm, which shows how BioGuide application works. In BioGuide the major components are device and Biobrick graphs. While the device graph represents input-output (promoter-regulator) compatibility combination of iGEM devices, the Biobrick graph represents combinations of atomic parts assembled in a device or system. The flowchart shows how these graphs are created and embedded into the program, which displays both of the graphs to the user when launched. Application presents few interactive options to the user when started, which were shown on the flowchart under the horizontal, bolded line. As shown on the diagram 2, there are four interactive tasks BioGuide can do, where the device and Biobricks graphs are utilized. Upon clicking a node on a devices or Biobricks graph, that node changes in size and color and the various functions shown on the flowchart can be performed then after.

Graph Modeling

Graphical Modeling for Bio-Guide

Introduction

Graphical Modeling Theory has been applied to construct four different graphs where relations of atomic parts, devices and systems and the functional combinations that can build new constructs are presented for the iGEMs parts registry database. Three graphs are composed of iGEM devices and one graph is based on Biobricks. Each graph comprises a set of vertices or nodes and a set of edges. In the set of nodes each node represents a device, while in the set of edges each edge represents the input-output combination of the nodes. These graphs are directed graphs as the edges are created according to input-output combination. All compatibilities between a regulator and a promoter of an edge is created, where the source of this edge is the device with the corresponding regulator and target of the edge is the device with the promoter in concern.

Fig. 1: A node representing a device

Fig. 2: Arrow representing an edge between two nodes

The atomic structures used in our graphical model have been represented in Figures 1 and 2. A node is represented with a solid circle where the label, the part/device ID according to iGEM standards, of the device is marked on the foreground. The blue arrows between nodes connect the related devices, representing the input-output connectivity. End style of the arrow helps us to determine the direction of the node, like in Figure 2 where the node labeled BBa_S03520 is the source and BBa_JO9250 is the target.


Directivity

All the four constructed graphs build for BioGuide are directed graphs. So that, for every edge there must be a single source and a target. There is no single edge which is bidirectional. In mathematical form this can be represented as:

If an edge e has node v as source and node w as target then the edge can be expressed as

For a directed graph the combination (v, w) is totally different from (w, v). Therefore,

The direction of the edges has been represented with the arrows, as explained in Figure 2.


Connectivity

The nodes forming their own sub-graphs disconnected from the rest of the nodes have been recognized, which showed us the presence of incompatibility between few regulators and promoters of the devices. We have observed this disconnection in all four of our graphs. The basis of the disconnection has been shown in Figure 3, where the two sub-graphs without any edge that connects them to the main graph has been presented on the right hand side of the diagram. These features classify our graphs as disconnected graphs [1].

Fig. 3: A zoomed in screenshot showing two sub-graphs within the disconnected graph.


"Semi-Simplicity"

A simple graph is a graph in which no more than one edge contains the same set of nodes. So, in a simple graph it is not possible to find more than one edge with the same source and the same target. Additionally, an edge with the same source and target, forming a loop is not allowed. But, in synthetic biology it is possible to construct a device consisting of devices or bio bricks of the same species or type. Accordingly, our graphs are simple graphs with an exception of possible self-containing loops, where the edge starts from and ends on the same node. Our graphs have an exception of having loops and due to this permitted flexibility our graphs are "semi-simple".

For general information about graphs refer to:

[1] http://en.wikipedia.org/wiki/Graph_(mathematics)

Results

  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...
  • ...

Attribution and Contributions

...

...

...

Development Plans and Project Management

...

...

..