PSO-DNA
Protein Sequence Optimizer – DNA sequence¶
PSO-DNA (DNA sequence) is an API endpoint to optimize
the sequence of a protein for expression in a host organism.
While other available tools use a deterministic approach and always return, and only,
one solution, PSO uses a stochastic approach. This means that PSO can perform a wider
exploration of the solution panorama, instead of being restricted to the same solution
over and over again.
From a single input sequence PSO generates a population of solutions that are recombined,
scored and selected over and over again to obtain a final pool of optimized solutions.
The key for the optimization is the selection of best performing sequences,
that we choose based on the target organism, on the analysis of structures and adapt
to custom needs like forbidden restriction sites.
Once added the input parameters and run the API, the request will be added to our tasks queue, and the system will return a task id to use in the successive phases of the process.
By means of the task id previously generated, you can access the status of the computation while running and the outcomes once completed.
POST Parameters:¶
is_dna : Indicates the type of the target sequence. Please set to true if the target sequence is a
DNA sequence and to false if it's an amino acid sequence. This parameter is necessary to avoid
ambiguities. For example "AAAAAAAAA" could be both a stretch of adenine (thus being a DNA sequence),
and a stretch of alanine (thus being an amino acid sequence).
input_sequence : The sequence to be optimized, either a DNA or an amino acid sequence.
input_organism : The host organism. Currently supported organisms: E. coli, Mammalian cell, S. rimosus, S. coelicolor, S. cerevisiae.
upstream_sequence : The DNA sequence upstream the target one, ideally 50bp.
This sequence won't be optimized or edited, but it will be used to minimize any possible secondary structure between the CDS and the 5'-UTR.
This parameter is valid only if the target sequence is DNA.
downstream_sequence : The DNA sequence downstream the target one, ideally 50bp.
This sequence won't be optimized or edited, but it will be used to minimize any possible secondary structure between the CDS and the 3'-UTR.
This parameter is valid only if the target sequence is DNA.
restriction_sites_to_avoid : A list of restriction enzymes to avoid. Please list the name of the enzymes to forbid.
This parameter supports the commercially available enzymes from the REBASE database
(listed in the commdata file).
forbidden_strings : A list of custom DNA sequences to avoid. You can use this parameter also to add custom restriction sites.
Please list the DNA sequence
GET Parameters¶
task_id : The task id to access the request status and results
Example¶
POST Input payload¶
{
"input_sequence": "ATGAACACTTTCTTCTCCTCAGACCAGGTCTCGGCGCCCGATCGCGTCGCGCTCTGGCACGATGTCATCTGCCGTAGCTATGTCCCGCTCAAC",
"target_organism": "E. coli",
"upstream_sequence": "ATGAACACTTTC",
"downstream_sequence": "ATGAACACTTTC",
"population_n": 100,
"generation_n": 10,
"mfe_weight": 1.5,
"mfe_weight_downstream": 1.5,
"cai_weight": 0.5,
"cai_weight_downstream": 1.5,
"gc_weight": 0,
"target_gc": 50,
"restriction_sites_to_avoid": [
"Kpn2I",
"EcoRI"
],
"custom_forbidden_strings": [
"ATTATTAT"
]
}
POST response¶
{
"id": "f20c5db1-95e0-469e-8079-4d851dcb54ec"
}
Output¶
GET endpoint¶
https://api-testing.officinae.bio/api/v1/tasks/{task_id}
e.g. https://api-testing.officinae.bio/api/v1/tasks/f20c5db1-95e0-469e-8079-4d851dcb54ec
GET Response¶
{
"input_sequence": "ATGAACACTTTCTTCTCCTCAGACCAGGTCTCGGCGCCCGATCGCGTCGCGCTCTGGCACGATGTCATCTGCCGTAGCTATGTCCCGCTCAAC",
"target_organism": "E. coli",
"output_sequence": "ATGAACACCTTCTTCAGCAGTGATCAGGTAAGCGCGCCGGATCGTGTTGCGCTGTGGCACGATGTTATCTGCCGTTCTTACGTTCCGCTGAAC",
"output_score": 0.9519981447561441,
"output_gc": 53.76,
"restriction_sites_to_avoid": [
"Kpn2I",
"EcoRI"
],
"remaining_sites": 0,
"custom_forbidden_strings": [
"ATTATTAT"
],
"input_score": 0.6940335210377976,
"input_gc": 59.14
}
