Skip to content

Conversation

@manulera
Copy link
Collaborator

@manulera manulera commented Nov 11, 2025

Same as #483, but rebased. Please continue developing on this one @BjornFJohansson

Hi @BjornFJohansson look at the last commit where I fixed the looped function, now it passes the tests. I created the draft PR so we can discuss here.

I like the changes to assembly2, I think they make things clearer, and the overriding of the PCR assembly function makes a lot of sense.

I wonder if this bit from assembly2 could be turned into a function (strands_anneal or something), or some way to test for reverse-complementarily:

        seq_u = loc_u.extract(f_u).seq
        seq_v = loc_v.extract(f_v).seq
        # instead of testing for identity we test if seq_u and seq_v anneal
        anneal = all(basepair_dict.get(x, y) for x, y in zip(str(seq_u), str(seq_v)))
        if not anneal:

…mplement_table, _complement_table, to_watson_table, to_crick_table, to_N, to_5tail_table, to_3tail_table, to_full_sequence, bp_dict
2. new Dseq.__init__ w same arguments as before, but data is now stored in Bio.Seq.Seq._data
3. altered Dseq.quick classmethod
4. watson, crick and ovhg are methods decorated with @Property
5. New method to_blunt_string with returns a the string of the watson strand if the underlying Dseq object was blunt.
6. Old __getitem__ replaced
7. New __repr__ method
8. new looped method
9. new __add__ method
… imports at the top. Some tests involved strands that did not anneal prefectly, these have been corrected.
…ytestrings

2. user method that removes U and leaves an empty site.

3. cast_to_ds_right, cast_to_ds_left methods, these are *not* fill_in methods as they do not rely on a polymerase.

4. New melt method, useful for USER cloning etc..

5. reimplemented apply_cut method
…e x and y has meaning in the new Dseq implementation. (line 1074)

2. The expected result in test_pcr_assembly_uracil should be AUUAggccggTTOO.
3. Removed numbers at start and end of some sequenses. This could be discussed.
4. Four instances of FIXME: The assert below fails in the Sanity check on line 770 in assembly2, but gives the expected result.
fuction dsbreaks is called from pydna.alphabet in __init__

simplified code overall, fuction get_parts from pydna.alphabet used in several places

simpler looped method using get_parts and __add__

improved error message from __add__
BjornFJohansson and others added 19 commits November 11, 2025 18:48
This will take care of bytes.find over the origin.
Leads to simpler Dseq.find implementation.

Added a __bytes__ dunder method to Dseq.
Previously, bytes(Dseq) called __bytes__ on the Bio.Seq.Seq
returning basically the ._data property.
This is different from str(Dseq) that returns the watson strand
of an imaginary blunt version of the Dseq.
With this, bytes and str returns the same thing ecxept for encoding.

Explain the added "79.0" in mw method.

Added that Dseq.find is case sensitive. I think this is
the least surprising functionality. Debatable.

simplified the docstring for Dseq.find method

removed commented out code

reimplemented __contains__ in terms of Dseq.find

Highlighted that __eq__ is case insensitive. Debatable.

Added docstring to shifted method

simplified the looped by testing if self+self gives a TypeError.
This puts the logics in the __add__
Changed

assert str(bbfeat).upper() in bbseq
-->
assert str(bbfeat).upper() in str(bbseq).upper()

changed since Dseq.__contains__ is now case sensitive since it now relies on Dseq.find
We have to call

out_dseqrecord.seq.cast_to_ds_left()
out_dseqrecord.seq.cast_to_ds_right()

since exo1_front and exo1_end now works differently

for symmetry:

	watson_ovhg is not a property and ovhg
More intiutive implementation of Dseq.__add__

Renamed _fill_in_five_prime and _fill_in_three_prime to the less confusing
_fill_in_left and _fill_in_right to better reflect what the methods do.

Simpler implementation of mung method.

Deleted out obsolete cas9 method.

watson_ovhg renamed left_ovhg, watson_ovhg remains an alias

Renames

_fill_in_five_prime  --> _fill_in_left
_fill_in_three_prime --> _fill_in_right

__mul__ implemented in terms of .join

nibble_to_blunt

New methods used in new Assembly2 code

	nibble_five_prime_left

	nibble_five_prime_right

	nibble_three_prime_left

	nibble_three_prime_right
@pydna-group pydna-group deleted a comment from manulera Nov 14, 2025
manulera and others added 3 commits November 14, 2025 17:27
964 Changed how the slice is expressed. This solves an edge case when Sticky3 is zero.

1624

1688 1752 1804 calculate overhang in a different way. This solves removing nucleotides from sequences that already has overhang.

2503 apply_cut_new method

2730 apply_cut
@manulera
Copy link
Collaborator Author

@cursor review this

@manulera manulera requested a review from Copilot November 18, 2025 11:43
Copilot finished reviewing on behalf of manulera November 18, 2025 11:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a major refactoring of pydna's DNA sequence representation system, implementing a new "dsIUPAC" alphabet (dscode) to better handle double-stranded DNA with overhangs, single-stranded regions, and USER enzyme treatment. The changes enable new molecular cloning techniques like USER cloning while maintaining backward compatibility.

Key changes:

  • New alphabet system with dscode symbols representing base pairs and single-stranded regions
  • Refactored Dseq class with improved internal representation and new methods for DNA manipulation
  • Enhanced support for sticky ends, melting, and enzymatic treatments (USER, T4, mung bean nuclease)

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 27 comments.

Show a summary per file
File Description
src/pydna/alphabet.py New module defining dscode alphabet with base pair dictionaries and translation tables
src/pydna/dseq.py Major refactoring of Dseq class with new internal representation and manipulation methods
src/pydna/utils.py Added anneal_from_left function and updated complement logic
src/pydna/assembly2.py Updated assembly logic to use new Dseq methods (cast_to_ds_, exo1_)
src/pydna/amplify.py Improved primer annealing detection using new alphabet system
src/pydna/dseqrecord.py Updated looped() method to handle features properly with sticky ends
tests/test_new.py New test file for dscode representations
tests/test_USERcloning.py Complete rewrite for USER enzyme cloning
tests/test_module_dseq.py Extensive test updates for new Dseq behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +811 to +813
def user_assembly_design():
"""
Work in progress
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is marked as 'Work in progress' in its docstring. If it's not ready for production, it should either be completed, removed, or marked with a decorator like @pytest.mark.skip to prevent accidental use.

Copilot uses AI. Check for mistakes.
assembly_plan = [
(1, 2, loc_end, loc_start),
]
# FIXME: The assert below fails in the Sanity check on line 770 in assembly2, but gives the expected result.
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple FIXME comments indicate that assertions are failing sanity checks but producing expected results. This suggests either the sanity checks need adjustment or there's an underlying issue that should be resolved before merging.

Copilot uses AI. Check for mistakes.
data.append(basepair_dict[w, c])
except KeyError as err:
print(f"Base mismatch in representation {err}")
raise ValueError()
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'Base mismatch in representation' is printed but then a ValueError is raised with no message. The ValueError should include a descriptive message, such as f'Base mismatch in representation: {err}' to help debugging.

Suggested change
raise ValueError()
raise ValueError(f"Base mismatch in representation: {err}")

Copilot uses AI. Check for mistakes.
Comment on lines +274 to +275
assert Dseq("G A C ", "C T G"[::-1], 0) == s # TODO: should probably give an error

Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment suggests this test case should probably raise an error but currently passes. This indicates incomplete validation logic that should be addressed.

Suggested change
assert Dseq("G A C ", "C T G"[::-1], 0) == s # TODO: should probably give an error
with pytest.raises(ValueError):
Dseq.from_representation(
"""
Dseq(-6)
G A C
C T G
""")

Copilot uses AI. Check for mistakes.
Comment on lines +673 to +678
# if seq.ovhg < 0:
# new_crick = new_crick + reverse_complement(seq.watson[: -seq.ovhg])
# # Crick 5' overhang
# elif seq.ovhg > 0:
# new_watson = reverse_complement(seq.crick[-seq.ovhg :]) + new_watson
# if _Dseq(new_watson, new_crick, 0) != seq.cast_to_ds_left():
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Copilot uses AI. Check for mistakes.

import pytest
import IPython
import sys
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'sys' is not used.

Suggested change
import sys

Copilot uses AI. Check for mistakes.
import pytest
import IPython
import sys
import copy
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'copy' is not used.

Suggested change
import copy

Copilot uses AI. Check for mistakes.
import sys
import copy
import warnings
import glob
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'glob' is not used.

Suggested change
import glob

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +46
from Bio.Restriction import (
Acc65I,
ApaI,
BamHI,
BglII,
BsaI,
Bsp120I,
Bsu36I,
BstAPI,
EcoRI,
EcoRV,
KpnI,
MaeII,
NlaIV,
NdeI,
NotI,
PacI,
PstI,
SmaI,
RestrictionBatch,
)
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'MaeII' is not used.
Import of 'PacI' is not used.

Copilot uses AI. Check for mistakes.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import pytest
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'pytest' is not used.

Suggested change
import pytest

Copilot uses AI. Check for mistakes.
@manulera manulera mentioned this pull request Nov 18, 2025
@manulera
Copy link
Collaborator Author

manulera commented Nov 18, 2025

Hi @BjornFJohansson, below my review of the PR with the things that I think should be fixed before merging. I have also added some changes myself in a PR to this branch (#489). Feel free to cherry-pick the ones you agree with or just merge it.

  • docstrings: some docstrings tests are not passing
  • documentation building fails: when the other tests pass, I can have a look and fix this myself.
  • GitHub Copilot comments: it has made some good suggestions, see if you agree with some and you can add those.
  • alphabet.py:
    • I think it is better to use a normal class instead of a namedtuple. With the class, you get property auto-completion and type hints, but not with the namedtuple. Also, I don't think people are familiar with namedtuples (I did not even know they existed). I have included this in a PR.
    • Some functions are untested and what they do is not documented. It would be good to provide minimal input-output examples in the docstring and/or unit tests.
      • dsbreaks
      • regex_ss_melt_factory
      • regex_ds_melt_factory
  • assembly2:
    • Removed FIXMEs related to the Sanity check on line 770 in assembly2. I haved fixed those in the PR.
  • test_new:
    • Should these go into the test_module_dseq.py file?
  • dseq.py:
    • issue with melt as discussed in Usage of apply_cut in melt #487.
    • mw: should this handle U characters now? Also, this can probably use a dictionary from biopython rather than hardcoding the values. Can be made into a separate issue.
    • find: given the new functionality, it would be good to have some unit tests for circular sequences where the substring spans the origin.
    • translate: we discussed that this method could be removed and maybe rely on BioPython's translate method instead. It can be a good occasion for this, but it can go into a separate issue.
    • Usage of get_parts: it may make sense to create a class method for dseq, so that instead of having to call get_parts(self._data.decode("ascii")), one can call self.get_parts() or a similar name. That also will make the method more discoverable, since it will be in the Dseq class documentation.
    • shed_ss_dna: this is undocumented and untested, it would be good to explain what it does and provide some examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants