New dseq manu bjorn rebased #484

manulera · 2025-11-11T18:52:36Z

Same as #483, but rebased. Please continue developing on this one @BjornFJohansson

Hi @BjornFJohansson look at the last commit where I fixed the looped function, now it passes the tests. I created the draft PR so we can discuss here.

I like the changes to assembly2, I think they make things clearer, and the overriding of the PCR assembly function makes a lot of sense.

I wonder if this bit from assembly2 could be turned into a function (strands_anneal or something), or some way to test for reverse-complementarily:

        seq_u = loc_u.extract(f_u).seq
        seq_v = loc_v.extract(f_v).seq
        # instead of testing for identity we test if seq_u and seq_v anneal
        anneal = all(basepair_dict.get(x, y) for x, y in zip(str(seq_u), str(seq_v)))
        if not anneal:

…mplement_table, _complement_table, to_watson_table, to_crick_table, to_N, to_5tail_table, to_3tail_table, to_full_sequence, bp_dict

2. new Dseq.__init__ w same arguments as before, but data is now stored in Bio.Seq.Seq._data 3. altered Dseq.quick classmethod 4. watson, crick and ovhg are methods decorated with @Property 5. New method to_blunt_string with returns a the string of the watson strand if the underlying Dseq object was blunt. 6. Old __getitem__ replaced 7. New __repr__ method 8. new looped method 9. new __add__ method

… imports at the top. Some tests involved strands that did not anneal prefectly, these have been corrected.

…ytestrings 2. user method that removes U and leaves an empty site. 3. cast_to_ds_right, cast_to_ds_left methods, these are *not* fill_in methods as they do not rely on a polymerase. 4. New melt method, useful for USER cloning etc.. 5. reimplemented apply_cut method

… utils. This should fix U in primers

…XME indicating a large change in behaviour.

…e x and y has meaning in the new Dseq implementation. (line 1074) 2. The expected result in test_pcr_assembly_uracil should be AUUAggccggTTOO. 3. Removed numbers at start and end of some sequenses. This could be discussed. 4. Four instances of FIXME: The assert below fails in the Sanity check on line 770 in assembly2, but gives the expected result.

…he check for internal splits in init

fuction dsbreaks is called from pydna.alphabet in __init__ simplified code overall, fuction get_parts from pydna.alphabet used in several places simpler looped method using get_parts and __add__ improved error message from __add__

This will take care of bytes.find over the origin. Leads to simpler Dseq.find implementation. Added a __bytes__ dunder method to Dseq. Previously, bytes(Dseq) called __bytes__ on the Bio.Seq.Seq returning basically the ._data property. This is different from str(Dseq) that returns the watson strand of an imaginary blunt version of the Dseq. With this, bytes and str returns the same thing ecxept for encoding. Explain the added "79.0" in mw method. Added that Dseq.find is case sensitive. I think this is the least surprising functionality. Debatable. simplified the docstring for Dseq.find method removed commented out code reimplemented __contains__ in terms of Dseq.find Highlighted that __eq__ is case insensitive. Debatable. Added docstring to shifted method simplified the looped by testing if self+self gives a TypeError. This puts the logics in the __add__

Changed assert str(bbfeat).upper() in bbseq --> assert str(bbfeat).upper() in str(bbseq).upper() changed since Dseq.__contains__ is now case sensitive since it now relies on Dseq.find

We have to call out_dseqrecord.seq.cast_to_ds_left() out_dseqrecord.seq.cast_to_ds_right() since exo1_front and exo1_end now works differently for symmetry: watson_ovhg is not a property and ovhg

More intiutive implementation of Dseq.__add__ Renamed _fill_in_five_prime and _fill_in_three_prime to the less confusing _fill_in_left and _fill_in_right to better reflect what the methods do. Simpler implementation of mung method. Deleted out obsolete cas9 method. watson_ovhg renamed left_ovhg, watson_ovhg remains an alias Renames _fill_in_five_prime --> _fill_in_left _fill_in_three_prime --> _fill_in_right __mul__ implemented in terms of .join nibble_to_blunt New methods used in new Assembly2 code nibble_five_prime_left nibble_five_prime_right nibble_three_prime_left nibble_three_prime_right

…lphabet letters

…ng it

964 Changed how the slice is expressed. This solves an edge case when Sticky3 is zero. 1624 1688 1752 1804 calculate overhang in a different way. This solves removing nucleotides from sequences that already has overhang. 2503 apply_cut_new method 2730 apply_cut

manulera · 2025-11-18T11:43:34Z

@cursor review this

Copilot

Pull Request Overview

This PR introduces a major refactoring of pydna's DNA sequence representation system, implementing a new "dsIUPAC" alphabet (dscode) to better handle double-stranded DNA with overhangs, single-stranded regions, and USER enzyme treatment. The changes enable new molecular cloning techniques like USER cloning while maintaining backward compatibility.

Key changes:

New alphabet system with dscode symbols representing base pairs and single-stranded regions
Refactored Dseq class with improved internal representation and new methods for DNA manipulation
Enhanced support for sticky ends, melting, and enzymatic treatments (USER, T4, mung bean nuclease)

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 27 comments.

Show a summary per file

File	Description
src/pydna/alphabet.py	New module defining dscode alphabet with base pair dictionaries and translation tables
src/pydna/dseq.py	Major refactoring of Dseq class with new internal representation and manipulation methods
src/pydna/utils.py	Added anneal_from_left function and updated complement logic
src/pydna/assembly2.py	Updated assembly logic to use new Dseq methods (cast_to_ds_, exo1_)
src/pydna/amplify.py	Improved primer annealing detection using new alphabet system
src/pydna/dseqrecord.py	Updated looped() method to handle features properly with sticky ends
tests/test_new.py	New test file for dscode representations
tests/test_USERcloning.py	Complete rewrite for USER enzyme cloning
tests/test_module_dseq.py	Extensive test updates for new Dseq behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-18T11:49:54Z

src/pydna/design.py

+def user_assembly_design():
+    """
+    Work in progress


This function is marked as 'Work in progress' in its docstring. If it's not ready for production, it should either be completed, removed, or marked with a decorator like @pytest.mark.skip to prevent accidental use.

Copilot · 2025-11-18T11:49:54Z

tests/test_module_assembly2.py

    assembly_plan = [
        (1, 2, loc_end, loc_start),
    ]
+    # FIXME: The assert below fails in the Sanity check on line 770 in assembly2, but gives the expected result.


Multiple FIXME comments indicate that assertions are failing sanity checks but producing expected results. This suggests either the sanity checks need adjustment or there's an underlying issue that should be resolved before merging.

Copilot · 2025-11-18T11:49:54Z

src/pydna/dseq.py

+                    data.append(basepair_dict[w, c])
+                except KeyError as err:
+                    print(f"Base mismatch in representation {err}")
+                    raise ValueError()


The error message 'Base mismatch in representation' is printed but then a ValueError is raised with no message. The ValueError should include a descriptive message, such as f'Base mismatch in representation: {err}' to help debugging.

Suggested change

raise ValueError()

raise ValueError(f"Base mismatch in representation: {err}")

Copilot · 2025-11-18T11:49:55Z

tests/test_module_dseq.py

+    assert Dseq("G A C ", "C T G"[::-1], 0) == s # TODO: should probably give an error
+


The TODO comment suggests this test case should probably raise an error but currently passes. This indicates incomplete validation logic that should be addressed.

Suggested change

assert Dseq("G A C ", "C T G"[::-1], 0) == s # TODO: should probably give an error

with pytest.raises(ValueError):

Dseq.from_representation(

"""

Dseq(-6)

G A C

C T G

""")

Copilot · 2025-11-18T11:49:55Z

src/pydna/assembly2.py

+    # if seq.ovhg < 0:
+    #     new_crick = new_crick + reverse_complement(seq.watson[: -seq.ovhg])
+    # # Crick 5' overhang
+    # elif seq.ovhg > 0:
+    #     new_watson = reverse_complement(seq.crick[-seq.ovhg :]) + new_watson
+    # if _Dseq(new_watson, new_crick, 0) != seq.cast_to_ds_left():


This comment appears to contain commented-out code.

Copilot · 2025-11-18T11:50:00Z

tests/test_module_dseqrecord.py


 import pytest
+import IPython
+import sys


Import of 'sys' is not used.

Suggested change

import sys

Copilot · 2025-11-18T11:50:00Z

tests/test_module_dseqrecord.py

 import pytest
+import IPython
+import sys
+import copy


Import of 'copy' is not used.

Suggested change

import copy

Copilot · 2025-11-18T11:50:00Z

tests/test_module_dseqrecord.py

+import sys
+import copy
+import warnings
+import glob


Import of 'glob' is not used.

Suggested change

import glob

Copilot · 2025-11-18T11:50:01Z

tests/test_module_dseqrecord.py

+from Bio.Restriction import (
+    Acc65I,
+    ApaI,
+    BamHI,
+    BglII,
+    BsaI,
+    Bsp120I,
+    Bsu36I,
+    BstAPI,
+    EcoRI,
+    EcoRV,
+    KpnI,
+    MaeII,
+    NlaIV,
+    NdeI,
+    NotI,
+    PacI,
+    PstI,
+    SmaI,
+    RestrictionBatch,
+)


Import of 'MaeII' is not used.
Import of 'PacI' is not used.

Copilot · 2025-11-18T11:50:01Z

tests/test_new.py

+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+import pytest


Import of 'pytest' is not used.

Suggested change

import pytest

manulera · 2025-11-18T12:55:00Z

BjornFJohansson added 30 commits November 11, 2025 18:26

updated seq class: use base class slicing, added full_sequence property

5f3eb2e

Added override of Bio.Restriction.FormattedSeq._table

8f4dd7c

Added dicts and tables _ambiguous_dna_complement, _keys, _values, _co…

3dc3f48

…mplement_table, _complement_table, to_watson_table, to_crick_table, to_N, to_5tail_table, to_3tail_table, to_full_sequence, bp_dict

Dseq + empty string now returns the Dseq obj unchanged. collected all…

0bc3678

… imports at the top. Some tests involved strands that did not anneal prefectly, these have been corrected.

_annealing_positions new implementation using _iupac_compl_regex from…

e7eceae

… utils. This should fix U in primers

fixed fill_left and fill_right and a FIXME

c48646d

fixed initiation

6ac6149

deleted

7eecce4

anneal_from_left function and more regexes and tables

04a5ea4

updated test for USER cloning

9c7689d

moved all imports to the beginning. Changed some tests. There is a FI…

d523aff

…XME indicating a large change in behaviour.

removed if __name__ == __main__

e938341

removed main test and moved imports to the top

d801d55

Updated docstrings in Dseq class for clarity, work in progress

45705d3

fix doctests

8e79a54

removed main chunk

89e6b60

moved import

c73970e

removed reference to .length property

42cd797

removed reference to .length property

39ea20b

removed code regarding the alphabet, not in the alphabet module

a092087

broke out the __repr__ code to a function for clarity, reintroduced t…

27f5e78

…he check for internal splits in init

alphabet related code in src/pydna/alphabet.py

f998c25

mostly comments

98416f4

Commented out code to be removed.

c061519

Only check for start of error message.

b551729

Clearer names for some dicts

417e62e

BjornFJohansson and others added 19 commits November 11, 2025 18:48

better Dseq.__str__ docstring

28b6227

docsstring for get_parts function.

9f4ebba

Commented out test_cas9(), since this tests an obsolete feature.

6f30729

Changed assert str(bbfeat).upper() in bbseq --> assert str(bbfeat).upper() in str(bbseq).upper() changed since Dseq.__contains__ is now case sensitive since it now relies on Dseq.find

Commented out obsolete cas9 method.

b9fd704

Bugfix!

b5d6043

We have to call out_dseqrecord.seq.cast_to_ds_left() out_dseqrecord.seq.cast_to_ds_right() since exo1_front and exo1_end now works differently for symmetry: watson_ovhg is not a property and ovhg

watson_ovhg is a property

a53e3ae

moved big regexes to alphabet and expressed them in terms of dscode a…

0d1d4a6

…lphabet letters

docstring stubs for factory functions

15103a1

copied tests from Dseq4 branch

310953c

all melt tests pass

df48c17

fixed the range test so that two of the melt tests pass

8c9149f

fix looped and add docstring to overridden pcr linear assembly function

27eabf3

watson_ovhg is now a property.

c38a3bd

Added import.

6d44914

better docstrings.

db68d00

docstrings

c7b5ff5

deleted tolinear

86796e4

pydna-group deleted a comment from manulera Nov 14, 2025

manulera and others added 3 commits November 14, 2025 17:27

handle edge-case where feature spans both overhangs, solved by droppi…

0235627

…ng it

Added initiation tests, slice, cut and nibble

581a0f5

manulera requested a review from Copilot November 18, 2025 11:43

Copilot started reviewing on behalf of manulera November 18, 2025 11:44 View session

Copilot finished reviewing on behalf of manulera November 18, 2025 11:45

Copilot AI reviewed Nov 18, 2025

View reviewed changes

manulera mentioned this pull request Nov 18, 2025

Manu patch #489

Draft

	raise ValueError()
	raise ValueError(f"Base mismatch in representation: {err}")

		assert Dseq("G A C ", "C T G"[::-1], 0) == s # TODO: should probably give an error

-    assert Dseq("G A C ", "C T G"[::-1], 0) == s # TODO: should probably give an error
+    with pytest.raises(ValueError):
+        Dseq.from_representation(
+        """
+        Dseq(-6)
+        G A C
+        C T G
+        """)

New dseq manu bjorn rebased #484

Are you sure you want to change the base?

New dseq manu bjorn rebased #484

Uh oh!

Conversation

manulera commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manulera commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

manulera commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

manulera commented Nov 11, 2025 •

edited

Loading

manulera commented Nov 18, 2025 •

edited

Loading