Skip to content

Commit 6aedf98

Browse files
authored
xarray.merge function and major refactor for merge logic (#857)
* xarray.merge function Fixes GH417 New top level :py:func:`merge` function allows for combining variables from any number of ``Dataset`` and/or ``DataArray`` variables. Example usage: >>> arrays = [xr.DataArray(n, name='var%d' % n) for n in range(5)] >>> xr.merge(arrays) <xarray.Dataset> Dimensions: () Coordinates: *empty* Data variables: var0 int64 0 var1 int64 1 var2 int64 2 var3 int64 3 var4 int64 4 The internal refactoring also lays the ground work for supporting ufunc-like functions that merge three or more arguments, such as the full form of `where`. * Fixes * Add tests for _consolidate_slices * Add another test * Merge fix
1 parent d89a219 commit 6aedf98

19 files changed

+985
-487
lines changed

doc/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Top-level functions
1717
align
1818
broadcast
1919
concat
20+
merge
2021
set_options
2122

2223
Dataset

doc/combining.rst

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -75,14 +75,15 @@ expensive if you are manipulating your dataset lazily using :ref:`dask`.
7575
Merge
7676
~~~~~
7777

78-
To combine variables and coordinates between multiple Datasets, you can use the
79-
:py:meth:`~xarray.Dataset.merge` and :py:meth:`~xarray.Dataset.update` methods.
80-
Merge checks for conflicting variables before merging and by default it returns
81-
a new Dataset:
78+
To combine variables and coordinates between multiple ``DataArray`` and/or
79+
``Dataset`` object, use :py:func:`~xarray.merge`. It can merge a list of
80+
``Dataset``, ``DataArray`` or dictionaries of objects convertible to
81+
``DataArray`` objects:
8282

8383
.. ipython:: python
8484
85-
ds.merge({'hello': ('space', np.arange(3) + 10)})
85+
xr.merge([ds, ds.rename({'foo': 'bar'})])
86+
xr.merge([xr.DataArray(n, name='var%d' % n) for n in range(5)])
8687
8788
If you merge another dataset (or a dictionary including data array objects), by
8889
default the resulting dataset will be aligned on the **union** of all index
@@ -91,9 +92,22 @@ coordinates:
9192
.. ipython:: python
9293
9394
other = xr.Dataset({'bar': ('x', [1, 2, 3, 4]), 'x': list('abcd')})
94-
ds.merge(other)
95-
96-
This ensures that the ``merge`` is non-destructive.
95+
xr.merge([ds, other])
96+
97+
This ensures that ``merge`` is non-destructive. ``xarray.MergeError`` is raised
98+
if you attempt to merge two variables with the same name but different values:
99+
100+
.. ipython::
101+
102+
@verbatim
103+
In [1]: xr.merge([ds, ds + 1])
104+
MergeError: conflicting values for variable 'foo' on objects to be combined:
105+
first value: <xarray.Variable (x: 2, y: 3)>
106+
array([[ 0.4691123 , -0.28286334, -1.5090585 ],
107+
[-1.13563237, 1.21211203, -0.17321465]])
108+
second value: <xarray.Variable (x: 2, y: 3)>
109+
array([[ 1.4691123 , 0.71713666, -0.5090585 ],
110+
[-0.13563237, 2.21211203, 0.82678535]])
97111

98112
The same non-destructive merging between ``DataArray`` index coordinates is
99113
used in the :py:class:`~xarray.Dataset` constructor:

doc/internals.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ xarray:
8181
This achieves the same result as if the ``Dataset`` class had a cached property
8282
defined that returns an instance of your class:
8383

84-
.. python::
84+
.. code-block:: python
8585
8686
class Dataset:
8787
...

doc/whats-new.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,10 @@ Enhancements
4444
option that clips coordinate elements that are fully masked. By
4545
`Phillip J. Wolfram <https://github.com/pwolfram>`_.
4646

47+
- New top level :py:func:`merge` function allows for combining variables from
48+
any number of ``Dataset`` and/or ``DataArray`` variables. See :ref:`merge`
49+
for more details. By `Stephan Hoyer <https://github.com/shoyer>`_.
50+
4751
- DataArray and Dataset method :py:meth:`resample` now supports the
4852
``keep_attrs=False`` option that determines whether variable and dataset
4953
attributes are retained in the resampled object. By

xarray/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from .core.variable import Variable, Coordinate
66
from .core.dataset import Dataset
77
from .core.dataarray import DataArray
8+
from .core.merge import merge, MergeError
89
from .core.options import set_options
910

1011
from .backends.api import open_dataset, open_mfdataset, save_mfdataset

xarray/core/alignment.py

Lines changed: 64 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from . import ops, utils
99
from .common import _maybe_promote
1010
from .pycompat import iteritems, OrderedDict
11-
from .utils import is_full_slice
11+
from .utils import is_full_slice, is_dict_like
1212
from .variable import Variable, Coordinate, broadcast_variables
1313

1414

@@ -77,13 +77,33 @@ def align(*objects, **kwargs):
7777
aligned : same as *objects
7878
Tuple of objects with aligned coordinates.
7979
"""
80+
return partial_align(*objects, exclude=None, **kwargs)
81+
82+
83+
def partial_align(*objects, **kwargs):
84+
"""partial_align(*objects, join='inner', copy=True, indexes=None,
85+
exclude=set())
86+
87+
Like align, but don't align along dimensions in exclude. Any indexes
88+
explicitly provided with the `indexes` argument should be used in preference
89+
to the aligned indexes.
90+
91+
Not public API.
92+
"""
8093
join = kwargs.pop('join', 'inner')
8194
copy = kwargs.pop('copy', True)
95+
indexes = kwargs.pop('indexes', None)
96+
exclude = kwargs.pop('exclude', None)
97+
if exclude is None:
98+
exclude = set()
8299
if kwargs:
83100
raise TypeError('align() got unexpected keyword arguments: %s'
84101
% list(kwargs))
85102

86-
joined_indexes = _join_indexes(join, objects)
103+
joined_indexes = _join_indexes(join, objects, exclude=exclude)
104+
if indexes is not None:
105+
joined_indexes.update(indexes)
106+
87107
result = []
88108
for obj in objects:
89109
valid_indexers = dict((k, v) for k, v in joined_indexes.items()
@@ -92,36 +112,52 @@ def align(*objects, **kwargs):
92112
return tuple(result)
93113

94114

95-
def partial_align(*objects, **kwargs):
96-
"""partial_align(*objects, join='inner', copy=True, exclude=set()
115+
def is_alignable(obj):
116+
return hasattr(obj, 'indexes') and hasattr(obj, 'reindex')
117+
97118

98-
Like align, but don't align along dimensions in exclude. Not public API.
119+
def deep_align(list_of_variable_maps, join='outer', copy=True, indexes=None):
120+
"""Align objects, recursing into dictionary values.
99121
"""
100-
join = kwargs.pop('join', 'inner')
101-
copy = kwargs.pop('copy', True)
102-
exclude = kwargs.pop('exclude', set())
103-
assert not kwargs
104-
joined_indexes = _join_indexes(join, objects, exclude=exclude)
105-
return tuple(obj.reindex(copy=copy, **joined_indexes) for obj in objects)
122+
if indexes is None:
123+
indexes = {}
124+
125+
# We use keys to identify arguments to align. Integers indicate single
126+
# arguments, while (int, variable_name) pairs indicate variables in ordered
127+
# dictionaries.
128+
keys = []
129+
out = []
130+
targets = []
131+
sentinel = object()
132+
for n, variables in enumerate(list_of_variable_maps):
133+
if is_alignable(variables):
134+
keys.append(n)
135+
targets.append(variables)
136+
out.append(sentinel)
137+
elif is_dict_like(variables):
138+
for k, v in variables.items():
139+
if is_alignable(v) and k not in indexes:
140+
# don't align dict-like variables that are already fixed
141+
# indexes: we might be overwriting these index variables
142+
keys.append((n, k))
143+
targets.append(v)
144+
out.append(OrderedDict(variables))
145+
else:
146+
out.append(variables)
106147

148+
aligned = partial_align(*targets, join=join, copy=copy, indexes=indexes)
107149

108-
def align_variables(variables, join='outer', copy=False):
109-
"""Align all DataArrays in the provided dict, leaving other values alone.
110-
"""
111-
from .dataarray import DataArray
112-
from pandas import Series, DataFrame, Panel
113-
114-
new_variables = OrderedDict(variables)
115-
# if an item is a Series / DataFrame / Panel, try and wrap it in a DataArray constructor
116-
new_variables.update((
117-
(k, DataArray(v)) for k, v in variables.items()
118-
if isinstance(v, (Series, DataFrame, Panel))
119-
))
120-
121-
alignable = [k for k, v in new_variables.items() if hasattr(v, 'indexes')]
122-
aligned = align(*[new_variables[a] for a in alignable], join=join, copy=copy)
123-
new_variables.update(zip(alignable, aligned))
124-
return new_variables
150+
for key, aligned_obj in zip(keys, aligned):
151+
if isinstance(key, tuple):
152+
n, k = key
153+
out[n][k] = aligned_obj
154+
else:
155+
out[key] = aligned_obj
156+
157+
# something went wrong: we should have replaced all sentinel values
158+
assert all(arg is not sentinel for arg in out)
159+
160+
return out
125161

126162

127163
def reindex_variables(variables, indexes, indexers, method=None,

xarray/core/combine.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
import pandas as pd
44

55
from . import utils
6-
from .pycompat import iteritems, reduce, OrderedDict, basestring
6+
from .merge import merge
7+
from .pycompat import iteritems, OrderedDict, basestring
78
from .variable import Variable, as_variable, Coordinate, concat as concat_vars
89

910

@@ -69,6 +70,7 @@ def concat(objs, dim=None, data_vars='all', coords='different',
6970
7071
See also
7172
--------
73+
merge
7274
auto_combine
7375
"""
7476
# TODO: add join and ignore_index arguments copied from pandas.concat
@@ -204,6 +206,7 @@ def _dataset_concat(datasets, dim, data_vars, coords, compat, positions):
204206
# list; the gains would be minimal
205207
datasets = [as_dataset(ds) for ds in datasets]
206208
dim, coord = _calc_concat_dim_coord(dim)
209+
207210
concat_over = _calc_concat_over(datasets, dim, data_vars, coords)
208211

209212
def insert_result_variable(k, v):
@@ -217,7 +220,6 @@ def insert_result_variable(k, v):
217220
result_coord_names = set(datasets[0].coords)
218221
result_attrs = datasets[0].attrs
219222

220-
# Dataset({}, attrs=datasets[0].attrs)
221223
for k, v in datasets[0].variables.items():
222224
if k not in concat_over:
223225
insert_result_variable(k, v)
@@ -374,5 +376,5 @@ def auto_combine(datasets, concat_dim=None):
374376
grouped = itertoolz.groupby(lambda ds: tuple(sorted(ds.data_vars)),
375377
datasets).values()
376378
concatenated = [_auto_concat(ds, dim=concat_dim) for ds in grouped]
377-
merged = reduce(lambda ds, other: ds.merge(other), concatenated)
379+
merged = merge(concatenated)
378380
return merged

0 commit comments

Comments
 (0)