Skip to content

Conversation

@andreibratu
Copy link

@andreibratu andreibratu commented Feb 24, 2025

  • Use /otel endpoint instead of processing spans locally
  • Helpful error messages on evaluations.run()
  • Throw errors for invalid decorator uses
  • Decorators will no longer serialize all errors but raise critical ones -> evaluations will not hang mysteriously
  • Move away from context variables in favor of OTEL's runtime context API
  • @flow will also pick up SDK logging e.g. prompts.call(...) or tool.log(...)
  • Sync with backend API: jinja release, log status use

@andreibratu
Copy link
Author

andreibratu commented Feb 24, 2025

Test script

from humanloop import Humanloop
from openai import OpenAI

hl_client = Humanloop(
    api_key="<HL_API_KEY>",
    base_url="http://localhost:80/v5",
)

TEMPLATE = "You are a useful chatbot that speaks like a {{personality}}"

client = OpenAI(api_key="<OPENAI_KEY>")
hl_client.prompts.populate_template(TEMPLATE, inputs={"personality": "pirate"})
messages = [
    {
        "role": "system",
        "content": "You are a useful chatbot that speaks like a ",
    }
]


@hl_client.tool(path="Andrei QA/Calculator")
def calculator(a: int, b: int):
    return a + b


@hl_client.flow(path="Andrei QA/Flow")
def fn(cool: str, beans: int):
    output = calculator(1, 2)
    hl_client.tools.log(
        path="Andrei QA/Log",
        tool={
            "function": {
                "name": "calculator",
                "description": "Adds two numbers",
                "parameters": {
                    "a": {
                        "type": "int",
                        "description": "First number",
                    },
                    "b": {
                        "type": "int",
                        "description": "Second number",
                    },
                },
            }
        },
        inputs={
            "a": 1,
            "b": 2,
        },
        output=output,
    )

    hl_client.prompts.call(
        path="Andrei QA/Call Prompt",
        prompt={
            "provider": "openai",
            "model": "gpt-4o",
            "temperature": 0.8,
            "frequency_penalty": 0.6,
            "presence_penalty": 0.6,
        },
        messages=[
            {
                "content": "Say something funny",
                "role": "system",
            }
        ],
    )

    with hl_client.prompt(path="Andrei QA/Prompt"):
        while True:
            user_input = input("> ")
            if user_input == "exit":
                break
            messages.append(
                {
                    "role": "user",
                    "content": user_input,
                }
            )
            response = (
                client.chat.completions.create(
                    model="gpt-4o",
                    messages=messages,
                    temperature=0.8,
                    frequency_penalty=0.6,
                    presence_penalty=0.6,
                    seed=42,
                )
                .choices[0]
                .message.content
            )
            print(response)
            messages.append(
                {
                    "role": "assistant",
                    "content": response,
                }
            )

    return messages[-1]


fn("cool", 1)

# @hl_client.flow(path="Andrei QA/Flow Evaluate")
# def fn_evaluate(a: int, b: int):
#     return calculator(a, b)


# hl_client.evaluations.run(
#     file={
#         "path": "Andrei QA/Flow Evaluate",
#         "callable": fn_evaluate,
#     },
#     name="Test",
#     dataset={
#         "path": "Andrei QA/Arithmetic",
#         "datapoints": [
#             {
#                 "inputs": {
#                     "a": 1,
#                     "b": 2,
#                 },
#                 "target": {
#                     "value": 3,
#                 },
#             },
#             {
#                 "inputs": {
#                     "a": 3,
#                     "b": 4,
#                 },
#                 "target": {
#                     "value": 7,
#                 },
#             },
#         ],
#     },
#     evaluators=[
#         {
#             "path": "Andrei QA/Equals",
#             "callable": lambda x, y: x["output"] == y["target"]["value"],
#             "return_type": "boolean",
#             "args_type": "target_required",
#         }
#     ],
# )

@andreibratu andreibratu changed the title Decorators fixes Decorator fixes Feb 24, 2025
*,
path: Optional[str] = None,
**prompt_kernel: Unpack[DecoratorPromptKernelRequestParams], # type: ignore
path: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for all of these: we should support passing either path or id, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep it to path to reduce cognitive load

else:
self._opentelemetry_tracer = opentelemetry_tracer

@contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed earlier, this is cool but probably unnecessary -- simpler to just keep it as a decorator on a callable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep done

Comment on lines -131 to -143
"""Upload spans to Humanloop.
Ran by worker threads. The threads use the self._shutdown flag to wait
for Spans to arrive. Setting a timeout on self._upload_queue.get() risks
shutting down the thread early as no Spans are produced e.g. while waiting
for user input into the instrumented feature or application.
Each thread will upload a Span to Humanloop, provided the Span has all its
dependencies uploaded. The dependency happens in a Flow Trace context, where
the Trace parent must be uploaded first. The Span Processor will send in Spans
bottoms-up, while the upload of a Trace happens top-down. If a Span did not
have its span uploaded yet, it will be re-queued to be uploaded later.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should still replace this with a correct docstring

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated

@peadaroh
Copy link
Contributor

peadaroh commented Mar 6, 2025

Just to confirm this refactors the 'polling behaviour' for flow logs to instead check for when it is updated.

Copy link
Contributor

@jamesbaskerville jamesbaskerville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks great. Love the error/logging improvements and how simplified the processor/exporter logic is.

Few higher-level things:

  • why are there more # type: ignore[arg-type] littered all over? what changed to cause that?
  • need to handle the polling fix as a follow-on, though good we're sleeping in between polls now

Comment on lines +105 to +108
The Humanloop SDK File decorators use OpenTelemetry internally.
You can provide a TracerProvider and a Tracer to integrate
with your existing telemetry system. If not provided,
an internal TracerProvider will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does our OTEL exporting stuff still work ok if user passes in a TracerProvider?

If a different model, endpoint, or hyperparameter is used, a new
Prompt version is created. For example:
```
@humanloop_client.prompt(path="My Prompt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent, above example just uses @prompt. Should standardize

provided, the function name is used as the path and the File
is created in the root of your Humanloop organization workspace.
:param prompt_kernel: Attributes that define the Prompt. See `class:DecoratorPromptKernelRequestParams`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we're getting rid of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these tests all gone now? Or did they not work?

Comment on lines +52 to +54
context = get_decorator_context()
if context is None:
raise HumanloopRuntimeError("Internal error: trace_id context is set outside a decorator context.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I'm understanding error right, this should also be a check in the line 59 block? i.e. apply even when not flow client.

I do like all this validation/error checking though!

Comment on lines +838 to +839
(trace_info[-200:] + "..." if len(trace_info) > 200 else trace_info) if length_limit else trace_info
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... should be in front if it's the last 200 chars, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also feels like length_limit should just be an int instead of a boolean

},
data=serialize_span(span_to_export),
)
print("RECV", span_to_export.attributes, response.json(), response.status_code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: change to logger/debug

pass
else:
if eval_context_callback:
print("HELLO")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

]
)

return MessageToJson(payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not 100% sure this will work as expected -- OTLP JSON encoding differs a bit from the proto3 standard mapping. Possible we should just be working in proto anyways? Might make things simpler since it's more of the default rather than JSON.

https://opentelemetry.io/docs/specs/otlp/#json-protobuf-encoding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's fine if our IDs aren't "correct" as long as they're internally consistent?

attributes=[
KeyValue(
key=key,
value=AnyValue(string_value=str(value)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are they always strings? Aren't they sometimes ints or some other type?

Infinite polling the export queue without a sleep was causing threads to take up a full core of CPU. Simple sleep fixes that.
@andreibratu andreibratu merged commit 1a709ee into master Mar 7, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants