Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion scripts/build_flashinfer_jit_cache_whl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ echo "=========================================="
# MAX_JOBS = min(nproc, max(1, MemAvailable_GB/4))
MEM_AVAILABLE_GB=$(free -g | awk '/^Mem:/ {print $7}')
NPROC=$(nproc)
MAX_JOBS=$(( MEM_AVAILABLE_GB / $([ "$(uname -m)" = "aarch64" ] && echo 8 || echo 4) ))
# MAX_JOBS=$(( MEM_AVAILABLE_GB / $([ "$(uname -m)" = "aarch64" ] && echo 8 || echo 4) ))
MAX_JOBS=$(( MEM_AVAILABLE_GB / 8 ))
if (( MAX_JOBS < 1 )); then
MAX_JOBS=1
elif (( NPROC < MAX_JOBS )); then
Expand Down
68 changes: 68 additions & 0 deletions scripts/task_test_blackwell_kernels.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,75 @@ if [[ "$1" == "--dry-run" ]] || [[ "${DRY_RUN}" == "true" ]]; then
fi

if [ "$DRY_RUN" != "true" ]; then
echo "Using CUDA version: ${CUDA_VERSION}"
echo ""

# Install precompiled kernels (require CI build artifacts)
JIT_ARCH_EFFECTIVE=""
# Map CUDA_VERSION to CUDA_STREAM for artifact lookup
if [[ "${CUDA_VERSION}" == cu* ]]; then
CUDA_STREAM="${CUDA_VERSION}"
elif [ "${CUDA_VERSION}" = "12.9.0" ]; then
CUDA_STREAM="cu129"
else
CUDA_STREAM="cu130"
fi
echo "Using CUDA stream: ${CUDA_STREAM}"
echo ""
Comment on lines 27 to +42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit CUDA_VERSION validation to avoid silent fallback.

Line 28 echoes an undefined CUDA_VERSION without checking if it's set. If CUDA_VERSION is unset in the environment, the logic at lines 34-40 silently defaults to cu130 without warning the user. This echoes the critical issue from the previous review that was flagged but not yet resolved.

Add an explicit check for unset CUDA_VERSION:

if [ "$DRY_RUN" != "true" ]; then
+   if [ -z "${CUDA_VERSION}" ]; then
+       echo "⚠️  WARNING: CUDA_VERSION environment variable not set. Defaulting to cu130."
+   fi
    echo "Using CUDA version: ${CUDA_VERSION}"

This makes the fallback behavior transparent and helps users identify configuration issues.

πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ "$DRY_RUN" != "true" ]; then
echo "Using CUDA version: ${CUDA_VERSION}"
echo ""
# Install precompiled kernels (require CI build artifacts)
JIT_ARCH_EFFECTIVE=""
# Map CUDA_VERSION to CUDA_STREAM for artifact lookup
if [[ "${CUDA_VERSION}" == cu* ]]; then
CUDA_STREAM="${CUDA_VERSION}"
elif [ "${CUDA_VERSION}" = "12.9.0" ]; then
CUDA_STREAM="cu129"
else
CUDA_STREAM="cu130"
fi
echo "Using CUDA stream: ${CUDA_STREAM}"
echo ""
if [ "$DRY_RUN" != "true" ]; then
if [ -z "${CUDA_VERSION}" ]; then
echo "⚠️ WARNING: CUDA_VERSION environment variable not set. Defaulting to cu130."
fi
echo "Using CUDA version: ${CUDA_VERSION}"
echo ""
# Install precompiled kernels (require CI build artifacts)
JIT_ARCH_EFFECTIVE=""
# Map CUDA_VERSION to CUDA_STREAM for artifact lookup
if [[ "${CUDA_VERSION}" == cu* ]]; then
CUDA_STREAM="${CUDA_VERSION}"
elif [ "${CUDA_VERSION}" = "12.9.0" ]; then
CUDA_STREAM="cu129"
else
CUDA_STREAM="cu130"
fi
echo "Using CUDA stream: ${CUDA_STREAM}"
echo ""
πŸ€– Prompt for AI Agents
In scripts/task_test_blackwell_kernels.sh around lines 27 to 42, validate
CUDA_VERSION before using it: if CUDA_VERSION is unset or empty, print a clear
error message stating it must be provided and exit non-zero; otherwise echo
"Using CUDA version: ${CUDA_VERSION}" and proceed with the existing mapping
logic. Ensure you use quoted checks (e.g. [ -z "${CUDA_VERSION}" ] or [[ -z
"${CUDA_VERSION}" ]]) so unset/empty values are detected, and do not silently
fall back to cu130 β€” if you prefer a fallback instead, print an explicit warning
showing the chosen default before continuing.

if [ -n "${JIT_ARCH}" ]; then
# 12.0a for CUDA 12.9.0, 12.0f for CUDA 13.0.0
if [ "${JIT_ARCH}" = "12.0" ]; then
if [ "${CUDA_STREAM}" = "cu129" ]; then
JIT_ARCH_EFFECTIVE="12.0a"
else
JIT_ARCH_EFFECTIVE="12.0f"
fi
else
JIT_ARCH_EFFECTIVE="${JIT_ARCH}"
fi

echo "Using JIT_ARCH from environment: ${JIT_ARCH_EFFECTIVE}"
DIST_CUBIN_DIR="../dist/${CUDA_STREAM}/${JIT_ARCH_EFFECTIVE}/cubin"
DIST_JIT_CACHE_DIR="../dist/${CUDA_STREAM}/${JIT_ARCH_EFFECTIVE}/jit-cache"

echo "==== Debug: listing artifact directories ===="
echo "Tree under ../dist:"
(cd .. && ls -al dist) || true
echo ""
echo "Tree under ../dist/${CUDA_STREAM}:"
(cd .. && ls -al "dist/${CUDA_STREAM}") || true
echo ""
echo "Contents of ${DIST_CUBIN_DIR}:"
ls -al "${DIST_CUBIN_DIR}" || true
echo ""
echo "Contents of ${DIST_JIT_CACHE_DIR}:"
ls -al "${DIST_JIT_CACHE_DIR}" || true
echo "============================================="

if [ -d "${DIST_CUBIN_DIR}" ] && ls "${DIST_CUBIN_DIR}"/*.whl >/dev/null 2>&1; then
echo "Installing flashinfer-cubin from ${DIST_CUBIN_DIR} ..."
pip install -q "${DIST_CUBIN_DIR}"/*.whl
else
echo "ERROR: flashinfer-cubin wheel not found in ${DIST_CUBIN_DIR}. Ensure the CI build stage produced the artifact." >&2
fi

if [ -d "${DIST_JIT_CACHE_DIR}" ] && ls "${DIST_JIT_CACHE_DIR}"/*.whl >/dev/null 2>&1; then
echo "Installing flashinfer-jit-cache from ${DIST_JIT_CACHE_DIR} ..."
pip install -q "${DIST_JIT_CACHE_DIR}"/*.whl
else
echo "ERROR: flashinfer-jit-cache wheel not found in ${DIST_JIT_CACHE_DIR} for ${CUDA_VERSION}. Ensure the CI build stage produced the artifact." >&2
fi
echo ""
fi

# Install local python sources
pip install -e . -v --no-deps
echo ""

# Verify installation
echo "Verifying installation..."
(cd /tmp && python -m flashinfer show-config)
echo ""
fi

EXIT_CODE=0
Expand Down