Skip to content

Can't install pandas on EMR cluster #140

@czapol

Description

@czapol

System Information

  • Spark 2.4.5
  • EMR cluster 5.30.1
  • Sagemaker notebook with sparkmagic kernel

I try to install some python additional libraries on EMR cluster using install_pypi_package API. A few months ago I had no problem to install pandas or sagemaker libraries but now I run into this long error. I am able to install some other libraries like boto3 without error.

Input:
sc.install_pypi_package("pandas")

Output:
FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…
Collecting cython
Using cached https://files.pythonhosted.org/packages/3d/48/bbca549da0b0f636c0f161e84d30172c40aafe99552680f297da7fedf102/Cython-0.29.24-cp37-cp37m-manylinux1_x86_64.whl
Installing collected packages: cython
Successfully installed cython-0.29.24

Collecting pandas
Using cached https://files.pythonhosted.org/packages/12/01/360d7f444f910ae16496c07e3f003cb8c641b4ca6c033408a4469a904df3/pandas-1.3.1.tar.gz
Building wheels for collected packages: unknown, unknown
Running setup.py bdist_wheel for unknown: started
Running setup.py bdist_wheel for unknown: finished with status 'error'
Complete output from command /tmp/1628881028302-0/bin/python -u -c "import setuptools, tokenize;file='/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmprwhg6bvxpip-wheel- --python-tag cp37:
running bdist_wheel
running build
running build_ext
building 'pandas._libs.algos' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/pandas
creating build/temp.linux-x86_64-3.7/pandas/_libs
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DNPY_NO_DEPRECATED_API=0 -I./pandas/_libs -Ipandas/_libs/src/klib -I/usr/local/lib64/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c pandas/_libs/algos.c -o build/temp.linux-x86_64-3.7/pandas/_libs/algos.o
pandas/_libs/algos.c:41:10: fatal error: Python.h: No such file or directory
#include "Python.h"
^~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1


Running setup.py clean for unknown
Running setup.py bdist_wheel for unknown: started
Running setup.py bdist_wheel for unknown: still running...
Running setup.py bdist_wheel for unknown: finished with status 'error'
Complete output from command /tmp/1628881028302-0/bin/python -u -c "import setuptools, tokenize;file='/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmp8zdwxxl1pip-wheel- --python-tag cp37:
Compiling pandas/_libs/algos.pyx because it changed.
Compiling pandas/_libs/arrays.pyx because it changed.
Compiling pandas/_libs/groupby.pyx because it changed.
Compiling pandas/_libs/hashing.pyx because it changed.
Compiling pandas/_libs/hashtable.pyx because it changed.
Compiling pandas/_libs/index.pyx because it changed.
Compiling pandas/_libs/indexing.pyx because it changed.
Compiling pandas/_libs/internals.pyx because it changed.
Compiling pandas/_libs/interval.pyx because it changed.
Compiling pandas/_libs/join.pyx because it changed.
Compiling pandas/_libs/lib.pyx because it changed.
Compiling pandas/_libs/missing.pyx because it changed.
Compiling pandas/_libs/parsers.pyx because it changed.
Compiling pandas/_libs/reduction.pyx because it changed.
Compiling pandas/_libs/ops.pyx because it changed.
Compiling pandas/_libs/ops_dispatch.pyx because it changed.
Compiling pandas/_libs/properties.pyx because it changed.
Compiling pandas/_libs/reshape.pyx because it changed.
Compiling pandas/_libs/sparse.pyx because it changed.
Compiling pandas/_libs/tslib.pyx because it changed.
Compiling pandas/_libs/tslibs/base.pyx because it changed.
Compiling pandas/_libs/tslibs/ccalendar.pyx because it changed.
Compiling pandas/_libs/tslibs/dtypes.pyx because it changed.
Compiling pandas/_libs/tslibs/conversion.pyx because it changed.
Compiling pandas/_libs/tslibs/fields.pyx because it changed.
Compiling pandas/_libs/tslibs/nattype.pyx because it changed.
Compiling pandas/_libs/tslibs/np_datetime.pyx because it changed.
Compiling pandas/_libs/tslibs/offsets.pyx because it changed.
Compiling pandas/_libs/tslibs/parsing.pyx because it changed.
Compiling pandas/_libs/tslibs/period.pyx because it changed.
Compiling pandas/_libs/tslibs/strptime.pyx because it changed.
Compiling pandas/_libs/tslibs/timedeltas.pyx because it changed.
Compiling pandas/_libs/tslibs/timestamps.pyx because it changed.
Compiling pandas/_libs/tslibs/timezones.pyx because it changed.
Compiling pandas/_libs/tslibs/tzconversion.pyx because it changed.
Compiling pandas/_libs/tslibs/vectorized.pyx because it changed.
Compiling pandas/_libs/testing.pyx because it changed.
Compiling pandas/_libs/window/aggregations.pyx because it changed.
Compiling pandas/_libs/window/indexers.pyx because it changed.
Compiling pandas/_libs/writers.pyx because it changed.
Compiling pandas/io/sas/sas.pyx because it changed.
[ 1/41] Cythonizing pandas/_libs/algos.pyx
[ 2/41] Cythonizing pandas/_libs/arrays.pyx
[ 3/41] Cythonizing pandas/_libs/groupby.pyx
[ 4/41] Cythonizing pandas/_libs/hashing.pyx
[ 5/41] Cythonizing pandas/_libs/hashtable.pyx
[ 6/41] Cythonizing pandas/_libs/index.pyx
[ 7/41] Cythonizing pandas/_libs/indexing.pyx
[ 8/41] Cythonizing pandas/_libs/internals.pyx
[ 9/41] Cythonizing pandas/_libs/interval.pyx
[10/41] Cythonizing pandas/_libs/join.pyx
[11/41] Cythonizing pandas/_libs/lib.pyx
[12/41] Cythonizing pandas/_libs/missing.pyx
[13/41] Cythonizing pandas/_libs/ops.pyx
[14/41] Cythonizing pandas/_libs/ops_dispatch.pyx
[15/41] Cythonizing pandas/_libs/parsers.pyx
[16/41] Cythonizing pandas/_libs/properties.pyx
[17/41] Cythonizing pandas/_libs/reduction.pyx
[18/41] Cythonizing pandas/_libs/reshape.pyx
[19/41] Cythonizing pandas/_libs/sparse.pyx
[20/41] Cythonizing pandas/_libs/testing.pyx
[21/41] Cythonizing pandas/_libs/tslib.pyx
[22/41] Cythonizing pandas/_libs/tslibs/base.pyx
[23/41] Cythonizing pandas/_libs/tslibs/ccalendar.pyx
[24/41] Cythonizing pandas/_libs/tslibs/conversion.pyx
[25/41] Cythonizing pandas/_libs/tslibs/dtypes.pyx
[26/41] Cythonizing pandas/_libs/tslibs/fields.pyx
[27/41] Cythonizing pandas/_libs/tslibs/nattype.pyx
[28/41] Cythonizing pandas/_libs/tslibs/np_datetime.pyx
[29/41] Cythonizing pandas/_libs/tslibs/offsets.pyx
[30/41] Cythonizing pandas/_libs/tslibs/parsing.pyx
[31/41] Cythonizing pandas/_libs/tslibs/period.pyx
[32/41] Cythonizing pandas/_libs/tslibs/strptime.pyx
[33/41] Cythonizing pandas/_libs/tslibs/timedeltas.pyx
[34/41] Cythonizing pandas/_libs/tslibs/timestamps.pyx
[35/41] Cythonizing pandas/_libs/tslibs/timezones.pyx
[36/41] Cythonizing pandas/_libs/tslibs/tzconversion.pyx
[37/41] Cythonizing pandas/_libs/tslibs/vectorized.pyx
[38/41] Cythonizing pandas/_libs/window/aggregations.pyx
[39/41] Cythonizing pandas/_libs/window/indexers.pyx
[40/41] Cythonizing pandas/_libs/writers.pyx
[41/41] Cythonizing pandas/io/sas/sas.pyx
running bdist_wheel
running build
running build_ext
building 'pandas._libs.algos' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/pandas
creating build/temp.linux-x86_64-3.7/pandas/_libs
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DNPY_NO_DEPRECATED_API=0 -I./pandas/_libs -Ipandas/_libs/src/klib -I/usr/local/lib64/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c pandas/_libs/algos.c -o build/temp.linux-x86_64-3.7/pandas/_libs/algos.o
pandas/_libs/algos.c:41:10: fatal error: Python.h: No such file or directory
#include "Python.h"
^~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1


Running setup.py clean for unknown
Failed to build unknown unknown
Installing collected packages: unknown
Running setup.py install for unknown: started
Running setup.py install for unknown: still running...
Running setup.py install for unknown: finished with status 'error'
Complete output from command /tmp/1628881028302-0/bin/python -u -c "import setuptools, tokenize;file='/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-ie8cw1pj-record/install-record.txt --single-version-externally-managed --compile --install-headers /tmp/1628881028302-0/include/site/python3.7/unknown:
Compiling pandas/_libs/algos.pyx because it changed.
Compiling pandas/_libs/arrays.pyx because it changed.
Compiling pandas/_libs/groupby.pyx because it changed.
Compiling pandas/_libs/hashing.pyx because it changed.
Compiling pandas/_libs/hashtable.pyx because it changed.
Compiling pandas/_libs/index.pyx because it changed.
Compiling pandas/_libs/indexing.pyx because it changed.
Compiling pandas/_libs/internals.pyx because it changed.
Compiling pandas/_libs/interval.pyx because it changed.
Compiling pandas/_libs/join.pyx because it changed.
Compiling pandas/_libs/lib.pyx because it changed.
Compiling pandas/_libs/missing.pyx because it changed.
Compiling pandas/_libs/parsers.pyx because it changed.
Compiling pandas/_libs/reduction.pyx because it changed.
Compiling pandas/_libs/ops.pyx because it changed.
Compiling pandas/_libs/ops_dispatch.pyx because it changed.
Compiling pandas/_libs/properties.pyx because it changed.
Compiling pandas/_libs/reshape.pyx because it changed.
Compiling pandas/_libs/sparse.pyx because it changed.
Compiling pandas/_libs/tslib.pyx because it changed.
Compiling pandas/_libs/tslibs/base.pyx because it changed.
Compiling pandas/_libs/tslibs/ccalendar.pyx because it changed.
Compiling pandas/_libs/tslibs/dtypes.pyx because it changed.
Compiling pandas/_libs/tslibs/conversion.pyx because it changed.
Compiling pandas/_libs/tslibs/fields.pyx because it changed.
Compiling pandas/_libs/tslibs/nattype.pyx because it changed.
Compiling pandas/_libs/tslibs/np_datetime.pyx because it changed.
Compiling pandas/_libs/tslibs/offsets.pyx because it changed.
Compiling pandas/_libs/tslibs/parsing.pyx because it changed.
Compiling pandas/_libs/tslibs/period.pyx because it changed.
Compiling pandas/_libs/tslibs/strptime.pyx because it changed.
Compiling pandas/_libs/tslibs/timedeltas.pyx because it changed.
Compiling pandas/_libs/tslibs/timestamps.pyx because it changed.
Compiling pandas/_libs/tslibs/timezones.pyx because it changed.
Compiling pandas/_libs/tslibs/tzconversion.pyx because it changed.
Compiling pandas/_libs/tslibs/vectorized.pyx because it changed.
Compiling pandas/_libs/testing.pyx because it changed.
Compiling pandas/_libs/window/aggregations.pyx because it changed.
Compiling pandas/_libs/window/indexers.pyx because it changed.
Compiling pandas/_libs/writers.pyx because it changed.
Compiling pandas/io/sas/sas.pyx because it changed.
[ 1/41] Cythonizing pandas/_libs/algos.pyx
[ 2/41] Cythonizing pandas/_libs/arrays.pyx
[ 3/41] Cythonizing pandas/_libs/groupby.pyx
[ 4/41] Cythonizing pandas/_libs/hashing.pyx
[ 5/41] Cythonizing pandas/_libs/hashtable.pyx
[ 6/41] Cythonizing pandas/_libs/index.pyx
[ 7/41] Cythonizing pandas/_libs/indexing.pyx
[ 8/41] Cythonizing pandas/_libs/internals.pyx
[ 9/41] Cythonizing pandas/_libs/interval.pyx
[10/41] Cythonizing pandas/_libs/join.pyx
[11/41] Cythonizing pandas/_libs/lib.pyx
[12/41] Cythonizing pandas/_libs/missing.pyx
[13/41] Cythonizing pandas/_libs/ops.pyx
[14/41] Cythonizing pandas/_libs/ops_dispatch.pyx
[15/41] Cythonizing pandas/_libs/parsers.pyx
[16/41] Cythonizing pandas/_libs/properties.pyx
[17/41] Cythonizing pandas/_libs/reduction.pyx
[18/41] Cythonizing pandas/_libs/reshape.pyx
[19/41] Cythonizing pandas/_libs/sparse.pyx
[20/41] Cythonizing pandas/_libs/testing.pyx
[21/41] Cythonizing pandas/_libs/tslib.pyx
[22/41] Cythonizing pandas/_libs/tslibs/base.pyx
[23/41] Cythonizing pandas/_libs/tslibs/ccalendar.pyx
[24/41] Cythonizing pandas/_libs/tslibs/conversion.pyx
[25/41] Cythonizing pandas/_libs/tslibs/dtypes.pyx
[26/41] Cythonizing pandas/_libs/tslibs/fields.pyx
[27/41] Cythonizing pandas/_libs/tslibs/nattype.pyx
[28/41] Cythonizing pandas/_libs/tslibs/np_datetime.pyx
[29/41] Cythonizing pandas/_libs/tslibs/offsets.pyx
[30/41] Cythonizing pandas/_libs/tslibs/parsing.pyx
[31/41] Cythonizing pandas/_libs/tslibs/period.pyx
[32/41] Cythonizing pandas/_libs/tslibs/strptime.pyx
[33/41] Cythonizing pandas/_libs/tslibs/timedeltas.pyx
[34/41] Cythonizing pandas/_libs/tslibs/timestamps.pyx
[35/41] Cythonizing pandas/_libs/tslibs/timezones.pyx
[36/41] Cythonizing pandas/_libs/tslibs/tzconversion.pyx
[37/41] Cythonizing pandas/_libs/tslibs/vectorized.pyx
[38/41] Cythonizing pandas/_libs/window/aggregations.pyx
[39/41] Cythonizing pandas/_libs/window/indexers.pyx
[40/41] Cythonizing pandas/_libs/writers.pyx
[41/41] Cythonizing pandas/io/sas/sas.pyx
running install
running build
running build_ext
building 'pandas._libs.algos' extension
creating build
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/pandas
creating build/temp.linux-x86_64-3.7/pandas/_libs
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DNPY_NO_DEPRECATED_API=0 -I./pandas/_libs -Ipandas/_libs/src/klib -I/usr/local/lib64/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c pandas/_libs/algos.c -o build/temp.linux-x86_64-3.7/pandas/_libs/algos.o
pandas/_libs/algos.c:41:10: fatal error: Python.h: No such file or directory
#include "Python.h"
^~~~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1

----------------------------------------

Running setup.py (path:/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py) egg_info for package pandas produced metadata for project name unknown. Fix your #egg=pandas fragments.
Failed building wheel for unknown
Failed building wheel for unknown
Command "/tmp/1628881028302-0/bin/python -u -c "import setuptools, tokenize;file='/mnt/tmp/pip-build-p0xe97tr/pandas/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-ie8cw1pj-record/install-record.txt --single-version-externally-managed --compile --install-headers /tmp/1628881028302-0/include/site/python3.7/unknown" failed with error code 1 in /mnt/tmp/pip-build-p0xe97tr/pandas/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions