Skip to content

aliccp_dataset_processing.py 170行 .tmp 文件缺失 #3

@cywuuuu

Description

@cywuuuu
 def norm_df(path,out_path):                             
     df = pd.read_csv(path,dtype=np.int32)               
     print(df.shape)                                     
     df -= (min_v-1)                                     
     df[df<0]=0                                          
     df =df.astype(np.int32)                             
     print(df.head(10))                                  
     df.to_csv(out_path,index=False)                     
     return df                                           
 train_df = norm_df(data_path.format('train') + '.tmp',  
         norm_data_path.format('train'))                 
 test_df = norm_df(data_path.format('test') + '.tmp',    
         norm_data_path.format('test'))                  

此处的.tmp是指哪个文件,我复制了一份.csv到.csv.tmp但是仍然报错如下:

    df = pd.read_csv(path,dtype=np.int32)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
    return parser.read(nrows)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1036, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1137, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: invalid literal for int() with base 10: 'bacff91692951881'

请问具体可能是什么原因呢

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions