Reputation
Badges 1
15 × Eureka!On the original 30GB dataset, it took just a few seconds to go from uploading the last chunk of data to "File compression and upload completed" so I find it weird that the upload of the update is hanging indefinitely while processing and without utilizing the disk at all.
@<1523701070390366208:profile|CostlyOstrich36> I'll be glad for any ideas of what might be happening
I know I can specify the repo manually in the add_funciton_step
call but I would like to keep the execution from the parent pipeline task, including uncomitted changes etc.
to clarify, the parameters are typed correctly inside the pipeline task process itself but logged as strings so they need to be cast manually if I am forwarding some parameters using get_parameter
yes, they are. It's basically the same script as the pipeline_from_functions.py
example on clearml github but I need to import local modules from a private repository inside the steps.
yes, I only have a single repository. The pipeline and the individual steps are implemented in the same folder but although the controller task runs fine and the repo is cloned on the agent, the function step agents only pull the single .py file.
@<1523701205467926528:profile|AgitatedDove14> I'm not really sure about "drop in replacement" but they do support importing poetry projects. It's also extensible with plug-ins (see None ) for controlling the dependency resolution and install process so you could probably find more use cases.
Just checked that my older pipeline tasks all had this issue, the oldest one being run on 1.13.2
They are in the same repository.
It does feel like the server is struggling since webUI is also having trouble loading debug sample artifacts during the upload. But I'm not sure why that would be the case. The client console is hanging after "uploading dataset changes" and I can see the fileserver.py process putting load on the server cpu but don't see any files being added or changed on the local fileserver folder. Is there a way to check what is the fileserver doing? I don't see anything suspicious in log.
I should probably add that a lot of the update is file modifications...
Sure, you can check their github or this somewhat objective comparison to other packaging tools. I personally like that it's more PEP compliant and definitely faster in my personal experience, especially with large packages like torch. Also allows using custom install scripts but I don't have enough experience with other managers to really compare that feature. Definitely not yet as po...
Just found None which mentions the same issues in HPO
Huh. So it looks like this was an issue of spawning too many upload workers which overwhelmed the fileserver limited to a single core...? When I limited max_workers in upload() on the client side, it went smoothly with no hanging. Funny thing is I had no issues with this using sync_folder() which I used for the original data upload, hence my perceived difference in performance despite similar file sizes.