Reputation
Badges 1
662 × Eureka!I just used this to create the dual_gpu
queue:clearml-agent daemon --queue dual_gpu --create-queue --gpus 0,1 --detached
Thanks for your help SuccessfulKoala55 ! Appreciate the patience š
That will come at a later stage
PricklyRaven28 That would be my fallback, it would make development much slower (having to build containers with every small change)
Alternatively, it would be good to specify both some requirements and auto-detect š¤
I was thinking of using the --volume
settings in clearml.conf
to mount the relevant directories for each user (so it's somewhat customizable). Would that work?
It would be amazing if one can specify specific local dependencies for remote execution, and those would be uploaded to the file server and downloaded before the code starts executing
I guess following the example https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py , it's not clear to me how the server has access to the data loaders location when it hits execute_remotely
It failed on some missing files in my remote_execution, but otherwise seems fine now
Is there a preferred way to stop the agent?
I'll kill the agent and try again but with the detached mode š¤
Okay trying again without detached
Hah. Now it worked.
Btw TimelyPenguin76 this should also be a good starting point:
First create the target directory and add some files:sudo mkdir /data/clearml sudo chmod 777 -R /data/clearml touch /data/clearml/foo touch /data/clearml/bar touch /data/clearml/baz
Then list the files using the StorageManager. It shouldn't take more than a few miliseconds.` from clearml import StorageManager
%%timeit
StorageManager.list("/data/clearml")
-> 21.2 s Ā± 328 ms per loop (mean Ā± std. dev. of 7 runs, 1 loop each) `
The error seems to come from this line:self._driver = _FileStorageDriver(str(path_driver_uri.root))
(line #353 in clearml/storage/helper.py
Where if the path_driver
is a local path, then the _FileStorageDriver
starts with a base_path = '/'
, and then takes extremely long time at iterating over the entire file system (e.g. in _get_objects
, line #1931 in helper.py
)
Yes. Though again, just highlighting the naming of foo-mod
is arbitrary. The actual module simply has a folder structured with an implicit namespace:
foo/
mod/
__init__.py
# stuff
FWIW, for the time being Iām just setting the packages to all the packages the pipeline tasks sees with:
packages = get_installed_pkgs_detail()
packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
packages = task.data.script.require...
I'm trying to build an easy SDK that would fit DS work and fit the concept of clearml pipelines.
In doing so, I'm planning to define various Step
classes, that the user can then experiment with, providing Steps as input to other steps, etc.
Then I'd like for the user to be able to run any such step, either locally or remotely. Locally is trivial. Remotely is the issue. I understand I'll need to upload additional data to the remote instance, and pull a specific artifact back to the notebo...
Hey @<1537605940121964544:profile|EnthusiasticShrimp49> ! Youāre mostly correct. The Step
classes will be predefined (of course developers are encouraged to add/modify as needed), but as in the DataTransformationStep
, there may be user-defined functions specified. Thatās not a problem though, I can provide these functions with the helper_functions
argument.
- The
.add_function_step
is indeed a failing point. I canāt really create a task from the notebook because calling `Ta...
No worries @<1537605940121964544:profile|EnthusiasticShrimp49> ! I made some headway by using Task.create
, writing a temporary Python script, and using task.update
in a similar way to how pipeline steps are created.
I'll try and create an MVC to reproduce the issue, though I may have strayed from your original suggestion because I need to be able to use classes and not just functions.
@<1537605940121964544:profile|EnthusiasticShrimp49> Itāll take me still some time to find the MVC that generated this, but I do have the ClearML experiment page for it. I was running the thing from ipython
, and was trying to create a task from a function:
Any thoughts @<1523701070390366208:profile|CostlyOstrich36> ?
I wouldnāt want to run the entire notebook, just a specific part of it.
Iāll give the create_function_task
one more try š¤
Thanks @<1537605940121964544:profile|EnthusiasticShrimp49> ! Thatās definitely the route I was hoping to go, but the create_function_task
is still a bit of a mystery, as Iād like to use an entire class with relevant logic and proper serialization for inputs, and potentially Iāll need to add more āhelper functionsā (as in the case of DataTransformationStep
, for example). Any thoughts on that? š¤
I can elaborate in more detail if you have the time, but generally the code is just defined in some source files.
Iāve been trying to play around with pipelines for this purpose, but as suspected, it fails finding the definition for the pickled objectā¦
Consider e.g:
# steps.py
class DataFetchingStep:
def __init__(self, source, query, locations, timestamps):
# ...
def run(self, queue=None, **kwargs):
# ...
class DataTransformationStep:
def __init__(self, inputs, transformations):
# inputs can include instances of DataFetchingStep, or local files, for example
# ...
def run(self, queue=None, **kwargs):
# ...
And then the following SDK usage in a notebook:
from steps imp...