Reputation
Badges 1
25 × Eureka!And as far as I can see there is no mechanism installed to load other objects than the model file inside the Preprocess class, right?
Well actually this is possible, let's assume you have another Model that is part of the preprocessing, then you could have:
something like that should work
def preprocess(...)
if not getattr(self, "_preprocess_model):
self._preprocess_model = joblib.load(Model(model_id).get_weights())
Hi CooperativeFox72
Sure πtask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
VexedCat68
But what's happening is, that I only publish a dataset once but every time it polls,
this seems wrong (i.e a bug?!), how do you setup the trigger ? is the Trigger Task constantly running or are you re-launching it?
S3 access would return a different error...
Can you do:
` from clearml.storage.helper import StorageHelper
helper = StorageHelper.get("s3://<bucket>/<foo>/local/<env>/<project-name>/v0-0-1/2022-05-12-30-9-rocketclassifier.7b7c02c4dac946518bf6955e83128bc2/models/2022-05-12-30-9-rocketclassifier.pkl.gz")
print("helper", helper) `
ValueError: Missing key and secret for S3 storage access
Yes that makes sense, I think we should make sure we do not suppress this warning it is too important.
Bottom line missing configuration section in your clearml.conf
it would be clearml-serverβs job to distribute to each user internally?
So you mean the user will never know their own S3 access credentials?
Are those credentials unique per user or once"hidden" for all of them?
I can verify the behavior, I think it has to do with the way the subparser was setup.
This was the only way for me to get it to run:script.py test blah1 blah2 blah3 42
When I passed specific arguments (for example --steps) it ignored them...
Okay let me see if I can think of something...
Basically crashing on the assertion here ?
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train.py#L495
Could it be your are passing "Args/resume" True, but not specifying the checkpoint ?
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train.py#L452
I think I know what's going on:
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train...
you mean in the enterprise
Enterprise with the smarter GPU scheduler, this is inherent problem of sharing resources, there is no perfect solution, you either have fairness, but then you get idle GPU's of you have races, where you can get starvation
So if everything works you should see "my_package" package in the "installed packages"
the assumption is that if you do:pip install "my_package"
It will set "pandas" as one of its dependencies, and pip will automatically pull pandas as well.
That way we do not list the entire venv you are running on, just the packages/versions you are using, and we let pip sort the dependencies when installing with the agent
Make sense ?
Hi @<1573119962950668288:profile|ObliviousSealion5>
Hello, I don't really like the idea of providing my own github credentials to the ClearML agent. We have a local ClearML deployment.
if you own the agent, that should not be an issue,, no?
forward my SSH credentials using
ssh -A
and then starting the clearml agent?
When you are running the agent and you force git clonening with SSH, it will autmatically map the .ssh into the container for the git to use
Ba...
This is part if a more advanced set of features of the scheduler, but only available in the enterprise edition π
BTW: you still can get race/starvation cases... But at least no crash
Hi SplendidToad10
In order to run a pipeline you first have to create the steps (i.e Tasks).
This is usually dont by running the code once (basically running any code with Task.init call will create a Task for that specific code, including the enviroement definition needed to reproduce it by the Agent)
Hi RoughTiger69
One quirk I found was that even with this flag on, the agent decides to install whatever is in the requirements.txt
Whats the clearml-agent you are using?
I just noticed that even when I clear the list of installed packages in the UI, upon startup, clearml agent still picks up the requirements.txt (after checking out the code) and tries to install it.
It can also just skip the entire Python installation with:CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
adding the functionality to clearml-task sounds very attractive!
Hmm, what do you think?parser.add_argument('--configuration', type=str, default=None, help='Specify local configuration file' ) parser.add_argument('--configuration-name', type=str, default=None, help='configuration section name' ) ... with open(args.configuration, 'rt') as f: create_populate.task.set_configuration_object(args.name, config_text=f.read())
Add h...
I useΒ
torch.save
Β to store some very large model, so it hangs forever when it uploads the model. Is there some flag to show a progress bar?
I'm assuming the upload is http upload (e.g. the default files server)?
If this is the case, the main issue we do not have callbacks on http uploads to update the progress (which I would love a PR for, but this is actually a "requests" issue)
I think we had a draft somewhere, but I'm not sure ...
JitteryCoyote63 This seems like exactly what you are saying, elastic license issue...
My bad you have to pass it to the container itself:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149extra_docker_arguments: ["-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"]
Yep that will fi it, nice one!!
BTW I think we should addtge ability to continue aborted datasets, wdyt?
would those containers best be started from something in services mode?
Yes as long as the machine has enough cpu/ram
Notice that the services mode will start a second parallel Task after the first one is done setting up the env, if running with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL, with containers that have git/python/clearml-agent preinstalled it should be minimal.
or is it possible to get no-overhead with my approach of worker-inside-docker?
No do not do that, see above e...
Any chance your code needs more than the main script, but it is Not in a git repo? Because the agent supports either single script file, or a git repo with multiple files
My task starts up and checks the mounted EFS volume for x data, if x data does not exist there, it then pulls x data from S3.
BoredHedgehog47 you can just use StorageManager and configure clearml cache for the EFS, it will essentially do the same π
Regrading helm chart with EFS,
you need to configure the clearml-glue pod template with the EFS mount
example :
https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/e7f647f4e6fc76f983d61522e635353005f1472f/examples/kubernetes/volu...
. I guess this can be built in as a feature into ClearML at some future point.
VexedCat68 you mean referencing an external link?
So the agent installed okay. It's the specific Task that the agent is failing to create the environment for, correct?
if this is the case, what do you have in the "Installed Packages" section of the Task (see under the Execution tab)
Hi DeliciousBluewhale87
Hmm, good question.
Basically the idea is that if you have ingestion service on the pods (i.e. as part of the yaml template used by the k8s glue) you can specify to the glue what are the exposed ports, so it knows (1) what's the maximum of instances it can spin, e.g. one per port (2) it will set the external port number on the Task, so that the running agent/code will be aware of the exposed port.
A use case for it would be combing the clearml-session with the k8s gl...
BTW: if you want to sync between artifacts / settings, I would recommend calling task.reload() to get the latest values back from the server.