Reputation
Badges 1
25 × Eureka!I see, that means xarray
is not an actual package but a folder add to the python path.
This explains why Task.add_requirements fails, as it is supposed to add python packages to the equivalent of "requirements.txt" ...
Is the folder part of the git repository ? How would you pass it to the remote machine the cleamrl-agent is running on?
I managed to do it by using logger.report_scalar, thanks!
Sure, but for future reference where (in ignite callbacks) did you add the report_scalar
call ?
Apparently it ignores it and replaces everything...
The other way around- "8011:8008"
TartSeal39 please let me know if it works, conda is a strange beast and we do our best to tame it.
Specifically when you execute manually on a conda env we collect (separately) the conda packages & the python packages (so later we can replicate on both conda & pip, or at least do our best)
Are you running both development env and agent with conda ?
Hmm interesting, I guess once you are able to connect it with ClearML you can just clone / modify / enqueue and let users train models directly from the UI on any hardware, is that the plan ?
, I can see the shape is
[136, 64, 80, 80]
. Is that correct?
Yes that's correct. In case of the name, just try input__0
Notice you also need to convert it to torchscript
Yes it does, but these files must be committed to begin with, basically think 'git diff' output is stored and then the agent applies it
JitteryCoyote63 next week is the Trains next release with upgrade to ES 7, do you want to wait or sort a solution for this one ?
(BTW: I think that you can mount a license file or delete one, and it should be okay, I'll ask the backend guys regradless)
Remove this from your startup script:
#!/bin/bash
there is no need that, it actually "markes out" the entire thing
Hi RoughHedgehog31
I'm assuming your git diff is just too big to be stored as is (probably some binary files)
it should not really have any effect on the execution, it just means the clearml-agent will not be able to reproduce the uncommitted changes.
Make sense ?
Hi WorriedParrot51
Assuming you run the code "manually" once (i.e. without the agent). Then when you call Task.init it will register the argparser.
When running with the agent, the first time you will call parse, it will automatically override the argparse defaults with the values stored in the Task.
Make sesne?
am getting None for Task.current_task() at the beginning of my script.
Task.init() is doing the magic , only after this call you will have current_task (either running manua...
WorriedParrot51 trains should support subparsers etc.
Even if your code calls the parsing before trains.
The only thing you need is to import the package when argparser is called (not to initialize it, that can happen later)
It should (hopefully) solve the issue.
Does this mean the model weights are stored on the clearml-server file system?
By default they are just logged (i.e. the local path is stored, but the file is not uploaded). If you want to automatically store the model, pass output_uri=True
to the Task.init , or any object store / shared folder (e.g. output_uri='
s3://bucket/folder '
). ClearML will automatically create a subfolder for the Task, and upload all models/artifacts to it.
` task = Task.init(project_name='ex...
inΒ
Β issues a delete command to the ClearML API server,...
almost, it issues the boto S3 delete commands (directly to the S3 server, not through the cleaml-server)
And that I need to enter an AWS key/secret in the profile page of the web app here?Β (edited)
correct
. Would you have any suggestions about where I could look to debug? Maybe the docker logs of the web server?
Let me check, we had the same issue reported today, Let me double check with front-end people and get back to you
i have it deployed successfully with istio.
Nice!
the only thing we had to do to get it to work was to modify the nginx.conf in the webserver pod to allow http 1.1
I was under the impression we fixed that, let me check
I lost you SmallBluewhale13 is this the Task init call you used:task = Task.init( project_name="examples", task_name="load_artifacts", output_uri="s3://company-clearml/artifacts/bethan/sales_journeys/", )
Hi ExcitedFish86
Good question, how do you "connect" the 3 nodes? (i.e. what the framework you are using)
WackyRabbit7
we did execute locally
Sure, instead of pipe.start()
use pipe.start_locally(run_pipeline_steps_locally=False)
, this is it π
but I need to dig digger into the architecture to understand what we need exactly from k8s glue.
Once you do, feel free to share, basically there are two options , use the k8s scheduler with dynamic pods, or spin the trains-agent as a service pod, and let it spin the jobs
HealthyStarfish45 my apologies, they do have it (this ability needs support for both trains-agent and server) but not in the open-source ...
PompousHawk82 unfortunately this is kind of binary, either you have full tracking of load/save operations or you do not.
This warning message will disappear in the next version as we will be able to log multiple models under the same Task :)
I think that clearml should be able to do parameter sweeps using pipelines in a manner that makes use of parallelisation.
Use the HPO, it is basically doing the same thing with some more sophisticated algorithm (HBOB):
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
For example - how would this task-based example be done with pipelines?
Sure, you could do something like:
` from clearml import Pi...
Okay found the issue, to disable SSL verification global add the following env variable:CLEARML_API_HOST_VERIFY_CERT=0
(I will make sure we fix the actual issue with the config file)
Ohh, sure then editing git config will solve it.
btw: why would you need to do that, the agent knows how to do this conversion on the fly
HealthyStarfish45 if I understand correctly the trains-agent is running as daemon (i.e. automatically pulling jobs and executes them), the only point might be cancelling a daemon will cause the Task executed by that daemon to be canceled as well.
Other than that, sounds great!