Are you aware of any other way then (other than the
secure: false
flag?
Actually self -signing and providing certificate file is already supported with boto (and thus clearml)
AWS_CA_BUNDLE
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
Hi VexedCat68
Yes the serving is a bit complicated. Let me try to explain the underlying setup, before going into more details.
clearml-serving CLI is a tool to launch / setup. (it does the configuration and enqueuing not the actual serving) control plan Task -> Storing the state of the serving (i.e. which end points needs to be served, what models are used, collects stats). This Task has no actual communication with the serving requests/replies (Running on the services queue) Serving Task...
SmarmyDolphin68 sadly if this was not executed with trains (i.e. the offline option of trains), this is not really doable (I mean it is, if you write some code and parse the TB 😉 but let's assume this is way to much work)
A few options:
On the next run, use clearml OFFLINE option, (i.e. in your code call Task.set_offline() , or set env variable CLEARML_OFFLINE_MODE=1) You can compress the upload the checkpoint folder manually, by passing the checkpoint folder, see https://github.com...
Hi ScantChimpanzee51
Is it possible to run multiple agent on EC2 machines started by the Autoscaler?
I think that by default you cannot,
having the Autoscaler start 1x p3.8xlarge (4 GPU) on AWS might be better than 4x p3.2xlarge (1 GPU) in terms of availability, but then then we’d need one Agent per GPU.
I think that this multi-GPU setup is only available in the enterprise tier.
That said, the AWS pricing is linear, it costs the same having 2 instances with 1 GPU as 1 instanc...
Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L430
Did you set an agent on a machine? (See clearml agent in docs for details)
I think what you are looking for is clearml-agent daemon
https://clear.ml/docs/latest/docs/clearml_agent
https://clear.ml/docs/latest/docs/getting_started/video_tutorials/agent_remote_execution_and_automation
You need to adjust it to your setup , specifically change the queue name to one you have. Does that make sense ?
ElegantKangaroo44 it seems to work here?!
https://demoapp.trains.allegro.ai/projects/0e152d03acf94ae4bb1f3787e293a9f5/experiments/48907bb6e870479f8b230e6b564cd52e/output/metrics/plots
Hmm @<1523701083040387072:profile|UnevenDolphin73> I think this is the reason, None
and this means that even without a full lock file poetry can still build an environment
How about this one:
None
Hi GiganticTurtle0
I have found that
clearml
does not automatically detect the imports specified within the function decorated
The pipeline decorator will automatically detect the imports Inside the funciton, but not outside (i.e. global), to allow better control of packages (think for example one step needs the huge torch package, and the other does not.
Make sense ?
How can I tell
clearml
I will use the same virtual environment in all steps...
Metadata might be expensive, it's a RestAPI call, and we have found users putting hundreds of artifacts, with preview entries ...
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling
Task.init
on those scripts.
Correct, and allow users to more easily create Tasks from code.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload ...
Hi @<1523701868901961728:profile|ReassuredTiger98>
is there something like a clearml context manager to disable automatic logging?
Sure just do a wildcard with the files you actually want to autolog the rest will be ignored:
None
task = Task.init(..., auto_connect_frameworks={'pytorch' : '*.pt'}
PompousParrot44
It should still create a new venv, but inherit the packages from the system-wide (or specific venv) installed packages. Meaning it will not reinstalled packages you already installed, but it will ive you the option of just replacing a specific package (or install a new one) without reinstalling the entire venv
Notice both needs to be str
btw, if you need the entire folder just use StorageManager.upload_folder
Can you print the actual values you are passing? (i.e. local_file
remote_url
)
Hi BrightGoat74
So merging general purpose plotly plots is very hard (i.e. putting both on the same graph)
But if you report using logger.report_scatter(...) the UI will merge the ROC curves into the dame graph, wdyt?
https://clear.ml/docs/latest/docs/guides/reporting/scatter_hist_confusion_mat_reporting#2d-scatter-plots
PompousParrot44 now that I think about it, you might be able to limit the cpu affinity, would that help?
Thanks OutrageousGiraffe8
Any chance you can expand the example code to be a fully a reproducible toy code? (I would really like to make sure we fix it)
PompousBeetle71 just making sure, and changing the name solved it?
With the warning ?
I was able to reproduce it on the old versions, but it seems fixed on the latest from GitHub.
Thanks for pinging OutrageousGiraffe8
I think I was able to reproduce.
model is saved to the clearml as an output model when
b
is not a dictionary.
How did you make the example work with the automagic ?
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
I think, this all ties into the none-standard git repo definition. I cannot find any other reason for it. Is it actually stuck for 5 min at the end of the process, waiting for the repo detection ?
Hi UpsetBlackbird87
This is an Optuna decision on how many concurrent tests to run simultaneously.
You limited it to 100, but remember Optuna does a Bayesian optimization process, where it decides on the best set of arguments based on the performance of the previous set, this means it will first try X trials, then decide on the next batch.
That said you can a pruner to Optuna specifying how it should start
https://optuna.readthedocs.io/en/v1.4.0/reference/pruners.html#optuna.pruners.Median...
Hi SmallDeer34
On the SaaS you can right click on an experimenter and publish it 🙂
This will make the link available for everyone, would that help?
ZanyPig66 you are correct in your assumptions. What exactly do you have in the Task? If there is no git repo the entire script should be under "uncommitted changes. What is your case?