ShinyWhale52 any time 🙂
Feel free to followup with more questions
Hmmm, what's your trains version ?
Hi @<1536881167746207744:profile|EnormousGoose35>
, Could we just share the entire project instead of Workspace ?
You mean allow access to a project between workspaces ?
If the answer is yes, then unfortunatly the SaaS version (app.clear.ml) does not really support these level of RBAC, this is part of the enterprise version, which assumes a large organization with the need for that kind of access limit.
What is the use case ? Why not just share the entire workspace ?
Hi @<1610083503607648256:profile|DiminutiveToad80>
This depends on how you configure the agents in your clearm.conf
You can do https if user/pass are configured, and you can force SSH and it will auto-mount your host SSH folder into the container and use it.
None
[None](https://github.com/allegroai/clearml-agent/blob/0254279ed5987fbc69cebae245efaea33aec1ff2/docs/cl...
Hi ClumsyElephant70
extra_docker_shell_script: ["export SECRET=SECRET", ]
I think ${SECRET}
will not get resolved you have to specifically have text value there.
That said it is a good idea to resolve it if possible, wdyt?
ETA for the next release is end of the month/early March, it is planned to include many other improvements 🙂
Hi @<1523701337353621504:profile|FlutteringSheep58>
are you asking how to convert a worker IP into a dns resolved host name ?
Hmm could you try to upload to your files server (not the S3)
Maybe some credentials error ?
My question is what should be the path to the requirements.txt file?
Is it relative to the repo base?
This is actually in runtime (i.e. when running the code), so relative to the working directory. Make sense ? (you can specify absolute path, probably something I would avoid in the code base though...)
What I mean is that I don't need to have cudatoolkit installed in the current conda env, right?
Wait, are you using conda as package manager ?
EDIT: meaning configured in trains.conf as package manager
CrookedWalrus33 from the log it seems the code is trying to use "kwcoco" but it is not listed under any "Installed packages" nor do you see any attempt to install it. Can you confirm ?
hi @<1546303293918023680:profile|MiniatureRobin9>
I can still see the metrics in Grafana. I
it will not delete it from grafana, it means it's no longer collected, make sense ?
Sorry, what I meant is that it is not documented anywhere that the agent should run in docker mode, hence my confusion
This is a good point! I'll make sure we stress it (BTW: it will work with elevated credentials, but probably not recommended)
Is itÂ
CLEARML_CONFIG_FILE
? (I had to dig this from the GH codeÂ
 )
Yes it is !
https://clear.ml/docs/latest/docs/faq#clearml-configuration
(I will make sure we add it to https://clear.ml/docs/latest/docs/configs/env_vars#server-connection as well 🙂 )
MoodyCentipede68 could it be that the model is on one account (workspace) and your credentials (the ones provided to the docker compose) are from another workspace?
The error itself point to the triton helper failing to get the model ID from the backend. The models are uploaded to a a specific workspace, and it looks like a mismatch (I.e. the model Id is nowhere to be found) wdyt?
I get gaps in the graphs.
For example, the first time I run, I create a task and run a loop:
Hi SourOx12
Is this related to this one?
https://github.com/allegroai/clearml/issues/496
You need to adjust it to your setup , specifically change the queue name to one you have. Does that make sense ?
When is clearml-deploy coming to the open source release?
Currently available under clearml-serving (more features are being worked on, i.e. additional stats and backends)
https://github.com/allegroai/clearml-serving
Yey! BTW: what the setup you are running it with ? does it include "manual" tasks? Do you also report on completed experiments (not just failed ones)? Do you filter by iteration numbers?
. but when we try to do a "New Run" from UI, it tries to follow the DAG of previous run (the run with all child nodes skipped) and the new run fails too.
This is odd, is this reproducible ? what's the clearml python package version ?
Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
JitteryCoyote63 hmm that is a pickle ...
let me check the code ...
but cant catch that only one way for service queue or I can experiments with that?
UnevenOstrich23 I'm not sure what exactly is the question, but if you are asking weather this is limited, the answer is no it is not limited to that use case.
Specifically you can run as many agents in "services-mode" pulling from any queue/s that you need, and they can run any Task that is enqueued on those queues. There is no enforced limitation. Did that answer the question ?
, I generate some more graphs with a file calledÂ
graphs.py
 and want to attach/upload to this training task
Make total sense to use Task.get_task, I just want to make sure that you are aware of all the options, so you pick the correct one for you :)
Could you download and send the entire log ?
Hmm so I guess the actual code adds it into the reporting itself ...
How about we call:task.set_initial_iteration(0)
BTW is it cheaper than ec2 instance? Why not use the aws autoscaler ?