Reputation
Badges 1
25 × Eureka!'config.pbtxt' could not be inferred. please provide specific config.pbtxt definition.
This basically means there is no configuration on how to serve the mode, i.e. size/type of lower (input) layer and output layer.
You can wither store the configuration on the creating Task, like is done here:
https://github.com/allegroai/clearml-serving/blob/b5f5d72046f878bd09505606ca1147d93a5df069/examples/keras/keras_mnist.py#L51
Or you can provide it as standalone file when registering the mo...
That is a good question, usually the cuda version is automatically detected, unless you overrride it with the conf file or OS env. What's the setup? Are you using as package manager ? (conda actually installs CUDA drivers, if the original Task was executed on a machine with conda, it will take the CUDA version automatically, reason is to match the CUDA/Torch/TF)
Hi @<1523701337353621504:profile|FlutteringSheep58>
are you asking how to convert a worker IP into a dns resolved host name ?
CooperativeFox72 this is indeed sad news π
When you have the time, please see if you can send a code snippet to reproduce the issue. I'd like to have it fixed
Thanks DilapidatedDucks58 ! We β€ suggestions for improvements π
Did you try to print a page using the browser (I think that they can all store it as pdf these days) Yes I agree, it would π we have some thoughts on creating plugins for the system, I think this could be a good use-case. Wait a week or two ;)
PunySquid88 RC1 is out with a fix:pip install trains-agent==0.14.2rc1
SuperiorDucks36 , is the domain name "rz-s-git" this does not seem like a valid domain?
EDIT:
Is it a local domain on your network?
but realized calling that from the extension would be hard, so we opted to have the TypeScript code make calls to the ClearML API server directly, e.g.
POST /tasks.get_all_ex
.
did you manage to get that working?
- To get the credentials, we read the
~/clearml.conffile. I tried hard, but couldn't get a TypeScript library to work to parse the HOCON config file format... so I eventually resorted to using (likely brittle) regex to grab the ClearML endpoint and API ke...
Yes I suspect it is too large π
Notice that most parts have default values so there is no need to specify them
Hi ClumsyElephant70
extra_docker_shell_script: ["export SECRET=SECRET", ]
I think ${SECRET} will not get resolved you have to specifically have text value there.
That said it is a good idea to resolve it if possible, wdyt?
Hi CleanPigeon16
You need to pass the private repository docker credentials to the aws instance, I would use the custom bash script option of the aws autoscaler to create the docker credentials file.
YummyWhale40 from the code snippet, it seems like the argument is passed.
"reuse_last_task_id=True" is the default, and it means that if the previous run of the task did not create any artifacts/models and was executed 72 hours ago (configurable), The Task will be reset (i.e. all logs cleared) and will be reused in the current run.
SmarmySeaurchin8 what's the mount command you are using?
So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)
If you use joblib.dump (which is like pickle but safer/faster) it will be auto logged
https://github.com/allegroai/clearml/blob/4945182fa449f8de58f2fc6d380918075eec5bcf/examples/frameworks/scikit-learn/sklearn_joblib_example.py#L28
Follow up: I see that if I move an Experiment to a new project, it does not copy the associated model files and must be done manually.Β Once I moved the models to the new project, the query works as expected.
Correct π
Nice catch!
Hi MelancholyElk85
So the way datasets now work, is they are actually an entity (folder) inside a project , all under TFW hidden .datasets sub project
This is so all data and tasks are both on the same project , but at the same time will not intersect with subprojects by the same name. Does that make sense?
Ohh RotundHedgehog76 this implies a single jupyter hub with multiple uses, is that correct ?
(if this is the case, then yes, clearml-session is definitely not the correct solution, I would look for a helm chart for jupyter hub)
Yeah I think this kind of makes sense to me, any chance you can open a GH issue on this feature request?
MelancholyElk85
After I set base docker for pipeline controller task, I cannot clone the repo...
What do you mean by that?
Also, how do you set the PipelineController base_docker_image (I'm assuming the is needed to run the pipeline logic?!, is that correct?)
And you are calling Task.init? And the scalars show under scalars and the images are not under debug samples?
Hmm I see, add this for example
extra_docker_shell_script: ["rm ~/.bashrc", "echo removed bashrc"]
Hi StaleButterfly40
but if I sync more than once I get a duplication of each line in log
Hmm.. let me check if we can "force" overwriting (it might require you to have a more stateful code for the sync process)
sometime we resume training
How would that work in offline mode? The offline process cannot sync with the backend... Are you saying you would like to get a new capability, "continue-offline-session" ?
When I'm setting up my Pipeline, I can't go "here are some brand new tasks, please run them",
I think this is the main point. Can you create those Tasks via Task.create and get what you want? If so, then sure you can do that:
` def create_step_task(a_node):
task = Task.create(...)
return task
pipe.add_step(
name="stage_process",
parents=["stage_data"],
base_task_factor=create_step_task
) `wdyt?
As for the node, this confusing bit is that this is text from the docs...
WickedGoat98 the mechanism of cloning and parameter overriding is working only when the trains-agent is launching the experiment. Think of it this way:
Manual execution: trains sends data to server
Automatic (trains-agent) execution: trains pulls data from the server
This applies for both the argparse and connect and connect configuration.
The trains code itself is acting differently when it is executed from the 'trains-agent' context.
Does that help clear things ?
Can you try to set this in your clearml.conf:
agent.pip_download_cache.enabled: false
this should disable the local caching, of your wheel, I suspect there is some issue with the local cache file in windows...
so i end up having to clone the other ones manually in my code
Hi ConvolutedChicken69
Yes the problem is that there is no standard for multi repo environments
The best solution I can come up with is using git-submodules or packaging the auxiliary repo as wheels. wdyt?