Reputation
Badges 1
25 × Eureka!Seems like credentials error
Do you have everything setup correctly in your ~/clearml.conf
?
Thank you @<1523701949617147904:profile|PricklyRaven28> !!!
Let me see if we can reproduce and how to solve it
Yes it should
here is fastai example, just in case π
https://github.com/allegroai/clearml/blob/master/examples/frameworks/fastai/fastai_with_tensorboard_example.py
Hi OutrageousSheep60
Is there a way to instantiate a
clearml-task
while providing it a
Dockerfile
that it needs to build prior to executing the task?
Currently not really, as at the aned the agent does need to pull a container,
But you can cheive basically the same by adding the "dockerfile" script as --docker_bash_setup_script
Notice of course that this is an actual bash script not Docker script, so no need for "RUN" prefix.
wdyt?
Yea the "-e ." seems to fit this problem the best.
π
It seems like whatever I add to
docker_bash_setup_script
is having no effect.
If this is running with the k8s glue, there console out of the docker_bash_setup_script ` is currently Not logged into the Task (this bug will be solved in the next version), But the code is being executed. You can see the full logs with kubectl, or test with a simple export test
docker_bash_setup_script
` export MY...
Woot woot!
awesome, this RC is stable you can feel free to use it, the official release is probably due to be out next week :)
I think RoughTiger69 was discussing this exact scenario
https://clearml.slack.com/archives/CTK20V944/p1629885416175500?thread_ts=1629881415.172600&cid=CTK20V944
wdyt?
My main issue with this approach is that it breaks the workflow into βa-syncβ set of tasks:
This is kind of the way you depicted it, meaning, there is an an initial dataset, "offline process" (i.e. external labeling) then, ingest process.
I was wondering if the βwaitingβ operator can actually be a part of the pipeline.
This way it will look more clear what is the workflow we are executing.
Hmm, so pipeline is "aborted", then the trigger relaunches the pipeline, and the pipeli...
Hi GrotesqueDog77
What do you mean by share resources? Do you mean compute or storage?
SlipperyDove40 I just installed a fresh copy py3.6 and plotly on ubuntu. the entire venv dir is ~86MB
I hope you can do this without containers.
I think you should be fine, the only caveat is CUDA drivers, nothing we can do about that ...
BitingKangaroo95 nice work π
I think that what did it was:
change the sshd_config
so that it allows port forwarding
, agent forwarding
and x11 forwarding
But just in case, it might be there was a pre existing SSH identifier on your machine, and hence the error.
clear known_hosts under ~/.ssh was also something I would try π
BitingKangaroo95 can you post here the entire console output of clearml-session (including full command line) ?
So the thing is, regardless of the link you should end with:helper <clearml.storage.helper.StorageHelper object at 0x....>
But the code that failed seemed to return None, which makes me suspect the url itself is somehow broken.
Any chance you have a space before the "s3://" ?
BTW : what's the clearml version you are using ?
ValueError: Missing key and secret for S3 storage access
Yes that makes sense, I think we should make sure we do not suppress this warning it is too important.
Bottom line missing configuration section in your clearml.conf
Hi ReassuredTiger98
Good point, since the user actually "running" the code is the agent, all the api calls are registered under its name, including the Model creation.
This is a good point, though ...
I know the enterprise tiers add "impersonate" as part of the security layer, meaning that the agent is Not actually running the code but the creating "user" is, which solve this problem. I'm not sure what actually can be done without this feature... thoughts?
There is no way to create an artifact/model/dataset without a task, right?
Models are a an entity of it's own, and you can actually create one without a Task.
(just for my own interest: how much does the enterprise version divert from the open source version? It it just extended or are there core changes to the enterprise version)
It adds a few security layers on top, and adds a few features that are just not part of the open source (RBAC, hyper-datasets, advanced scheduling, cu...
see here the docker_setup_bash_script
argument
None
It will be executed (no need for the #!/bin/bash
btw) before starting to setup the env inside the container, so apt-get and the like can be executed if needed. Notice that if this is something that Always needs to be executed, you can put the same list of commands here: [None](https://github.com/allegroai/clearml-agen...
Would love to just cap it at a fixed amount for a month for API calls.
Try the timeout configuration, I think this shoud solve all your issues, and will be fairly easy to set for everyone
It's always preferred to use conda_freeze: false
That said, if you do use conda_freeze: true
it should also freeze the cudatoolkit, so it should have worked.
BTW when you say it worked, is it 0.17.2 version or the hacked RC I sent ?
I have one agent running on the machine. I also have only one task running. This
only
happens to us when we use pipelines
@<1724960468822396928:profile|CumbersomeSealion22> notice that when you are launching a pipeline you are actually running Two tasks, one is the "pipeline" itself (i.e. the logic) and one is the component in the pipeline (i.e. the step)
If you have one agent, I'm assuming what happens is the pipeline itself (the one that you launch on your machine)...
ChubbyLouse32 could it be the configuration file is not passed to the agent machine itself ?
(were you able to run anything against this internal server? I mean to connect to it from code, clearml/cleamrl-agent) ?
First let's try to test if everything works as expected. Since 405 really feels odd to me here. Can I suggest following one of the examples start to end to test the setup, before adding your model?
Hi FranticCormorant35
So Tasks have parent field, that would link one to another.
Unfortunately there is no visual representation for it.
What we did with the hyper-parameter for example, was also to add a tag with the ID of the "parent" Task. This would make sense if you have multiple tasks all generated from the same "parent", like in hyper-parameter optimization.
What's your use case ? Is it a single evaluation Task per training, or multiple or con job alike ?
That makes sense...
Basically in the open-source version the approach is everyone sees everything for maximum transparency (and also ease of use). I know there are access-roles in the paid tier and vault for exactly these types of things...
Where do you currently save them? and how do you pass them to the remote machine ?
Do you have two agents pulling from the same queue ?
Maybe one of them is configured differently ?