Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with
pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl
Then run the daemon with the additional --debug
argument, basically:
clearml-agent --debug daemon --foreground ...
Once the agent is running please send the Task's log from your console ๐
sets up the venv correctly, prints
Starting Task Execution:
then does nothing
Can you provide a log?
Do you see the code/git reference in the Pipeline Task details - Execution Tab ?
Hi GrotesqueDog77
What do you mean by share resources? Do you mean compute or storage?
I am struggling with configuring ssh authentication in docker mode
GentleSwallow91 Basically the agent will automatically mount the .ssh into the container , just make sure you set the following in the clearml.conf:force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L30
Hi IcySwallow94
Are you deploying the clearml server with the helm chart ?
No worries, you open the issue on pypa/pip and I will do my best to push forward ๐
We also have to be realistic I have a PR that is waiting for almost a year now (that said it is a major one and needed to wait until a few more features were merged), basically what I'm saying best case scenario is a month to get a PR merged
Iโd definitely prefer the ability to set a docker image/docker args/requirements config for the pipeline controller too
That makes sense, any chance you can open a github issue with feature request so that we do not forget ?
The current implementation will upload the result of the first component, and then the first thing the next component will do is download it.
If they are on the same machine, it should be cached when accessed the 2nd time
Wouldnโt it be more performant f...
basically @<1554638166823014400:profile|ExuberantBat24> you can think of hyper-datasets as a "feature-store for unstructured data"
Hmm are you running from inside the Kaggle jupyter thing ?
Need - in my CI, the url used is https but I need the ssh url to be used. I see that we can pass repo to Task.create but not Task.init
Are you cloning an existing Task, or creating a new one ?
Hi GrievingTurkey78
Turning of pytorch auto-logging:Task.init(..., auto_connect_frameworks={'pytorch': False})
To manually log a model:from clearml import OutputModel OutputModel().update_weights('my_best_model.pt')
Hi RobustRat47
What do you mean by "log space for hyperparameter" , what would be the difference ? (Notice that on the graph itself you can switch to log scale when viewing in the UI) ?
Or are you referring to the hyper parameter optimization, allowing you to add log space ?
how did you try to restart them ?
Yes, but how did you restart the agent on the remote machine ?
This is the reason you are getting an error ๐
Basically the session asks the agent to setup a new SSH server with credentials on the remote machine, this is not an issue inside a container, as this is an isolated environment, but when running in venv mode the User running the agent is not root, hence it cannot spin/configure an SSH server.
Make sense ?
Also btw, is this supposed to be screenshot from community verison
Hmm seems like screenshot from an enterprise version, I'll ask them to update ๐
I am also not understanding how clearml-serving is doing the version for models in triton.
Basically you have two Tasks, one is the "controller" checking model changes and updating itself.
The other is the engine, checking on the "controller" Task, which models it needs to download/configure and replaces them.
This way you can ha...
This is done in the background while accessing the cache, so it should not have any slowdown effect
Before this line, call Task.init
I would like to force the usage of those requirements when running any script
How would you force it? Will you just ignore the "Installed Packages" section ?
Hmm, interesting, why would you want that? Is this because some of the packages will fail?
Hi @<1526371965655322624:profile|NuttyCamel41>
How are you creating the model? specifically what do you have in "config.pbtxt"
specifically any python code should be in the pre/post processing code (actually not running on the GPU instance)
That was the idea behind the feature (and BTW any feedback on usability and debugging will be appreciated here, pipelines are notorious to debug ๐ )
the ability to exexute without an agent i was just talking about thia functionality the other day in the community channel
What would be the use case ? (actually the infrastructure now supports it)
... transformed to 'str' when passed to a function decorated withย
PipelineDecorator.component
ย at the time of calling it in the pipeline itself. Again, is this something intentional?
Are you sure about that? Notice the example code specifies, int as well...
should reload the reported scalars
Exactly (notice it also understand when was the last report of scalars so it should automatically increase the iterations (i.e. you will not accidentally overwrite previously reported scalars)
and the task needs to reload last checkpoints only, right?
Correct ๐
We didn't figure out the best way of continuing for both the grid and optuna. Can you suggest something?
That is a good point, not sure if we have a GH issue, for that but wo...
. Does
Task.connect
send each element of the dictionary as a separate api request? Has anyone else encountered this issue?
Hi SuperiorPanda77
the task.connect ends up as a single call with all the data being sent on a single request.
That said, maybe the connect dict is not the best solution for thousand key dictionary ...
Maybe artifact, or connect_configuration are better suited ?
wdyt?
Is it possibe to launch a task from Machine C to the queue that Machine B's agent is listening to?
Yes, that's the idea
Do I have to have anything installed (aside from theย
trains
ย PIP package) on Machine C to do so?
Nothing, pure magic ๐