Reputation
Badges 1
25 × Eureka!If you have idea on where to start looking for a quick win, I'm open to suggestions 🙂
What's the trains-server version ?
For visibility, after close inspection of API calls it turns out there was no work against the saas server, hence no data
Hi ScaryBluewhale66
TaskScheduler I created. The status is still
running
. Any idea?
The TaskScheduler needs to actually run in order to trigger the jobs (think cron daemon)
Usually it will be executed on the clearml-agent services queue/mahine.
Make sense ?
Are you saying you have a single line in the console output of the component Task?
Hi ExcitedFish86
Of course, this is what it was designed for. Notice in the UI under Execution you can edit this section (Setup Shell Script). You can also set via task.set_base_docker
Then in theory (since the backend is python based) you just need to find a base docker image to build it on.
Actually, no. This is ti spin the clearml-server on GCP, not the agent
Debug samples can only be controlled via api.file_server (or programatically)
Model/Artifacts see above
This has no effect. I am not able to change the files_sever, e.g. I can not change from
You are Not changing the files_server just where your Taskj uploads Models/Artifacts, these are two diff things (and again Only applies to Artifacts/Models)
But from the log it seems that:
you are not running as root in the docker? Python3.8 is installed (and not python 3.6 as before)
But I do not know how it can help me:(
In your code itself after the Task.init
call add:task.set_initial_iteration(0)
See reply here:
https://github.com/allegroai/clearml/issues/496#issuecomment-980037382
So the only difference is how I log in into machine to start clear-ml
the only different that I can think of is the OS Environments in the two login types:
can you run export
in the two cases and check the diff between them?export
Hi DefeatedCrab47
You mean by trains-agent, or accumulated over all experiences ?
just to check. Does the k8s glue install torch by default?
SubstantialElk6 what do you mean the glue installs torch ?
The glue will take a Task from the queue create a k8s job (basically use the same docker and inside the docker run get the agent to execute the requested Task). Where would the "torch" come into play?
Hi GiganticTurtle0
ClearML will only list the directly imported packaged (not their requirements), meaning in your case it will only list "tf_funcs" (which you imported).
But I do not think there is a package named "tf_funcs" right ?
And command is a list instead of a single str
"command list", you mean the command
argument ?
Hi UnevenDolphin73
I cannot initialize a task before loading the file, but the docs for
connect_configuration
Yes, that's basically the problem. you have to decide where is the main driver.
If you are executing the code "manually" (i.e. not via the agent) then there is no problem, obviously you have the local file and you can use it to load the "project name" etc, then you just call Task.connect_configuration to log the content.
If you are running the same code via the agent...
Oh yes, you probably have sorting or filter applies there :)
WickedGoat98 sorry, I missed the thread...
that the trains.conf has to be located on the node running the trains-agent.
Correct 🙂
The easiest way to check is to see if you can curl to the ip:port from the docker.
If you fail it is probably the wrong IP.
the IP you need to use is the IP of the machine running the docker-compose (not the IP of the docker inside that machine).
Make sense ?
I'm glad to hear 🙂
If you can reproduce it, let me know
Hi SubstantialBaldeagle49
yes, you can backup the entire trains-server (see the github docs on how) You mean upgrading the server? Yes, you can change the name or add comments (Info tab / description ), and you can add key/value description (under the configuration tab, see user properties)
HealthyStarfish45 this sounds very cool! How can I help?
Hi FrothyShark37
Can you verify with the latest version?
pip install -U clearml
Hi @<1523711619815706624:profile|StrangePelican34>
Hmm, I think this is missing from the docs, let me ping the guys about that 🙏
Also could you explain the difference between trigger.start() and trigger.start_remotely()
Start will start the trigger process (the one "watching the changes") locally (this makes sense for debugging etc.)
start_remotely will launch the trigger process on the "services" where it should live forever 🙂
Okay so when I add trigger_on_tags, the repetition issue is resolved.
Nice!
This problem occurs when I'm scheduling a task. Copies of the task keep being put on the queue ...