Reputation
Badges 1
25 × Eureka!however, I don't think it's our code, since the trigger is not triggered at all, unless a new task is created :((
Yeah I think you are correct, I'm more interested in understanding the how you use it ...
BTW can you test with the latest clearml
python version (the trigger code is the important part)?
I try to add it to ClearML Serving, but it call
forward
method by default
If this is the case, then the statement above is odd to me, if this is a custom engine, who exactly is calling " forward
" ?
(in you code example you specifically call generate, as you should)
ComfortableShark77 are you saying you need "transformers" in the serving container?CLEARML_EXTRA_PYTHON_PACKAGES: "transformers==x.y"
https://github.com/allegroai/clearml-serving/blob/6005e238cac6f7fa7406d7276a5662791ccc6c55/docker/docker-compose.yml#L97
Is there a way to move existing pipelines between projects?
You should be able to, go to your settings page and turn on "show hidden folders"
Then go to your project, you should see " .pipeline
" sub project there, right click it and move it to another folder.
Oh, then just make sure you call Task.init in your code,
as long as you have clearml.conf in the container or pass the ENV variables to configure your clearml, it should just work
Hi @<1549202366266347520:profile|GorgeousMonkey78>
how do I integrate sagemaker with clearml ,
you mean to launch an experiment, or just to log it?
Should I map the poetry cache volume to a location on the host?
Yes, this will solve it! (maybe we should have that automatically if using poetry as package manager)
Could you maybe add a github issue, so we do not forget ?
Meanwhile you can add the mapping here:
https://github.com/allegroai/clearml-agent/blob/bd411a19843fbb1e063b131e830a4515233bdf04/docs/clearml.conf#L137
extra_docker_arguments: ["-v", "/mnt/cache/poetry:/root/poetry_cache_here"]
Hmm StrangePelican34
Can you verify you call Task.init before TB is created ? (basically at the start of everything)
No TB (Tesnorboard) is not enabled.
That explains it π did you manage to get it working ?
Hi StrangePelican34
What exactly I not working? Are you getting any TB reports?
I callΒ
Task.init
Β after I import tensorflow (and thus tensorboard?)
That should have worked...
Can you manually add a TB report before calling opennmt
function ?
(I want to verify the Task.init is indeed catching the TB calls, my theory is that somewhere inside the opennmt
we loose the TB)
ClearML does not work easily with Google Drive.
Yes, google drive is not google storage (which ClearML supports π )
Seems like you solved it?
Hi GreasyPenguin14
This is what I did, but I could not reproduce the hang, how is this different from your code?
` from multiprocessing import Process
import numpy as np
from matplotlib import pyplot as plt
from clearml import Task, StorageManager
class MyProcess(Process):
def run(self):
# in another process
global logger
# Create a plot
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = ...
If i point directly to the data.yaml the training starts without any problem
what do you mean? how do you know where the extracted file is?
basically:
data_path = Dataset.get(...).get_local_copy()
then you should be able to open your file with open(data_path + "/data.yaml", "rt")
doe that work?
I'm trying to queue a task in python but I'd like to reuse the prior task ID.
is it your own Task? i,,e, enqueue yourself, if this is the case use task.execute_remotely
it will do just that.
If this is another Task, then if it is aborted then you can just enqueue it, by definition it will continue with the Same Task ID.
agentservice...
Not related, the agent-services job is to run control jobs, such as pipelines and HPO control processes.
Hi @<1668065560107159552:profile|VivaciousPenguin20>
I think you are looking at the wrong experiment, this is a 3 year old experiment ? this does not seem to be your currently executed experiment, right?
Thanks ReassuredTiger98 , yes that makes sense.
What's the python version you are using ?
Okay, let me quickly run a test
Just to make sure I understand, running locally creates the Args/command correctly, then when actually executed on the remote machine (i.e. execute_remotely creates the correct Args/command But when the agent actually executes it) it updates back the Args/command as a list. Is that a correct description ?
Hi ReassuredTiger98
It's clearml
that needs to support subparser, and it does support it.
What are you seeing in the Args section ?
(Notice that at the end all the args parsing are stored on the global "args" variable after you call the pasre_args(), clearml
will basically take those variables and put them into Args
section)
if executed remotely...
You mean cloning the local execution, sending to the agent, then when running on the agent the Args/command is updated to a list ?
Thanks @<1527459125401751552:profile|CloudyArcticwolf80> ! let me see if we can reproduce it
nice @<1724960458047229952:profile|EnergeticKoala33> !
The issue was that the agent was trying to start the docker but had no credentials to do that, your solution is exactly what was needed to be done
oh that makes sense.
I would add to your Task's docker startup script the following:
ls -la /.ssh
ls -la ~/.ssh
cat ~/.ssh/id_rsa
Let's see what you get
Hi @<1541954607595393024:profile|BattyCrocodile47>
I
do
have the SSH key placed at
/root/.ssh/id_rsa
on the machine,
Notice that the .ssh folder is mounted from the host (EC2 / GCP) into the container,
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh'
This is odd, why is it mounting it to /.ssh and not /root/.ssh ?
Actually, dumb question: how do I set the setup script for a task?
When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait π
I
do
have the SSH key placed at
/root/.ssh/id_rsa
on the machine,
@<1541954607595393024:profile|BattyCrocodile47> is the SSH key part of the containers? or are you saying it is on the EC2 instance ?