Reputation
Badges 1
25 × Eureka!So basically take data from tensorboard read it, and report it to the cloud ?
store_code_diff_from_remote
Β don't seem to change anything in regards of this issue
Correct, it is always from remote
i'll be using the update_task, that worked just fine, thanksΒ
Β (edite
Sure thing.
ShakyJellyfish91 , I took a quick look at the diff between the versions can you hack a non working version (preferably the latest) and verify the issue for me?
SoreDragonfly16 the torchvision warning has nothing to do with the Trains
warning.
The Trains warning means that somehow someone changes the state of the Task from running (in_progress) to "stopped" (aborted). Could it be one of the subprocesses raised an exception ?
Hi ScaryKoala63
Sure, add the following to your clearml.conf:sdk.storage.cache.default_cache_manager_size = 400
I think you are correct, it seems like for some reason you hit the cache limit, and a previous entry was deleted
Yeah you can ignore those, this is some python GC stuff, seems to be related with the OS and python version
GrotesqueOctopus42
The problem is that when I import some function from a file in another folder, that task doesn't catch the files depencies.
Just to be clear, if this is another file, you have to have all the files in the same git repo for the agent to actually be able to fetch them on the remote machine.
If you have a mix of notebooks and code, you have to have the local code in a git repo,
Make sense ?
UnsightlyShark53 See if this one solves the problem :)
BTW: the reasoning for the message is that when running the task with "trains-agent" if the parsing of the argparser happens before the the Task is initialized, the patching code doesn't know if it supposed to override the values. But this scenario was fixed a long time ago, and I think the error was mistakenly left behind...
Is this consistent on the same file? can you provide a code snippet to reproduce (or understand the flow) ?
Could it be two machines are accessing the same cache folder ?
Hi DefeatedCrab47
Not really sure, and it smells bad ...
Basically you can always use the TB logger, and call Task.init.
I hope you can do this without containers.
I think you should be fine, the only caveat is CUDA drivers, nothing we can do about that ...
Hi @<1547028116780617728:profile|TimelyRabbit96>
Notice that if running with docker compose you can pass an argument to the clearml triton container an use shared mem. You can do the same with the helm chart
Sounds good to me. DepressedChimpanzee34 any chance you can add a github feature request, so we do not forget to add it?
would I have to execute each task in the pipeline locally(but still connected to trains),
Somehow you have to have the pipeline step Task in the system, you can import it from code, or you can run it once, then the pipeline will clone it and reuse it. Am I missing something ?
Those variables are not passed to the remote instance they are used by the aws autoscaler to launch it, but there is no need to pass them.
I think the easiest is to add them to the "extra_vm_bash_script" as well
In Azure VMSS, there is a method called "Custom Data", which is basically a way of passing things to be executed
I know that it is in the to do list to add "azure_autoscaler" which is basically asybling to the aws_autoscaler.
With the same idea of the "custom data" as initial bash script:
You can check here:
https://github.com/allegroai/clearml/blob/4a2099b53c09d1feaf0e079092c9e075b43df7d2/clearml/automation/aws_auto_scaler.py#L54
And you are seeing a bunch of the GS SSL errors?
Yep π but only in RC (or github)
PompousParrot44
It should still create a new venv, but inherit the packages from the system-wide (or specific venv) installed packages. Meaning it will not reinstalled packages you already installed, but it will ive you the option of just replacing a specific package (or install a new one) without reinstalling the entire venv
I see them run reliably (no killed), are they running in service mode?
How do you deploy agents, with the clearml k8s glue ?
Decorators are good π
Something along the lines of
` @PipelineDecorator.pipeline(...)
def pipeline(skip_a=False):
if not skip_a:
a = step_a()
else:
# somehow get a previous A?
# let's call it cached A
a = "replace with real'
step_b(a)
... `Is this the gist?
If it is, this looks like, "how can I control whether A is cached or not", is that correct?
Hi @<1523711619815706624:profile|StrangePelican34>
You can either report on the Model itself:
None
or you can force it on the Task:
task = Task.get_task("task id here")
task.mark_started(force=True)
task.get_logger().report_scalar(...)
task.mark_completed(force=True)
Let's try:
` echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && b...
How can i get loaded model in Preporcess class in ClearML Serving?
ComfortableShark77
You mean your preprocess class needs a python package or is it your own module ?
Hi @<1643423185791619072:profile|DashingCentipede5>
Notice that you called "start_locally", it tries to run the code locally inside your jupter notebook, it assumes everything including code already exists, is that your case ?
Hi DeliciousKoala34
I am using Pycharm and i have set up the clear-ml plugin, but it still doesnt work.
Did you provide the key/secret to the plugin? I think this is a must for it to actually work