Reputation
Badges 1
25 × Eureka!Hmm I assume it is not running from the code directory...
(I'm still amazed it worked the first time)
Are you actually using "." ?
GrievingTurkey78 where do you see this message? Can you send the full server log
?
So the TB issue was reported images were not logged.
We are now talking about the caching, which is actually a UI thing which clearml-server version are you using ?
And where are the images stored (the default files server or is it S3/GS etc.) ?
Out of curiosity, if Task flush worked, when did you get the error, at the end of the process ?
Thanks @<1523701868901961728:profile|ReassuredTiger98>
From the log this is what conda is installing, it should have worked
/tmp/conda_env1991w09m.yml:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
-...
This is odd, what is the parameter?
I assume it needs sorting and one time this is Integer, and the next it is a String, so the server cannot sort based on it. Could that be ?
MelancholyElk85
After I set base docker for pipeline controller task, I cannot clone the repo...
What do you mean by that?
Also, how do you set the PipelineController base_docker_image (I'm assuming the is needed to run the pipeline logic?!, is that correct?)
So I'm gusseting the cli will be in the folder of python:import sys from pathlib2 import Path (Path(sys.executable).parent / 'cli-util-here').as_posix()
Getting the last checkpoint can be done via.
Task.get_task(task_id='aabbcc').models['output'][-1]
This task is picked up by first agent; it runs DDP launch script for itself and then creates clones of itself with task.create_function_task() and passes its address as argument to the function
Hi UnevenHorse85
Interesting use case, just for my understanding, the idea is to use ClearML for the node allocation/scheduling and PyTorch DDP for the actual communication, is that correct ?
passes its address as argument to the function
This seems like a great solution.
the queu...
The fact is that I use docker for running clearml server both on Linux and Windows.
My question was on running the agent, is it running with --docker flag, i.e. docker mode
Also, just forgot to note, that I'm running clearml-agent and clearml processes in virtual environment - conda environment on Windows and venv on Linux.
Yep that answers my question above ๐
Does it make any sense to chdngeย
system_site_packages
ย toย
true
ย if I r...
without the ClearML Server in-between.
You mean the upload/download is slow? What is the reasoning behind removing the ClearML server ?
ClearML Agent per step
You can use the ClearML agent to build a socker per Task, so all you need is just to run the docker. will that help ?
GrievingTurkey78 Actually it is in progress, see the GitHub issue for details:
https://github.com/allegroai/trains/issues/219
Yeah @<1689446563463565312:profile|SmallTurkey79> is right, reverting to image is the safest way to get exactly the same...
btw, @<1791277437087125504:profile|BrightDog7> if you can produce a standalone example of reporting the data, we can probably fix whatever is broken in the auto convert, or at least revert to image based automatically (basically if the plot is simple enough it will try to convert it, otherwise it will automatically revert to image internally)
If it cannot find the Task ID I'm guessing it is trying to connect to the demo server and not your server (i.e. configuration is missing)
Hi @<1649221394904387584:profile|RattySparrow90>
: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod
Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set
So that they are physically lying there as a file I can see in the filesystem?
They are, and cached there
Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?
I...
and I install the tar
I think the only way to do that is add it into the docker bash setup script (this is a bash script executed before Task)
any idea why i cannot selected text inside the table?
Ichh, seems again like plotly ๐ I have to admit quite annoying to me as well ... I would vote here: None
Sorry my bad, you are looking for:
None
hit ctrl-f5 (reload the page) do you still ge the same error? Is it limited to a specific experiment?
yes you are correct, I would expect the same.
Can you try manually importing pt, and maybe also moving the Task.init before darts?
I want to be able to delete only the logs since they are taking a lot of space in my case.
I see... I do not think this is possible ๐
You can disable the auto logging though ... pass auto_connect_streams=False to Task.init
Hi @<1798887585121046528:profile|WobblyFrog79>
. When I execute the pipeline remotely in Kubernetes, those components
two things, one, make sure you specify the repo you need the components from in the decorator function, what will happen is the repo will be cloned into the container running on k8s, then inside the repo root your script (i.e. pipeline step) will be running.
[None](https://github.com/clearml/clearml/blob/9c93aa9e538075c848647dcd88e3e12bec051b5f/clearml/automation/con...
My main issue with this approach is that it breaks the workflow into โa-syncโ set of tasks:
This is kind of the way you depicted it, meaning, there is an an initial dataset, "offline process" (i.e. external labeling) then, ingest process.
I was wondering if the โwaitingโ operator can actually be a part of the pipeline.
This way it will look more clear what is the workflow we are executing.
Hmm, so pipeline is "aborted", then the trigger relaunches the pipeline, and the pipeli...
Hi BeefyHippopotamus73
. I checked the template task and the list of โInstalled Packagesโ indeed does not have one of my required packages in the list.
Basically the "installed packages" is auto populated based on the directly imported packages n your code base.
Could it be you do not have import snowflake-connector-python and this is a derivative package (i.e. required from a different package)
BTW: when you clone your Task in the UI you can edit and add the missing packages,...
default is clearml data server
Yes the default is the clearml files server, what did you configure it to ? (e.g. should be something like None )
LOL, if you can get it to run any python code, I can help with the rest. We just need to make sure we can capture the output, and then start the VScode remote debugging feature directly from the extension.