Reputation
Badges 1
25 × Eureka!So I wonder - why should an agent be related to a specific user's credentials? Is the right way to go about this is to create a "fake user" for the sake of the agent?
Very true you have to have credentials for the trains-agent, so it can "report" to the trains-server, that said, the creator of the Task (i.e. the person who cloned it) will be registered as the "user" in the UI.
I would recommend to create an "agent" user and put it's credentials on the trains-agent machine (the same way...
Our remote machine is Windows 10
JumpyDragonfly13 seems like the Windows 10 + docker is the issue (that would explain the OCI error)
Is this relevant ?
https://github.com/microsoft/WSL/issues/5100
LudicrousParrot69 this is implementation issue, this entire page is based on "task comparison" single Task means totally different interface for querying the data π
Actually you cannot breakpoint at "atexit" calls (or at least doesn't work with my gdb)
But I would add a few prints here:
https://github.com/allegroai/clearml/blob/aa4e5ea7454e8f15b99bb2c77c4599fac2373c9d/clearml/task.py#L3166
Does it wok if you remove the Task.init call?
Why can we even change the pip version in the clearml.conf?
LOL mistakes learned the hard way π
Basically too many times in the past pip versions were a bit broken, which is fine if they are used manually and users can reinstall a diff version, but horrible when you have an automated process like the agent, so we added a "freeze version" option, only with greater control. Make sense ?
have a CI/CD (e.g Github Actions) thats update my βproductionβ pipeline on ClearML UI,
I think this is the easiest way, basically the CI/CD launches a pipeline (which under the hood is another type of Task), by querying the latest "Published" pipeline that is also Not archived, then cloning+pushing it to execution queue.
In the UI when you want to "upgrade" the production pipeline you just right click "Publish" on the pipeline you want to launch. Another way is to do the same with Tags...
You can do:task = Task.get_task(task_id='uuid_of_experiment')task.get_logger().report_scalar(...)
Now the only question is who will create the initial Task, so that the others can report to it. Do you have like a "master" process ?
WickedGoat98 Same for me, let me ask the UI guys, I think this is a UI bug.
Also maybe before you post the article we could release a fix to both, what do you think?
EDIT:
Never mind π i just saw the medium link, very cool!!!
Hi SpotlessFish46 ,
Is the artifact already in S3 ?
Is the S3 configured as the default files_server in the trains.conf ?
You can always use the StorageManager upload to wherever and register the url on the artifacts.
You can also programmatically change the artifact destination server to S3, then upload the artifact as usual.
What would be the best natch for you?
Then you have to pass the .ssh into the remote server, probably the easiest is to have it in the "extra bash script"
currently I'm doing it by fetching the latest dataset, incrementing the version and creating a new dataset version
This seems like a very good approach, how would you improve ?
I'm assuming you mean for the clients, right?
WickedGoat98 what's the clearml version you are using?
WickedGoat98 the agent itself can be executed on bare metal, no need to setup a docker for it (although fully supported)
Specifically the docker compose has the docker running in services mode, i.e. for CPU light weight tasks such as running pipelines .
If the agent running on GPU, the easiest way to is run on bare metal
BoredHedgehog47 you need to configure the clearml k8s glue to spin pods (instead of allocating agents per pods statically) does that make sense ?
Good, so we narrowed it down. Now the question is how come it is empty ?
HandsomeCrow5 OMG the guys already added it to the debug samples as well, checkout the demo app (drop down "test html sample"):
https://demoapp.trains.allegro.ai/projects/4e7fef090aa849b1acc37d92b59b3360/experiments/83c9ed509f0e421eaadc1ef56b3af5b4/info-output/debugImages
BeefyCow3 if you are trying to optimizer a specific metric (i.e. a scalar on a graph). The template Task should report it with the same title/series combination, which should be easy enough to verify in the UI π
You can either report with Tensorboard or with the Trains Logger, either way will work.
GiganticTurtle0 quick update, a fix will be pushed, so that casting is based on the Actual value passed not even type hints π
(this is only in case there is no default value, otherwise the default value type is used for casting)
Is this per Task or for all the Tasks always ?
Yes, the agent's mode is global, i.e. all tasks are either inside docker or in venv. In theory you can have two agents on the same machine one venv one docker listening to two diff queues
Okay so my thinking is, on the pipelinecontroller / decorator we will have:abort_all_running_steps_on_failure=False (if True, on step failing it will abort all running steps and leave)
Then per step / component decorator we will havecontinue_pipeline_on_failure=False (if True, on step failing, the rest of the pipeline dag will continue)
GiganticTurtle0 wdyt?
well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?
Hi @<1523701295830011904:profile|CluelessFlamingo93>
from your log:
ImportError: cannot import name 'packaging' from 'pkg_resources' (/home/bat/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/pkg_resources/__init__.py)
I'm guessing yolox/setuptools
None
Try adding to the "Installed packages"
setuptools==69.5.1
(Something about the `setup...
VexedCat68 are you manually creating the OutputModel object?