Reputation
Badges 1
25 × Eureka!Thanks EnviousStarfish54 we are working on moving them there!
BTW, in the mean time, please feel free to open GitHub issue under train, at least until they are moved (hopefully end of Sept).
that machine will be able to pull and report multiple trials without restarting
What do you mean by "pull and report multiple trials" ? Spawn multiple processes with different parameters ?
If this is the case: the internals of the optimizer could be synced to the Task so you can access them, but this is basically the internal representation, which is optimizer dependent, which one did you have in mind?
Another option is to pull Tasks from a dedicated queue and use the LocalClearMLJob ...
Or use python:3.9 when starting the agent
This is probably the best solution 🙂
Hi WackyRabbit7
First always check the functions on the Task object, they are the most straight forward access to the system.
Then if you need general purpose API calls, currently they are only documented in the doc-string of the API schema (that said it should be quite documented)
You can check all the endpoints https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
And finally if you want to easily use the RestAPI :
` from trains.backend_api.session.client impo...
Dynamic GPU option only available with Enterprise version right?
Correct 🙂
Thank you DilapidatedDucks58 for the ping!
totally slipped my mind 😞
Hi DisgustedDove53
Now for the clearml-session tasks, a port-forward should be done each time if I need to access the Jupyter notebook UI for example.
So basically this is why the k8s glue has --ports-mode.
Essentially you setup a k8s service (doing the ingest TCP ports) then the template.yaml that is used by the k8s glue should specify said service. Then the clearml-session knows how to access the actual pod, by a the parameters the k8s glue sets on the Task.
Make sense ?
Hey GiganticTurtle0 ,
So basically the issue is the the pipeline function ( prediction_service
) is getting a dict as input, and it is expecting to get basic types... if you were to do the following, it would have worked as expected.prediction_service(**default_config)
I will make sure we flatten any dictionary so that we end up with config/start
, instead of a serialized version of the dict.
wdyt?
The only downside is that you cannot see it in the UI (or edit it).
You can now do:data = {'datatask': 'idhere'} task.connect(data, 'DataSection')
This will create another section named "DataSection" on the configuration tab. then you will be able to see/edit the input Task.id
JitteryCoyote63 what do you think?
Okay Now I get it!
Let me think about it for an hour or two 😄
NaughtyFish36
what's the error you are getting?
Also did you try setting: force_git_ssh_protocol: true
?
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L39
link to the line please 🙂
LudicrousParrot69 this is implementation issue, this entire page is based on "task comparison" single Task means totally different interface for querying the data 🙂
JitteryCoyote63 I meant to store the parent ID as another "hyper-parameter" (under its own section name) not the data itself.
Makes sense ?
an implementation of this kind is interesting for you or do you suggest to fork
You mean adding a config map storing a default trains.conf for the agent?
Hmm I suspect the 'set_initial_iteration' does not change/store the state on the Task, so when it is launched, the value is not overwritten. Could you maybe open a GitHub issue on it?
PanickyMoth78 'tensorboard_logger' is an old deprecated package that meant to create TB events without TB, it was created before TB was a separate package. Long story short, it is not supported. That said if you just run the same code and replace tensorboard_logger with tensorboard, you should see all scalars in the UI
background:
ClearML logs TB events as they are created in real-time, TB_logger is not TB, it creates events and dumps them directly into a TB equivalent event file
pip cache & git cache & venvs cache
Are all supported, you just need to map the folders.
If you do not want to spin a PVC with NFS mount, you can just mount an S3 bucket with s3fs as part of the container extra bash script,
https://github.com/allegroai/clearml-agent/blob/b39b54bbafab39e6731cb742fdf317bc6dcae54a/docs/clearml.conf#L140
s3 FUSE fuse filesystems:
https://github.com/kahing/goofys
https://github.com/s3fs-fuse/s3fs-fuse
WDYT?
LudicrousParrot69
Yes please add to GitHub 🙂 The problem is, if this is on single Task than we loose the ability have the nice interactive abilities (selecting diff scalars / parameters) etc...
Hi DilapidatedDucks58
apologies, this thread slipped way.
I double checked, there server will not allow you to overwrite it (meaning to have it fixed will need to release a server version which usually takes longer)
That said maybe we can pass an argument to the "Task.init" so it ignores it? wdyt?
Hi SubstantialElk6
What if I have OS library dependencies as well? (Apt install, rpm install...etc).
If these are OS libraries that you always need you can put them here:
https://github.com/allegroai/clearml-agent/blob/d9b9b4984bb8a83914d0ec6d53c86c68bb847ef8/docs/clearml.conf#L136agent.extra_docker_shell_script: ["apt-get install -y bindfs", ]
In the next version, this could be controlled on a per Task basis.
FYI: the default apt package that are installed:
` apt-get update
a...
ItchyJellyfish73
Unfortunately this needs backend support, and only available in the enterprise version, what is your use case for it? (It was designed to allow out of the box bare-metal multi gpu dynamic allocation, think DGX with 8 GPUs that instead of spinning down agents when you want to change the queue->num-gpu mapping you can do it on the fly)
Hi @<1523722267119325184:profile|PunySquid88> I guess it's a good thing we talk, because I believe that what you are looking for is already available :)
Logger.current_logger().report_media('title', 'series', iteration=1337, local_path='/tmp/bunny.mp4')
This will actually work on any file, that said, the UI might display the wrong icon (which will be fixed in the next version).
We usually think of artifacts as data you want to reuse, so all the files uploaded there are accessibl...
What will I do to fix my problem?
What is the problem? we just proved the upload speed is just fine?
MuddyCrab47 could you post the full sample code you are using?
There is some overhead, but it should be negligible.
I think there is a bug on the UI that causes series with "." to only use the first part of the series name for the color selection. This means "epsilon 0" and "epsilon 0.1" will always get the same color, and this will explain why it works on other graphs
HighOtter69 , let me check something
Hi SteadyFox10 the way it works is that Trains limits the debug image history by reusing the same files names, so the UI will only present the iterations where the debug images are relevant for. With your sample code it looks like it exposes a bug , the generated link should contain iteration number, it does not and so it overwrites the debug images every iteration. Here is the image link: https://demofiles.trains.allegro.ai/Test/test_images.6ed32a2b5a094f2da47e6967bba1ebd0/metrics/Test/te...