Reputation
Badges 1
25 × Eureka!Could I use "register artifact"
I think this is somewhat deprecated and we should probably replace it with something similar to what you mentioned (i.e. watch a file change).
Right now the easiest way would e to manually upload the trainer_state.json
every checkpoint:Task.current_task().upload_artifact('trainer_state.json
, name='state') `
Hi @<1529271085315395584:profile|AmusedCat74>
ClearML Scheduler where it doesn't reuse the task
What do you mean by doesn't reuse the Task, do you mean you want each time the scheduler is launched to basically overwrite the previous run ?
What is the link you are seeing there?
Hi @<1541954607595393024:profile|BattyCrocodile47>
Do you mean to start a remote session instead of the cli directly from the vscode ui and connect to it? If so, that would be awesome!! We have a remote session from the web were it spins you remote session and launches vscode inside the container so you work on it in your browser. But a VSCode plugin is a great idea, do you have a ref code to similar plugins?
Hi NaughtyFish36
c++ module fails to import, anyone have any insight? required c++ compilers seem to be installed on the docker container.
Can you provide log for the failed Task?
BTW: if you need build-essentials
you can add it as the Task startup scriptapt-get install build-essentials
Yes please π
BTW: I originally thought the double quotes (in your PR) were also a bug, this is why I was asking, wdyt?
Btw it seems the docker runs in
network=host
Yes, this is so if you have multiple agents running on the same machine they can find a new open port π
I can telnet the port from my mac:
Okay this seems like it is working
UnevenDolphin73 are you positive, is this reproducible? What are you getting?
Hi DeliciousBluewhale87
When you say "workflow orchestration", do you mean like a pipeline automation ?
Yes, albeit not actually "intercept" as the user will be able to directly put Task sin queues B_machine_a/B_machine_b , but any time the user is pushing Tasks into queue B, this service will pull it and push to the individual machines queue.
what do you think?
Hi EnviousStarfish54
The Enterprise edition extends Trains functionality.
It adds security, scale and full data management (data management and versioning being the key difference)
You can get it as a saas solution or on prem.
If you need more information, you can leave contact details on the website, I'm sure sales will be happy to help :)
I found the issue, the first run it jumps over the first day (let me check if we can quickly fix that)
JitteryCoyote63 the new wizard was pushed, you can check it out here:
https://github.com/allegroai/trains/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
BTW: next release to include it all is next week (hopefully :))
My main issue with this approach is that it breaks the workflow into βa-syncβ set of tasks:
This is kind of the way you depicted it, meaning, there is an an initial dataset, "offline process" (i.e. external labeling) then, ingest process.
I was wondering if the βwaitingβ operator can actually be a part of the pipeline.
This way it will look more clear what is the workflow we are executing.
Hmm, so pipeline is "aborted", then the trigger relaunches the pipeline, and the pipeli...
Basically when running remotely, the first argument to any configuration (whether object or string, or whatever) is ignored, right?
Correct π
Is there a planned documentation overhaul?
you mean specifically for the connect_configuration ? or in general on the connect
approach rationale ?
No I was was pointing out the lack of one
Sounds like a great idea, could you open a github issue (if not already opened) ? just so we do not forget
set the pytorch lightning trainer argument
log_every_n_steps
to
1
(default
50
) to prevent the ClearML iteration logger from timing-out
Hmm that should not have an effect on the training time, all logs are send in the background, that said checkpoints might slow it a bit (i.e.; i...
for future reference this is indeed a PEP-610 related bug, f
π
can we also set theΒ
poetry
Β version used?....
Actually the agent assumes poetry is preinstalled (so whatever you already have on the docker) ...
That said, maybe we should install a specific version (after installing pip, we could do that if poetry is selected)
wdyt ?
Why can't it be updated after creation?
You can but then you have to rerun it again. I mean technically this is obviously solvable, but the idea was to make it simple to use, and since we "assume" in most cases there is a single Task per execution, it made sense. wdyt?
In the UI you can edit the base container image + add "SETUP SHELL SCRIPT", with any missing "apt update && apt-get install -y ..."
Hi JitteryCoyote63
Yes I think you are correct, since torch is installed automatically as a requirement by pip, the agent is not aware of it, so it cannot download the correct one.
I think the easiest is just to add the torch as additional package# call before Task.init() Task.add_requirements(package_name="torch", package_version="==1.7.1")
I have mounted my s3 bucket at the location /opt/clearml/data/fileserver/ but I can see my data is not being stored in s3 but its storing in ebs. How so?
I'm assuming the mount was not successful
What you should see is a link to the files server inside clearml, and actual files in your S3 bucket
how to put or handle this configuration and where?
In your clearml.conf on the machine with the agent just add at the bottom of the file agent.venvs_cache.path=~/.clearml/venvs-cache
Hi @<1569496075083976704:profile|SweetShells3>
Are you using the standard docker-compose ? are using the default elastic container ?
What exactly changed ?
The pod has an annotation with a AWS role which has write access to the s3 bucket.
So assuming the boto environment variables are configured to use the IAM role, it should be transparent, no? (I can't remember what the exact envs are, but google will probably solve it π _
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN. I was expecting clearml to pick them by default from the environment.
Yes it should, the OS env will always override the configuration file sect...