Reputation
Badges 1
25 × Eureka!I think this is great! That said, it only applies when you are spining agents (the default helm is for the server). So maybe we need another one? or an option?
It is way too much to pass on env variable π
/opt/clearml/data/fileserver
this is ion the host machine and it is mounted Into the container to /mnt/fileserer
BoredHedgehog47 were you able to locate the issue ?
What's the clearml version? Is this with the latest from GitHub?
Can you verify by adding the the following to your extra_docker_shell_script:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L152extra_docker_shell_script: ["echo machine example.com > ~/.netrc", "echo login MY_USERNAME >> ~/.netrc", "echo password MY_PASSWORD >> ~/.netrc"]
I think we should open a GitHub Issue and get some more feedback, maybe we should just add support in the backend side ?
Could it be you have two entries of "console_cr_flush_period" ?
Could you download and send the entire log ?
I mean just add the toy tqdm loop somewhere just before starting the lightning train function. I just want to verify that it works, or maybe there is something in the specific setup happening in real-time that changes it
Plan is to have it out in the next couple of weeks.
Together with a major update in v0.16
So basically the APIClient is a pythonic interface to the RestAPI, so you can do the following
See if this one works# stats from he last 60 seconds for worker in workers: print(client.workers.get_stats(worker_ids=[worker.id], from_date=int(time()-60),to_date=int(time()), interval=60, ))
Hi EnviousStarfish54
After the pop up do you see the plot on the web UI?
Hi UnevenDolphin73
Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?
You can add a link as an artifact, that is probably the easiest:tasl.upload_artifact(name="just link", artifact_object="
")
EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with
.datasets
. This creates several annoyances that I...
@<1523716917813055488:profile|CloudyParrot43> yes server upgrades deleted it π we are redeploying a copy, should take a few min
Hi DepressedChimpanzee34
This is not a query call, this is a reporting call. see docs below
https://clear.ml/docs/latest/docs/references/api/workers#post-workersstatus_report
It is used by the worker to report its own status.
I think this is what you are looking for:
https://clear.ml/docs/latest/docs/references/api/workers#post-workersget_stats
however can you see the inconsistency between the key and the name there:
Yes that was my point on "uniqueness" ... π
the model-key must be unique, and it is based on the filename itself (the context is known, it is inside the Task) but the Model Name is an entity, so it should have the Task Name as part of the entity name, does that make sense ?
I would say 4vCPUs and 512GB storage , but it really depends on the load you will put on it
Try to upload something to the file server ?
None
Ohh so even easier:print(client.workers.get_all())
from clearml import TaskTypes
That will only work if you are using the latest from the GitHub, I guess the example code was modified before a stable release ...
Hmm I suspect the 'set_initial_iteration' does not change/store the state on the Task, so when it is launched, the value is not overwritten. Could you maybe open a GitHub issue on it?
sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the assΒ
No worries I totally feel you.
As a quick hack in the actual code of the Task itself, is it reasonable to have:task = Task.init(....) task.set_initial_iteration(0)
EnviousStarfish54 data versioning on the open source leverages the artifacts and storage and caching capabilities of Trains.
A simple workflow
- Upload data
https://github.com/allegroai/events/blob/master/odsc20-east/generic/dataset_artifact.py - Preprocessing data
https://github.com/allegroai/events/blob/master/odsc20-east/generic/process_dataset.py - Using data
https://github.com/allegroai/events/blob/master/odsc20-east/scikit-learn/sklearn_jupyter.ipynb
Hi SpicyLion54
the -f flag is not very stabe for pip (and cannot be added in requirements.txt). ClearML agent mwill automatically find the correct torch (from the torch repository) based on the cuda it detects in runtime.
This means it automatically translates torch==1.8.1 and will pull form the correct repo based on torch support table.
clearml-task
Β seems does not allow me passing theΒ
run
Β argument without value
EnviousStarfish54 did you try --args run=True
I'm assuming run is a boolean of a sort ?
a. The submitted job would automatically download data from internal data repository, but it will be time consuming if data is re-downloaded every time. Does ClearML caching the data somewhere?
What do you mean by the agent will download the data ? are you referring to Dataset
?
Okay, some progress, so what is the difference ?
Any chance the issue can be reproduced with a small toy code ?
Can you run the tqdm loop inside the code that exhibits the CR issue ? (maybe some initialization thing that is causing it to ignore the value?!)
I see TrickyFox41 try the following:--args overrides="param=value"
Notice this will change the Args/overrides argument that will be parsed by hydra to override it's params