Reputation
Badges 1
25 × Eureka!Hmm TrickyRaccoon92 take a look at the cleanup service, I think you can hack it so instead of deleting the artifacts, it will archive them somewhere (also you can change the filter, maybe only perform on experiments with specific user tag)
What do you think?
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
Hi WickedElephant66
So I'm trying to upload an artefact to clearmlβs fileserver(I have a self hosted clearml server running),
Are you trying to upload an artifact? If so I would do:task.upload_artifact('local file', artifact_object="/path/to/file")
Or is it about Model files?
You can alst check how to upload artifacts / models here:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
https://github.com/allegroai/clearml/blob/master/examples/reporti...
Hi DepressedChimpanzee34 , took me a while but I think there is a solution:
In your docker file, replace:
https://github.com/allegroai/clearml-server/blob/a64c4d264d00eadd2d11818b37151d3cc6266d99/docker/docker-compose.yml#L5
withentrypoint: /bin/bash command: -c "mkdir -p /var/log/clearml && cd /opt/clearml/ && python3 -m apiserver.apierrors_generator && gunicorn -w 4 -t 600 --bind=0.0.0.0:8008 apiserver.server:app"
ConvolutedChicken69
basically the cleamrl-data needs to store an immutable copy of the delta changes per version, if the files are already uploaded, there is a good chance they could be modified...
So in order to make sure we you have a clean immutable copy, it will always upload the data (notice it also packages everything into a single zip file, so it is easy to manage).
Hi @<1523711619815706624:profile|StrangePelican34>
You can either report on the Model itself:
None
or you can force it on the Task:
task = Task.get_task("task id here")
task.mark_started(force=True)
task.get_logger().report_scalar(...)
task.mark_completed(force=True)
but DS in order for models to be uploaded,
you still have to set:
output_uri=True
in the
No, if you set the default_output_uri, there is no need to pass output_uri=True
in the Task.init()
π
It is basically setting it for you, make sense ?
Hi @<1541954607595393024:profile|BattyCrocodile47>
I
do
have the SSH key placed at
/root/.ssh/id_rsa
on the machine,
Notice that the .ssh folder is mounted from the host (EC2 / GCP) into the container,
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh'
This is odd, why is it mounting it to /.ssh and not /root/.ssh ?
Thank you so much!! π€©
That's with the key at
/root/.ssh/id_rsa
You mean inside the container that the autoscaler spinned ?
Notice that the agent by defult would mount the Host .ssh over the existing .ssh inside the container, if you do not want this behavior you need to set: agent.disable_ssh_mount: true
in clearml.conf
My bad, there is a mixture in terms.
"configuration object" is just a dictionary (or plain text) stored on the Task itself.
It has no file representation (well you could get it dumped to a file, but it is actually stored a s a blob of text on the Task itself, at the backend side)
Yes, sorry, that wasn't clear π
So I think there are two bugs here?
--args overrides="key=value" does not work request: add --hydra to override hydra arguments (and if this is added the first one is not needed)Is that correct?
at means I need to pass a single zip file toΒ
path
Β argument inΒ
add_files
Β , right?
actually the opposite, you pass a folder (of files) to add_files. Then add_files remembers the files location (and pre calculates the hash of the files content). When you call upload
it will actually compress the files that changed into a zip file (or files depending on the chunk size), and upload the files to the destination (as specified in the upload
call...
But they are all running inside the same pod, correct ?
DepressedChimpanzee34
I am actually curious now, why is the default like this? maybe more people are facing similar bottlenecks?
On "regular" load there is no need for multiple processes, and the memory consumption might be more important than reply lag (at least before you start to scale)
DisturbedWalrus17
By spawning multiple processes for the API server, it looks like we utilise the CPU more now but the UI and API calls are still lagging a lot
Can you try with even more ...
Hmm let me check something
Hi MagnificentSeaurchin79
This sounds like a deeper bug (of a sort), I think the best approach is to open a GitHub issue with some code that can reproduce this behavior, or at least enough information so that we could try to catch the bug.
This way we will make sure it is not forgotten.
Sounds good ?
and about a month later for some reason the initial iteration seems to have changed to 0
Hmm, I see your point. Just so I fully understand, your are not saying Old experiments were changed, but new experiments (running the same code-ish) have a totally different max iterations value. Is this correct ?
What does spin mean in this context?
This line:docker-compose --env-file example.env -f docker-compose-triton-gpu.yml up
But these have: different task ids, same endpoints (from looking through the tabs)
So I am not sure why they are here and why not somewhere else
You can safely ignore them for the time being π
but is it true that I can have multiple models on the same docker instance with different endpoints?
Yes! this is exactly the idea (and again I'm not sure ...
The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).
Understood, basically the moment we add nested project view to the dataset (and pipelines for that matter, and both are already being worked on), it should solve everything. Is that correct?
Hi UnevenDolphin73
Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?
You can add a link as an artifact, that is probably the easiest:tasl.upload_artifact(name="just link", artifact_object="
")
EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with
.datasets
. This creates several annoyances that I...
Hi @<1569858449813016576:profile|JumpyRaven4>
task.add_requirements()
This is the problem, if you look closely this is a class method, meant for helping the Task.init better capture python packages, it does Not change the task requirements.
To do that, use " task.set_packages
"
"
This is Not a an S3 endpoint... what is the files server you configured for it?
JumpyPig73 Do you see all the configurations under the Args section in the "Configuration" Tab ?
(Maybe I'm wrong and the latest RC does Not include the python-fire support)
What is the specific use case, updating a file on existing dataset and creating a new version?
To auto upload the model you have to tell clearml to upload it somewhere, usually by passing output_uri to Task.init or setting the default_output_uri in the clearml.conf
Yes this is Triton failing to load the actual model file