Reputation
Badges 1
25 × Eureka!I guess itβs on me to check whether this slowdown is negligible or not
Usually performance is negligible, especially with GPU
But if you really want the best:
Add --security-opt seccomp=unconfined to the extra_docker_arguments
See detials:
https://betterprogramming.pub/faster-python-in-docker-d1a71a9b9917
Actually, dumb question: how do I set the setup script for a task?
When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait π
Hi TastyOwl44
So this depends on your code itself, but usually you need a CPU machine to run ClearML server (or use the free community server), than a machine to run the pipeline controller (usually the same machine running the clearml-server , as the pipeline control code is basically controller only and does not execute the Task itself), lastly you need machines with GPU running the clearml-agent (these GPU machines are the one actually doing the training inference etc.)
Make ...
Hi @<1572395181150310400:profile|DeterminedHare56>
Yes Slack is not the best for knowledge sharing, but it is the easiest for users to communicate over, and it is the easiest to setup and scale.
Specifically you can find historical log of the Slack channel here: None
Which we hoped google will index, but seems like this is still not working as expected, if you have any inputs it will be great to improve it
AFAICS it's quite trivial implementation at the moment, and would otherwise require parsing the text file to find some references, right?
Yes, but the main issue is the parsing, it needs to have a specific standard. We use HOCON because it is great to read and edit (basically JSON would be a subset of HOCON)
the original pyhocon does support include statements as you mentioned -
Correct, my thinking was to expand them into "@configuration_section.key" or something of that nature
DefiantHippopotamus88HTTPConnectionPool(host='localhost', port=8081):This will not work because inside the container of the second docker compose "fileserver" is not definedCLEARML_FILES_HOST=" "
You have two options:
configure to the docker compose to use the networkhost on all conrtainers (as oppsed to the isolated mode they are now running ing)2. Configure all of the CLEARML_* to point to the Host IP address (e.g. 192.168.1.55) , then rerun the entire thing.
which was trained on jupyter notebook.
Hmm that might be the issue, it assumes a local script running, let me verify that
Hi GrievingTurkey78task.models['output'][-1] should return the last stored model.
What do you have under under task.models['output'][-1].url
ouch, I think you are correct, can you test a fix?
Do you have any advice for this step, (monitoring)? I feel like it's not very well documented.
Yeah I think it is complicated.
I would start with the example here: None
Basically what it does is create histogram over time of the values the Rest API gets. Then in graphana it visualizes those values.
Notice that the request latency / frequency are automatically logged ...
yes, so it does exist the local process (at least, the command returns),
What do you mean the command returns ? are running the scipt from bash and it returns to bash ?
Hi ExcitedFish86
In Pytorch-Lightning I use DDP
I think a fix for pytorch multi-node / process distribution was commited to 1.0.4rc1, could you verify it solves the issue ? (rc1 should fix this specific issue)
BTW: no problem working with cleaml-server < 1
I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible
So I tested the "old" code that did the parsing and matching, and it did resolve to the correct wheel (i.e. found that there is no 117 only 115 and installed this one)
I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt?
but I cannot compare between them
I think we noticed it, and this will be fixed in the next server update (again, some plotly.js issue there)
so 78000 entries ...
wow a lot! would it makes sens to do 1G chunks ? any reason for the initial 1Mb chunk size ?
2021-07-11 19:17:32,822 - clearml.Task - INFO - Waiting to finish uploads
I'm assuming a very large uncommitted changes π
What's the clearml version? Is this with the latest from GitHub?
HighOtter69 inside the legend click on the color rectangle next to the series name, you can change the color of the series on the graph. This property is stored so it will always remember your color preferences (yes even logging from another machine π )
SillyPuppy19 are you aborting the experiment or are you trying to protect crash? Is it like a callback functionality you are looking for?
Ohh yes, if you deleted the token then you have to recreate the cleaml.conf
BTW: no need to generate a token, it will last π
Isn't that risky? not knowing you need a package ?
How do you actually install it on the remote machine with the agent ?
AstonishingWorm64 can you share the full log (In the UI under Results/Console there is a download button)?
MuddySquid7 you mean you are creating them with TB ? or are you uploading them as debug images ?
Specifically in the ClearML UI, do you have it under "plots" tab or "debug samples" tab ?
Hi RobustHippopotamus53
The way "latest from branch" works:
On the Task you specify the branch name (e.g. "master", no need to add the origin/ prefix) The agent then pulls the latest commit from that branch and updates back the Task to the current commit ID (the latest on the branch at the time of execution) This process ensures reproduciblity and traceability as we can always be certain the exact commit that was executed.Could it be the you "forced-push" a commit/squash, hence the "origina...
TBH ClearML doesn't seem to be picking the model up so I need to do it manually
This is odd, cleamrl will pick framework level serialization, but not just any pickle call
Why do I need an output_uri for the model saving? The dataset API can figure this out on its own
So that it knows where to upload it, if your are setting True this will be the default files server, you can also set iy for shared files system, S3 GCP storage etc.
If no value is passed, it will just log th...