Three options:
In your code: Task.init(..., output_uri='s3://.../'2. Configure a default output_uri to be used by all tasks: https://github.com/allegroai/clearml/blob/64042f6c4fdaaf15b6c5f816f2fbf50f89c313e2/docs/clearml.conf#L156
3. In the UI after you clone a Task under Execution tab, "output" "destination"
In all cases output_uri can be:
/mnt/share/folder (if you have a shared folder between all machines. http://trains-server:8081/ gs://bucket azure://bucket/
None of them is problematic, this is what I'm trying to say 🙂
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)
Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.Actually I am as well, this is Kubernets doing the resource scheduling and actually Kubernetes decided it is okay to run two pods on the Same GPU, which is cool, but I was not aware Nvidia already added this feature (I know it was in beta for a long time)
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
I also see thety added dynamic slicing and Memory Proteciton:
Notice you can control ...
Does StorageManager.upload and upload_artifact use the same methods?
Yes they both use StorageManager.upload
Is the only difference is task being async?
Two differences:
Upload being async Registering the artifact on the experiment. StorageManager will only upload, where as upload_artifact will make sure the file is registered as an artifact on the experiment, together with all of the artifacts properties.
What will I do to fix my problem?
What is the problem? we just proved the upload speed is just fine?
BTW: server-side vault is in progress, hopefully will be available in the upcoming releases :)
upload_artifact will actually do two things:
upload the file to the trains-server register it as an artifact on the experiment
What did you mean by "register the artifact manually"? You still need to upload the file to the trains-server (so it is later accessible )
there is probably some way to make an S3 path open up in the browser by default
You should have a pop-up asking for credentials ...
Could you check that if you add the credentials in the profile page it works ?
I'm assuming TF was not part of the original requirements, and was automatically pulled by one of the packages, hence the latest version ....
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?
The only problem os that the remote code did not install pandas , once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?
yes, TrickySheep9 use the k8s glue from here:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
Is that normal or a possible bug?
This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...
Hmm interesting, will pass it along to FE 🙂 3. That is nice! I wonder if this is built into the graph library
PS. I just noticed that this function is not documented. I'll make sure it appears in the doc-string.
JitteryCoyote63 I found it 🙂
Are you working in docker mode or venv mode ?
But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
that depends, how would the sibling packages get to a remote machine ?
In my understanding requests still go through
clearml-server
which configuration I left
DefiantHippopotamus88 actually this is Not correct.
clearml-server only acts as a control plane, no actual requests are routed to it, it is used to sync model state, stats etc. not part of the request processing flow itself.curl: (56) Recv failure: Connection reset by peerThis actually indicates 9090 port is not being listened to...
What's the final docker-compose you are usi...
Yes that's the reason, basically there is a background thread analyzing the code, at the end of the execution if it is till running (hence the question regrading execution time) we give it extra 10seconds to come up with answers, otherwise we terminate, so the code won't get stuck. Makes sense to you?
That's the theory, I still see it is not there
feature request: tell me what gets passed along each edge of the pipeline graph
Nice! please feel free to add to GH issue 🙂
Hmm yes that is odd, let me see if I can reproduce
Hi SparklingHedgehong28
What would be the use for "end of docker hook" ? is this like an abort callback? completion ?
instance protection
Do you mean like when instance just died (line spot in AWS) ?
😞 DilapidatedDucks58 how exactly are you "relaunching/continue" the execution? And what exactly are you setting?
Yea the "-e ." seems to fit this problem the best.
👍
It seems like whatever I add to
docker_bash_setup_script
is having no effect.
If this is running with the k8s glue, there console out of the docker_bash_setup_script ` is currently Not logged into the Task (this bug will be solved in the next version), But the code is being executed. You can see the full logs with kubectl, or test with a simple export test
docker_bash_setup_script
` export MY...
is everything on the same network?