Reputation
Badges 1
25 × Eureka!Hi @<1727497172041076736:profile|TightSheep99>
I think you are correct! it will use the internal individual file upload retry but does not let you control it.
Could you please open a github issue so that we do not forget to add it?
Thanks MuddyCrab47 !!!
I found it!
It turns out the artifact upload will always upload from stream (aka no multi-upload). I will make sure we fix it in the next RC (I think the plan is to have it out this weekend)
os.environ['CLEARML_PROC_MASTER_ID'] = ''
Nice catch! (I'm assuming you also called Task.init somewhere before, otherwise I do not think this was necessary)
I think i solved it by deleting the project and running the base_task one time before the hyper parameter optimzation
So isit working now? everything is there ?
None of them is problematic, this is what I'm trying to say π
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)
Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.Actually I am as well, this is Kubernets doing the resource scheduling and actually Kubernetes decided it is okay to run two pods on the Same GPU, which is cool, but I was not aware Nvidia already added this feature (I know it was in beta for a long time)
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
I also see thety added dynamic slicing and Memory Proteciton:
Notice you can control ...
I'm assuming TF was not part of the original requirements, and was automatically pulled by one of the packages, hence the latest version ....
I was expecting the remote experiment to behave similarly, why do I need to import pandas there?
The only problem os that the remote code did not install pandas , once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?
yes, TrickySheep9 use the k8s glue from here:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
Is that normal or a possible bug?
This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...
JitteryCoyote63 I found it π
Are you working in docker mode or venv mode ?
feature request: tell me what gets passed along each edge of the pipeline graph
Nice! please feel free to add to GH issue π
Hmm yes that is odd, let me see if I can reproduce
Hi SparklingHedgehong28
What would be the use for "end of docker hook" ? is this like an abort callback? completion ?
instance protection
Do you mean like when instance just died (line spot in AWS) ?
Hi @<1544853721739956224:profile|QuizzicalFox36>
Sure just change the ports on the docker compose
logger.report_scalar(title="loss", series="train", iteration=0, value=100)logger.report_scalar(title="loss", series="test", iteration=0, value=200)
Hi PanickyMoth78
dataset name is ignored if
use_current_task=True
Kind of, it stores the Dataset on the Task itself (then dataset.name becomes the Task name), actually we should probably deprecate this feature, I think this is too confusing?!
What was the use case for using it ?
I just set the git credentials in the
clearml.conf
and it works out of the box
git has issues with passing the user/token from the main repo to the submodules, hence my surprise that it is working out-of-the-box.
Do notice that if you are ussing ssh-key this is a none issue.
Nope, no
.netrc
defined anywhere, ...
If this is the case can you try to add the following to your "extra_vm_bash_script"
` echo machine example.com > ~/.netrc && echo log...
You can get a mutable copy of the entire dataset (original version), with get_mutable_copy() Then change the files on the returned directory, then create a new Dataset with the parent dataset as the original verison, then sync the folder.
You can also just update the specific file (without needing to download the entire original version)
Hi @<1600661423610925056:profile|StrongMouse81>
using serving base url and also other endpoint of model we add using:
clearml-serving model add
we get the attached respond:
And other model endpoints are working for you?
Hi AbruptHedgehog21
How i can add S3 credentials to S3 bucket in example.env for clearml-serving-triton? I need to add bucket name, keys and endpoint
Basically boto (s3) environment variables would just work:
https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving#advanced-setup---s3gsazure-access-optional
the hack doesn't work if conda is not installedΒ
Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing
Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace
And the assumption is the code is also there ?
Hi ElegantCoyote26 , yes I did π
It seems cometml puts their default callback logger for you, that's it.
ModelCheckpoint('best_model', save_best_only=True)That worked for me now, what's the diff
VictoriousPenguin97 I'm assuming the exact same server version ?
Hi RipeGoose2
Yes, the "services-mode" of an agent will take multiple Tasks, that said, these are "service" i.e. light CPU tasks, think pipeline controllers etc.
Simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.
Ohhh I missed that. What is the speed you get for uploading the artifacts to the server? (you can test it with simple toy artifact upload code) ?