Reputation
Badges 1
25 × Eureka!Hi JitteryCoyote63
I think this is the default python str() casting.
But you can specify the preview test when you call upload_artifact:
https://clear.ml/docs/latest/docs/references/sdk/task#upload_artifact
see preview argument
Where exactly are the model files stored on the pod?
clearml cache folder, usually under ~/.clearml
Currently I encounter the problem that I always get a 404 HTTP error when I try to access the model via the...
How are you deploying it? I would start by debugging and runnign everything in the docker-compose (single machine) make sure you have everything running, and then deploy to the cluster
(becuase on a cluster level, it could be a general routing issue, way before getting t...
I think it should look something like:files { gsc { contents: """{"type": "service_account", "project_id": "ai-platform", "private_key_id": "9999", "private_key": "-----BEGIN PRIVATE KEY-----==\n-----END PRIVATE KEY-----\n", "client_email": "a@ai.iam.gserviceaccount.com", "client_id": "111", "auth_uri": " ", "token_uri": " ", "auth_provider_x509_cert_url": " ", "client_x509_cert_url": " "}""" path: "~/gs.cred" } }
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
Hmm that makes sense to me, any chance you can open a github issue so we do not forget ? (I do not think it should be very complicated to fix)
well, it's only when adding aΒ
- name
Β to the template
Nonetheless it should not break it π
in the UI the installed packages will be determined through the code via the imports as usual ...
This is only in a case where a user manually executed their code (i.e. without trains-agent), then in the UI after they clone the experiment, they can click on the "Clear" button (hover over the "installed packages" to see it) and remove all the automatically detected packages. This will results in the trains-agent using the "requirements.txt".
Is it possible to make a connection to a S3 bucket via this authentication method with the open source version on EKS?
Hi BoredBluewhale23
In your setup, are we talking about agents running inside the Kubernetes cluster, or clients connecting from their own machine ?
Hi GrievingTurkey78
Can you test with the latest clearml-agent RC (I remember a fix just for that)pip install clearml-agent==1.2.0rc0
Can you copy the "Installed Packages" here, and point to the package causing the issue?
ShallowCat10 Thank you for the kind words π
so I'll be able to compare the two experiments over time. Is this possible?
You mean like match the loss based on "images seen" ?
Hi FriendlyKoala70 you can edit the installed package section and add the missing package. See more details on how trains-agent works here (although it's on conda the same rules apply for pip) https://github.com/allegroai/trains-agent/issues/8
Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-sto...
assume clearml has some period of time that after it, shows this message. am I right?
Yes you are π
is this configurable?
It is πtask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
Hi ScaryBluewhale66
TaskScheduler I created. The status is still
running
. Any idea?
The TaskScheduler needs to actually run in order to trigger the jobs (think cron daemon)
Usually it will be executed on the clearml-agent services queue/mahine.
Make sense ?
that really depends on hoe much data you have there, and the setup. The upside of the file server is you do not need to worry about credentials, the downside is storage is more expensive
...I'm not sure I follow, the clearml-task is designed to always be used so that at the end the agent will be running the Task. What am I missing?
In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future).
Hmm I see, wouldn't it make more sense to separate clients like a multi-tenant SAAS solution ?
Now I need to figure out how to export that task id
You can always look it up π
How come you do not have it?
ReassuredTiger98 yes this is odd:
also:Warning, could not locate PyTorch torch==1.12 matching CUDA version 115, best candidate 1.12.0.dev20220407Seems like it found a matching version and did not use it...
Let me check that
Hi WackyRabbit7
I believe this is fixed in clearml-server 1.1 (this is a plotly color issue), releasing later today or tomorrow π
save off the "best" model instead of the last
Should be relatively easy to update on the main Task the model with the best performance, no?
Hi NastyFox63 could you verify the fix works?pip install git+
. Could you clarify the question for me, please?
...
Could you please point me to the piece of ClearML code related to the downloading process?
I think I mean this part:
https://github.com/allegroai/clearml/blob/e3547cd89770c6d73f92d9a05696018957c3fd62/clearml/datasets/dataset.py#L2134
Hi @<1716987933514272768:profile|SuccessfulPuppy43>
How to make remote ClearML agent do
pip install -e .
in theory there is no need to do that clearml-agent adds the repo root folder to the python path.
If you insist on actually installing it, try to add to your "installed packages" section a "requirement.txt" compatible line:
-e .