Reputation
Badges 1
25 × Eureka!Hi SmugOx94
Hmm are you creating the environment manually, or is it done by Task.init ?
(Basically Task.init will store the entire environment of conda, and if the agent is working with conda package manager it will use it to restore it)
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L50
Hi FancyWhale93 you can disable the auto model uploading with@PipelineDecorator.component(..., auto_connect_frameworks={'pytorch': False}) def step(): pass
CloudyHamster42 what's the trains-server version ?
okay so it is downloaded to your machine, and unzipped , is that part correct?
CrookedWalrus33 I found the issue, this is only failing with Python 3.6.
Let me check something
Hi TeenyFly97
Can I super-impose the graphs while comparing experiments?
Hmm not at the moment, I think someone asked for the option to control it, in both comparison mode and "standalone" mode.
There is a long discussion on this feature here:
https://github.com/allegroai/trains/issues/81#issuecomment-645425450
Feel free to chime in π
I think that the latest agreement is a switch in the UI, separating or collecting (super-imposing) those graphs.
BTW: if you could implement _AzureBlobServiceStorageDriver with the new Azure package, it will be great:
Basically update this class:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L1620
PipelineController creates another Task in the system, that you can later clone and enqueue to start a process (usually queuing it on the "services" queue)
pywin32 isnt in my requirements file,
CloudySwallow27 whats the OS/env ?
(pywin32 is not in the direct requirements of the agent)
Hi JitteryCoyote63
I think this is the default python str() casting.
But you can specify the preview test when you call upload_artifact:
https://clear.ml/docs/latest/docs/references/sdk/task#upload_artifact
see preview argument
Where exactly are the model files stored on the pod?
clearml cache folder, usually under ~/.clearml
Currently I encounter the problem that I always get a 404 HTTP error when I try to access the model via the...
How are you deploying it? I would start by debugging and runnign everything in the docker-compose (single machine) make sure you have everything running, and then deploy to the cluster
(becuase on a cluster level, it could be a general routing issue, way before getting t...
I think it should look something like:files { gsc { contents: """{"type": "service_account", "project_id": "ai-platform", "private_key_id": "9999", "private_key": "-----BEGIN PRIVATE KEY-----==\n-----END PRIVATE KEY-----\n", "client_email": "a@ai.iam.gserviceaccount.com", "client_id": "111", "auth_uri": " ", "token_uri": " ", "auth_provider_x509_cert_url": " ", "client_x509_cert_url": " "}""" path: "~/gs.cred" } }
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
Hmm that makes sense to me, any chance you can open a github issue so we do not forget ? (I do not think it should be very complicated to fix)
well, it's only when adding aΒ
- name
Β to the template
Nonetheless it should not break it π
in the UI the installed packages will be determined through the code via the imports as usual ...
This is only in a case where a user manually executed their code (i.e. without trains-agent), then in the UI after they clone the experiment, they can click on the "Clear" button (hover over the "installed packages" to see it) and remove all the automatically detected packages. This will results in the trains-agent using the "requirements.txt".
Is it possible to make a connection to a S3 bucket via this authentication method with the open source version on EKS?
Hi BoredBluewhale23
In your setup, are we talking about agents running inside the Kubernetes cluster, or clients connecting from their own machine ?
Hi GrievingTurkey78
Can you test with the latest clearml-agent RC (I remember a fix just for that)pip install clearml-agent==1.2.0rc0
Can you copy the "Installed Packages" here, and point to the package causing the issue?
ShallowCat10 Thank you for the kind words π
so I'll be able to compare the two experiments over time. Is this possible?
You mean like match the loss based on "images seen" ?
Hi FriendlyKoala70 you can edit the installed package section and add the missing package. See more details on how trains-agent works here (although it's on conda the same rules apply for pip) https://github.com/allegroai/trains-agent/issues/8
Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-sto...
assume clearml has some period of time that after it, shows this message. am I right?
Yes you are π
is this configurable?
It is πtask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
Hi ScaryBluewhale66
TaskScheduler I created. The status is still
running
. Any idea?
The TaskScheduler needs to actually run in order to trigger the jobs (think cron daemon)
Usually it will be executed on the clearml-agent services queue/mahine.
Make sense ?
that really depends on hoe much data you have there, and the setup. The upside of the file server is you do not need to worry about credentials, the downside is storage is more expensive
...I'm not sure I follow, the clearml-task is designed to always be used so that at the end the agent will be running the Task. What am I missing?