I'm trying to decide if ClearML is a good use case for my team ๐
Right now we're not looking for a complete overhaul into new tools, just some enhancements (specifically, model repository, data versioning).
We've been burnt by DVC and the likes before, so I'm trying to minimize the pain for my team before we set out to explore ClearML.
That's fine for the current use-case I believe.
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
If I add the bucket to that (so CLEARML_FILES_HOST=
s3://minio_ip:9000/minio/bucket ), I then get the following error instead --
2021-12-21 22:14:55,518 - clearml.storage - ERROR - Failed uploading: SSL validation failed for
... [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076)
Ah, uhhhh whatever is in the helm/glue charts. I think itโs the allegroai/clearml-agent-k8s-base
, but since I hadnโt gotten a chance to try it out, itโs hard to say with certainty which would be the best for us ๐
Is it CLEARML_CONFIG_FILE
? (I had to dig this from the GH code ๐
)
Or if it wasn't clear, that chunk of code is from clearml's dataset.py
I've also followed https://clearml.slack.com/archives/CTK20V944/p1628333126247800 but it did not help
I will TIAS, but maybe worthwhile to also mention if it has to be the absolute path or if relative path is fine too!
Honestly I wouldn't mind building the image myself, but the glue-k8s setup is missing some documentation so I'm not sure how to proceed
I think so, it was just missing from the official documentation ๐ Thanks!
Any updates @<1523701087100473344:profile|SuccessfulKoala55> ? ๐ซฃ
AFAICS it's quite trivial implementation at the moment, and would otherwise require parsing the text file to find some references, right?
https://github.com/allegroai/clearml/blob/18c7dc70cefdd4ad739be3799bb3d284883f28b2/clearml/task.py#L1592
Why not give ClearML read-only access credentials to the repository?
Right so this is checksum based? Are there plans to only store delta changes for files (i.e. store the changed byte instead of the entire file)?
Just because it's handy to compare differences and see how the data changed between iterations, but I guess we'll work with that ๐
We'll probably do something like:
When creating a new dataset with a parent (or parents), look at immediate parents for identically-named files If those exist, load those with matching framework (pyarrow, pandas, etc), and log differences to the new dataset ๐
I also tried switching to dockerized mode now, getting the same issue ๐ค
I opened a GH issue shortly after posting here. @<1523701312477204480:profile|FrothyDog40> replied (hoping I tagged the right person).
We need to close the task. This is part of our unittests for a framework built on top of ClearML, so every test creates and closes a task.
Yes exactly that AgitatedDove14
Testing our logic maps correctly, etc for everything related to ClearML
- in the second scenario, I might have not changed the results of the step, but my refactoring changed the speed considerably and this is something I measure.
- in the third scenario, I might have not changed the results of the step and my refactoring just cleaned the code, but besides that, nothing substantially was changed. Thus I do not want a rerun.Well, I would say then that in the second scenario itโs just rerunning the pipeline, and in the third itโs not running it at all ๐
(I ...
Well, -ish. Ideally what we're after is one of the following:
Couple a task with a dataset. Keep it visible in it's destined location. Create a dataset separately from the task. Have control over its visibility and location. If it's hidden, it should not affect normal UI interaction (most annoying is having to click twice on the same project name when there are hidden datasets, which do not appear in the project view)
I'm using 1.1.6 (upgraded from 1.1.6rc0) - should I try 1.1.7rc0 or smth?
Ah I see, if the pipeline controller begins in a Task it does not add the tags to itโฆ
Still failing with 1.2.0rc3 ๐ AgitatedDove14 any thoughts on your end?
Yeah I managed to work around those former two, mostly by using Task.create
instead of Task.init
. It's actually the whole bunch of daemons running in the background that takes a long time, not the zipping.
Regarding the second - I'm not doing anything per se. I'm running in offline mode and I'm trying to create a dataset, and this is the error I get...
There is a data object it, but there is no script object attached to it (presumably again because of pytest?)
Okay this was a deep dive into clearml-agent code ๐
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old (Python 3.6.9 and Python 3.8 had latest virtualenv, but Python 3.7.5 had an old virtualenv).
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
As a result -> Could the agent maybe also output the virtualenv
version used with setting up the environment for the first time?
I think I may have brought this up multiple times in different ways :D
When dealing with long and complicated configurations (whether config objects, yaml, or otherwise), it's often useful to break them down into relevant chunks (think hydra, maybe).
In our case, we have a custom YAML instruction !include
, i.e.
` # foo.yaml
bar: baz
bar.yaml
obj: !include foo.yaml
maybe_another_obj: !include foo.yaml `
Say I upload each of these yamls as a configuration object (as with the above). Once I try to load bar.yaml remotely it will crash, since foo.yaml is missing (and is instead a clearml configuration object).
Does that make sense?
Yes, exactly! I've added instructions for the users on creating their account and running clearml-init
, and then they run the snippet that updates the api and sdk sections.
Or did you mean I can couple a short "mini config" with the package and redirect clearml to use this local one (instead of the one at ~/clearml.conf)?
I'll have yet another look at both the latest agent RC and at the docker-compose, thanks!
There was no "default" services agent btw, just the queue, I had to launch an agent myself (not sure if it's relevant)
Hey @<1523701070390366208:profile|CostlyOstrich36> , thanks for the reply!
Iโm familiar with the above repo, we have the ClearML Server and such deployed on K8s.
Whatโs lacking is documentation regarding the clearml-agent helm chart. What exactly does it offer, etc.
Weโre interested in e.g. using karpenter to scale our deployments per demand, effectively replacing the AWS autoscaler.