This was a long time running since I could not access the macbook in question to debug this.
It is now resolved and indeed a user error - they had implicitly defined CLEARML_CONFIG_FILE to e.g. /home/username/clearml.conf instead of /Users/username/clearml.conf as is expected on Mac.
I guess the error message could be made clearer in this case (i.e. CLEARML_CONFIG_FILE='/home/username/clearml.conf' file does not exist ). Thanks for the support! โค
No it doesn't, the agent has its own clearml.conf file.
I'm not too familiar with clearml on docker, but I do remember there are config options to pass some environment variables to docker.
You can then set your environment variables in any way you'd like before the container starts
Coming back to this; ClearML prints a lot of error messages in local tests, supposedly because the output streams are not directly available:
` --- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.10/logging/init.py", line 1103, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/home/idan/CC/git/ds-platform/.venv/lib/python3.10/site-packages/clearml/task.py", line 3504, in _at_exit
self.__shutdown...
Seems like you're missing an image definition (AMI or otherwise)
https://github.com/allegroai/clearml-agent/pull/98 AgitatedDove14 ๐
Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials' to the extra_vm_bash_script and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
@<1539780258050347008:profile|CheerfulKoala77> you may also need to define subnet or security groups.
Personally I do not see the point in Docker over EC2 instances for CPU instances (virtualization on top of virtualization).
Finally, just to make sure, you only ever need one autoscaler. You can monitor multiple queues with multiple instance types with one autoscaler.
This happened again ๐ค
How many files does ClearML touch? :shocked_face_with_exploding_head:
Looks great, looking forward to the all the new treats ๐
Happy new year! ๐
I'll have some reports tomorrow I hope TimelyPenguin76 SuccessfulKoala55 !
I can elaborate in more detail if you have the time, but generally the code is just defined in some source files.
Iโve been trying to play around with pipelines for this purpose, but as suspected, it fails finding the definition for the pickled objectโฆ
Honestly I wouldn't mind building the image myself, but the glue-k8s setup is missing some documentation so I'm not sure how to proceed
That's fine as well - the code simply shows the name of the environment variable, not it's value, since that's taken directly from the agent listening to the services queue (and who's then running the scaler)
Ah, uhhhh whatever is in the helm/glue charts. I think itโs the allegroai/clearml-agent-k8s-base , but since I hadnโt gotten a chance to try it out, itโs hard to say with certainty which would be the best for us ๐
More experiments @<1537605940121964544:profile|EnthusiasticShrimp49> - the core issue with the create_function_step seems to be that the chosen executable will be e.g. IPython or some notebook, and not e.g. python3.10 , so it fails running it as a taskโฆ ๐ค
I have seen this quite frequently as well tbh!
I guess the thing that's missing from offline execution is being able to load an offline task without uploading it to the backend.
Or is that functionality provided by setting offline mode and then importing an offline task?
It's okay ๐ I was originally hoping to delete my "initializer" task, but I'll just archive it if someone is interested in the worker data etc. Setting the queue is quite nice.
I think this should get my team excited enough ๐
I can only say Iโve found ClearML to be very helpful, even given the documentation issue.
I think theyโve been working on upgrading it for a while, hopefully something new comes out soon.
Maybe @<1523701205467926528:profile|AgitatedDove14> has further info ๐
For example, can't interact with these two tasks from this view (got here from searching in the dashboard view; they're in different projects):
Thanks! To clarify, all the agent does is then spawn new nodes to cover the tasks?
Yes, exactly! I've added instructions for the users on creating their account and running clearml-init , and then they run the snippet that updates the api and sdk sections.
Or did you mean I can couple a short "mini config" with the package and redirect clearml to use this local one (instead of the one at ~/clearml.conf)?
SmugDolphin23 I think you can simply change not (type(deferred_init) == int and deferred_init == 0) to deferred_init is True ?
Ah okay ๐ Was confused by what you quoted haha ๐
SuccessfulKoala55 That string was autogenerated by pyhocon and matches their documentation too - https://github.com/lightbend/config/blob/master/HOCON.md#substitutions
The first example won't work (it will treat ${...} as a string literal and won't replace it). The second does work, but as mentioned anyway, these were not hand typed, but rather generated from pyhocon, so I don't think that's the issue ๐ค
Or do you mean the contents of the configuration, probably :face_palm: ... one moment
I'm not sure what you mean by "entity", but honestly anything work. We're already monkey-patching our way ๐
That's a nice work around of course - I'm sure it works and I'll give it a shot momentarily. I'm just wondering if ClearML could automatically recognize image files in upload_artifact (and other well known suffixes) and do that for me.
I see! The Hyper Datasets don't really fit our use case - it seems really focused on CNNs and image-based data, but lacking support for database-oriented tabular data.
So for now we mainly work with parquet and CSV files, and I was hoping there'd be an easy way to view those... I'll make a workaround with a "Datasets" project I suppose!