Reputation
Badges 1
662 × Eureka!Same result π This is frustrating, wtf happened :shocked_face_with_exploding_head:
This is also specifically the services queue worker I'm trying to debug π€
Debugging. It's very useful for us to be able to see the contents of the configuration and understand what is going on and what is meant to be going on. Without a preview (which in our case is the entire content of the configuration file), one has to take an annoying route of downloading the files etc. The configurations are uploaded to a single task and then linked across all task to conserve storage space (so the S3 storage point is identical across tasks) Sure, sounds good. I think it's a ...
AgitatedDove14 yeah I see this now; this was an issue because I later had to "disconnect" the remote task, so it can, itself, create new tasks (using clearml.config.remote.override_current_task_id(None) ). I guess you might remember that discussion? π
EDIT: It's the discussion we had here, for reference. https://clearml.slack.com/archives/CTK20V944/p1640955599257500?thread_ts=1640867211.238900&cid=CTK20V944
So probably not needed in JitteryCoyote63 's case, we still have some...
Maybe it's better to approach this the other way, if one uses Task.force_requirements_env_freeze() , then the locally updated packages aren't reflected in poetry π€
Fair enough π
Could be nice to be able to define the fallbacks under type maybe?type: [ poetry, pip ] (current way under the hood) vs type: [ pip, poetry ]
Here's how it failed for us π
poetry stores git related data in poetry.lock , so when you pip list , you get an internal package we have with its version, but no git reference, i.e. internal_module==1.2.3 instead of internal_module @ git+https://....@commit .
Then pip actually fails (our internal module is not on pypi), but poetry suceeds
Local changes are applied before installing requirements, right?
Iβll also post this on the main channel -->
From the traceback ( backend_interface/task/task.py, line 178, in __init__ ), notice it's not Task.init
Removing the PVC is just setting the state to absent AFAIK
This was a long time running since I could not access the macbook in question to debug this.
It is now resolved and indeed a user error - they had implicitly defined CLEARML_CONFIG_FILE to e.g. /home/username/clearml.conf instead of /Users/username/clearml.conf as is expected on Mac.
I guess the error message could be made clearer in this case (i.e. CLEARML_CONFIG_FILE='/home/username/clearml.conf' file does not exist ). Thanks for the support! β€
No it doesn't, the agent has its own clearml.conf file.
I'm not too familiar with clearml on docker, but I do remember there are config options to pass some environment variables to docker.
You can then set your environment variables in any way you'd like before the container starts
Coming back to this; ClearML prints a lot of error messages in local tests, supposedly because the output streams are not directly available:
` --- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.10/logging/init.py", line 1103, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/home/idan/CC/git/ds-platform/.venv/lib/python3.10/site-packages/clearml/task.py", line 3504, in _at_exit
self.__shutdown...
Seems like you're missing an image definition (AMI or otherwise)
https://github.com/allegroai/clearml-agent/pull/98 AgitatedDove14 π
Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials' to the extra_vm_bash_script and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
@<1539780258050347008:profile|CheerfulKoala77> you may also need to define subnet or security groups.
Personally I do not see the point in Docker over EC2 instances for CPU instances (virtualization on top of virtualization).
Finally, just to make sure, you only ever need one autoscaler. You can monitor multiple queues with multiple instance types with one autoscaler.
This happened again π€
How many files does ClearML touch? :shocked_face_with_exploding_head:
Looks great, looking forward to the all the new treats π
Happy new year! π
I'll have some reports tomorrow I hope TimelyPenguin76 SuccessfulKoala55 !
I can elaborate in more detail if you have the time, but generally the code is just defined in some source files.
Iβve been trying to play around with pipelines for this purpose, but as suspected, it fails finding the definition for the pickled objectβ¦
Honestly I wouldn't mind building the image myself, but the glue-k8s setup is missing some documentation so I'm not sure how to proceed
That's fine as well - the code simply shows the name of the environment variable, not it's value, since that's taken directly from the agent listening to the services queue (and who's then running the scaler)
Ah, uhhhh whatever is in the helm/glue charts. I think itβs the allegroai/clearml-agent-k8s-base , but since I hadnβt gotten a chance to try it out, itβs hard to say with certainty which would be the best for us π
More experiments @<1537605940121964544:profile|EnthusiasticShrimp49> - the core issue with the create_function_step seems to be that the chosen executable will be e.g. IPython or some notebook, and not e.g. python3.10 , so it fails running it as a taskβ¦ π€
I have seen this quite frequently as well tbh!
I guess the thing that's missing from offline execution is being able to load an offline task without uploading it to the backend.
Or is that functionality provided by setting offline mode and then importing an offline task?
It's okay π I was originally hoping to delete my "initializer" task, but I'll just archive it if someone is interested in the worker data etc. Setting the queue is quite nice.
I think this should get my team excited enough π