I guess the thing that's missing from offline execution is being able to load an offline task without uploading it to the backend.
Or is that functionality provided by setting offline mode and then importing an offline task?
I've tried also e.g. setting gent.package_manager.priority_packages = ["poetry"]
, and/or agent.package_manager.poetry_version = ">1.2.0"
, and other flags, but these affect only the main /clearml_agent_venv
environment, and not the one actually generated by the clearml-agent
when executing the task
But to be fair, I've also tried with python3.X -m pip install poetry
etc. I get the same error.
Happens pretty much consistently across all our projects -
Have a project with over 15 tasks (i.e. one that needs the Load More button) Click Load More, select a task that's not in the first 15 Let the page "rest" for a while (a couple of hours) Flip back to the page - the task is still active, but you cannot see it in the task list and there is no more Load More button
It misses the repository information of course, but the 'configuration/Args' were logged. So something weird in identifying the repository
I believe that happens natively thanks to pyhocon? No idea why it fails on mac
I'm not sure, I'm not getting anything (this is the only thing I could fin that's weird about this project).
It has a space in the name, has no subprojects, and it just doesn't show up anywhere 🤔
I don't think there's a PR issue for that yet, at least I haven't created one.
I could have a look at this and maybe make a PR.
Not sure what would the recommended flow be like though 🤔
I'm not sure what you mean by "entity", but honestly anything work. We're already monkey-patching our way 😄
I'm not entirely sure I understand the flow but I'll give it a go. I have two final questions:
This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case? Why do you see this as preferred to the dataset method we have now? 🤔
Yup, latest version of ClearML SDK, and we're deployed on AWS using K8s helm
Because setting env vars and ensuring they exist on the remote machine during execution etc is more complicated 😁
There are always ways around, I was just wondering what is the expected flow 🙂
Not that I recall
Last but not least - can I cancel the offline zip creation if I'm not interested in it 🤔
EDIT: I see not, guess one has to patch ZipFile
...
Thanks! That's what I thought, but then I get2021-12-21 22:08:35,376 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
Is it CLEARML_CONFIG_FILE
? (I had to dig this from the GH code 😅 )
Sure SuccessfulKoala55 , and thanks for looking into it.
As an alternative (for now, or in general), we could consider reverting back to pip. The issue we encounter is that we have a monorepo, so frozen requirements should specify relative paths, but pip freeze
does not seem to do that, so ClearML also fails in pip
mode
I think I may have brought this up multiple times in different ways :D
When dealing with long and complicated configurations (whether config objects, yaml, or otherwise), it's often useful to break them down into relevant chunks (think hydra, maybe).
In our case, we have a custom YAML instruction !include
, i.e.
` # foo.yaml
bar: baz
bar.yaml
obj: !include foo.yaml
maybe_another_obj: !include foo.yaml `
That's fine for the current use-case I believe.
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
I also tried adding gent.package_manager.system_site_packages = true
to ensure these virtual environments have access btw, still no avail
But it is strictly that if condition in Task.init, see the issue I opened about it
Oh and clearml-agent==1.1.2
My current workaround is to use poetry
and tell users to delete poetry.lock
if they want their environment copied verbatim
That's what I thought too, it should only look for the CLEARML_TASK_ID
environment variable?
What's new in 1.1.6rc0?
For now this is okay - no data lost, really - but I'd like to make sure we're not missing any steps in the next upgrade