Reputation
Badges 1
25 × Eureka!GrievingTurkey78 where do you see this message? Can you send the full server log
?
load_model will get a link to a previously registered URL (i.e. it search a model pointing to the specific URL, if it finds it, it will get you the Model object)
My pleasure, and apologies 🙂
SubstantialElk6 I just executed it , and everything seems okay on my machine.
Could you pull the latest clearml-agent from the github and try again ?
EDIT:
just try to run:git clone cd clearml-agent python examples/k8s_glue_example.py
Hmm that is odd, let me see if I can reproduce it.
What's the clearml version you are using ?
This is because we have a pub-sub architecture that we already use, it can handle retries, etc. also we will likely want multiple systems to react to notifications in the pub sub system. We already have a lot of setup for this.
How would you integrate with your current system? you have a restapi or similar to trigger event ?
but I was hoping ClearML had a straightforward way to somehow represent ALL ClearML events as JSON so we could land them in our system.
Not sure I'm followi...
Epochs are still round numbers ...
Multiply by 2?! 😅
Hi PerfectChicken66
every X iterations and delete the older ones with
I have to ask, why not just overwrite the artifact? it is basically the same, no ?!
older ones with
delete_artifacts
from
Task
I think you are correct, when you delete the entire Task you can specify, delete artifacts, but it does not do that on delete_artifact 😞
You can manually do that with:
` task._delete_uri(task.artifacts["artifact"].url)
task.delete_artifact() ...
It runs into the above error when I clone the task or reset it.
from here:
AssertionError: ERROR: --resume checkpoint does not exist
I assume the "internal" code state changed, and now it is looking for a file that does not exist, how would your code state change, in other words why would it be looking for the file only when cloning? could it be you put the state on the Task, then you clone it (i.e. clone the exact same dict, and now the newly cloned Task "thinks" it resuming ?!)
Hi JitteryCoyote63
cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server machine?
It assumes you have an agent connected to the "services" queue 🙂
That said, it also tries to delete the tasks artifacts/models etc, you can see it here:
https://github.com/allegroai/trains/blob/c234837ce2f0f815d3251cde7917ab733b79d223/examples/services/cleanup/cleanup_service.py#L89
The default configuration will assume you are running i...
they are efs mounts that already exist
Hmm, that might be more complicated to restore, right ?
seems it was fixed 🙂
MagnificentWorm7 thank you for noticing ! 🙏
You need trains-server support, so if trains v0.15 is working with older backend it will revert to "training" type
Hmm, I still wonder what is the "correct" answer for most people, is empty string in argparse redundant anyhow? will someone ever use it?
Yup, I just wanted to mark it completed, honestly. But then when I run it, Colab crashes.
task.close() will do that
BTW what's the exception you are getting ?
Nice!
script, and the kwcoco not imported directly (but from within another package).
fyi: usually the assumption is that clearml will only list the directly imported packages, as these will pull the respective required packages when the agent will be installing them ... (meaning that if in the repository you are never actually directly importing kwcoco, it will not be listed (the package that you do import directly, the you mentioned is importing kwcoco, will be listed). I hope this ...
I am thinking about just installing this manually on the worker ...
If you install them system wide (i.e. with sudo) and add agent.package_manager.system_site_packages then they will always be available for you 🙂
And then also use
priority_optional_packages: ["carla"]
This actually means that it will always try to install the package clara first, but if it fails, it will no raise an error.
BTW: this would be a good use case for dockers, just saying :w...
ReassuredTiger98 I ❤ the DAG in ASCII!!!
port = task_carla_server.get_parameter("General/port")
This looks great! and will acheive exactly what you are after.
BTW: when you are done you can do :task_carla_server.mark_aborted(force=True)And it will shutdown the Clara Task 🙂
current task fetches the good Task
Assuming you fork the process than the gloabl instance" is passed to the subprocess. Assuming the sub-process was spawned (e.g. POpen) then an environement variable with the Task's unique ID is passed. then when you call the "Task.current_task" it "knows" the Task was already created and it will fetch the state from the clearml-server and create a new Task object for you to work with.
BTW: please use the latest RC (we fixed an issue with exactly this...
adding the functionality to clearml-task sounds very attractive!
Hmm, what do you think?parser.add_argument('--configuration', type=str, default=None, help='Specify local configuration file' ) parser.add_argument('--configuration-name', type=str, default=None, help='configuration section name' ) ... with open(args.configuration, 'rt') as f: create_populate.task.set_configuration_object(args.name, config_text=f.read())Add h...
LOL EnormousWorm79 you should have a "do not show again" option, no?
p.s. any chance you can get me the nvidia driver version? I can't seem to find the one for v22 on amazon
Hi MagnificentSeaurchin79
This sounds like a deeper bug (of a sort), I think the best approach is to open a GitHub issue with some code that can reproduce this behavior, or at least enough information so that we could try to catch the bug.
This way we will make sure it is not forgotten.
Sounds good ?
I was thinking mainly about AWS.
Meaning S3?
or can I directly open a PR?
Open a direct PR and link to this thread, I will make sure it is passed along 🙂