Reputation
Badges 1
25 × Eureka!Hi WorriedParrot51 , what do you mean by "call get_parameters_as_dict() from agent" ?
Do you mean like change the trains-agent to run the task differently?
Or inside your code while the trains agent runs it?
From the code itself (regardless off how you run it) you can always call, and get the current states parameters (i.e. from backend if running with trains-agent, or copied from the code, if running manually)task.get_parameters_as_dict()
(i.e. importing the trains package is enough to patch the argparser, only when you call the task.init the arguments will be logged, before they are stored in memory)
Oh, did you try task.connect_configuration
?
https://allegro.ai/docs/examples/reporting/model_config/#using-a-configuration-file
Hmm, I think it is this line:
WARNING - Model configuration only supports dictionary or string objects
done
Let me check something.
Hi @<1549202366266347520:profile|GorgeousMonkey78>
how do I integrate sagemaker with clearml ,
you mean to launch an experiment, or just to log it?
Hi @<1654294828365647872:profile|GorgeousShrimp11>
can you run a pipeline on a
schedule
or are schedules only for Tasks?
I think one tiny details got lost here, Pipelines (the logic driving them) are a type of Task, this means you can clone and enqueue them like other tasta
(Task.enqueue / Task.clone)
Other than that looks good to me, did I miss anything ?
BTW: if you feel like pushing forward with integration I'll be more than happy to help PRing new capabilities, even before the "official" release
and when you remove the "." line does it work?
But the git apply failed, the error message is the "xxx already exists in working directory" (xxx is the name of the untracked file)
DefeatedOstrich93 what's the clearml-agent version?
Hi SkinnyPanda43
No idea what the ImageId actually is.
That's the ami image string that the new EC2 will be started with, make sense ?
I think we should open a GitHub Issue and get some more feedback, maybe we should just add support in the backend side ?
😞 DilapidatedDucks58 how exactly are you "relaunching/continue" the execution? And what exactly are you setting?
DilapidatedDucks58 by default if you continue to execution, it will automatically continue reporting from the last iteration . I think this is what you are seeing
Lol, :)
I think the issue is that you do not need to manually set the initial iteration, it's supposed to get it , as it is stored on the Task itself
Hi @<1547028074090991616:profile|ShaggySwan64>
. If I have a local repo cloned with ssh, the agent will attempt to replace the repo url with https,
Yes if you provide git user/pass (or user / app-pass) the agent would automatically replace and ssh:// repo link with the equivalent https:// and user the user/pass for authentication
but it seems that it doesn't remove the 2222 port in my case. That leads to
Hmm,,, what's the clearml-agent version? if this is not the latest 2.0.0r...
Thanks @<1527459125401751552:profile|CloudyArcticwolf80> ! let me see if we can reproduce it
sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the ass
No worries I totally feel you.
As a quick hack in the actual code of the Task itself, is it reasonable to have:task = Task.init(....) task.set_initial_iteration(0)
Yep it should :)
I assume you add the previous iteration somewhere else, and this is the cause for the issue?
Can you send the full log as attachment?
actually no it is not, alpine is Not a good baseline, is is very very slim missing a ton of stuff.
I would use bullseye or slim (depending how many aux things you need on the container)
https://hub.docker.com//python/tags?page=1&name=bullseye
https://hub.docker.com//python/tags?page=1&name=slim-bullseye
Hi DilapidatedDucks58
apologies, this thread slipped way.
I double checked, there server will not allow you to overwrite it (meaning to have it fixed will need to release a server version which usually takes longer)
That said maybe we can pass an argument to the "Task.init" so it ignores it? wdyt?
Thank you DilapidatedDucks58 for the ping!
totally slipped my mind 😞
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
Can you do the following
Clone the Task you previously sent me the installed packages of, then enqueue the cloned task to the queue the agent with the conda.
Then send me the full log of the task that the agent run
Yes, I think you are correct, verified on Firefox & Chrome. I'll make sure to pass it along.
Thanks SteadyFox10 !
As I suspected, from your log:agent.package_manager.system_site_packages = falseWhich is exactly the problem of the missing tensorflow (basically it creates a new venv inside the docker, but without the flag On, it does not inherit the docker preinstalled packages)
This flag should have been true.
Could it be that the clearml.conf you are providing for the glue includes this value?
(basically you should only have the sections that are either credentials or missing from the default, there...
(you can find it in the pipeline component page)
Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
Basically you should have the docker service login before running the agent, then the agent uses docker to run the image from the ECR.
Make sense ?
So the "packages" are the packages you need in the steps themselves ?
is "my_package" a local package ?
what is the output of:pip freeze | grep my_package