Reputation
Badges 1
25 × Eureka!LudicrousParrot69 ,
Are you trying to post execution parse the attached Table, then put it into a CSV on the HPO Task ?
but then an error message in the web-app pops up
Fetch parents failed
and the Scheduler task disappears
And the Task is still running? What's he clearml python version and webui version ?
They could, the problem by the time you set them,they have been read into the variables.
Maybe we should make it lazy loaded, it will also speedup the import.
Yeah the ultimate goal I'm trying to achieve is to flexibly running tasks for example before running, could have a claim saying how many resources I can and the agent will run as soon as it find there are enough resources
Checkout Task.execute_remotely() you can push it anywhere in your code, when execution get to it, If you are running without an agent it will stop the process and re-enqueue it to be executed remotely, on the remote machine the call itself becomes a noop,
I...
We use an empty queue to enqueue our tasks in, just to trigger the scheduler
it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it
π
It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)
Is the trigger controller running on the services queue ?
Apologies on the typo ;)
There is also a global "running_remotely" but it's not on the task
"This is Not a an S3 endpoint... what is the files server you configured for it?
Hmmm, that actually connects with something we were thinking about: introducing sections to the hyper parameters. This way we could easily differentiate between the command line arguments and other types of parameters. DilapidatedDucks58 what do you think?
is it a shared network mount ? could you just delete the entire ~/.clearml on the host machine ?
Notice the order here:Task.add_requirements("tensorflow") task = Task.init(...)
Hi CluelessElephant89
I'm thinking that different users might want to comment on results of an experiment and stuff. Im sure these things can be done externally on a github thread attached to the experiment
Interesting! Like a "comment section on top of a Task ?
Or should it be a project ?
Basically I have this intuition that Task granularity might be to small (I would want to talk about multiple experiments, not a single one?) and a project might be to generic ?
wdyt?
btw: The addr...
WickedGoat98 is this related to plotly opening a web page when you call show() method ?
You can do:if not Task.running_locally() fig.show()
Hi VexedCat68
(sorry I just saw the message)
I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step
So do do so you can do:
` def pre_execute_callback_example(a_pipeline, a_node, current_param_override):
# if we want to skip this node (and subtree of this node) we return False
...
# ew decided to skip so we return False
return False
pipe.add_step(name='...
SweetGiraffe8
That might be it, could you test with the Demo server ?
Ohh StraightCoral86 did you check cleaml-task ? This is exactly what it does
(this is the CLI, from code you basically call Task.create & Task.enqueue)
Will this solve it ?
We could use our 8xA100 as 8 workers, for 8 single-gpu jobs running faster than on a single 1xV100 each.
@<1546665634195050496:profile|SolidGoose91> I think that in order to have the flexibility there you need the "dynamic" GPU allocation that is only part of the "enterprise" offering π
That said, why not allocate a single a100 machine? no?
Hm GiganticTurtle0 let me check quickly it
Ohh try to add --full-monitoring to the clearml-agent execute
None
Please go ahead with the PR π
okay this points to an issue with the k8s glue, I think it somehow failed to launch the pod. Can you send me the log of the clearml-k8s-glue ?
Ohh if this is the case, and this is a stream of constant inference Results, then yes, you should push it to some stream supported DB.
Simple SQL tables would work, but for actual scale I would push into a Kafka stream then pull it (serially) somewhere else and push into a DB
CharmingBeetle38 try adding "General/" before the arguments. This means batch_size becomes General/batch_size. This is only because we are accessing the parameters externally, when the task is executed it is resolved automatically
If you choose between skipping or logging like nan, then here I find it difficult, it seems that it is better to log than skip, but you need to think.
So I "think" the issue is plotly (UI), doesn't like NaN and also elastic (storing the scalar) is not a NaN fan. We need to check if they both agree on the representation, that it should be easy to fix...
Maybe you could open a github issue, so we do not forget?
It seems that I solved the problem by moving all of the local codeΒ (local repos) imports to after the Task.init
PunyPigeon71 I'm confused, how did that solve the issue on the remote machine?
DeterminedToad86 were you running a jupyter notebook or a jupyter console ?
Actually scikit implies joblib π (so you should use scikit, anyhow I'll make sure we add joblib as it is more explicit)
What we would like ideally, is a system where development, training, and deployment are almost one and the same thing, to reduce the lead time from development code to production models.
This is very aligned with the goals of ClearML π
I would to understand more on what is currently missing in ClearML so we can better support this approach
my inexperience in using them a lot until recently. I can see how that is a better solution
I think I failed in explaining my self, I me...