Reputation
Badges 1
25 × Eureka!Task.current_task().get_logger().flush(wait=True). # <-- WILL HANG HERE
Okay a bit of theoretical "how it actually works" (and I might be mistaken here...)
Console logging is being reported because the underlining DDP infra (gloo) is pipeline stdout to the main process, where clearml will catch it (I think) The scalars not working on the subprocesss & the flush wait stuck I think are related, as the wait actually waits for the flush process, and it seems it cannot actually "talk" to i...
β¦every user in the server has the same credentials, and they donβt need to know them..makes sense?
Make sense, single credentials for everyone, without the need to distribute
Is that correct?
Hi JitteryCoyote63
Somehow I thought it was solved π
1 ) Yes please add GitHub issue so we can keep track
2 )
Task.current_task().get_logger().flush(wait=True). # <-- WILL HANG HERE
Is this the main issue ?
JitteryCoyote63 How can I reproduce it quickly?
Thanks JitteryCoyote63 , once we have a reproducible example the fix should be very quick to push (with these things reproducing it is the challenge)
Amazing! π
Let me know how we can help π
VexedCat68 actually a few users already suggested we auto log the dataset ID used as an additional configuration section, wdyt?
Yes, I mean use the helm chart to deploy the server, but manually deploy the agent glue.
wdyt?
I think it was just pushed, including nested call you have to use the new argument for the decorator, helper_function
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L2392
Thanks GiganticTurtle0 !
I will try to reproduce with the example you provided. regardless I already took a look at the code, and I'm pretty sure I know what the issue is. We will be pushing a few fixes after the weekend, I'm hoping this one will be included as well π
Do you think ClearML is a strong option for running event-based training and batch inference jobs in production?
(I'm assuming event-base, you mean triggered by events not streaming data, i.e. ETL etc)
I know of at least a few large organizations doing tat as we speak so I cannot see any reason not to.
Thatβd include monitoring and alerting. Iβm afraid that Metaflow will look far more compelling to our teams for that reason.
Sure, then use Metaflow. The main issue with Metaflow...
Hi WittyOwl57
I'm guessing clearml is trying to unify the histograms for each iteration, but the result is in this case not useful.
I think you are correct, the TB histograms are actually a 3d histograms (i.e. 2d histograms over time, which would be the default for kernel;/bias etc.)
is there a way to ungroup the result by iteration, and, is it possible to group it by something else (e.g. the tags of the two plots displayed below side by side).
Can you provide a toy example...
GiganticTurtle0 you mean the repo for the function itself ?
the default assumes the function is "standalone", you can specify a repo with:@PipelineDecorator.component(..., repo='.')
will take the current folder's repo (i.e. the local one)
you can also specify repo url/commit etc (repo=' https://github/user/repo/repo.git ' ....)
See here:
https://github.com/allegroai/clearml/blob/dd3d4cec948c9f6583a0b69b05043fd60d8c103a/clearml/automation/controller.py#L1931
Hi VexedCat68
(sorry I just saw the message)
I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step
So do do so you can do:
` def pre_execute_callback_example(a_pipeline, a_node, current_param_override):
# if we want to skip this node (and subtree of this node) we return False
...
# ew decided to skip so we return False
return False
pipe.add_step(name='...
HappyDove3
see here https://github.com/allegroai/clearml-pycharm-plugin π
Hi @<1561885921379356672:profile|GorgeousPuppy74>
- Could you copy the 3 messages here into your original message, it helps keeping things tidy and nice (press on the 3 dot menu and select edit)
- what do you mean by "currently its not executing in queue-01", you changed it so it should be pushed to queue-02, no? Also notice that you can run the enire pipeline as sub-processes for debugging,
just callpipe.start_locally(run_pipeline_steps_locally=True)
You also need an agent on the ser...
will my datasets be stored on the same machine that hosts the clearml server?
By default yes, they will be stored to the files-server (but you can change it, this is an argument for both the CLI and the python interface)
Hi BitterStarfish58
Where are you uploading it to?
Hmm BitterStarfish58 what's the error you are getting ?
Any chance you are over the free tier quota ?
Since the error says network error, is it possible because I'm in Taiwan? Like downloading from Asia leads to this kind of issue
Can you download it from the browser ? (I mean the file size after download , is it 400mb?)
BitterStarfish58 I would suspect the upload was corrupted (I think this is the discrepancy between the files size logged, to the actual file size uploaded)
It might be the file upload was broken?
You can do that programatically, clone the pipeline Task (a pipeline is also a Task) and change the Args section of that Task, wdyt?
Example:
None
I have a lot of parameters, about 40. It is inconvenient to overwrite them all from the window that is on the screen.
Not sure I follow, so what are you suggesting?
I'm not sure the files-server supports "continue" from last position...
ldconfig fromΒ
/etc/profile
Β which is put there by the interactive_session_task
LackadaisicalOtter14 are you sure ? maybe this is done as part of the installation the interactive session runs ?
Could that be the issue ?apt-get update && apt-get install -y openssh-server
Hi LackadaisicalOtter14
However, whenever we spin up a session,Β
Β always gets run and overwrites our configs
what do you mean by that?
The what config are being overwritten? (generally speaking, it just add the OS environment it needs to for the setup process)