Reputation
Badges 1
25 × Eureka!If it cannot find the Task ID I'm guessing it is trying to connect to the demo server and not your server (i.e. configuration is missing)
JitteryCoyote63
are the calls from the agents made asynchronously/in a non blocking separate thread?
You mean like request processing on the apiserver are multi-threaded / multi-processed ?
Multi-threaded multi-processes multi-nodes π
Hmm, I see the jump from 50 to 100, is that consistent with the last iteration on the aborted Task (before continuing )?
Hi SourOx12
How do you set the iteration when you continue the experiment? is it with Task.init
continue_last_task
?
Yes docker was not installed in the machine
Okay make sense, we should definitely check that you have docker before starting the daemon π
Ok, it would be nice to have a --user-folder-mounted that do the linking automatically
It might be misleading if you are running on k8s cluster, where one cannot just -v mount
volume...
What do you think?
why are all defined components shown in the UI Results/Plots/PipelineDetails/ExecutionDetails section? Shouldn't it make more sense to show only the ones that are used in that pipeline?
They are listed there (because of the decorator, you basically "say" these are steps so they are listed), the actual resolving (i.e. which steps are actually being called) is done in "real-time"
Make sense ?
GiganticTurtle0 in the PipelineDecorator.component
, did you pass helper_functions=[]
with refrence to all the sub component ?
I think that listing them all would just clutter up the results tab for that pipeline task
Can you share a screen so we better understand the clutter ?
Also "1000 components" ?! and not using them ? could you expand on how/why?
UnevenOstrich23
but interesting that auto-reload config does not working as I expected.
Unfortunately the trains-agent does not support auto reloading the config file yet. If you think this will be a great feature, please feel free to open a GitHub feature request issue π
Only those components that are imported in the script where the pipeline is defined would be included in the DAG plot, is that right?
Actually the way it works currently (and we might change it if there is a better way), every time you call PipelineDecorator.component
a new component is stored on the Pipeline Task, which is later translated into DaG graph and Table (next version will have a very nice UI to display / edit them).
The idea is first to have a representation of the p...
Hi JitteryCoyote63
So the main issue is backing up the elastic & mongo DB while they are running, once they are backed/restored, the server will spin as is. (Let me check regrading the reddis, it might be that since it is used for caching there is no need to actually backup the content only the configuration)
Yes that should work, only thing is you need to call Task init on the master process (and make sure you call Task.current_task() on the subprocesses, if you want to automagic to kick in, that said, usually there is no need, they are supposed to report everything back to the main one anyhow
basically
` @call_parse
def main(
Β Β gpus:Param("The GPUs to use for distributed training", str)='all',
Β Β script:Param("Script to run", str, opt=False)='',
Β Β args:Param("Args to pass to script", nargs=...
Regrading the demoapp, this is just a default server that allows you to start play around with ClearML without needing to setup any of your own servers or signup
That said, I would recommend to sign up (totally free) on the community server
https://app.community.clear.ml/
GrumpyPenguin23 could you help and point us to an overview/getting-started video?
server-->agent is fast, but agent-->server is slow.
Then multiple connection will not help, this is the bottleneck of the upload speed of your machine, regardless of what the target is (file-server, S3, etc...)
I lost you SmallBluewhale13 is this the Task init call you used:task = Task.init( project_name="examples", task_name="load_artifacts", output_uri="s3://company-clearml/artifacts/bethan/sales_journeys/", )
SourOx12
Hmmm. So if last iteration was 75, the next iteration (after we continue) will be 150 ?
Hi FunnyTurkey96
Any chance you can try to run with the latest form GitHub (i just tested your code and it seemed to work on my machine).pip install git+
Hi SourOx12
I think that you do not actually need this one:step = step - cfg.start_epoch + 1
you can just dostep += 1
ClearML Will take care of the offset itself
By default SSH server is not running in a lot of scenarios (k8s for example, Windows, MacOS)...
Hi StickyWhale51
I think this issue is due to some internal race condition, anyhow I think we have an RC out solving it, can you try with:pip install clearml==1.2.0rc2
Hi JitteryCoyote63
Do you have a specific example in mind ?
Great! btw: final v1.2.0 should be out after the weekend
SharpDove45 FYI:
if you set the environment variable CLEARML_NO_DEFAULT_SERVER=1
, it will make sure never to default to the demo server
Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
JitteryCoyote63 hmm that is a pickle ...
let me check the code ...