Hi NervousRabbit2 , what version of ClearML server are you running? Also what clearml
version are you using?
Hi DepravedCoyote18 , can you please elaborate a bit on what the current state is now and how you would like it to be?
Hi!
I believe you can stop and resume studies by adding these actions to your script:
Add save points via joblib.dump()
and connect them to clearml via clearml.model.OutputModel.connect()
Then, when you want to start or resume a study, load latest study file via joblib.load() and connect to clearml with http://clearml.model.InputModel.co nnect()
This way you can stop your training sessions with the agent and resume them from nearly the same point
I think all the required references are h...
No, but I think it would make sense to actually share reports outside of your workspace, similar to experiments. I'd suggest opening a GitHub feature request
Hi @<1649946171692552192:profile|EnchantingDolphin84> , it's not a must but it would be the suggested approach 🙂
Hi SubstantialElk6 ,
Define prior to running the pipeline, which tasks to be running on which remote queue using which images?
What type of pipeline steps are you running? From task, decorator or function?
Make certain tasks in the pipeline run in the same container session, instead of spawning new container sessions? (To improve efficiency)
If they're all running on the same container why not make them the same task and do things in parallel?
UnevenDolphin73 , if you're launching the Autoscaler through the apps, you can also add bash init script or additional configs - that's another way to inject env vars 🙂
Hi GorgeousMole24 , I think for this your best option would be using the API to extract this information.
` from clearml.backend_api.session.client import APIClient
client = APIClient() `is the pythonic usage
I think in this case you can fetch the task object, force it into running mode and then edit whatever you want. Afterwards just mark it completed again.
None
Note the force
parameter
Hi @<1523703012214706176:profile|GorgeousMole24> , I'm not sure about the exact definition, but I think when the script finishes running or the thread that started Task.init()
finishes.
Hi GorgeousMole24 , you can certainly compare across different projects.
Simply go to "all projects" and select the two experiments there (you can search for them at the top right to find them easily)
Also, try specifying an iteration when you report
Hi BroadSeaturtle49 , can you please elaborate on what the issue is?
Although I'm not sure it's connected
VexedCat68 , can you try accessing it as
192.168.15.118:8080/login first?
Hi FreshParrot56 , I'm not sure there is a way to stop it. However you do need to archive and then delete it.
@<1590514584836378624:profile|AmiableSeaturtle81> , its best to open a GitHub issue in that case to follow up on this 🙂
If you shared an experiment to a colleague in a different workspace, can't they just clone it?
Hi WhoppingMole85 , you can actually do that with the logger.
Something along the lines of:Dataset.get_logger().report_table(title="Data Sample", series="First Ten Rows", table_plot=data1[:10])
Does this help?
Hi @<1603198163143888896:profile|LonelyKangaroo55> , you can change the value of files_server in your clearml.conf
to control it as well.
How did you run the original experiment? What version of ClearML are you using?
Hi @<1649221402894536704:profile|AdventurousBee56> , I'm not sure I understand. Can you add the full log and explain step by step what's happening?
SubstantialElk6 , interesting. What metrics are you looking for?
CluelessElephant89 try the elastic search logs clearml-elastic
Yes you can set everything on the task level and of course you can also use different docker images for different python versions
Please implement in python the following command curl <HOST_ADDRESS>/v2.14/debug/ping
Hi SubstantialElk6 , I think you need to have Task.init()
inside these sub processes as well.
Please open developer tools (F12) and see if you're getting any console errors when loading a 'stuck' experiment