You can read up on the caching options in your ~/clearml.conf
You can make virtualenv creation a bit faster
When the agent starts running a task it will print out where the logs are being saved
Hi ShallowCormorant89 ,
When does 1. happen? Can you add the full log?
Regarding 2, can you please elaborate? What is your pipeline doing and what sort of configurations would you want to add? On the pipeline controller level or steps?
Hi @<1523706700006166528:profile|DizzyHippopotamus13> , you can simply do it in the experiments dashboard in table view. You can rearrange columns, add custom columns according to metrics and hyper parameters. And of course you can sort the columns
Hi StraightParrot3 , page_size
is indeed limited to 500 from my understanding. You need to scroll through the tasks. The first tasks.get_all
response will return scroll_id
, you need to use this scroll_id
in your following call. Every call afterwards will return a different scroll_id
which you will always need to use in your next call to continue scrolling through the tasks. Makes sense?
Hi,
can you paste the error you're getting?
What do you mean? How are you running the pipeline - locally or remotely?
Hi @<1566959349153140736:profile|ShinyChicken29> , when you try to access the image on your browser, the browser tries access the S3 bucket directly - This is why you get the popup. Data never goes through ClearML backend. Makes sense?
Hi @<1533619725983027200:profile|BattyHedgehong22> , I think it needs to be part of repository
Hi @<1541954607595393024:profile|BattyCrocodile47> , how does ClearML react when you run the scripts this way? The repository is logged as usual?
For Enterprise related questions please use your dedicated Slack channel. You need to provide an extra index url, instructions are given in the ClearML Python Package Setup as shown in the screenshot:
How are you currently setting it up?
UnevenDolphin73 , can you verify that the process is not running on the machine? for example with htop
or top
Hi @<1654294828365647872:profile|GorgeousShrimp11> , it appears the issue is due to running with different python versions. It looks like the python you're running the agent on, doesn't have virtualenv
Is something failing? I think that's the suggested method
Also, what if you try using only one GPU with pytorch-lightning? Still nothing is reported - i.e. console/scalars?
GrittyKangaroo27 , I see no special reason why not, as long as you set the credentials correctly 🙂
Have you tried?
What is the address of your server?
Hi @<1523721697604145152:profile|YummyWhale40> _, what if you specify the output_uri
through the code in Task.init()
?
MinuteGiraffe30 , Hi ! 🙂
What if you try to manually create such a folder?
Aren't you getting logs from the docker via ClearML? I think you can build that capability fairly easily with ClearML, maybe add a PR?
Hi @<1635088270469632000:profile|LividReindeer58> , you should do a separation. The pipeline controller should run on the services queue. Pipeline steps should run on different queues. This is why they are sitting in pending - there is no free worker to pick them up.
Hi 🙂
A task is the most basic object in the system in regards to experiments. A pipeline is a bunch of tasks that are controller by another task 🙂
Note that you used an env variable, I want to try the config directly first 🙂
You don't need to do any special actions. Simply run your script from within a repository and ClearML will detect the repo + commit + uncommitted changees
Thats strange, you don't have a create new credentials button?
@<1523703961872240640:profile|CrookedWalrus33> , you can use the UI as reference. Open dev tools (F12) and see the network (filter by XHR).
For example, in scalars/plots/debug samples tabs the relevant calls seem to be:
events.get_task_single_value_metrics
events.scalar_metrics_iter_histogram
events.get_task_plots
events.get_task_metrics
Hi @<1576381444509405184:profile|ManiacalLizard2> , it feels like something related to the resources of the server or networking and it's having a hard time retrieving the data from ES. What resources have you allocated for the API server/ ES?