Reputation
Badges 1
25 × Eureka!BTW: 0.14.3 solved the issue you are referring to, so you can import trains before / parsing the args without an issue. Regrading passing project/name as parameters. A few thoughts: (1) you can always rename / move projects from the UI (2) If you are running it with trains-agent
there is no meaning to these arguments, as by definition the Task was already created... Maybe we should give an option to exclude a few arguments from argparser, I think this topic came up a few times... What d...
Yes, in tandem with the experiments (because they constantly log to the server).
That said, with 0.16 we added offline mode, so you can run in offline mode, then import the experiment into the system.
You might be able to also find out exactly what needs to be pickled using theΒ
f_code
Β of the function (but that's limited to C implementation of python).
Nice!
I would like to start off by saying that I absolutely love clearml.
@<1547028031053238272:profile|MassiveGoldfish6> thank you for saying that! π
Is is possible to download individual files from a dataset without downloading the entire dataset? If so, how do you do that?
Well by default files are packaged into multiple zip files, you can control the size of the zip file for finer granularity, but at the end when you download, you are downloading the entire packaged ...
SmarmySeaurchin8 yes, the package containing the Controller is only RC, plan is to release the stable one in a couple of days. In the meantime:pip install git+
A few examples here:
None
Grafana model performance example:
browse to
login with: admin/admin
create a new dashboard
select Prometheus as data source
Add a query: 100 * increase(test_model_sklearn:_latency_bucket[1m]) / increase(test_model_sklearn:_latency_sum[1m])
Change type to heatmap, and select on the right hand-side under "Data Format" s...
GrievingTurkey78 can you send the entire log?
task.wait_for_status() task.reload() task.artifacts["output"].get()
: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them
This is the current state.
Downloading the artifacts is done only when actually calling get()/get_local_copy()
What you actually specified is torch the @ is kind of pip remark, pip will not actually parse it π
use only the link https://download.pytorch.org/whl/cu100/torch-1.3.1%2Bcu100-cp36-cp36m-linux_x86_64.whl
SmarmySeaurchin8
When running in "dev" mode (i.e. writing the code) only packages imported directly are registered under "installed packages" , then when the agent is executing the experiment, it will update back the entire environment (including derivative packages etc.)
That said you can set detect_with_pip_freeze
to true (in trains.conf) and it will basically store the entire pip freeze.
https://github.com/allegroai/trains/blob/f8ba0495fb3af1f99732fdffbbccd2fa992934a4/docs/trains.c...
Actually you cannot breakpoint at "atexit" calls (or at least doesn't work with my gdb)
But I would add a few prints here:
https://github.com/allegroai/clearml/blob/aa4e5ea7454e8f15b99bb2c77c4599fac2373c9d/clearml/task.py#L3166
BTW if the plots are too complicated to convert to interactive plotly graphs, they will be rendered to images and the server will show them. This is usually the case with seaborn plots
GiddyTurkey39 do you have an experiment with the jupyter notebook ?
DefeatedCrab47 no idea, but you are more then welcome to join the thread here, and point it out:
https://github.com/PyTorchLightning/pytorch-lightning-bolts/issues/249
Hi FreshKangaroo33
clearml.conf is HOCON format, to parse you can use pyhocon:
https://github.com/chimpler/pyhocon
Or the built in version of clearml:from clearml.utilities.pyhocon import ConfigFactory config_dict = ConfigFactory.parse_string(text).as_plain_ordered_dict()
You can also just get the parsed objectfrom clearml.config import config_obj
Hi GiddyTurkey39
Glad to see that you are already diving into the controllers (the stable release will be out early next week)
A bit of background on how the pipeline controller are designed:
All steps in the pipeline are experiments already registered in the system (i.e. you can see them in the UI). Regardless on how you created those experiments they have to be there prior to the pipeline launch. The pipeline itself can be executed on any machine (it does very little, and...
TrickyRaccoon92 I'm not sure I follow, TB do show? and you want to add additional plotly plot ?
but I need to dig digger into the architecture to understand what we need exactly from k8s glue.
Once you do, feel free to share, basically there are two options , use the k8s scheduler with dynamic pods, or spin the trains-agent as a service pod, and let it spin the jobs
HealthyStarfish45 you mean like replace the debug image viewer with custom widget ?
For the images themselves, you can get heir urls, then embed that in your static html.
You could also have your html talk directly with the server REST API.
What did you have in mind?
In both case if I get the element from the list, I am not able to get when the task started. Where is info stored?
If you are using client.tasks.get_all( ...)
should be under started
field
Specifically you can probably also do:queried_tasks = Task.query_tasks(additional_return_fields=['started']) print(queried_tasks[0]['id'], queried_tasks[0]['started'],)
FreshKangaroo33 you can:from time import time Task.query_tasks(..., task_filter=dict(started=['<{}'.format(datetime.utcfromtimestamp(time())), ]))
I think this should work
Hi SubstantialBaldeagle49
2. Sure follow the back procedure and restore on the new server
3. Yes
task=Task.get_task(task_id='aa')
task.get_logger().report_scalar()
Hi SmarmySeaurchin8
, I was wondering if I could change the commit id to the current one as well.
Actually that would be possible, but will need a bit of code to support controlling Task properties (not just configuration parameters)
How can I do that without running this Task by it's own?
Assuming you have a committed code that already supports it. You can clone the executed Task, and then change the commit ID to the "latest on branch" (see drop down when editing)
Would t...
You can query the system and get all the experiments based on date, then grab the machine GPU metrics.
DefeatedCrab47 check the cleanup service, it queries the system with the Apiclient.
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/examples/services/cleanup/cleanup_service.py#L72
SoreDragonfly16 the torchvision warning has nothing to do with the Trains
warning.
The Trains warning means that somehow someone changes the state of the Task from running (in_progress) to "stopped" (aborted). Could it be one of the subprocesses raised an exception ?
JitteryCoyote63 to filter out 'archived tasks' (i.e. exclude archived tasks)Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(system_tags=["-archived"])))
Hi @<1743079861380976640:profile|HighKitten20>
but when I try to use code stored in a GIT (Bitbucket) repo I got a repository cloning error, specifically
did you pass configure the git repo application/pass here: None
Task status change to "completed" is set after all artifacts upload is completed.
JitteryCoyote63 that seems like the correct behavior for your scenario