Reputation
Badges 1
25 × Eureka!If the right properties are set can the profile tab be added?
I guess that is doable, that said some of the graphs are not straight forward to support like this one:
https://www.tensorflow.org/guide/images/tf_profiler/trace_viewer.png
Hi OutrageousGrasshopper93
I think that what you are looking for is Task.import_task and Task.export
https://allegro.ai/docs/task.html#trains.task.Task.import_task
https://allegro.ai/docs/task.html#trains.task.Task.export_task
yup, i updated this in my local clearml.conf... Or should be updating this elsewhere as well
On the agent's machine, you should update the default_output_uri. Make sense ?
seems it was fixed 🙂
MagnificentWorm7 thank you for noticing ! 🙏
Hi @<1533620191232004096:profile|NuttyLobster9>
I, but no system stats. ,,,
If the job is too short (I think 30 seconds), it doesn't have enough time to collect stats (basically it collects them over a 30 sec window, but the task ends before it sends them)
does that make sense ?
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
That is quite neat! You can also put a soft link from the main repo to the submodule for better visibility
I double checked the code it's always being passed 😞
Hi @<1637624975324090368:profile|ElatedBat21>
I think that what you want is:
Task.add_requirements("unsloth", "@ git+
")
task = Task.init(...)
after you do that, what are you seeing in the Task "Installed Packages" ?
Make sense. BTW: you can manually add data visualization to a Dataset with dataset.get_logger().report_table(...)
(I think the GCP is already up, I'll double check)
The idea of queues is not to let the users have too much freedom on the one hand and on the other allow for maximum flexibility & control.
The granularity offered by K8s (and as you specified) is sometimes way too detailed for a user, for example I know I want 4 GPUs but 100GB disk-space, no idea, just give me 3 levels to choose from (if any, actually I would prefer a default that is large enough, since this is by definition for temp cache only), and the same argument for number of CPUs..
Ch...
There is a version coming out next week, the one after it (probably 2/3 weeks later) will have this feature
The easiest if export_task / update_task:
https://allegro.ai/docs/task.html#trains.task.Task.export_task
https://allegro.ai/docs/task.html#trains.task.Task.update_task
Check the structure returned by export_task, you'll find the entire configuration test there,
then, you can use that to update back the Task.
BTW:
Partial update is also supported...
Meanwhile you can just sleep for 24hours and put it all on the services queue. it should work 🙂
Example here:
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
Hi LazyLeopard18
I suggest removing the trains.conf and running:trains-init
At the end of the wizard it verifies the credentials, so you should be good to go.
I would also recommend using the machine IP and not local host, as on some setups (Windows / VM etc) localhost will no be bridged to the VM/Docker but machine IP will be.
command line 🙂
cmd.exe / bash
BTW trains agent will not delete the venv until the next run, so you can check exactly what's missing there
Thanks MagnificentPig49 !
MagnificentPig49 that's a good question, I'll ask the guys 🙂
BTW, I think the main issues is actually making sure there is enough documentation on how to compile it...
Anyhow I'll update here
should i only do mongodb
No, you should do all 3 DBs ELK , Mongo, Redis
im not running in docker mode though
hmmm that might be the first issue. it cannot skip venv creation, it can however use a pre-existing venv (but it will change it every time it installs a missing package)
so setting CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 in non docker mode has no affect
It's more or less here:
https://github.com/allegroai/clearml-session/blob/0dc094c03dabc64b28dcc672b24644ec4151b64b/clearml_session/interactive_session_task.py#L431
I think that just replacing the package would be enough (I mean you could choose hub/lab, which makes sense to me)
WorriedParrot51 I now see ...
Two solutions that I can quickly think of:
In the code add:import sys sys.path.append('./my_sub_module')
Assuming you always have to add the sub-directories to make the code work, and assuming they are part of the repository, this is probably the table stolution
2. In the the UI in the Docker base image, add -e PYTHONPATH=/folder
or from code (which is exactly what you did)
a clean interface task.set_base_docker('nvidia/cids -e PYTHONPATH=/folder")
BroadMole98 as one can expect long answer as well 🙂
I have a workflow with 19000 job nodes in it.
wow, 19k job nodes? as in a single pipeline 19k steps?
The main idea of the trains-agent is to allow multi-node workloads, and creating pipelines on top of a scheduler without worrying about docker packaging (done automatically for you), and to have a proper scheduler with priority (that is missing from k8s)
If the first step is just "logging" all the steps, you can easily add "Task...
Hi BroadMole98
What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen. How does Allegro handle DAGgy workflows?
Long story short, yes you are correct. kubeflow and snakemake for that matter, are all about DAGs where each node is running a docker (bash) for you. The missing portions (for both) are:
How do I cr...
BroadMole98
I'm still exploring what trains is for.
I guess you can think of Trains as Experiment manager + MLOps tied together.
The idea is to give a quick and easy way to move from coding/running on one machine to scaling it to multiple remote machines, with everything that comes with it.
In some ways it is like snakemake, it setups your environment and execute the code. Snakemake also allows you to setup data, which in Trains is done via code (StorageManager), pipelines are also...