It doesn't not seem to be related to the upload. The upload itself finished... What's your Trains version?
PungentLouse55 could you test with 0.15.2rc0 see if there is any difference ?
PungentLouse55 hmmm
Do you have an idea on how we could quickly reproduce it?
And is Exectuer actually runs something, or is it IO?
Well it seems we forgot that one π I'll quickly make sure it is there.
As a quick solution (no need to upgrade)task.models["output"]._models.keys()
let's call it an applicative project which has experiments and an abstract/parent project, or some other name that group applicative projects.
That was my way of thinking, the guys argued it will soon "deteriorate" into the first option :)
PompousBeetle71 that actually brings me to another question, how do you feel about "parent" experiment ?
Since this fix is all about synchronizing different processes, we wanted to be extra careful with the release. That said I think that what we have now should be quite stable. Plan is to have the RC available right after the weekend.
Hi VexedKangaroo32 , funny enough this is one of the fixes we will be releasing soon. There is a release scheduled for later this week, right after that I'll put here a link to an RC containing a fix to this exact issue.
BTW, VexedKangaroo32 are you using torch launch ?
Just one more question, do you have any idea about how I could change the x-axis label from "Iterations" to "Epochs"
You mean in the UI (i.e. just the title) ? or are you actually reporting iterations instead of epochs? and if so is this auto connected to tensorboard or is it reported manually ?
Hi @<1637624975324090368:profile|ElatedBat21>
I think that what you want is:
Task.add_requirements("unsloth", "@ git+
")
task = Task.init(...)
after you do that, what are you seeing in the Task "Installed Packages" ?
Hi @<1543766544847212544:profile|SorePelican79>
You want the pipeline configuration itself, not the pipeline component, correct?
pipeline = Task.get_task(Task.current_task().parent)
conf_text = pipeline.get_configuration_object(name="config name")
conf_dict = pipeline.get_configuration_object_as_dict(name="config name")
still it is a chatgpt interface correct ?
Actually, no. And we will change the wording on the website so it is more intuitive to understand.
The idea is you actually train your own model (not chatgpt/openai) and use that model internally, which means everything is done inside your organisation, from data through training and ending with deployment. Does that make sense ?
- Triton server does not support saving models off to normal RAM for faster loading/unloadingCorrect, the enterprise version also does not support RAM caching
Therefore, currently, we can deploy 100 models when only 5 can be concurrently loaded, but when they are unloaded/loaded (automatically by ClearML), it will take a few seconds because it is being read from the the SSD, depending on the size.
Correct, there is also deserializing CPU time (imaging unpickling 20GB file, this takes ...
Thanks SarcasticSparrow10 !
I'll later reply the Github issue (for better visibility)
But my initial thoughts:
(1) I think this was suggested, and hopefully we will get to implementing it, I can definitely see the value. Meanwhile you can achieve some of the functionality with the experiment table and custom columns π
(2) "Don't display the performance metric" -> isn't that important? what am I missing?
(3) Hmm you mean just extra columns?
(4) sounds like a bug
(5) is this a plotly issue?...
Hi WickedGoat98
"Failed uploading to //:8081/files_server:"
Seems like the problem. what do you have defined as files_server in the trains.conf
Hmm, yes this fits the message. Which basically says that it gave up on analyzing the code because it run out of time. Is the execution very short? Or the repo very large?
Hi VexedKangaroo32 , there is now an RC with a fix:pip install trains==0.13.4rc0
Let me know if it solved the problem
Thanks VexedKangaroo32 , this is great news :)
My question is what should be the path to the requirements.txt file?
Is it relative to the repo base?
This is actually in runtime (i.e. when running the code), so relative to the working directory. Make sense ? (you can specify absolute path, probably something I would avoid in the code base though...)
SubstantialElk6 feel free to tweet them on their very inaccurate comparison table π
Hi StickyMonkey98
aΒ
very
Β large number of running and pending tasks, and doing that kind of thing via the web-interface by clicking away one-by-one is not a viable solution.
Bulk operations are now supported , upgrade the clearml-server to 1.0.2 π
Is it possible to fetch a list of tasks via Task.get_tasks,
Sure:Task.get_tasks(project_name='example', task_filter=dict(system_tags=['-archived']))
So in a simple "all-or-nothing"
Actually this is the only solution unless preemption is supported, i.e. abort running Task to free-up an agent...
There is no "magic" solution for complex multi-node scheduling, even SLURM will essentially do the same ...
Hi UnsightlySeagull42
But now I need the hyperparameters in every python file.
You can always get the Task from anywhere?main_task = Task.current_task()
The problem is not really for the agents to wait (this is easily solved by additional high priority queue) the problem is will you have a "free" agent... you see my point ?
Hi ProudChicken98task.connect(input)
preserves the types based on the "input" dict types, on the flip side get_parameters
returns the string representation (as stored on the clearml-server).
Is there a specific reason for using get_parameters
over connect ?