Sorry @<1657918706052763648:profile|SillyRobin38> I missed this reply
Is ClearML-Serving using either System or CUCA shared memory? O
This needs to be set on the docker-compose:
and I think this line actually includes ipc: host which means there is no need to set the shm_size, but you can play around with it and let me know if you see a difference
[None](https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/docker/docker-compose-triton-gpu.yml#L1...
Wait, why aren't you just calling Popen? (or os.system), I'm not sure how it relates to the torch multiprocess example. What am I missing ?
SubstantialElk6 could you try with the latest (just released)?pip install clearml-agent==0.17.2
Then if possible, could you attach the full log of the agent's execution (Task->results->Console)
No, it is zipped and stored, so in order to open the zipfile and read the files you have to download them.
That said everything is cached, so if the machine already downloaded the dataset there is zero download / unzipping,
make sese?
MelancholyBeetle72 it will be great if you could also open an issue on Trains and reference the pytorch lightning issue, could you please?
Hmmm:
WOOT WOOT we broke the record! Objective reached 17.071016994817196
WOOT WOOT we broke the record! Objective reached 17.14302934610711
These two seems strange, let me look into it
Hi ConvincingSwan15
A few background questions:
Where is the code that we want to optimize? Do you already have a Task of that code executed?
"find my learning script"
Could you elaborate ? is this connect to the first question ?
MagnificentPig49 I was not aware of jsonargparse
from what I understand it's a nicer way to parse json configuration files, with argparser alike interface. Did I get that correctly?
Regrading the missing argparser, you are correct, the auto-magic is not working since jsonargparse
is calling an internal ArgParser function and not the external one (hence we miss it).
The quickest fix is adding the following line before you call parse_args()
:task.connect(parent_parser)
CheerfulGorilla72 as I understand there were some delays wit the current release, so it is going to be out this week. The one after that includes this feature and as far as I understand would be mid Dec.
if you have cuda 10.2, then the torch 1.3.1 from the cu101 version should work
Can you fix locally, just to verify ?
the only problem with it is that it will start the task even if the task is completed
What is the criteria ?
Sounds good, I assumed that was the case but I was not sure.
Let's make sure that in the clearml.conf
we write it in the comment above the use_credentials_chain
option, so that when users look for IAM roles configuration they can quick search for it ๐
So I think this is a good example of pipelines and data:
Basically Task A generates data stored using the cleamrl-data
(See Dataset class). The output of that is an ID of the Dataset. Then Task B uses that ID to retrieve the Dataset created by Task A.
documentation
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Example:
Step A creating Dataset:
https://github.com/alguchg/clearml-demo/blob/main/process_dataset.py
Step B training model using the Dataset created in ...
it does
not
include the โinternal.repoโ as a package dependency, so it crashes.
understood
And for the time being we have not used the decorators,
So how are you building the pipeline component ?
Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.
@<1533619716533260288:profile|SmallPigeon24> as long as these are the Exact same instances you can have them runing simultaneously (think multi node training), that said each one should "know" not to report over the others, because of course it will overwrite the reports.
Back to your point on multiple agents:
You cannot have two Tasks in the same queue, that means that a single agen...
Oh that makes sense:
` # Create a child process
using os.fork() method
pid = os.fork()
if pid > 0 :
# pid greater than 0 represents
# the parent process
print("I am parent process:")
print("Process ID:", os.getpid())
print("Child's process ID:", pid)
else :
# pid equal to 0 represents
# the created child process
print("\nI am child process - this is still fully auto logged")
print("Process ID:", os.getpid())
print("Parent's process ID:", o...
I have to admit, I haven't had the time ๐
Trying to get pip to be twice as fast ๐ค
https://github.com/pypa/pip/pull/8215
Please keep pinging me, I would really like to follow on it.
WackyRabbit7 just making sure I understand:MedianPredictionCollector.process_results
Is called after the pipeline is completed.
Then inside the function, Task.current_task() returns None.
Is this correct?
Hi @<1544128915683938304:profile|DepravedBee6>
You mean like backup the entire instance and restore it on another machine? Or are you referring to specific data you want to migrate?
BTW if you are upgrading old versions of the server I would recommend upgrading to every version in the middle (there are some migration scripts that need to be run in a few of them)
HurtWoodpecker30
The agent uses the
requirements.txt
)
what do you mean by that? aren't the package listed in the "Installed packages" section of the Task?
(or is it empty when starting, i.e. it uses the requirements.txt from the github, and then the agent lists them back into the Task)
Hi ResponsiveCamel97
The agent generates a new configuration file to be mounted into the docker, with all the new folders as they will be seen inside the docker itself. One of the changes is the system_site_packages as inside the docker we want the new venv to inherit everything from the docker system installed packages.
Make sense ?
ohh sorry, weights_url=path
Basically url can be the local path to the weights file ๐
So maybe the path is related to the fact I have venv caching on?
hmmm could be...
Can you quickly disable the caching and try ?
and they don't know how to write code, is this still possible?
well this means there is some standard of the data, right? what is that standard? unfortunately in our space there is no standard fort data, it's just too generic, so everyone always end with custom parsing of a sort.
Does that make sense ?
HugeArcticwolf77 you can add --services-mode
to the agent, and it will basically keep on spinning Tasks in parallel (unfortunately the open source version does not include a way to limit it to a maximum of concurrent Tasks)
Hi @<1571308003204796416:profile|HollowPeacock58>
I'm assuming this is the arm support (i,e, you are running on new mac) fix we released in one one of the last clearml-agent versions. could you update to the latest clearml-agent?
pip3 install clearml-agent==1.6.0rc2
How can I add additional information, e.g. debug samples, or scalar to the data to be shown in the UI?ย Logger.current_logger() is not working
Yes ๐
dataset.get_logger() to the rescue
Closing the data doesnt work: dataset.close() AttributeError: 'Dataset' object has no attribute 'close'
Hi @<1523714677488488448:profile|NastyOtter17> could you send he full exception ?